Skip to main content

Babai’s conjecture for high-rank classical groups with random generators

Abstract

Let \(G = {\text {SCl}}_n(q)\) be a quasisimple classical group with n large, and let \(x_1, \ldots , x_k \in G\) be random, where \(k \ge q^C\). We show that the diameter of the resulting Cayley graph is bounded by \(q^2 n^{O(1)}\) with probability \(1 - o(1)\). In the particular case \(G = {\text {SL}}_n(p)\) with p a prime of bounded size, we show that the same holds for \(k = 3\).

Introduction

Let G be a group and S a symmetric (\(S = S^{-1}\)) subset of G. Write \({\text {Cay}}(G, S)\) for the associated Cayley graph: the graph whose vertices are the elements \(g \in G\) and whose edges are pairs \(\{g, sg\}\) with \(g\in G, s\in S\). The graph \({\text {Cay}}(G, S)\) is connected if and only if S generates G, and its diameter is equal to the smallest d such that \((S \cup \{1\})^d = G\). A well-known conjecture of Babai [9] states that

$$\begin{aligned} {\text {diam}}{\text {Cay}}(G, S) = (\log |G|)^{O(1)}, \end{aligned}$$

uniformly over all nonabelian finite simple groups G and symmetric generating sets S. In other words, every connected Cayley graph of a nonabelian finite simple group has diameter within a power of the trivial lower bound.

By the classification of finite simple groups, Babai’s conjecture splits into essentially three broad cases:

  1. 1.

    Groups of Lie type of bounded rank over \({\mathbf {F}}_q\) with \(q \rightarrow \infty \);

  2. 2.

    Classical groups of unbounded rank over \({\mathbf {F}}_q\) with q arbitrary;

  3. 3.

    Alternating groups \(A_n\) with \(n \rightarrow \infty \).

For groups of Lie type and bounded rank, Babai’s conjecture is now completely resolved, following breakthrough work of Helfgott [23], Pyber–Szabó [39], and Breuillard–Green–Tao [5]. In the other two cases the conjecture remains open. For the alternating groups, Helfgott and Seress [25] proved that

$$\begin{aligned} {\text {diam}}{\text {Cay}}(A_n, S) = \exp O((\log n)^4 \log \log n). \end{aligned}$$

For comparison, Babai’s conjecture (folkloric in this case) asserts that

$$\begin{aligned} {\text {diam}}{\text {Cay}}(A_n, S) = n^{O(1)}; \end{aligned}$$

thus we have a quasipolynomial bound instead of the expected polynomial bound. The case of classical groups of unbounded rank on the other hand is still wide open. The best bounds currently known are due to Biswas–Yang and Halasi–Maróti–Pyber–Qiao:

$$\begin{aligned} {\text {diam}}{\text {Cay}}(G, S)&\le q^{O(n (\log n + \log q)^3)}&\text {([BY17])};\nonumber \\ {\text {diam}}{\text {Cay}}(G, S)&\le q^{O(n (\log n)^2)}&\text {([HMPQ19])}. \end{aligned}$$
(1)

By contrast, Babai’s conjecture in this case asserts that

$$\begin{aligned} {\text {diam}}{\text {Cay}}(G, S) \le (n \log q)^{O(1)}, \end{aligned}$$

so we are still exponentially stupid. A key open case is the family of groups \({\text {SL}}_n(2)\) with n tending to infinity.

In all cases, an important subproblem is the case of random generators (see, e.g., [38, Problem 10.8.6]). Let \(k \ge 2\) be a small constant and let \(S = \{x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\), where \(x_1, \ldots , x_k \in G\) are uniform and independent. For groups of Lie type of bounded rank, it was proved by Breuillard, Green, Guralnick, and Tao [4] that \({\text {Cay}}(G, S)\) is almost surelyFootnote 1 an expander, and in particular

$$\begin{aligned} {\text {diam}}{\text {Cay}}(G, S) = O(\log |G|). \end{aligned}$$

There is no consensus about whether such a strong bound is likely to hold for groups of unbounded rank. Babai’s conjecture for \(A_n\) and random generators was an open problem for some time. The first polynomial bound was proved by Babai and Hayes, and the exponent has been lowered by Schlage-Puchta and Helfgott–Seress–Zuk:

$$\begin{aligned} {\text {diam}}{\text {Cay}}(A_n, S)&\le n^{7+o(1)}&\text {([BH05])}; \nonumber \\ {\text {diam}}{\text {Cay}}(A_n, S)&\le O(n^3 \log n)&\text {([SP12])}; \nonumber \\ {\text {diam}}{\text {Cay}}(A_n, S)&\le n^2 (\log n)^{O(1)}&\text {([HSZ15])}. \end{aligned}$$
(2)

In this paper we consider the case of high-rank classical groups over a small field. Recall that these are obtained from the groups

$$\begin{aligned} \begin{array}{llll} {\text {GL}}_n(q),&{\text {Sp}}_n(q),&{\text {GO}}_{n}^{(\pm )}(q),&{\text {GU}}_n(q), \end{array} \end{aligned}$$

of automorphisms of a finite vector space \(V = {\mathbf {F}}_q^n\), in the latter three cases equipped with a nondegenerate alternating, quadratic, or hermitian form, respectively. Throughout we write \({\text {GCl}}_n(q)\) for any of these groups, and \({\text {SCl}}_n(q)\) for the corresponding derived subgroup

$$\begin{aligned} \begin{array}{llll} {\text {SL}}_n(q),&{\text {Sp}}_n(q),&\Omega _{n}^{(\pm )}(q),&{\text {SU}}_n(q). \end{array} \end{aligned}$$

We will write \({\text {Cl}}_n(q)\) for any intermediate group:

$$\begin{aligned} {\text {SCl}}_n(q) \le {\text {Cl}}_n(q) \le {\text {GCl}}_n(q). \end{aligned}$$

Omitting a few small exceptional cases, \({\text {SCl}}_n(q)\) is a quasisimple group, so Babai’s conjecture applies.Footnote 2 For \({\text {SCl}}_n(q)\) with n large and random generators, the best bound out there is just the uniform bound (1).

There is a promising programme of Pyber, which aims to prove Babai’s conjecture in three steps. The programme is motivated by the positive solution in the case of random generators in alternating groups, especially the result of Babai–Beals–Seress [3] that \({\text {diam}}{\text {Cay}}(A_n, S) \le n^{O(1)}\) provided only that S contains an element of degree at most \(n/(3 + \epsilon )\). Here the degree of a permutation is the number of non-fixed points. Analogously, the degree of an element \(g \in {\text {GL}}_n(q)\) is defined to be the rank of \(g - 1\), and Pyber’s programme is the following.

  1. 1.

    Given some generators, find an element whose degree is at most \((1-\epsilon )n\).

  2. 2.

    Given an element of degree \((1-\epsilon )n\), find an element of minimal degree.

  3. 3.

    Given an element whose degree is minimal, finish the proof.

In the case of alternating groups, step 3 is essentially trivial, since there are only \(O(n^3)\) 3-cycles in \(A_n\), but for \({\text {SCl}}_n(q)\) it is highly nontrivial. In the case of \({\text {SL}}_n(p)\), p prime, step 3 was accomplished recently by Halasi [22].

We have two things to contribute in the case of large n, small q. First, assuming we have at least 3 random generators, we will do steps 1 and 2 of Pyber’s programme.

Theorem 1.1

Let \(G = {\text {Cl}}_n(q)\), and assume \(\log q < c n / \log ^2 n\) for a sufficiently small constant \(c>0\). Let \(x, y, z \in G\) be random. Then with probability \(1 - e^{-cn}\) there is a word \(w \in F_3\) of length \(n^{O(\log q)}\) such that w(xyz) has minimal degree in \(G' = {\text {SCl}}_n(q)\).

Combined with Halasi’s result, this settles Babai’s conjecture for \({\text {SL}}_n(p)\), p prime and bounded, with at least 3 random generators.

Theorem 1.2

Let \({\text {SL}}_n(p) \le G \le {\text {GL}}_n(p)\), where p is prime and \(\log p < cn / \log ^2 n\). Let xyz be elements of G chosen uniformly at random, and let \(S = \{ x^{\pm 1}, y^{\pm 1}, z^{\pm 1} \}\). Then with probability \(1 - e^{-cn}\) we have

$$\begin{aligned}&\langle S \rangle \ge {\text {SL}}_n(p),~\text {and} \\&{\text {diam}}{\text {Cay}}(\langle S \rangle , S) \le n^{O(\log p)}. \end{aligned}$$

Second, assuming we have sufficiently many random generators depending on q, we will do step 3 in a particularly satisfactory way. In fact, we will prove that the Schreier graph of the action of G on O(1)-tuples of vectors is almost surely a union of expander graphs. (The analogous result for the symmetric group is a result of Friedman, Joux, Roichman, Stern, and Tillich [15], and was essential in [26].)

Theorem 1.3

Let \(G = {\text {Cl}}_n(q)\), and let \(x_1, \ldots , x_k \in G\) be random. Let W be the set of r-tuples of vectors in the natural module \(V = {\mathbf {F}}_q^n\). Assume that \(r < cn^{1/3}\), and that \(k \ge q^{C r^3}\). Then almost surely the Schreier graph of G generated by \(x_1, \ldots , x_k\) on any of its orbits in W has a uniform spectral gap.

As we will explain, this implies that if we have an element of minimal degree then by conjugation we can rapidly obtain a full conjugacy class of elements of minimal degree, and it follows in short order that the diameter of G is not too large. This completes the proof of Babai’s conjecture for \({\text {SCl}}_n(q)\) for k random generators, as long as k is sufficiently large compared to q.

Theorem 1.4

There are constants \(c, C>0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(n > C\). Let \(x_1, \ldots , x_k\) be elements of G chosen uniformly at random, where \(k > q^C\), and let \(S = \{ x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\). Then with probability \(1 - q^{-cn}\) we have

$$\begin{aligned}&\langle S \rangle \ge {\text {SCl}}_n(q),~\text {and} \\&{\text {diam}}{\text {Cay}}(\langle S \rangle , S) \le q^2 n^C. \end{aligned}$$

Corollary 1.5

Babai’s conjecture holds in the following two cases:

  1. (1)

    \({\text {SL}}_n(p)\), p prime and bounded, and at least 3 random generators;

  2. (2)

    \({\text {SCl}}_n(q)\) and at least \(q^C\) random generators, where C is an absolute constant.

Our method does not depend on the classification of finite simple groups (CFSG) in any way. Having a CFSG-free method is valuable for transparency, but moreover we think it is essential for attacking Babai’s conjecture. It is well-known that two random elements of \({\text {SCl}}_n(q)\) almost surely generate the group: this is a result of Kantor and Lubotzky [30]. Kantor and Lubotzky rely on CFSG through Aschbacher’s theorem, so unfortunately their method does not adapt well to proving diameter bounds. By contrast, in [14] the first author and Virchow found a CFSG-free proof in the case of \({\text {SL}}_n(q)\) and expressed the hope that the method would be generalizable. We recycle several ideas from that paper in the present one.

Perhaps the most important idea in our method is the idea that if \(x, y, z \in G\) are random and independent, then the elements xw(yz) for all short words \(w \in F_2\) behave roughly independently, which allows us to imitate having many more than just 3 generators. This is a more powerful version of the “\(xy^i\) trick”, which comes originally from [3, Sect. 4] and has been essential in all subsequent work on the random generator subproblem in high rank.

Let us mention one further result, of independent interest. In the appendix we give analogous arguments for \(A_n\), based on the standard fanciful idea that \(A_n = {\text {PSL}}_n(1)\). The value of doing so is mostly motivational, but we also obtain a new result. Provided \(k \ge 3\), we sharpen (2) to

$$\begin{aligned} {\text {diam}}{\text {Cay}}(A_n, S) \le O(n^2 \log n). \end{aligned}$$

This is a modest improvement, but it is interesting for being conjecturally sharp for any proof which uses elements of small support as a stepping stone. Decreasing the exponent 2 appears to require a radically new idea.

Reader’s guide

We first record some preliminaries (Sect. 2) regarding asymptotic notation, Cayley and Schreier graphs, classical groups and their associated formed spaces and the notions of degree and support, and adjacency operators.

Next we turn to a more specialized preparatory section (Sect. 3) dealing with word maps, where we introduce the vocabulary of queries, coincidences, and trajectories. Briefly, the idea is that if \(w \in F_k\) is a given word, \(v \in V\) a given vector, and \(x_1, x_2, \ldots , x_k \in G\) random, then evaluating \(w(x_1, \ldots , x_k) v\) can be thought of as a kind of random walk. As much as possible we recycle the key language used by [15] in the case of the symmetric group. The tools of this section will be used in two essentially different ways in the rest of the paper.

We proceed (Sect. 4) by showing that a given short word w evaluated at random elements \(x_1, \ldots , x_k \in G\) almost surely has large support (Theorem 4.2). This is a kind of antithesis to step 1 of Pyber’s programme: all sufficiently short words in random generators will in fact fail to have degree \((1-\epsilon )n\). However, this is interesting when combined with recent character bounds of Guralnick–Larsen–Tiep [18, 19], as it implies that the character ratio \(\chi (w(x_1, \ldots , x_k)) / \chi (1)\) is almost surely small for each nonlinear character \(\chi \) (Corollary 5.3).

This bound on the expectation of \(\chi (w(x_1, \ldots , x_k)) / \chi (1)\) is one of the two main ingredients in the “xw(yz) trick”, which is the subject of Sect. 6. This trick shows that, given random generators \(x_0, x_1, \ldots , x_k\), one can almost surely find a short word \(x_0 w(x_1, \ldots , x_k)\) lying in a given normal subset \({\mathfrak {C}}\subseteq G\), provided that the density of \({\mathfrak {C}}\) is large compared to the expected values of character ratios. The trick is a simple consequence of the second moment method, following the observation that the elements \(x_0 w(x_1, \ldots , x_k)\) for various w are approximately pairwise independent.

The other main ingredient is the construction of an appropriate normal set \({\mathfrak {C}}\). This is the subject of Sect. 7. For each classical group we find a large normal set \({\mathfrak {C}}\), all of whose fibres over \(G^\text {ab}\) are large (allowing us to ignore linear characters), and a small integer m such that for every \(g \in {\mathfrak {C}}\) the power \(g^m\) has minimal degree in \({\text {SCl}}_n(q)\). This completes the proof of Theorem 1.1.

Once we have an element of minimal degree, we can act on that element by conjugation. Since the minimal degree in all cases is at most 2, this action is a constituent of the usual permutation action on 4-tuples of vectors. We analyze this action by again using the language of trajectories and coincidences, and the trace method: we bound a high moment of the second eigenvalue by bounding the trace of the corresponding power of the adjacency matrix, interpretting the latter in terms of closed trajectories. This is analogous to a result for the symmetric group due to Friedman, Joux, Roichman, Stern, and Tillich [15], building on earlier work of Broder–Shamir [8]. However, in the case of classical groups there are some extra combinatorial complications that do not arise for symmetric groups.

We first focus (Sect. 8) on describing the structure of a closed trajectory with only one coincidence. We deal with the motivational case of G acting on V first, and then generalize to the action on tuples of vectors.

These results are then (Sect. 9) used to show that, in an orbit of G of size N, the probability that a trajectory closes is close to 1/N, with a small relative error. Again we first deal with the motivational case of G acting on V. Provided that we have sufficiently many generators in terms of q, these bounds are good enough for the trace method to work. This completes the proof of Theorem 1.3.

Finally, in Sect. 10 we collect results and deduce Theorems 1.2 and 1.4.

Many (but not all) of our arguments have natural analogues for the symmetric group. For independent interest and for motivation, these are presented in Appendix A.

Preliminaries

This section fixes some notation and definitions that will be relevant throughout the paper. The reader needing an introduction to expansion, particularly in Cayley and Schreier graphs, could consult Kowalski [31]. For an introduction to classical groups, see Aschbacher [2, Chapter 7] or Grove [20].

Asymptotic notation

Many of the arguments we will use are of asymptotic nature and we adopt standard asymptotic notation to state these. Given functions fg, we write \(f \ll g\) or equivalently \(f = O(g)\) to denote that there are absolute constants \(N, C > 0\) so that \(|f(n)| \le C \cdot g(n)\) for all \(n \ge N\). Let \(f \asymp g\) mean that \(f \ll g\) and \(g \ll f\). We write \(f = o(g)\) to denote that for every \(\epsilon > 0\) there is a constant N so that \(|f(n)| \le \epsilon \cdot g(n)\) for all \(n \ge N\). Let \(f = \omega (g)\) mean that \(g = o(f)\).

We will generally write statements that involve anonymous (usually absolute) constants by using c for small constants and C for big constants.

Cayley and Schreier graphs

Let G be a group with generating set S satisfying \(S = S^{-1}\). The (undirected, left) Cayley graph \({\text {Cay}}(G,S)\) is the graph whose vertices are elements of G and whose edges are pairs \(\{ g, s g \}\) for \(g \in G, s \in S\).

More generally, the (undirected) Schreier graph \({\text {Sch}}(G,S,\Omega )\) associated to a transitive action of G on a set \(\Omega \) is the graph whose vertices are elements of \(\Omega \) and whose edges are pairs \(\{ \omega , s \omega \}\) for \(\omega \in \Omega , s \in S\). Cayley graphs are Schreier graphs for the left regular representation of G on itself.

Let \(\Gamma \) be a connected graph. One can view \(\Gamma \) as a metric space in the following way. Define the length of a path in \(\Gamma \) to be the number of edges on the path, and let the distance \(d_\Gamma (v_1, v_2)\) between any two vertices \(v_1, v_2 \in V(\Gamma )\) be the length of the shortest path between \(v_1, v_2\). The diameter of a graph \(\Gamma \) is

$$\begin{aligned} {\text {diam}}\Gamma = \max _{v_1, v_2 \in V(\Gamma )} d_{\Gamma }(v_1, v_2). \end{aligned}$$

The diameter of \({\text {Cay}}(G, S)\) is just the smallest \(d \ge 0\) such that \((S \cup \{1\})^d = G\).

Classical groups

Throughout the paper we write \({\text {SCl}}_n(q) \le {\text {GCl}}_n(q) \le {\text {GL}}_n(q)\) for any of the following groups:

$$\begin{aligned} \begin{array}{rlllll} {\text {GCl}}_n(q):&{} &{}{\text {GL}}_n(q), &{}{\text {Sp}}_n(q), &{}{\text {GO}}_{n}^{(\pm )}(q), &{}{\text {GU}}_n(q), \\ {\text {SCl}}_n(q):&{} &{}{\text {SL}}_n(q), &{}{\text {Sp}}_n(q), &{}\Omega _{n}^{(\pm )}(q), &{}{\text {SU}}_n(q). \end{array} \end{aligned}$$

In all cases the defining module is \(V = {\mathbf {F}}_q^n\). We sometimes refer to the first case as the linear case. We make the following conventions in the other cases (notation in other literature sometimes differs, particular in the \({\text {GU}}\) case):

\({\text {Sp}}_n\)::

n must be even.

\({\text {GO}}_n^{(\pm )}\)::

\(\Omega _n(q) = {\text {SO}}_n(q)'\). If n is even there are two possibilities, denoted \({\text {GO}}_n^+(q)\) and \({\text {GO}}_n^-(q)\), depending on the choice of quadratic form. If n is odd there is only \({\text {GO}}_n(q)\), and q must be odd.

\({\text {GU}}_n\) ::

q must be a square \(q_0^2\). The field automorphism of \({\mathbf {F}}_q\) of order 2 is denoted \(\theta \).

We write \({\text {Cl}}_n(q)\) for any intermediate group:

$$\begin{aligned} {\text {SCl}}_n(q) \le {\text {Cl}}_n(q) \le {\text {GCl}}_n(q). \end{aligned}$$

Note that any such group corresponds to a subgroup of the abelianization \({\text {GCl}}_n(q)^\text {ab}\), which is given as follows:

$$\begin{aligned} {\text {GL}}_n(q)^\text {ab}&\cong {\mathbf {F}}_q^\times ,\\ {\text {Sp}}_n(q)^\text {ab}&\cong 1,\\ {\text {GO}}_n^{(\pm )}(q)^\text {ab}&\cong C_2 \times C_2&(q~\text {odd}, n\ge 2),\\ {\text {GO}}_n^{\pm }(q)^\text {ab}&\cong C_2&(q~\text {even}, n~\text {even}),\\ {\text {GU}}_n(q)^\text {ab}&\cong \{u \in {\mathbf {F}}_q : u u^\theta = 1\}. \end{aligned}$$

Binary and quadratic forms

In all cases we write f for the defining invariant binary form; thus f is zero in the linear case, alternating in the symplectic case, symmetric in the orthogonal case, and hermitian in the unitary case. Except in the linear case, f is nondegenerate.

In the orthogonal case, we write Q for the relevant quadratic form. Recall that Q is related to f by

$$\begin{aligned} Q(u + v) = Q(u) + Q(v) + f(u, v); \end{aligned}$$
(3)

in particular, in odd characteristic,

$$\begin{aligned} Q(v) = f(v, v) / 2. \end{aligned}$$

In even characteristic, Q is not determined by f, but is part of the defining data (and f is determined by Q via (3)). In the unitary case we write Q for the function

$$\begin{aligned} Q(v) = f(v, v), \end{aligned}$$

which we may regard as a quadratic form over \({\mathbf {F}}_{q_0}\). In the other cases define \(Q \equiv 0\). Define also \(q_0 = q\) in the orthogonal case and \(q_0=1\) in the linear and symplectic cases, so that Q always takes values in a \(q_0\)-element space.

It is important that we are able to count solutions to \(Q(v) = x\) in any affine subspace.

Lemma 2.1

Let \(v_0 + W\) be an affine subspace of V of codimension s. The number of \(v \in v_0 + W\) with a specified value of Q(v) is within \(q^{n - s}/q_0 \pm q^{n/2}\).

Proof

(Cf. Dickson [11, Chapter IV].) This is trivial in the linear and symplectic cases: \(Q \equiv 0\), so the number is exactly \(q^{n-s}\). The unitary case reduces to the orthogonal case by restriction of scalars, so it suffices to consider the orthogonal case.

For \(x \in {\mathbf {F}}_q\), let

$$\begin{aligned} \Phi (x) = |\{v \in v_0 + W : Q(v) = x\}|. \end{aligned}$$

The Fourier transform of \(\Phi \) is

$$\begin{aligned} \widehat{\Phi }(\chi )&= \sum _{x \in {\mathbf {F}}_q} \Phi (x) \overline{\chi (x)}&(\chi \in \widehat{{\mathbf {F}}_q}) \\&= \sum _{w \in W} \chi (-Q(v_0 + w)). \end{aligned}$$

For nontrivial \(\chi \) we have

$$\begin{aligned} |\widehat{\Phi }(\chi )|^2&= \sum _{w, h \in W} \chi (-Q(v_0+w) + Q(v_0+w+h)) \\&= \sum _{w, h \in W} \chi (Q(h) + \Phi (v_0+w,h)). \end{aligned}$$

The sum over w is zero unless \(h \in W^\perp \). Note that \(\dim W^\perp = s\). Hence

$$\begin{aligned} |\widehat{\Phi }(\chi )|^2 \le |W|\,|W^\perp | = q^n. \end{aligned}$$

By Fourier inversion we have

$$\begin{aligned} \Phi (x) = q^{n-s-1} + \frac{1}{q} \sum _{1\ne \chi \in \widehat{{\mathbf {F}}_q}} \widehat{\Phi }(\chi ) \chi (x), \end{aligned}$$

so

$$\begin{aligned} |\Phi (x) - q^{n-s-1}| \le \frac{1}{q} \sum _{1 \ne \chi \in \widehat{{\mathbf {F}}_q}} |\widehat{\Phi }(\chi )| \le q^{n/2}. \end{aligned}$$

Relatedly, we have Witt’s lemma, which characterizes the orbits of \({\text {GCl}}_n(q)\) in terms of f and Q.

Lemma 2.2

(Witt’s lemma) Let \(u_1, \ldots , u_k, v_1, \ldots , v_k \in V\) be vectors such that

$$\begin{aligned} \dim \langle u_1, \ldots , u_k \rangle&= \dim \langle v_1, \ldots , v_k\rangle \\ f(u_i, u_j)&= f(v_i, v_j)&(1 \le i,j \le k) \\ Q(u_i)&= Q(v_i)&(1 \le i \le k). \end{aligned}$$

Then there is an element \(g \in {\text {GCl}}_n(q)\) such that \(g u_i = v_i\) for each \(1 \le i \le k\). If \(k \le n-2\) there is such an element in \({\text {SCl}}_n(q)\).

Proof

See, e.g., [2, Sect. 20]. \(\square \)

Degree and support

The concepts of degree and support are essential in the rest of the paper. Both concepts are analogous to the size of the support of a permutation, defined as the set of non-fixed points. The degree of an element \(g \in {\text {GL}}_n(q)\) is

$$\begin{aligned} \deg g = {\text {rank}}(g - 1); \end{aligned}$$

the support of \(g \in {\text {GL}}_n(q)\) is

$$\begin{aligned} {\text {supp}}g = \min _{\lambda \in \overline{{\mathbf {F}}_q}} {\text {rank}}(g - \lambda ) \end{aligned}$$

(the former definition follows [10] and [24]; the latter definition follows Larsen–Shalev–Tiep [37]). Equivalently, if \(V_\lambda = \ker (g - \lambda )\) denotes the \(\lambda \)-eigenspace of g (for \(\lambda \in \overline{{\mathbf {F}}_q}\)), then

$$\begin{aligned} \deg g&= {\text {codim}}V_1, \\ {\text {supp}}g&= \min _{\lambda \in \overline{{\mathbf {F}}_q}} {\text {codim}}V_\lambda . \end{aligned}$$

Support is closely related to the size of the centralizer, as in the following lemma.

Lemma 2.3

For \(g \in G \le {\text {GL}}_n(q)\),

$$\begin{aligned} |C_G(g)| \le q^{n (n - {\text {supp}}g)}. \end{aligned}$$

Proof

(Cf. [35, Lemma 3.1].) Clearly

$$\begin{aligned} |C_G(g)| \le |C_{{\text {M}}_n({\mathbf {F}}_q)}(g)|. \end{aligned}$$

Note that \(C_{{\text {M}}_n({\mathbf {F}}_q)}(g)\) is a vector space over \({\mathbf {F}}_q\), so it will suffice to bound its dimension. Consider g as an element of \({\text {GL}}_n(\overline{{\mathbf {F}}_q})\) and decompose it into Jordan blocks. For each eigenvalue \(\lambda \) of g, let \(\pi _\lambda \) be the partition whose parts are the sizes of Jordan blocks associated to \(\lambda \). Denote by \(S^i(\pi )\) the sum of ith powers of the parts of a partition \(\pi \) and let \(\pi '\) be the transposed partition of \(\pi \). By [27, Sect. 1.3],

$$\begin{aligned} \dim C_{{\text {M}}_n({\mathbf {F}}_q)}(g) = \sum _{\lambda } S^2(\pi _\lambda '). \end{aligned}$$

The largest part of \(\pi _\lambda '\) is the dimension of \(V_\lambda \), so

$$\begin{aligned} S^2(\pi _\lambda ') \le S^1(\pi _\lambda ') \dim V_\lambda . \end{aligned}$$

Combined with \(\sum _{\lambda } S^1(\pi _\lambda ') = n\), this implies

$$\begin{aligned} \dim C_{{\text {M}}_n({\mathbf {F}}_q)}(g) \le n \max _\lambda \dim V_\lambda = n (n - {\text {supp}}g). \end{aligned}$$

Adjacency operator

Given any group G and \(x_1, \ldots , x_k \in G\), let

$$\begin{aligned} {\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k} = \frac{1}{2k} \sum _{i=1}^{k} (x_i + x_i^{-1}). \end{aligned}$$

This is an element of the group algebra \({\mathbf {C}}[G]\). Given any \({\mathbf {C}}[G]\)-module W, we may consider the action of \({\mathcal {A}}\) on W. Since \({\mathcal {A}}\) is self-adjoint its spectrum is real. Write \(\rho ({\mathcal {A}}, W)\) for the spectral radius of \({\mathcal {A}}\).

We are most interested in permutation modules. If G acts transitively on a set \(\Omega \) then there is a corresponding permutation module \({\mathbf {C}}[\Omega ]\) containing a single copy of the trivial representation, denoted \({\mathbf {C}}[\Omega ]^G\). Let \(W = {\mathbf {C}}[\Omega ]_0\) denote the orthogonal complement of \({\mathbf {C}}[\Omega ]^G\). The spectral gap is \(1 - \rho ({\mathcal {A}}, W)\). Equivalently, if \({\mathcal {A}}\) acting on \({\mathbf {C}}[\Omega ]\) has spectrum

$$\begin{aligned} 1 = \lambda _1 \ge \lambda _2 \cdots \ge \lambda _N \ge -1, \end{aligned}$$

where \(N = |\Omega |\), then

$$\begin{aligned} \rho ({\mathcal {A}}, W) = \max (\lambda _2, -\lambda _N), \end{aligned}$$

so the spectral gap is

$$\begin{aligned} \min (1 - \lambda _2, 1 - |\lambda _N|). \end{aligned}$$

We say the action of \(x_1, \ldots , x_k\) on \(\Omega \) is expanding if the spectral gap is bounded away from zero. This is equivalent to rapid mixing of the random walk on \(\Omega \).

Word maps, queries, and trajectories

Word maps

Write \(F_k = F\{\xi _1, \ldots , \xi _k\}\) for the free group with generators \(\{\xi _1, \ldots , \xi _k\}\). Let \(w \in F_k\) have length \(\ell \), and let

$$\begin{aligned} w = w_\ell \cdots w_1 \qquad (w_i \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}) \end{aligned}$$

be the reduced expression of w. Let G be a finite group and \(x_1, \ldots , x_k \in G\). Write

$$\begin{aligned} {{\overline{w}}} = w(x_1, \ldots , x_k) \end{aligned}$$

for the image of w under the homomorphism \(F_k \rightarrow G\) defined by \(\xi _i \mapsto x_i\).

Usually, but not always, \(x_1, \ldots , x_k\) will be chosen randomly. The following lemma is often useful for reducing to the cyclically reduced case.

Lemma 3.1

If \(x_1, \ldots , x_k \in G\) are uniform and independent then \({{\overline{w}}}\) is just the image of w under a uniformly random homomorphism \(F_k \rightarrow G\). In particular, the distribution of \({{\overline{w}}}\) depends only on the automorphism class of w.

Queries and coincidences

Let \(G = {\text {Cl}}_n(q)\) be a classical group and \(V = {\mathbf {F}}_q^n\) the defining module. Let \(x_1, \ldots , x_k \in G\). Define a query to be a pair \((\xi , v)\), where \(\xi \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}\) and \(v \in V\); the result of the query is \({\overline{\xi }} v\). After any finite sequence of queries

$$\begin{aligned} (w_1, v_1), (w_2, v_2), \ldots , (w_{t-1}, v_{t-1}) \end{aligned}$$

the known domain of a letter \(\xi \) at time t is

$$\begin{aligned} D_\xi ^t = {\text {span}}\{ v_i : w_i = \xi , i< t\} + {\text {span}}\{\overline{w_i} v_i : w_i = \xi ^{-1}, i < t\}. \end{aligned}$$

Suppose we make a further query \((w_t, v_t)\). If \(v_t \in D_{w_t}^t\), then the result \(\overline{w_t} v_t\) is determined already by the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t-1}} v_{t-1}\); we call this a forced choice. Otherwise, we say the query is a free choice.

Let R be some subset of V fixed in advance. If a query \((w_t, v_t)\) is a free choice and yet

$$\begin{aligned} \overline{w_t} v_t \in {\text {span}}R + {\text {span}}\{v_1, \overline{w_1} v_1, \ldots , v_{t-1}, \overline{w_{t-1}} v_{t-1}, v_t\} \end{aligned}$$

then we say the result of the query is a coincidence.

The language is most interesting when \(x_1, \ldots , x_k \in G\) are chosen randomly. Then, by Witt’s lemma, whenever \((\xi , v)\) is a free choice, \({\overline{\xi }} v\) is, conditionally on the result of previous queries, uniformly distributed among vectors satisfying the relevant independence and form conditions. In particular, coincidences are unlikely. We formalize these key points in the following lemmas.

Lemma 3.2

Let \(x \in G\) be uniformly random, and let \(u_1, \ldots , u_t\) be linearly independent, where \(t \le n-2\). Then, conditionally on the values of \(v_1 = x u_1, \ldots , v_{t-1} = x u_{t-1}\), the value of \(x u_t\) is uniformly distributed among vectors \(v_t\) such that \(u_i \mapsto v_i\) defines an isometric isomorphism \(\langle u_1, \ldots , u_t\rangle \rightarrow \langle v_1, \ldots , v_t \rangle \), or in other words such that \(v_t \notin {\text {span}}\{v_1, \ldots , v_{t-1}\}\) and \(f(u_i, u_t) = f(v_i, v_t)\) for each \(i\le t\) and \(Q(u_t) = Q(v_t)\).

Proof

For each such \(v_t\), Witt’s lemma asserts that there is at least one suitable \(x \in G\). The distribution is uniform by the orbit–stabilizer theorem. \(\square \)

Lemma 3.3

Let \(x_1, \ldots , x_k \in G\) be uniformly random and independent, and let

$$\begin{aligned} (w_1, v_1), (w_2, v_2), \ldots , (w_{t-1}, v_{t-1}) \end{aligned}$$

be a sequence of queries. Assume that \((w_t, v_t)\) is a free choice. Assume

$$\begin{aligned} \dim \langle v_1, \ldots , v_t\rangle \le n-2. \end{aligned}$$

Then, conditionally on the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t-1}} v_{t-1}\), the result \(\overline{w_t} v_t\) of the query \((w_t, v_t)\) is uniformly distributed outside \(D_{w_t^{-1}}^t\) subject to

$$\begin{aligned} f(\overline{w_i} v_i, \overline{w_t} v_t)&= f(v_i, v_t)&(i< t, w_i = w_t),\\ f(v_i, \overline{w_t} v_t)&= f(\overline{w_i} v_i, v_t)&(i < t, w_i = w_t^{-1}),\\ Q(\overline{w_t} v_t)&= Q(v_t). \end{aligned}$$

In particular, the conditional probability that \(\overline{w_t} v_t\) is a coincidence is bounded by

$$\begin{aligned} \frac{q^d}{q^{n-s}/q_0 - q^s - q^{n/2}} \end{aligned}$$

(provided the denominator is positive), where

$$\begin{aligned} d = \dim ({\text {span}}R + {\text {span}}\{v_1, \overline{w_1} v_1, \ldots , v_{t-1}, \overline{w_{t-1}} v_{t-1}, v_t\}) \end{aligned}$$

and s is the number of \(i < t\) with \(w_i \in \{w_t, w_t^{-1}\}\).

Proof

The first part of the lemma is immediate from the previous lemma. For the second part, note that \(\overline{w_t} v\) is drawn from an affine subspace of codimension at most s, less a subspace of dimension at most s, subject only to the quadratic condition; by Lemma 2.1 there are at least \(q^{n-s}/q_0 - q^{n/2} - q^s\) possibilities, so we get at least the denominator claimed. \(\square \)

Remark 3.4

In the linear case there are no form conditions, so we get the simpler bound \(q^d / (q^n - q^s)\) for the probability of a coincidence.

Trajectories

Let \(w \in F_k\), and let

$$\begin{aligned} w = w_\ell \cdots w_1 \qquad (w_i \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}) \end{aligned}$$

be the reduced expression. For each \(v \in V\), the trajectory of v is the sequence of queries \((w_t, v^{t-1})\), where \(v^0 = v\) and for each \(t \ge 1\) the vector \(v^t\) is the result of the query \((w_t, v^{t-1})\); in other words, the sequence \(v^0, v^1, \ldots , v^\ell \) is defined by

$$\begin{aligned} v^0&= v, \\ v^t&= \overline{w_t} v^{t-1}&(1\le t\le \ell ). \end{aligned}$$

The following lemma is trivial but essential.

Lemma 3.5

Suppose \(v \ne 0\) and \(v^\ell \in {\text {span}}R\). Then there is at least one coincidence in the trajectory of v.

Proof

Since \(D_{w_1}^1 = 0\), the first query \((w_1, v^0)\) is free. For each \(t\ge 1\), if \((w_t, v^{t-1})\) is free and not a coincidence then

$$\begin{aligned} v^t = \overline{w_t} v^{t-1} \notin {\text {span}}R + {\text {span}}\{v^0, \ldots , v^{t-1}\}, \end{aligned}$$

while

$$\begin{aligned} D_{w_{t+1}}^{t+1} \le {\text {span}}\{v^0, \ldots , v^{t-1}\}; \end{aligned}$$

hence the query \((w_{t+1}, v^t)\) is also free. Finally if \((w_\ell , v^{\ell -1})\) is free and not a coincidence then \(v^\ell \notin {\text {span}}R\). \(\square \)

More generally for any \(r\ge 1\) we consider the joint trajectory of an r-tuple

$$\begin{aligned} (v_1, \ldots , v_r) \in V^r, \end{aligned}$$

which is simply the r-tuple of individual trajectories, with the queries \((w_t, v_i^{t-1})\) ordered lexicographically by (ti); i.e., we answer the queries

$$\begin{aligned} (w_1, v_1^0)&(w_1, v_2^0)&\cdots&(w_1, v_r^0) \\ (w_2, v_1^1)&(w_2, v_2^1)&\cdots&(w_2, v_r^1) \\&\vdots&\end{aligned}$$

in reading order. Write \(\prec \) for this order, i.e., \((t',i') \prec (t,i)\) if \(t' < t\) or \(t'=t\) and \(i' < i\). The following lemma generalizes the previous one.

Lemma 3.6

Suppose \(v_i \notin {\text {span}}\{v_1, \ldots , v_{i-1}\}\) and \(v_i^\ell \in {\text {span}}R\). Then there is at least one coincidence in the trajectory of \(v_i\) (during the joint trajectory of \(v_1, \ldots , v_r\)).

Proof

At time (1, i), we have

$$\begin{aligned} D_{w_1}^{(1, i)} \le {\text {span}}\{v_1, \ldots , v_{i-1}\}, \end{aligned}$$

so the first query \((w_1, v_i^0)\) is free. For each \(t\ge 1\), if \((w_t, v_i^{t-1})\) is free and not a coincidence then

$$\begin{aligned} v_i^t = \overline{w_t} v_i^{t-1} \notin {\text {span}}R + {\text {span}}\{v_{i'}^{t'} : (t', i') \prec (t, i)\} \end{aligned}$$

(the vectors \(v_{i'}^{t'}\) with \(t' = t\) and \(i' < i\) get included because they are results of previous queries), while

$$\begin{aligned} D_{w_{t+1}}^{(t+1, i)} \le {\text {span}}\{v_{i'}^{t'} : (t', i') \prec (t, i)\}; \end{aligned}$$

hence the query \((w_{t+1}, v_i^t)\) is also free. Finally if \((w_\ell , v_i^{\ell -1})\) is free and not a coincidence then \(v_i^\ell \notin {\text {span}}R\). \(\square \)

The probability of small support

Let G be a finite group, let \(w \in F_k\), let \(x_1, \ldots , x_k \in G\) be random, and consider \({{\overline{w}}} = w(x_1, \ldots , x_k)\). The probability that \({{\overline{w}}} = 1\) quantifies the extent to which w is “almost a law” in G. This probability is a well-studied quantity, particularly when G is simple. For example, it is known that for any \(w \ne 1\) there is some \(c = c(w) > 0\) such that \({\mathbf {P}}({{\overline{w}}} = 1) \le |G|^{-c}\) for all sufficiently large finite simple groups G (Larsen–Shalev [35, Theorem 1.1]).

For groups of large rank (our particular interest), the following bounds have been proved recently. Let \(\ell > 0\) be the reduced length of w.

  1. 1.

    For \(G = A_n\) or \(G = S_n\), if \(\ell < cn^{1/2}\) then

    $${\mathbf {P}}({{\overline{w}}}=1) \le e^{-c n / \ell ^2}$$

    (Eberhard [12, Lemma 2.2]).

  2. 2.

    For any classical group \(G = {\text {Cl}}_n(q)\), if \(\ell < cn\) then

    $${\mathbf {P}}({{\overline{w}}}=1) \le |G|^{-c/\ell }$$

    (Liebeck–Shalev [36, Theorem 4]).

The proofs of these estimates can be adapted to show more, namely that with high probability \({{\overline{w}}}\) has large support. In this section we explain this observation in detail in the case of \(G = {\text {Cl}}_n(q)\). For the case of \(G = A_n\) or \(G = S_n\), see the appendix (Subsection A.2).

The following lemma generalizes a key step from the argument of [36, Theorem 4].

Lemma 4.1

Let \(G = {\text {Cl}}_n(q)\) be a classical group of dimension n. Let \(V = {\mathbf {F}}_q^n\) be the natural module, and let \(U \le V\) be a subspace of dimension \(r \le n-2\). Let \(w \in F_k\) be a nontrivial word of length \(\ell \le (\frac{n}{2} - 2)/r\). Then

$$\begin{aligned} {\mathbf {P}}\left( {{\overline{w}}} U = U\right) \le \left( C_{q^r} \frac{q^{\ell r}}{q^{n - \ell r - 1} - q^{\ell r} - q^{n/2}}\right) ^r, \end{aligned}$$

where \(C_{q^r} = 1 + (1 - q^{-r})^{-1} \le 3\).

Proof

Let \(v_1, \ldots , v_r\) be a basis for U. Consider the joint trajectory of \(v_1, \ldots , v_r\). By Lemma 3.6 with \(R = \{v_1, \ldots , v_r\}\), we can have \({{\overline{w}}} U = U\) only if there is at least one coincidence in each individual trajectory. We take a union bound over all possibilities for when the coincidences could occur. If \(t < \ell \), then by Lemma 3.3, the probability that step (ti) is a coincidence is bounded by

$$\begin{aligned} \frac{q^{(t+1) r}}{q^{n-\ell r-1} - q^{\ell r} - q^{n/2}}; \end{aligned}$$

indeed there are at most \(t r + i \le (t+1) r \le \ell r\) previous vectors. If \(t = \ell \), assuming \(v_j^\ell \in U\) for \(j < i\), we actually get a slightly stronger bound:

$$\begin{aligned} \frac{q^{\ell r}}{q^{n - \ell r - 1} - q^{\ell r} - q^{n/2}}. \end{aligned}$$

Summing over t, the probability that there is a coincidence in the trajectory of \(v_i\) is bounded by

$$\begin{aligned} (1 + 1 + q^{-r} + q^{-2r} + \cdots ) \frac{q^{\ell r}}{q^{n - \ell r - 1} - q^{\ell r} - q^{n/2}}. \end{aligned}$$

Taking the product over i gives the claimed bound. \(\square \)

In the following proof we will refer to the “q-binomial coefficient”, defined by

$$\begin{aligned} \left( {\begin{array}{c}x\\ r\end{array}}\right) _q = \frac{(q^x - 1) (q^x - q) \cdots (q^x - q^{r-1})}{(q^r - 1) (q^r - q) \cdots (q^r - q^{r-1})}. \end{aligned}$$

When x is a nonnegative integer this is the number of r-dimensional subspaces of \({\mathbf {F}}_q^x\). For \(x \ge r\) note that \(x\mapsto \left( {\begin{array}{c}x\\ r\end{array}}\right) _q\) is increasing and nonnegative, and

$$\begin{aligned} \left( {\begin{array}{c}x\\ r\end{array}}\right) _q = q^{xr - r^2} \frac{(1-q^{-x+r-1}) \cdots (1 - q^{-x})}{(1-q^{-r}) \cdots (1-q^{-1})} \asymp q^{xr - r^2}. \end{aligned}$$

The following theorem will be used for an unspecified, but fixed, \(\delta > 0\).

Theorem 4.2

There are constants \(c, C>0\) such that the following holds for all \(\delta > 0\). Let \(G = {\text {Cl}}_n(q)\) be a classical group of dimension n, and let \(w \in F_k\) be a nontrivial word of reduced length \(\ell < \delta ^2 n / 20\). Assume \(q^{\delta n} > C\). Then

$$\begin{aligned} {\mathbf {P}}\left( {\text {supp}}{{\overline{w}}} \le (1-\delta )n\right) \le |G|^{-c \delta ^2/\ell }. \end{aligned}$$

Proof

Let \(x_1, \ldots , x_k\) be chosen independently and uniformly from G. Suppose some eigenspace \(V_\lambda \le \overline{{\mathbf {F}}_q}^n\) of \({{\overline{w}}}\) has dimension at least \(\delta n\). Let \(d = [{\mathbf {F}}_q(\lambda ):{\mathbf {F}}_q]\). Let \(\Lambda \) be the set of d Galois conjugates of \(\lambda \). Since \(\dim V_{\lambda '} = \dim V_\lambda \) for each \(\lambda ' \in \Lambda \), \(\dim V_\lambda \le n / d\), so \(d \le 1 / \delta \). Let \(W \le V_\lambda \) be an r-dimensional subspace defined over \({\mathbf {F}}_q(\lambda ) \cong {\mathbf {F}}_{q^d}\). Then there is a conjugate subspace \(W' \le V_{\lambda '}\) for each \(\lambda ' \in \Lambda \), and the sum \(U = \sum _{\lambda ' \in \Lambda } W'\) is a dr-dimensional and \({\mathbf {F}}_q\)-rational since it is fixed by the Galois group, so it may be identified with a dr-dimensional subspace of V. Since \(U \cap V_\lambda = W\), this correspondence \(W \mapsto U\) is injective. Hence the number of dr-dimensional subspaces of V preserved by \({{\overline{w}}}\) is at least \(\left( {\begin{array}{c}\delta n\\ r\end{array}}\right) _{q^d}\).

Since \(\ell d \le \ell / \delta < \delta n / 20\), we may choose an integer \(r > 0\) such that \(\ell d r \in [\delta n / 5, \delta n / 4]\). Now by the previous lemma and Markov’s inequality, the probability that the number of dr-dimensional subspaces of V preserved by \({{\overline{w}}}\) is at least \(\left( {\begin{array}{c}\delta n\\ r\end{array}}\right) _{q^d}\) is bounded by

$$\begin{aligned}&\frac{\left( {\begin{array}{c}n\\ d r\end{array}}\right) _q}{\left( {\begin{array}{c}\delta n\\ r\end{array}}\right) _{q^d}} \left( 3 \frac{q^{\ell d r}}{q^{n - \ell d r - 1} - q^{\ell d r} - q^{n/2}} \right) ^{dr} \\&\quad \asymp \frac{q^{drn - d^2 r^2}}{(q^d)^{\delta r n - r^2}}\left( 3 \frac{q^{\ell d r}}{q^{n - \ell d r - 1} - q^{\ell d r} - q^{n/2}} \right) ^{dr} \\&\quad \le O\left( q^{-\delta n + 2 \ell dr + r - dr + 1}\right) ^{dr} \\&\quad \le O(1)^{\delta n / 4 \ell } \left( q^{-\delta n + 2\delta n / 4 + \delta n / 4}\right) ^{\delta n/5\ell } \\&\quad = O(1)^{\delta n / \ell } q^{-\frac{1}{20} \delta ^2 n^2 / \ell }. \end{aligned}$$

Taking the sum over all \(d \le 1/ \delta \), it follows that

$$\begin{aligned} {\mathbf {P}}\left( {\text {supp}}{{\overline{w}}} \le (1- \delta )n \right) = {\mathbf {P}}\left( \max _{\lambda \in \overline{{\mathbf {F}}_q}} \dim V_\lambda \ge \delta n \right) \le \delta ^{-1} O(1)^{\delta n / \ell } q^{-\frac{1}{20} \delta ^2 n^2 / \ell }. \end{aligned}$$

Assuming \(q^{\delta n}\) is sufficiently large, the first two factors are negligible compared to the third. \(\square \)

Remark 4.3

The restriction \(\ell < c \delta ^2 n\) in Theorem 4.2 is essential, and related to our reliance on linear algebra. For example, let \(G = {\text {SL}}_n(q)\), and suppose w is a word of length \(\ell \approx 10 n\). We do not know how to bound \({\mathbf {P}}({{\overline{w}}} = 1)\) satisfactorily. Is it true that \({\mathbf {P}}({{\overline{w}}} = 1) \le q^{-cn}\) for some \(c>0\)? Certainly w cannot be a law, because \({\text {SL}}_n(q)\) contains \({\text {SL}}_2(q^{\left\lfloor {n/2}\right\rfloor })\) and the shortest law in \({\text {SL}}_2(q^{\left\lfloor {n/2}\right\rfloor })\) has length at least \((q^{\left\lfloor {n/2}\right\rfloor } - 1)/3\) (see Hadad [21, Theorem 2]). The question is whether it can be an almost-law.

Expected values of characters

Throughout this section let \(G = {\text {Cl}}_n(q)\) be a classical group and \(\chi \in {\text {Irr}}G\) a nonlinear character. Our aim is to bound

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)}\right) \end{aligned}$$

when w is a fixed nontrivial word of length cn, evaluated at random \(x_1, \ldots , x_k \in G\). The proof consists of two steps:

  1. 1.

    By the previous section, with high probability \({{\overline{w}}}\) has large support.

  2. 2.

    By recent character bounds of Guralnick, Larsen, and Tiep [18, 19], if \({{\overline{w}}}\) has large support then \(|\chi ({{\overline{w}}})| \le \chi (1)^\epsilon \).

We first deal with elements of large support.

Lemma 5.1

For every \(\epsilon >0\) there is a \(\delta > 0\) such that the following holds. Let \(g \in G\) with \({\text {supp}}g \ge (1-\delta )n\). Then \(|\chi (g)| \le \chi (1)^\epsilon \).

Proof

By Lemma 2.3, \(|C_G(g)| \le q^{\delta n^2}\). Hence by the character bound [18, Theorem 1.3] we have \(|\chi (g)| \le \chi (1)^\epsilon \). \(\square \)

Theorem 5.2

There is a constant \(c > 0\) such that the following holds. Let \(w \in F_k\) be a fixed nontrivial word of reduced length less than cn. Then

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) < q^{-c n}. \end{aligned}$$

Proof

Let \(\delta \) be as in the previous lemma with \(\epsilon = 1/2\). By conditioning on whether or not \({\text {supp}}{{{\overline{w}}}} < (1-\delta )n\), we have

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)}\right)&\le {\mathbf {P}}_{x_1, \ldots , x_k}\left( {\text {supp}}{{{\overline{w}}}} < (1-\delta )n\right) \\&\qquad + \max _{x_1, \ldots , x_k : {\text {supp}}{{{\overline{w}}}} \ge (1-\delta )n} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) . \end{aligned}$$

It follows from Theorem 4.2 that

$$\begin{aligned} {\mathbf {P}}_{x_1, \ldots , x_k}({\text {supp}}{{{\overline{w}}}} < (1-\delta )n) \le q^{-c_1n} \end{aligned}$$

for some constant \(c_1 > 0\). The other summand is bounded by Lemma 5.1:

$$\begin{aligned} \max _{x_1, \ldots , x_k :{\text {supp}}{{{\overline{w}}}} \ge (1-\delta )n} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) \le \chi (1)^{-1/2} \le q^{-c_2 n} \end{aligned}$$

for some constant \(c_2 > 0\). (Here we used \(\chi (1) \ge q^{c_3 n}\): see [32].) \(\square \)

Our main interest is the case in which w is the result of a simple random walk in \(F_k\). With high probability the result of the random walk is nontrivial, so we can apply the above theorem.

Corollary 5.3

There is a constant \(c > 0\) such that the following holds. Let w be the result of a simple random walk of length \(\ell < cn\) in \(F_k\). Then

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k\in G, w} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) < q^{-c n} + k^{-c \ell }. \end{aligned}$$

Proof

By conditioning on whether or not the word w is trivial, we get

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) \le \max _{0< |w| < cn} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) + {\mathbf {P}}_{w}(w = 1). \end{aligned}$$

The first term is bounded by Theorem 5.2. The second term is the return probability of a simple random walk on a 2k-regular tree, which is at most \(k^{-c \ell }\) for a constant \(c > 0\) (see [29, Theorem 3 and Lemma 2.2] or [15, Appendix B]). \(\square \)

Reaching a normal subset: the xw(yz) trick

In this section, something of an interlude, let G be any finite group, and let \({\mathfrak {C}}\) be a normal (i.e., conjugacy-closed) subset of a group G. We will develop a criterion ensuring that one can, with high probability as \(x,y,z \in G\) are chosen uniformly at random, find a word \(w \in F_2\) of at most a prescribed length such that \(x w(y, z) \in {\mathfrak {C}}\). The criterion applies to sets \({\mathfrak {C}}\) whose density is large compared to the expected values of characters. This is a variation of the technique used in [13, Sect. 4]; see also [14, Sect. 2].

The following theorem expresses the most general such estimate we will need, in which we further allow arbitrary weights to be attached to elements of \({\mathfrak {C}}\). We express the result in terms of a nonnegative conjugation-invariant function (class function) f on G. We define the \(L^p\) norm of f by

$$\begin{aligned} \Vert f\Vert _p^p = \frac{1}{|G|} \sum _{x \in G} |f(x)|^p \qquad (p \in \{1, 2\}), \end{aligned}$$

and we use the standard inner product on functions on G defined by

$$\begin{aligned} \langle f, g \rangle = \frac{1}{|G|} \sum _{x \in G} f(x) \overline{g(x)}. \end{aligned}$$

Theorem 6.1

Let f be a nonnegative and conjugation-invariant function on G, and let \(\ell \) be a positive integer. Let \(x_0, x_1, \ldots , x_k\) be elements of G chosen uniformly at random. Let E be the event that \(f(x_0 {{\overline{u}}}) = 0\) for every word \(u \in F_k\) of length at most \(\ell \). Let w be the result of a simple random walk of length \(2\ell \) in \(F_k\). ThenFootnote 3

$$\begin{aligned} {\mathbf {P}}_{x_0, \ldots , x_k} \left( E\right) \le \frac{1}{\Vert f\Vert _1^2} \sum _{1 \ne \chi \in {\text {Irr}}{G}} |\langle f, \chi \rangle |^2 {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

In particular,

$$\begin{aligned} {\mathbf {P}}_{x_0, \ldots , x_k} \left( E\right) \le \frac{\Vert f\Vert _2^2}{\Vert f\Vert _1^2} \max _{\begin{array}{c} 1 \ne \chi \in {\text {Irr}}{G} \\ \langle f, \chi \rangle \ne 0 \end{array}} {\mathbf {E}}_{x_1, \ldots , x_k,w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

Proof

Let \({\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k}\) be the adjacency operator defined in Sect. 2.6, and consider its natural action on \(L^2(G)\). Let \(X = {\mathcal {A}}^\ell f(x_0)\), regarded as a random variable dependent on \(x_0, x_1, \ldots , x_k\), and note that E is precisely the event \(X = 0\). By Chebyshev’s inequality,

$$\begin{aligned} {\mathbf {P}}(X = 0) \le \frac{{\text {Var}}X}{\left( {\mathbf {E}}X \right) ^2}. \end{aligned}$$
(4)

The first moment is

$$\begin{aligned} {\mathbf {E}}X = \Vert f\Vert _1. \end{aligned}$$

The second moment is

$$\begin{aligned} {\mathbf {E}}X^2 = {\mathbf {E}}_{x_1, \ldots , x_k} \Vert {\mathcal {A}}^\ell f\Vert _2^2 = {\mathbf {E}}_{x_1, \ldots , x_k} \langle {\mathcal {A}}^{2\ell } f, f\rangle . \end{aligned}$$
(5)

Since f is conjugation-invariant, we can expand this further in terms of characters. By orthogonality of characters, if \(\tau _x\) is the translation operator defined by \(\tau _x(h)(y) = h(x^{-1} y)\), we have

$$\begin{aligned} \langle \tau _x \chi , \psi \rangle = {\left\{ \begin{array}{ll} \chi (x^{-1})/\chi (1) &{}\text {if}~\chi =\psi , \\ 0 &{}\text {else}. \end{array}\right. } \end{aligned}$$

Hence

$$\begin{aligned} \langle {\mathcal {A}}^{2\ell } \chi , \psi \rangle = 0 \qquad (\chi \ne \psi ), \end{aligned}$$

and

$$\begin{aligned} \langle {\mathcal {A}}^{2\ell } \chi , \chi \rangle = {\mathbf {E}}_w \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) , \end{aligned}$$

where w is the result of a simple (symmetric) random walk of length \(2\ell \) in \(F_k\). Hence, from (5),

$$\begin{aligned} {\mathbf {E}}X^2 = \sum _{\chi \in {\text {Irr}}{G}} |\langle f, \chi \rangle |^2 {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

The \(\chi = 1\) term is \(\Vert f\Vert _1^2\), which is the same as \(({\mathbf {E}}X)^2\). Hence the first part of the theorem follows from (4). The second part holds because

$$\begin{aligned} \sum _{\chi \in {\text {Irr}}{G}} |\langle f, \chi \rangle |^2 = \Vert f\Vert _2^2. \end{aligned}$$

Corollary 6.2

Let \({\mathfrak {C}}\) be a normal subset of G. Write

$$\begin{aligned} {\mathfrak {C}}= \bigcup _{\alpha \in G^\text {ab}} {\mathfrak {C}}_\alpha , \end{aligned}$$

where \({\mathfrak {C}}_\alpha = {\mathfrak {C}}\cap \alpha G'\) is the fibre of \({\mathfrak {C}}\) over \(\alpha \in G^\text {ab}\). Let \(\delta _\alpha = |{\mathfrak {C}}_\alpha | / |G'|\) be the fibre density, and let \(\delta = \min _{\alpha \in G^\text {ab}} \delta _\alpha \). Assume \(\delta > 0\).

Let \(x_0, x_1, \ldots , x_k \in G\) be chosen uniformly at random, and let E be the event that for every word \(u \in F_k\) of length at most \(\ell \) we have \(x_0 {{\overline{u}}} \notin {\mathfrak {C}}\). Let w be the result of a simple random walk of length \(2\ell \) in \(F_k\). Then

$$\begin{aligned} {\mathbf {P}}(E) \le \delta ^{-1} \max _{\begin{array}{c} \chi \in {\text {Irr}}{G} \\ \chi (1) > 1 \end{array}} {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

Proof

In the previous theorem, take

$$\begin{aligned} f = \sum _{\alpha \in G^\text {ab}} \frac{1_{{\mathfrak {C}}_\alpha }}{\delta _\alpha }. \end{aligned}$$

Then \(\Vert f\Vert _1 = 1\), and

$$\begin{aligned} \Vert f\Vert _2^2 = \frac{1}{|G^\text {ab}|} \sum _{\alpha \in G^\text {ab}} \delta _\alpha ^{-1} \le \delta ^{-1}. \end{aligned}$$

Thus

$$\begin{aligned} {\mathbf {P}}(E) \le \delta ^{-1} \max _{\begin{array}{c} 1 \ne \chi \in {\text {Irr}}{G} \\ \langle f, \chi \rangle \ne 0 \end{array}} {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

Now if \(\chi \ne 1\) is one-dimensional then \(\chi \) factors through \(G^\text {ab}\), so

$$\begin{aligned} \langle f, \chi \rangle = \frac{1}{|G^\text {ab}|} \sum _{\alpha \in G^\text {ab}} \chi (\alpha ) = 0. \end{aligned}$$

Hence

$$\begin{aligned} {\mathbf {P}}(E) \le \delta ^{-1} \max _{\begin{array}{c} 1 \ne \chi \in {\text {Irr}}{G} \\ \chi (1) > 1 \end{array}} {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi (\overline{w})}{\chi (1)}\right) . \end{aligned}$$

Obtaining an element of minimal degree

Let \(G = {\text {GCl}}_n(q)\). Let s be the minimal degree of a nontrivial element of \({\text {SCl}}_n(q)\); thus \(s = 2\) in the orthogonal case and \(s = 1\) otherwise. Let

$$\begin{aligned} {\mathfrak {M}}= \{ g \in {\text {SCl}}_n(q) : \deg g = s \}. \end{aligned}$$

In this section we exhibit a large normal subset \({\mathfrak {C}}_d \subseteq G\) with an integer parameter d whose \(q^d - 1\) power is contained in \({\mathfrak {M}}\). We will use \({\mathfrak {C}}_d\) in combination with Corollaries 5.3 and 6.2 to obtain an element of minimal degree as a short word in random generators.

Proposition 7.1

There is a constant \(C > 0\) so that the following holds. Let \(d \in [2, n]\) be an integer parameter. Assume \(q^d > Cn\). Then there is a normal subset \({\mathfrak {C}}_d \subseteq G\) with the following properties.

  1. (1)

    For every \(\alpha \in G^\text {ab}\), if \({\mathfrak {C}}_{d;\alpha }\) is the fibre of \({\mathfrak {C}}_d\) over \(\alpha \), then

    $$\begin{aligned} \frac{|{\mathfrak {C}}_{d; \alpha }|}{|G|} \ge \exp \left( -O(d^2 \log q) - O(d^{-1} n \log n) \right) . \end{aligned}$$
  2. (2)

    For every \(g \in {\mathfrak {C}}_d\), we have

    $$\begin{aligned} g^{\kappa (q^d - 1)} \in {\mathfrak {M}}, \end{aligned}$$

    where \(\kappa = 2\) if G is orthogonal in even characteristic, and \(\kappa = 1\) otherwise.

The proof is split into cases depending on the type of G.

The linear case

Let \(G = {\text {GL}}_n(q)\). In this case \({\mathfrak {M}}\) is the set of transvections. Let V be the natural module for G. Write

$$\begin{aligned} n - 3 = kd + r, \qquad (0 \le r < d), \end{aligned}$$

i.e., let \(k = \left\lfloor {\frac{n-3}{d}}\right\rfloor \) and \(r = n - 3 - k d\). Decompose V as

$$\begin{aligned} V = L \oplus V_1 \oplus \cdots \oplus V_k \oplus R \oplus W, \end{aligned}$$

where \(\dim L = 2\), \(\dim V_i = d\), \(\dim R = 1\), and \(\dim W = r\). Fix a basis for each of the subspaces.

We now define a particular element \(g \in {\text {GL}}(V)\) respecting the above decomposition. We define g by its action on the chosen basis for each of the subspaces above.

Subspace L::

Let g act as a transvection on L, say \({\begin{matrix} 1 &{}\quad 1 \\ 0 &{}\quad 1 \end{matrix}}\). Note that \((g|_L)^{q^d-1} = (g|_L)^{-1}\) is also a transvection.

Subspace \(V_i\) ::

Let \(p_i\) be a monic irreducible polynomial of degree d over \({\mathbf {F}}_q\). Identify \(V_i\) with \({\mathbf {F}}_q[t]/(p_i(t))\). The variable t acts on the latter space by multiplication. Let g act on \(V_i\) as multiplication by t. Note that the minimal polynomial of this transformation is \(p_i\), and \((g|_{V_i})^{q^d-1} = 1\).

Subspace R::

Let \(\alpha \in G^\text {ab}\). Let g act on R as the scalar \(\det (\alpha )/ \prod _{i = 1}^k (-1)^d p_i(0)\).

Subspace W::

Let g act trivially on W.

Let \({\mathfrak {I}}_d\) denote the set of monic irreducible polynomials of degree d over \({\mathbf {F}}_q\). For every tuple \(p_1, \ldots , p_k \in {\mathfrak {I}}_d\) with \(p_{i} \ne p_{i'}\) for \(i \ne i'\) and \(\alpha \in G^\text {ab}\) we thus have an element \(g = g_{p_1, \ldots , p_k; \alpha } \in G\). Let \(g_{p_1, \ldots , p_k; \alpha }^G\) denote the conjugacy class of \(g_{p_1, \ldots , p_k; \alpha }\) (this class does not depend on the order of \(p_1, \ldots , p_k\)). Let

$$\begin{aligned} {\mathfrak {C}}_{d; \alpha } = \bigcup _{ \{ p_1, \ldots , p_k \} \in \left( {\begin{array}{c}{\mathfrak {I}}_d\\ k\end{array}}\right) } g_{p_1, \ldots , p_k; \alpha }^G. \end{aligned}$$

The union is disjoint, because the minimal polynomial of each element of \(g_{p_1, \ldots , p_k; \alpha }^G\) is divisible by \(p_1(t) \cdots p_k(t)\) (the other factors are \((t-1)^2\) and \((t-\lambda )\) for \(\lambda = \det (\alpha ) / \prod _{i=1}^k (-1)^d p_i(0)\) if \(\lambda \ne 1\)). Finally let

$$\begin{aligned} {\mathfrak {C}}_d = \bigcup _{\alpha \in G^\text {ab}} {\mathfrak {C}}_{d; \alpha }. \end{aligned}$$

Remark 7.2

This is a variation of the construction in [14, Sect. 3.2].

Proof of Proposition 7.1for \({\text {GL}}\). By construction, \({\mathfrak {C}}_{d;\alpha }\) is the fibre of \({\mathfrak {C}}_d\) over \(\alpha \), and for every \(p_1, \ldots , p_k,\alpha \) we have \(g_{p_1, \ldots , p_k; \alpha }^{q^d-1} \in {\mathfrak {M}}\). It remains only to estimate the density of \({\mathfrak {C}}_{d;\alpha }\).

For \(g = g_{p_1, \ldots , p_k; \alpha }\), we have (as in the proof of Lemma 2.3)

$$\begin{aligned} |C_G(g)| \le |C_{{\text {M}}_n({\mathbf {F}}_q)}(g)| = q^{n + r^2 + O(r)}. \end{aligned}$$

Therefore

$$\begin{aligned} |{\mathfrak {C}}_{d; \alpha }| \ge \left( {\begin{array}{c}|{\mathfrak {I}}_d|\\ k\end{array}}\right) \cdot \frac{|G|}{q^{n + r^2 + O(r)}}. \end{aligned}$$
(6)

Recall that

$$\begin{aligned} |{\mathfrak {I}}_d| = q^d / d - O(q^{d/2} / d). \end{aligned}$$

In particular, by the hypothesis \(q^d > Cn\) we have \(|{\mathfrak {I}}_d| > k\), and in fact

$$\begin{aligned} \left( {\begin{array}{c}|{\mathfrak {I}}_d|\\ k\end{array}}\right)&= \left( {\begin{array}{c}q^d/d - O(q^{d/2}/d)\\ k\end{array}}\right) \\&\ge \left( \frac{ q^d/d - O(q^{d/2}/d) }{k} \right) ^k \\&= \left( \frac{ q^d (1 - O(q^{-d/2}) }{dk} \right) ^k \\&\ge \frac{q^{n-2-r}}{ n^{n/d} } \cdot e^{- O \left( n/ d \cdot q^{-d/2} \right) }. \end{aligned}$$

Hence, from (6), since \(r < d\),

$$\begin{aligned} \frac{|{\mathfrak {C}}_{d; \alpha }|}{|G|} \ge \exp \left( - (d^2 + O(d)) \log q - \frac{n}{d} \log n - O(n/d \cdot q^{-d/2}) \right) . \end{aligned}$$

This proves the proposition. \(\square \)

Other classical groups

Let \(G = {\text {GCl}}_n(q)\), where \({\text {GCl}}\ne {\text {GL}}\). Let V be the natural module for G equipped with a nondegenerate binary form f and possibly a quadratic form Q. By Witt’s decomposition theorem, there is an orthogonal decomposition of V of the form

$$\begin{aligned} V = H \perp V_\text {an}, \end{aligned}$$

where H is an orthogonal direct sum of hyperbolic planes and \(V_{\text {an}}\) is anisotropic, and \(\dim V_\text {an}\le 2\) by the Chevalley–Warning theorem. Let \(\delta = \dim V_\text {an}+ 4 + 2 \kappa \), where \(\kappa = 2\) if G is orthogonal in even characteristic, and \(\kappa = 1\) otherwise. Let \(D = 2d\) and write

$$\begin{aligned} n - \delta = kD + r, \qquad (0 \le r < D), \end{aligned}$$

i.e., let \(k = \left\lfloor {(n-\delta )/D}\right\rfloor \) and \(r = n - \delta - k D\). Write the hyperbolic space H as

$$\begin{aligned} H = L \perp V_1 \perp \cdots \perp V_k \perp R \perp W', \end{aligned}$$

where each constituent is an orthogonal direct sum of hyperbolic planes with \(\dim L = 2 \kappa + 2\), \(\dim V_i = D\), \(\dim R = 2\), and \(\dim W' = r\). Let \(W = W' \perp V_{\text {an}}\). Thus we have the following orthogonal decomposition of V:

$$\begin{aligned} V = L \perp V_1 \perp \cdots \perp V_k \perp R \perp W. \end{aligned}$$
(7)

Fix a hyperbolic basis for each of the hyperbolic spaces, and fix a basis for W.

We now define a particular element \(g \in {\text {GCl}}(V)\) respecting the decomposition (7). As before we will define g by its action on the chosen bases.

Subspace L::

Let \(v_1, \ldots , v_{\kappa + 1}, w_1, \ldots , w_{\kappa + 1}\) be the chosen hyperbolic basis for L, i.e., such that \(L_1 = \langle v_1, \ldots , v_{\kappa + 1} \rangle \) and \(L_2 = \langle w_1, \ldots , w_{\kappa + 1} \rangle \) are totally singular subplanes, and f is represented with respect to \(v_1, \ldots , v_{\kappa + 1}, w_1, \ldots , w_{\kappa + 1}\) by

$$\begin{aligned} \begin{pmatrix} 0 &{}\quad I \\ \pm I &{}\quad 0 \end{pmatrix}. \end{aligned}$$

Symplectic case::

Let g act on L as the transvection

$$\begin{aligned} \begin{pmatrix} 1 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \\ \end{pmatrix}. \end{aligned}$$
Unitary case::

Pick \(\lambda \in {\mathbf {F}}_q\) be such that \(\lambda + \lambda ^\theta = 0\) (where \(\theta \) is the field automorphism) and let g act on L as the transvection

$$\begin{aligned} \begin{pmatrix} 1 &{}\quad 0 &{}\quad \lambda &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \\ \end{pmatrix}. \end{aligned}$$
Orthogonal case::

Let \(g|_L\) be represented by the matrix

$$\begin{aligned} \begin{pmatrix} A &{}\quad 0 \\ 0 &{}\quad A^{-T} \end{pmatrix}, \end{aligned}$$

where in odd characteristic

$$\begin{aligned} A = \begin{pmatrix} 1 &{}\quad 1 \\ 0 &{}\quad 1 \end{pmatrix} \end{aligned}$$

and in even characteristic

$$\begin{aligned} A = \begin{pmatrix} 1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 1 \end{pmatrix}. \end{aligned}$$

In all cases we have \(g|_L \in {\text {SCl}}(L)\) and \((g|_L)^{\kappa (q^d-1)} \ne 1\).

Subspace \(V_i\)::

Fix a monic irreducible polynomial \(p_i \in {\mathbf {F}}_q[t]\) of degree d. Let \(v_1, \ldots , v_d, w_1, \ldots , w_d\) be the chosen hyperbolic basis for \(V_i\). Thus there is a decomposition

$$\begin{aligned} V_i = V_{i,1} \oplus V_{i,2} \end{aligned}$$

into totally singular subspaces \(V_{i,1} = \langle v_1, \ldots , v_d \rangle \) and \(V_{i,2} = \langle w_1, \ldots , w_d \rangle \) with \(f(v_a, w_b) = \delta _{ab}\). Identify \(V_{i,1}\) with \({\mathbf {F}}_q[t]/(p_i(t))\). The variable t acts on the latter space by multiplication. By Witt’s lemma, this action extends to the space \(V_i\). This extension is moreover unique provided we demand that it preserves the decomposition of \(V_i\) (see [28, Hilfssatz 3.1]). Let \(g|_{V_i}\) be defined by this unique extension. The minimal polynomial of this transformation can be determined as follows (see [41]). In the symplectic and orthogonal cases, let \(p^*(t) = p(0)^{-1} t^d p(t^{-1})\). In the unitary case, let \(p^*(t) = p^\theta (0)^{-1} t^d p^\theta (t^{-1})\), where \(\theta \) acts on the coefficients. The minimal polynomial of g acting on \(V_i\) is \(*\)-symmetric, divisible by \(p_i\) (since \(p_i\) is irreducible), and hence also divisible by \(p_i^*\). Under the assumption that \(p_i \ne p_i^*\), the minimal polynomial of \(g|_{V_i}\) must therefore be equal to \(p_i p_i^*\). If \(p_i = p_i^*\) then the minimal polynomial is \(p_i\).

Subspace R::

Let \(\alpha \in G^\text {ab}\).

Symplectic case::

Let g act trivially on R. (Note \(G^\text {ab}\) is trivial.)

Unitary case::

Let g act as the matrix

$$\begin{aligned} \begin{pmatrix} a &{} 0 \\ 0 &{} a^{-\theta } \end{pmatrix}, \end{aligned}$$

where \(a \in {\mathbf {F}}_q\) satisfies \(a^{1-\theta } \prod _{i=1}^k p_i(0)^{1-\theta } = \det \alpha \). Such an element always exists since \(\det \alpha \) has norm 1.

Orthogonal case::

The natural map \({\text {GO}}(R)^\text {ab}\rightarrow G^\text {ab}\) is bijective.Footnote 4 Let g act on R so that for every linear character \(\lambda \) of G we have

$$\begin{aligned} \lambda (g|_R) \prod _{i=1}^k \lambda (g|_{V_i}) = \lambda (\alpha ). \end{aligned}$$

In all cases note that \((g|_R)^{\kappa (q^d-1)}\) is trivial.Footnote 5

Subspace W::

Let g act trivially on W.

For every k-tuple \(p_1, \ldots , p_k \in {\mathfrak {I}}_d\) and every \(\alpha \in G^\text {ab}\), we thus have an element \(g = g_{p_1, \ldots , p_k ; \alpha } \in G\). The conjugacy class \(g_{p_1, \ldots , p_k ; \alpha }^G\) is invariant under reordering \(p_1, \ldots , p_k\), and under replacing any \(p_i\) by \(p_i^*\). Conversely, \(g_{p_1, \ldots , p_k ; \alpha }^G\) is determined by \(p_1(t) p_1^*(t) \cdots p_k(t) p_k^*(t)\) and \(\alpha \). Let \({\mathfrak {I}}_d'\) be the set of unordered pairs \(\{p, p^*\}\) of monic irreducible polynomials \(p, p^* \in {\mathfrak {I}}_d\) with \(p \ne p^*\). Let

$$\begin{aligned} {\mathfrak {C}}_{d ; \alpha } = \bigcup \left\{ g_{p_1, \ldots , p_k ; \alpha }^G : \{ \{p_1, p_1^*\}, \ldots , \{p_k, p_k^*\} \} \in \left( {\begin{array}{c}{\mathfrak {I}}_d'\\ k\end{array}}\right) \right\} , \end{aligned}$$

The union is disjoint, because the minimal polynomial of every element of \(g_{p_1, \ldots , p_k ; j}^G\) is divisible by \(p_1(t) p_1^*(t) \cdots p_k(t) p_k^*(t)\) and has no other nonlinear factors. Finally let

$$\begin{aligned} {\mathfrak {C}}_d = \bigcup _{\alpha \in G^\text {ab}} {\mathfrak {C}}_{d ; \alpha }. \end{aligned}$$

Proof of Proposition 7.1 for other classical groups

By construction, \(g_{p_{1},\ldots ,p_{k} ; \alpha }\) lies over \(\alpha \) and \(g_{p_1, \ldots , p_k ; \alpha }^{\kappa (q^d-1)} \in {\mathfrak {M}}\). We must estimate the density of \({\mathfrak {C}}_{d;\alpha }\).

Consider \(g = g_{p_1, \ldots , p_k ; \alpha }\) for some \(p_1, \ldots , p_k \in {\mathfrak {I}}_d'\) with \(p_i \ne p_{i'}, p_{i'}^*\) for \(i \ne i'\). Let \(h \in C_G(g)\). Then h preserves each \(V_{i,1}\) and \(V_{i,2}\), those being the \(p_i\)- and \(p_i^*\)-primary subspaces of g. The restrictions of h to \(V_{i,1}\) and \(V_{i,2}\) determine one another, and there are at most \(q^d\) possibilities for \(h|_{V_{i,1}}\) (as in Lemma 2.3). Hence, since \(\delta = O(1)\),

$$\begin{aligned} |C_G(g)| \le (q^d)^k |{\text {M}}_{r+\delta }({\mathbf {F}}_q)| \le q^{dk + r^2 + O(r)+O(1)}. \end{aligned}$$

Therefore

$$\begin{aligned} |{\mathfrak {C}}_{d; \alpha }| \ge \left( {\begin{array}{c}|{\mathfrak {I}}_d'|\\ k\end{array}}\right) \cdot \frac{|G|}{q^{dk + r^2 + O(r) + O(1)}}. \end{aligned}$$

The number of monic irreducible polynomials of degree d over \({\mathbf {F}}_q\) is \(q^d/d - O(q^{d/2}/d)\), while the number of \(*\)-symmetric polynomials of degree d is at most \(q^{d/2}\), so

$$\begin{aligned} |{\mathfrak {I}}_d'| = (q^d / d - O(q^{d/2})) / 2 \ge c q^d/d. \end{aligned}$$

By the hypothesis \(q^d > Cn\) this is at least k, and in fact

$$\begin{aligned} \left( {\begin{array}{c}|{\mathfrak {I}}_d'|\\ k\end{array}}\right) \ge \left( \frac{cq^d}{dk}\right) ^k \ge q^{dk} \exp \left( -O(k \log n)\right) , \end{aligned}$$

so

$$\begin{aligned} \frac{|{\mathfrak {C}}_{d; \alpha }|}{|G|} \ge \exp \left( -O(d^2 \log q) - O(d^{-1} n \log n)\right) . \end{aligned}$$

This proves the proposition. \(\square \)

Collecting results

We now collect the results from the previous sections to conclude that with high probability as three random elements from G are chosen uniformly at random, there is a short word in these elements that belongs to \({\mathfrak {M}}\).

Theorem 7.3

There are constants \(c, C > 0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(\log q < c n \log ^{-2} n\). Let xyz be elements of G chosen uniformly at random. Let M be the event that there exists a word \(w \in F_3\) of length at most \(n^{C \log q}\) such that \(w(x,y,z) \in {\mathfrak {M}}\). Then

$$\begin{aligned} {\mathbf {P}}_{x,y,z}(M) \ge 1 - e^{- c n}. \end{aligned}$$

Proof

By Corollaries 5.3 and 6.2 there are constants \(c_1, c_2 > 0\) and \(C_1, C_2\) such that the following holds. Let \(\ell = \left\lfloor {c_1 n / 2}\right\rfloor \), and let E be the event that every word \(u \in F_2\) of length at most \(\ell \) satisfies \(x u(y, z) \notin {\mathfrak {C}}_d\). Then

$$\begin{aligned} {\mathbf {P}}(E)&\le \max _{\alpha \in G^\text {ab}} \frac{|G'|}{|{\mathfrak {C}}_{d;\alpha }|} (q^{-c_1 n} + 2^{-c_1 2 \ell })\nonumber \\&\le \exp (C_1 d^2 \log q + C_1 d^{-1} n \log n - c_2 n), \end{aligned}$$
(8)

provided \(q^d > C_2 n\). Take \(d \sim C_3 \log n\) for a constant \(C_3\). If \(\log q < cn / \log ^2 n\) for a sufficiently small constant c so that \(c, C_3\) satisfy \(C_1 C_3^2 c + C_1/C_3 - c_2 < - c\), then \({\mathbf {P}}(E) \le e^{-cn}\).

On the other hand suppose E fails, i.e., suppose there is a word u of length at most \(c_1 n\) such that \(x u(y, z) \in {\mathfrak {C}}_d\). Let \(w \in F_3\) be the word

$$\begin{aligned} w = (\xi _1 u(\xi _2, \xi _3))^{\kappa (q^d-1)}. \end{aligned}$$

The length of w is at most

$$\begin{aligned} \kappa (q^d - 1) (1 + c_1 n / 2) \le n^{C \log q}, \end{aligned}$$

and

$$\begin{aligned} w(x, y, z) = (x u(y, z))^{\kappa (q^d - 1)} \in {\mathfrak {M}}. \end{aligned}$$

Hence \(E^c \subseteq M\). This completes the proof. \(\square \)

This completes the proof of Theorem 1.1.

If we are allowed \(q^C\) random generators, we can reach the set \({\mathfrak {M}}\) using shorter words.

Theorem 7.4

There are constants \(c, C > 0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(n > C\). Let \(x_0, x_1, \ldots , x_k\) be elements of G chosen uniformly at random, where \(k > q^C\). Let M be the event that there exists a word \(w \in F_{k+1}\) of length at most \(q^2 n^C\) such that \(w(x_0, \ldots , x_k) \in {\mathfrak {M}}\). Then

$$\begin{aligned} {\mathbf {P}}_{x_0, \ldots , x_k}(M) \ge 1 - q^{- c n}. \end{aligned}$$

Proof

Follow the proof of the previous theorem, replacing \(u \in F_2\) with \(u \in F_k\). Since \(\log k > C \log q\), we can replace (8) with the bound

$$\begin{aligned} {\mathbf {P}}(E)&\le \max _{\alpha \in G^\text {ab}} \frac{|G'|}{|{\mathfrak {C}}_{d;\alpha }|} q^{- c_2 n} \\&\le \exp (C_1 d^2 \log q + C_1 d^{-1} n \log n - c_2 n \log q), \end{aligned}$$

provided \(q^d > C_2 n\). Take \(d = \max (\left\lceil {C_3 \log n / \log q}\right\rceil , 2)\) for sufficiently large \(C_3\). As long as \(n > C\) we find \({\mathbf {P}}(E) \le q^{-cn}\). Note that \(q^d \le q^2 n^C\) in this case. The rest of the argument is the same. \(\square \)

Closed trajectories with only one coincidence

A trajectory is closed if \(v^\ell = v^0\). In Sect. 9 we will need to understand the structure of closed trajectories with only one coincidence. More generally the joint trajectory of an r-tuple \((v_1, \ldots , v_r)\) is called closed if each individual trajectory is closed, and we will need to understand the structure of closed joint trajectories with only one coincidence in each individual trajectory. We begin with the single-trajectory case, for motivation.

Lemma 8.1

Assume w is nontrivial and cyclically reduced. Suppose the trajectory \(v^0, \ldots , v^\ell \) is closed, and suppose there is only one coincidence, at step t say. Then

$$\begin{aligned} w = (w_d \cdots w_1)^{\ell / d}, \qquad \text {where}~d = \gcd (t, \ell ). \end{aligned}$$

In particular if w is not a proper power then \(t = \ell \).

Proof

Let

$$\begin{aligned} {{\widetilde{w}}} = \cdots w_1 w_\ell \cdots w_1 \end{aligned}$$

be the left-infinite \(\ell \)-periodic extension of w. Since \(v^\ell = v^0\), the trajectory of v under \({{\widetilde{w}}}\) (defined in the obvious way) is just the \(\ell \)-periodic extension of \(v^0, \ldots , v^\ell \), and still there is only one coincidence, at step t. The choices at steps \(1, \ldots , t\) are free and all subsequent choices are forced (as in the proof of Lemma 3.5). We claim that \({{\widetilde{w}}}\) is in fact \(\gcd (t, \ell )\)-periodic, and it suffices to prove that it is t-periodic.

Since the choices at steps \(1, \ldots , t - 1\) are free and not coincidences, the choice at step t is a coincidence, and all subsequent choices are forced, the vectors \(v^0, \ldots , v^{t-1}\) are linearly independent and the whole trajectory is contained in their span. In particular

$$\begin{aligned} v^t = a_0 v^0 + \cdots + a_{t-1} v^{t-1} \qquad (a_0, \ldots , a_{t-1} \in {\mathbf {F}}_q). \end{aligned}$$
(9)

Given that step \(t+1\) is forced, we must have \(v^i \in D_{w_{t+1}}^{t+1}\) for each i such that \(a_i \ne 0\). Thus either \(w_{t+1} = w_{i+1}\) or \(w_{t+1} = w_i^{-1}\) (\(i > 0\)). Similarly,

$$\begin{aligned} v^{\ell - 1} = b_0 v^0 + \cdots + b_{t-1} v^{t-1} \qquad (b_0, \ldots , b_{t-1} \in {\mathbf {F}}_q), \end{aligned}$$

and \(v^\ell = v^0\) is forced. Since \(w_\ell \ne w_1^{-1}\), we must have \(w_\ell = w_t\) and \(a_0 \ne 0\) (see Remark 8.2 for more details). Therefore

$$\begin{aligned} w_{t+1} = w_1. \end{aligned}$$

Consider now the trajectory of \(v^1\) under

$$\begin{aligned} {{\widetilde{w}}}' = {{\widetilde{w}}} w_1^{-1} = \cdots w_3 w_2. \end{aligned}$$

The trajectory is just \(v^1, v^2, \ldots , v^\ell , v^0, v^1, \ldots \). By (9) and \(a_0 \ne 0\), \(v^1, \ldots , v^t\) are linearly independent, and, for every letter \(\xi \),

$$\begin{aligned} {\text {span}}\{ v^i \mid 0< i \le t, v^i \in D_\xi ^{t+1} \} = {\text {span}}\{ v^i \mid 0 \le i < t, v^i \in D_\xi ^{t} \}. \end{aligned}$$

Therefore the trajectory of \(v^1\) also has just one coincidence, again at step t (when \(v^{t+1}\) is chosen). Therefore by the same argument we must have \(w'_{t+1} = w'_1\), or

$$\begin{aligned} w_{t+2} = w_2. \end{aligned}$$

Repeating this argument as many times as necessary proves that \({{\widetilde{w}}}\) is t-periodic, as claimed. \(\square \)

Remark 8.2

If \(t = \ell \), we must have \(a_0 = 1\) and all other \(a_i = 0\). The general case \(t < \ell \) is more complicated, but we can still describe the possibilities. From (9), because step \(t+1\) is forced we must have

$$\begin{aligned} v^{t+1} = \overline{w_{t+1}} v^t = \sum _{i=0}^{t-1} a_i v^{i \pm 1}, \end{aligned}$$
(10)

the signs depending on whether \(w_{t+1} = w_{i+1}\) or \(w_{t+1} = w_i^{-1}\) (\(a_i \ne 0\)). At the next step,

$$\begin{aligned} v^{t+2} = \sum _{i=0}^{t-1} a_i v^{i \pm 1 \pm 1}, \end{aligned}$$

and so on. We make a few observations:

  1. 1.

    The vectors \(v^{i \pm 1}\), etc, obey a no-crossing rule: we cannot have

    $$\begin{aligned}&v^i \xrightarrow {w_{s+1}} v^{i+1}, \\&v^{i+1} \xrightarrow {w_{s+1}} v^i, \end{aligned}$$

    as then we would have both \(w_{s+1} = w_{i+1}\) and \(w_{s+1} = w_{i+1}^{-1}\), for some i.

  2. 2.

    Similarly, there is a no-meeting rule: we cannot have

    $$\begin{aligned} v^i&\xrightarrow {w_{s+1}} v^{i+1}, \\ v^{i+2}&\xrightarrow {w_{s+1}} v^{i+1}, \end{aligned}$$

    as then we would have both \(w_{s+1} = w_{i+1}\) and \(w_{s+1} = w_{i+2}^{-1}\), but the expression for w is supposed to be reduced.

  3. 3.

    Finally, there is a time-consistency rule: we cannot have

    $$\begin{aligned} v^i \xrightarrow {w_{s+1}} v^{i+1} \xrightarrow {w_{s+2}} v^i, \end{aligned}$$

    as then we would have \(w_{s+1} = w_{i+1}\) and \(w_{s+2} = w_{i+1}^{-1}\), but again the expression for w is supposed to be reduced; nor could we have

    $$\begin{aligned} v^i \xrightarrow {w_{s+1}} v^{i-1} \xrightarrow {w_{s+2}} v^i, \end{aligned}$$

    as then we would have \(w_{s+1} = w_i^{-1}\) and \(w_{s+2} = w_i\).

Since \(a_0 \ne 0\) and \(w_{t+1} v^0 = v^1\), the only resolution is that

$$\begin{aligned} v^{t+s} = \sum _{i=0}^{t-1} a_i v^{i + s} \end{aligned}$$

for all \(s \ge 0\) (extending \(\ell \)-periodically). In other words, the sequence \((v^s)\) in \({\text {span}}\{v^0, \ldots , v^{t-1}\}\) corresponds with the sequence \((X^s)\) in \({\mathbf {F}}_q[X] / (f)\), where

$$\begin{aligned} f = X^t - a_{t-1} X^{t-1} - \cdots - a_0 X^0. \end{aligned}$$

Since \(v^\ell = v^0\) we must have

$$\begin{aligned} f \mid X^\ell - 1. \end{aligned}$$

Conversely, if f is a divisor of \(X^\ell - 1\), and if the period of w divides t and \(i - i'\) whenever \(a_i \ne 0\) and \(a_{i'} \ne 0\), then a one-coincidence trajectory of this type exists.

We now consider closed joint trajectories with only one coincidence in each individual trajectory. The following lemma generalizes Lemma 8.1.

Lemma 8.3

Assume w is nontrivial and cyclically reduced. Let \(v_1, \ldots , v_r \in V\) be linearly independent. Suppose the joint trajectory of \(v_1, \ldots , v_r\) is closed. Suppose there is just one coincidence in each individual trajectory, and suppose the coincidence in the trajectory of \(v_i\) occurs at step \((t_i, i)\). Then

$$\begin{aligned} w = (w_d \cdots w_1)^{\ell / d}, \qquad \text {where}~d = \gcd (t_1, \ldots , t_r, \ell ). \end{aligned}$$

In particular if w is not a proper power then \(t_i = \ell \) for each i.

Proof

As in the proof of Lemma 8.1, let \({{\widetilde{w}}}\) be the left-infinite \(\ell \)-periodic extension of w, and note that the trajectory of \(v_1, \ldots , v_r\) under \({{\widetilde{w}}}\) is just the \(\ell \)-periodic extension of the trajectory under w, and there are no further free choices.

The choice at step (ti) must be free for \(t \le t_i\) and forced for \(t > t_i\). Therefore the vectors \((v_i^t)_{1 \le i \le r, 0 \le t < t_i}\) are linearly independent and the whole trajectory is contained in their span. Since there is a coincidence at step \((t_i, i)\), we have

$$\begin{aligned} v_i^{t_i} = \sum _{(t, j) \prec (t_i, i)} a_{itj} v_j^t \qquad (a_{itj} \in {\mathbf {F}}_q), \end{aligned}$$
(11)

where \(a_{itj} = 0\) whenever \(t \ge t_j\) (and \((t, j) \prec (t_i, i)\) means \(t < t_i\) or \(t = t_i\) and \(j < i\), as in Sect. 3.3). Let \(A_0\) be the \(r \times r\) matrix

$$\begin{aligned} A_0 = (a_{i0j} : 1 \le i,j \le r). \end{aligned}$$

The matrix \(A_0\) must be nonsingular, for otherwise we could not have \((v_1^\ell , \ldots , v_r^\ell ) = (v_1^0, \ldots , v_r^0)\). In particular, for each i there is some j such that \(a_{i0j} \ne 0\). Since step \((t_i+1, i)\) is forced, the value of \(w_{t_i+1} v_j^0\) must be known; hence

$$\begin{aligned} w_{t_i+1} = w_1. \end{aligned}$$

Consider the joint trajectory of \((v_1^1, \ldots , v_r^1)\) under

$$\begin{aligned} {{\widetilde{w}}}' = {{\widetilde{w}}} w_1^{-1} = \cdots w_3 w_2, \end{aligned}$$

which is just \((v_i^t)_{1 \le i \le r, t \ge 1}\). Since \(A_0\) is nonsingular, we have

$$\begin{aligned} {\text {span}}\{ v_i^t : 1 \le i \le r, 1 \le t \le t_i\} = {\text {span}}\{ v_i^t : 1 \le i \le r, 0 \le t \le t_i - 1\}. \end{aligned}$$

Therefore the vectors \((v_i^t)_{1 \le i \le r, 1 \le t \le t_i}\) are linearly independent, and the joint trajectory of \((v_1^1, \ldots , v_r^1)\) under \({{\widetilde{w}}}'\) has the same behaviour as that of \((v_1, \ldots , v_r)\) under \({{\widetilde{w}}}\): the trajectory of \(v_i^1\) has just one coincidence, at step \((t_i, i)\) (when \(v_i^{t_i+1}\) is chosen). Therefore by the same argument \(w'_{t_i+1} = w'_1\), or

$$\begin{aligned} w_{t_i+2} = w_2. \end{aligned}$$

Repeating the argument as many times as necessary, we conclude that the period of \({{\widetilde{w}}}\) divides \(t_i\) for each i. \(\square \)

Remark 8.4

The discussion in Remark 8.2 generalizes too. From (11) and forcedness, we have

$$\begin{aligned} v_i^{t_i + 1} = \sum _{(t, j) \prec (t_i, i)} a_{itj} v_j^{t \pm 1} \qquad (i \in \{1, \ldots , r\}), \end{aligned}$$
(12)

where the signs are chosen depending on whether \(w_{t+1} = w_1\) or \(w_t = w_1^{-1}\). The latter case can arise only for \(t > 0\), so no \(v_j^0\) can appear in this expression. Hence (12) is the analogue of (10) for the joint trajectory of \((v_1^1, \ldots , v_r^1)\). As before there are no-crossing, no-meeting, and time-consistency rules for the indices t such that \(a_{itj}\ne 0\) for some ij, so in fact we can never have \(v_j^{t-1}\).

We conclude that

$$\begin{aligned} v_i^{t_i+s} = \sum _{(t, j) \prec (t_i, i)} a_{itj} v_j^{t+s} \end{aligned}$$

for all \(s \ge 0\), and hence the trajectory of \((v_1^s, \ldots , v_r^s)\) corresponds with the trajectory of \((Z^s X_1, \ldots , Z^s X_r)\) in the \({\mathbf {F}}_q[Z]\)-module \(({\mathbf {F}}_q[Z] X_1 \oplus \cdots \oplus {\mathbf {F}}_q[Z] X_r) / \langle f_1, \ldots , f_r \rangle \), where

$$\begin{aligned} f_i = Z^{t_i} X_i - \sum _{(t, j) \prec (t_i, i)} a_{itj} Z^t X_j, \end{aligned}$$

and we must have

$$\begin{aligned} \langle (Z^\ell - 1) X_1, \ldots , (Z^\ell -1) X_r \rangle \subseteq \langle f_1, \ldots , f_r \rangle . \end{aligned}$$

Write \(f_i = \sum _j p_{ij} X_j\) for some \(p_{ij} \in {\mathbf {F}}_q[Z]\) and let \(F = (p_{ij} : 1 \le i, j \le r)\). Then there must exist a matrix \(E \in {\text {M}}_r({\mathbf {F}}_q[Z])\) with

$$\begin{aligned} (Z^\ell - 1) I = E F. \end{aligned}$$

This is possible if and only if \(\det F\) divides \(Z^\ell - 1\).

Expansion in low-degree representations

We turn now to the proof of Theorem 1.3. We again consider the action of \(G = {\text {Cl}}_n(q)\) on linearly independent r-tuples of vectors, and we again consider trajectories under the action of a fixed word \(w \in F_k\), much as in Sect. 4. The difference is mainly one of parameter regime. In Sect. 4 we considered r-tuples with r as large as cn for constant c, and we were satisfied with somewhat crude bounds. In this section we consider \(r = O(1)\), and we seek sharper bounds. Our aim is to show that, in an orbit of G of size N, the probability that a trajectory under a given word closes is close to 1/N, with a small relative error; if we can do this it follows that there is a uniform spectral gap. We begin with the case of \(r=1\), which contains most of the key ideas.

The defining representation

Now let \(x_1, \ldots , x_k \in G\) be chosen uniformly at random. Let \({{\overline{w}}} = w(x_1, \ldots , x_k)\). Let \(v \in V \setminus \{0\}\). Let \(N = |G v|\). By Witt’s lemma (Lemma 2.2), N is the number of \(u \in V \setminus \{0\}\) such that \(Q(u) = Q(v)\). Thus, by Lemma 2.1, \(N = q^n/q_0 + O(q^{n/2})\). More generally, if \(U \le V\) is a subspace of dimension d then

$$\begin{aligned} |Gv \cap U| = q^d/q_0 + O(q^{n/2}). \end{aligned}$$

Lemma 9.1

Assume w is nontrivial and not a proper power. Assume \(\ell < n/4\). Then

$$\begin{aligned} {\mathbf {P}}({{\overline{w}}} v = v) \le \frac{1}{N} \left( 1 + O(q^{2\ell - n/2}) \right) . \end{aligned}$$

Proof

By Lemma 3.1 we may also assume that w is cyclically reduced, as replacing w by its cyclic reduction can only decrease its length. In this case Lemma 8.1 implies that the event that \({{\overline{w}}} v = v\) is contained in the union of the following two events:

\(E_1\)::

the trajectory \(v^0, \ldots , v^\ell \) has exactly one coincidence, occuring at step \(\ell \), and \(v^\ell = v^0\),

\(E_2\)::

the trajectory \(v^0, \ldots , v^\ell \) has at least two coincidences.

We can bound the probability of \(E_2\) using Lemma 3.3. Suppose there is a free choice at step \(t \le \ell \). There are t previous vectors, so the probability of a coincidence, conditional on previous steps, is bounded by

$$\begin{aligned} \frac{q^t}{q^{n-t} - q^{t-1} - q^{n/2}}. \end{aligned}$$

Similarly, the conditional probability of a coincidence at a later step \(t'\) is bounded by

$$\begin{aligned} \frac{q^{t'-1}}{q^{n-t'} - q^{t'-1} - q^{n/2}}. \end{aligned}$$

Summing over \(t < t' \le \ell \), we find, using \(\ell < n/4\),

$$\begin{aligned} {\mathbf {P}}(E_2)\le & {} \sum _{1\le t< t' \le \ell } \frac{q^{t+t'-1}}{(q^{n-\ell } - q^{\ell - 1} - q^{n/2})^2} \\\ll & {} q^{4\ell - 2n} \le q^{4 \ell - n} / N < q^{2 \ell - n/2} / N. \end{aligned}$$

Hence we may focus on the event \(E_1\). In the linear case (\({\text {Cl}}={\text {SL}}\)), \(v^\ell \) is chosen uniformly at random outside a linear subspace of dimension at most \(\ell -1\), so the probability of \(E_1\) is bounded by

$$\begin{aligned} \frac{1}{q^n - q^{\ell - 1}} = \frac{1}{N} \left( 1 + O(q^{\ell - n}) \right) . \end{aligned}$$

This completes the proof in this case.

In general, the situation is complicated by form conditions, as previous choices may significantly impact the probability that \(v^\ell = v^0\), even if there were no previous coincidences.

Let \(\xi = w_\ell \). The choice of \(v^\ell \) is subject to one linear constraint for every occurence of \(\xi = w_\ell \) as \(w_t\) or \(w_{t+1}^{-1}\) for some \(t < \ell \). Each such occurence is the end of a maximal subword matching a prefix \(u = w_\ell \cdots w_{\ell -s+1}\) of w, forward in the case \(\xi = w_t\) and backward in the case \(\xi = w_{t+1}^{-1}\) (see Fig. 1). Write \(s = s(t)\) and \(u = u(t)\). Define

$$\begin{aligned} T_1&= \{t< \ell : \xi = w_{t+1}^{-1}\}, \\ T_2&= \{t< \ell : \xi = w_t, \ell - s(t) > t\}, \\ T_3&= \{t < \ell : \xi = w_t, \ell - s(t) \le t\}. \end{aligned}$$

Note that, for \(t \in T_1\), we must have \(t + s < \ell - s\), because \(w_\ell \cdots w_{t+1}\) is reduced. In the \(\xi = w_t\) case it is possible that the subword overlaps (or is adjacent to) the matching prefix, and the division into \(T_2\) and \(T_3\) reflects this possibility.

Fig. 1
figure1

The word w and one of its maximal subwords matching a prefix u. Each occurence of the letter \(w_\ell \) or \(w_\ell ^{-1}\) is the end of one such subword. In the \(w_\ell = w_{t+1}^{-1}\) case we must have \(t + s < \ell - s\)

The choice of \(v^\ell \) at step \(\ell \) is constrained by the linear conditions

$$\begin{aligned} f(v^\ell , v^t)&= f(v^{\ell -s}, v^{t+s})&(t \in T_1) \\ f(v^\ell , v^t)&= f(v^{\ell -s}, v^{t-s})&(t \in T_2 \cup T_3) \end{aligned}$$

(where \(s = s(t)\)). We need to determine whether \(v^0\) is in this affine subspace. Obviously this is the case if and only if

$$\begin{aligned} f(v^0, v^t)&= f(v^{\ell -s}, v^{t+s})&(t \in T_1) \\ f(v^0, v^t)&= f(v^{\ell -s}, v^{t-s})&(t \in T_2 \cup T_3). \end{aligned}$$

Write \(C_t\) for this condition. For \(t \in T_1 \cup T_2\), the truth or falsity of \(C_t\) is determined at step \(\ell - s\), because \(\ell - s > t + s\) in the \(t \in T_1\) case and \(\ell - s > t\) in the \(t \in T_2\) case. The condition is not determined before step \(\ell - s\) by maximality of u(t). For \(t \in T_3\), \(C_t\) is settled at step t, because \(t \ge \ell - s\). The condition is not settled before step t because \(w_t = w_\ell \ne w_1^{-1}\) (since w is cyclically reduced).

Note that we may have \(\ell - s = t\) for \(t \in T_3\): this is the case in which the subword is adjacent to the prefix (see Fig. 2). In this case the condition \(C_t\) is

$$\begin{aligned} f(v^0, v^t) = f(v^t, v^{t-s}). \end{aligned}$$

However, we cannot have also \(t - s = 0\), for then we would have \(w = u^2\). Hence, by linear independence of \(v^0, \ldots , v^{t-1}\), still the condition \(C_t\) is settled at step t and not before. Note, however, if G unitary then \(C_t\) is linear only over \({\mathbf {F}}_{q_0}\) (because the form f is only sesquilinear).

Fig. 2
figure2

The case \(t = \ell - s \in T_3\). In this case we must have \(t - s > 0\), or else \(w = u^2\)

There is a case that may arise in which the various conditions \(C_{t'}\) settled at a given step t are not independent. This is the case in which \(t \in T_3\) and \(t = \ell - s'\) for some \(t' \in T_2\), where \(s' = s(t')\), and \(t' - s' = 0\) (see Fig. 3). Let \(T_4\) be the set of such steps t and let \(T_3' = T_3 \setminus T_4\). If \(t \in T_4\) then we have an overdetermined pair of conditions

$$\begin{aligned} f(v^0, v^{t'})&= f(v^t, v^0)&(C_{t'})\\ f(v^0, v^t)&= f(v^{\ell -s}, v^{t-s})&(C_t). \end{aligned}$$

This system is consistent if and only if

$$\begin{aligned} f(v^{t'}, v^0) = f(v^{\ell -s}, v^{t-s}). \end{aligned}$$

For \(t \in T_4\) let us redefine \(C_t\) to be this reduced condition. Certainly \(t - s < \ell - s\), and if \(t' = \ell - s\) then \(wu' = u'w\), so w is a proper power, contrary to hypothesis. Hence \(C_t\) is settled at step \(\ell - s \le t\).

Fig. 3
figure3

The case \(t \in T_4 \subseteq T_3\). Here \(t = \ell - s'\) for some \(t' \in T_2\) with \(t' - s' = 0\). If \(t' = \ell - s\) then w must be a proper power

Now consider any step \(t \in \{1, \ldots , \ell -1\}\), and consider all those conditions \(C_{t'}\) which are settled at step t. These conditions are \(C_{t'}\) for \(t' \in T_1 \cup T_2 \cup T_4\) such that \(\ell - s' = t\), as well as \(C_t\) if \(t \in T_3'\), i.e.,

$$\begin{aligned} f(v^t, v^{t' + s'})&= f(v^0, v^{t'})&(t' \in T_1, \ell - s' = t) \\ f(v^t, v^{t' - s'})&= f(v^0, v^{t'})&(t' \in T_2, \ell - s' = t) \\ f(v^t, v^{t' - s'})&= f(v^{t''}, v^0)&(t' \in T_4, \ell - s' = t) \\ f(v^t, v^0)&= f(v^{t-s}, v^{\ell -s})&(\text {if}~t \in T_3'.) \end{aligned}$$

We claim that these affine conditions for \(v^t\) are independent, and it suffices to demonstrate that the indices \(t' + s'\) (\(t' \in T_1, \ell - s' = t\)), \(t' - s'\) (\(t' \in T_2 \cup T_4\), \(\ell - s' = t\)), and 0 if \(t \in T_3'\) are all distinct. Since \(s' = \ell - t\) is a constant, the indices \(t'+s'\) are all distinct for \(t' \in T_1\), as are the indices \(t' - s'\) for \(t' \in T_2 \cup T_4\). Moreover we cannot have \(t_1 + s_1 = t_2 - s_2\) for \(t_1 \in T_1\) and \(t_2 \in T_2 \cup T_4\) with \(\ell - s_1 = \ell - s_2 = t\), because then we would have \(w_{t_1+s_1} = w_{t_2-s_2+1}^{-1} = w_{t_1+s_1+1}^{-1}\), in contradiction with the reducedness of w. If \(t' - s' = 0\) for some \(t' \in T_2\) then \(t \in T_4\) by definition, so \(t \notin T_3'\). Finally, if \(t' \in T_4\) then we cannot have \(t' - s' = 0\) unless w is a proper power, as discussed.

Hence, by linear independence of \(v^0, \ldots , v^{t-1}\), the h (say) conditions \(C_{t'}\) settled at step t consist of h independent affine linear conditions for \(v^t\), or, in the unitary case, if \(t = \ell - s \in T_3\), 2h independent affine linear conditions over \({\mathbf {F}}_{q_0}\). Suppose \(v^t\) is drawn from a subspace of codimension d (d is the number of previous occurences of \(w_t\) or \(w_{t}^{-1}\)). Then, by Lemma 2.1 and Lemma 3.3, the probability that all these conditions are satisfied, conditional on the past trajectory \(v^0, \ldots , v^{t-1}\), is

$$\begin{aligned} \frac{q^{n-d-h}/q_0 + O(q^d + q^{n/2})}{q^{n-d}/q_0 + O(q^d + q^{n/2})}&= q^{-h} \left( 1 + O(q^{h+d-n/2} q_0)\right) \nonumber \\&= q^{-h} \left( 1 + O(q^{\ell + t - n/2})\right) \end{aligned}$$
(13)

(in the second line we used \(h < \ell \), \(d < t\), and \(q_0 \le q\)).

Suppose \(H = |T_1| + |T_2| + |T_3'| + |T_4|\) (i.e., let \(H+1\) be the number of appearances of \(w_\ell \) or \(w_\ell ^{-1}\) in w). Taking the product of (13) over all t, the probability that \(C_{t'}\) is satisfied for every \(t' \in T_1 \cup T_2 \cup T_3' \cup T_4\) is

$$\begin{aligned} q^{-H}\left( 1 + O(q^{2\ell -n/2}) \right) . \end{aligned}$$

The conditions \(C_t\) are prequisite to the event \(v^\ell = v^0\). If all these conditions are satisfied, then at step \(\ell \) the vector \(v^\ell \) is drawn from an affine subspace of codimension H which includes \(v^0\). Note also that \(Q(v^{\ell -1}) = Q(v^0)\). Hence, from Lemma 3.3,

$$\begin{aligned} {\mathbf {P}}(v^\ell = v^0 \mid v^0, \ldots , v^{\ell -1}) = \frac{1}{q^{n-H} / q_0 - O(q^H) - O(q^{n/2})}. \end{aligned}$$

Hence the overall probability of \(E_1\) is bounded by

$$\begin{aligned} \frac{q^{-H}\left( 1 + O(q^{2\ell - n/2})\right) }{q^{n-H} / q_0 - O(q^H) - O(q^{n/2})}&= \frac{1}{q^n/q_0} \left( 1 + O(q^{2\ell - n/2}) \right) \\&= \frac{1}{N} \left( 1 + O(q^{2\ell - n/2}) \right) . \end{aligned}$$

Thus in all cases the error is bounded as claimed. \(\square \)

Remark 9.2

In the linear case, the hypothesis that w is not a proper power is needed only to ensure that the event \(v^\ell = v^0\) is contained in \(E_1 \cup E_2\); we do not need the hypothesis in order to bound \({\mathbf {P}}(E_1)\) or \({\mathbf {P}}(E_2)\). By contrast, at least in the orthogonal case, we do need this hypothesis in order to bound \({\mathbf {P}}(E_1)\) satisfactorily, so at least some of the complexity of the above proof is necessary. Suppose \(G = {\text {GO}}_n(q)\) and \(w = u^2\) for some word u of length \(\ell / 2\). Then the choice of \(v^\ell \) is constrained by

$$\begin{aligned} f(v^\ell , v^{\ell /2}) = f(u v^{\ell /2}, u v^0) = f(v^{\ell /2}, v^0) = f(v^0, v^{\ell /2}). \end{aligned}$$

Hence \(v^\ell \) is always restricted to an affine hyperplane that includes \(v^0\), so the probability that \({{\overline{w}}} v = v\) will be at least approximately q/N, even conditionally on there being only one coincidence.

Remark 9.3

On the other hand, it is usually possible to cyclically rotate w so that much of the complexity in the previous proof disappears. For example, if w can be cyclically rotated so that it has no square prefix, then, after such a rotation, \(T_3 = \emptyset \). Not every non-proper-power has this property,Footnote 6 but almost all words do.

We can now prove that the permutation action of uniformly random \(x_1, \ldots , x_k \in G\) on an orbit \(Gv \subseteq V\) has a uniform spectral gap. Assume \(v \ne 0\). As usual let \({\mathcal {A}}\) be the normalized adjacency operator

$$\begin{aligned} {\mathcal {A}}= \frac{1}{2k} \sum _{i=1}^k (x_i + x_i^{-1}) \end{aligned}$$

acting on \({\mathbf {C}}[Gv]\), and let \(1 = \lambda _1 \ge \lambda _2 \ge \cdots \ge \lambda _N\) be the spectrum. Let \(\lambda = \max (\lambda _2, -\lambda _N)\). Then, for even \(\ell \),

$$\begin{aligned} 1 + \lambda ^\ell \le {\text {tr}}{\mathcal {A}}^\ell = {\mathbf {E}}_w |\{u \in Gv : {{\overline{w}}} u = u\}|, \end{aligned}$$

where w is the result of a simple random walk of length \(\ell \) in \(F_k\). Let \({\mathcal {P}}\subseteq F_k\) be the set of proper powers \(w^m\) (\(w \in F_k, m \ge 2\)). Then

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell&\le {\mathbf {E}}_{x_1, \ldots , x_k} {\mathbf {E}}_w |\{ u \in Gv : {{\overline{w}}} u = u \}| - 1\\&= {\mathbf {E}}_w \left( {\mathbf {P}}({{\overline{w}}} v = v) - \frac{1}{N} \right) N\\&\le {\mathbf {P}}(w\in {\mathcal {P}}) N + \max _{w \notin {\mathcal {P}}, |w| \le \ell } \left( {\mathbf {P}}({{\overline{w}}} v = v) - \frac{1}{N} \right) N. \end{aligned}$$

By [15, Lemma 2.6],

$$\begin{aligned} {\mathbf {P}}(w \in {\mathcal {P}}) \ll \ell \left( \frac{2k - 1}{k^2}\right) ^{\ell / 2} \ll k^{-c\ell }. \end{aligned}$$

By Lemma 9.1,

$$\begin{aligned} \max _{w \notin {\mathcal {P}}, |w| \le \ell } {\mathbf {P}}({{\overline{w}}} v = v) \le \frac{1}{N}\left( 1 + O(q^{2\ell - n/2})\right) , \end{aligned}$$

provided \(\ell < n/4\). Hence

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \ll k^{-c\ell } q^n + q^{2\ell - n/2}. \end{aligned}$$

Take \(\ell \sim n/5\). If \(\log k / \log q\) is sufficiently large then

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \le q^{-c' \ell }. \end{aligned}$$

Hence, by Markov’s inequality,

$$\begin{aligned} {\mathbf {P}}(\lambda \ge q^{-c'/2}) = {\mathbf {P}}(\lambda ^\ell \ge q^{-c'\ell /2}) \le q^{c'\ell /2} {\mathbf {E}}\lambda ^\ell \le q^{-c'\ell /2} \le q^{-c''n}, \end{aligned}$$

so almost surely \(\lambda < q^{-c'/2}\).

The action on r-tuples

We now generalize the argument of the previous subsection to r-tuples of vectors, where r is bounded. It will be convenient to use the following notation. For \(v, v' \in V^r\), let \(f(v, v')\) denote the \(r \times r\) matrix

$$\begin{aligned} f(v, v')_{ij} = f(v_i, v'_j). \end{aligned}$$

Define also

$$\begin{aligned} Q(v)_i = Q(v_i). \end{aligned}$$

Let \(v = (v_1, \ldots , v_r) \in V^r\), where \(v_1, \ldots , v_r \in V\) are linearly independent. Let \(N = |Gv|\). By Witt’s lemma, N is the number of \(v' \in V^r\) with \(v'_1, \ldots , v'_r\) linearly independent such that \(f(v, v) = f(v', v')\) and \(Q(v) = Q(v')\). In the linear case,

$$\begin{aligned} N&= (q^n - 1) (q^n - q) \cdots (q^n - q^{r-1})\\&= q^{rn} \left( 1 - O(q^{-n+r-1})\right) . \end{aligned}$$

In the other cases we have, inductively, using Lemma 2.1,

$$\begin{aligned} N&= |G(v_1, \ldots , v_{r-1})| (q^{n-r+1} / q_0 + O(q^{n/2})) \nonumber \\&= q^{rn - r(r-1)/2} / q_0^r \left( 1+ O(q^{-n/2+r-1} q_0)\right) . \end{aligned}$$
(14)

Lemma 9.4

Assume w is nontrivial and not a proper power. Assume \(\ell r^2 < n/4\). Then

$$\begin{aligned} {\mathbf {P}}({{\overline{w}}} v = v) \le \frac{1}{N} \left( 1 + O( q^{2 \ell r - n/2} ) \right) . \end{aligned}$$

Proof

Again we may assume w is cyclically reduced. In this case Lemma 8.3 implies that the event that \({{\overline{w}}} v = v\) is contained in the union of the following two events:

\(E_1\)::

the joint trajectory \((v_i^t)\) has exactly one coincidence in each individual trajectory, each occuring at the final step \(t=\ell \), and \(v_i^\ell = v_i^0\) for each i,

\(E_2\)::

the joint trajectory \((v_i^t)\) has at least \(r+1\) coincidences.

Again we can bound the probability of \(E_2\) using Lemma 3.3. Suppose there is a free choice at step (ti). There are at most \(t r + i - 1 \le \ell r\) previous vectors, so the conditional probability of a coincidence is bounded by

$$\begin{aligned} \frac{q^{t r + i - 1}}{q^{n - \ell r} - q^{\ell r - 1} - q^{n/2}} = q^{tr + i - 1 + \ell r - n} \left( 1 + O(q^{\ell r - n/2})\right) . \end{aligned}$$

Hence the probability of \(E_2\) is bounded by (summing over all possibilities for \(r+1\) coincidences)

$$\begin{aligned} q^{(2 \ell r - n)(r+1)} \left( 1 + O(r q^{\ell r - n/2})\right) \ll q^{(2 \ell r - n)(r+1)}. \end{aligned}$$

Using \(N \le q^{rn}\), this is at most

$$\begin{aligned} q^{2 \ell r (r+1) - n} / N \le q^{2 \ell r - n/2} / N. \end{aligned}$$

Hence we may focus on the event \(E_1\). In the linear case, for each i the vector \(v_i^\ell \) is chosen uniformly at random outside a linear subspace of dimension at most \(\ell r\), so the probability of \(E_1\) is bounded by

$$\begin{aligned} \left( \frac{1}{q^n - q^{\ell r}}\right) ^r&= q^{-rn} \left( 1 + O(r q^{\ell r - n}) \right) \\&= \frac{1}{N} \left( 1 + O(r q^{\ell r - n}) \right) . \end{aligned}$$

This completes the proof in this case.

As in the previous subsection, the general situation is complicated by form conditions, but fortunately few changes are necessary in the \(r > 1\) case. Let \(\xi = w_\ell \). Assume there are \(H+1\) occurences of \(\xi \) or \(\xi ^{-1}\) in w, and consider the H maximal subwords u ending with \(\xi \) or \(\xi ^{-1}\) and matching a proper prefix of w, as in Fig. 1. Define \(T_1\), \(T_2\), and \(T_3 = T'_3 \cup T_4\) as before.

The choice of \(v^\ell \) at step \(\ell \) is constrained by the linear conditions

$$\begin{aligned} f(v^\ell , v^t)&= f(v^{\ell -s}, v^{t+s})&(t \in T_1) \\ f(v^\ell , v^t)&= f(v^{\ell -s}, v^{t-s})&(t \in T_2 \cup T_3) \end{aligned}$$

(where \(s = s(t)\)). For \(t \in T_1 \cup T_2 \cup T_3'\) we have a condition \(C_t\) defined by

$$\begin{aligned} f(v^0, v^t)&= f(v^{\ell -s}, v^{t+s})&(t \in T_1) \\ f(v^0, v^t)&= f(v^{\ell -s}, v^{t-s})&(t \in T_2 \cup T_3'). \end{aligned}$$

For \(t \in T_4\) the condition \(C_t\) is the reduced condition

$$\begin{aligned} f(v^{t'}, v^0) = f(v^{\ell - s}, v^{t-s}). \end{aligned}$$

Conditional on linear independence of \(v_i^t\) for \(1 \le i \le r\) and \(t < \ell \), it can be verified exactly as in the \(r = 1\) case that the conditions settled at any given step \(t < \ell \) are precisely \(C_{t'}\) for \(t' \in T_1 \cup T_2 \cup T_4\) and \(\ell - s' = t\), as well as \(C_t\) if \(t \in T_3'\), and these conditions are linearly independent.

Suppose at step \(t < \ell \) there are h conditions \(C_{t'}\) to be settled. Assume first that we are not in the case \(t = \ell - s \in T_3'\) (the case in which the subword is adjacent to the prefix, as in Fig. 2). Let d be the number of previous occurences of \(w_t\) or \(w_t^{-1}\). Then, by Lemma 3.3, at step (ti) the vector \(v_i^t\) is drawn from an affine subspace of codimension \(d' = dr + i-1\), less a subspace of dimension \(d'\), subject to the quadratic condition \(Q(v_i^t) = Q(v_i^{t-1})\). Hence, using Lemma 2.1, the probability that ji-component of each \(C_{t'}\) is satisfied for each \(j \in \{1, \ldots , r\}\) is

$$\begin{aligned} \frac{q^{n - d' - hr}/q_0 + O(q^{d'} + q^{n/2})}{q^{n-d'}/q_0 + O(q^{d'} + q^{n/2})}&= q^{-hr} \left( 1 + O( q^{hr+d'-n/2} q_0) \right) \nonumber \\&= q^{-hr} \left( 1 + O( q^{\ell r + (t-1) r + i-1 -n/2}) \right) \end{aligned}$$
(15)

(using \(h < \ell \), \(d' \le (t-1)r + i-1\), and \(q_0 \le q\)). Taking the product over all i, the probability that each \(C_{t'}\) is satisfied after step t is

$$\begin{aligned} q^{-hr^2} \left( 1 + O(q^{\ell r + tr - n/2}) \right) . \end{aligned}$$
(16)

The case \(t = \ell - s \in T_3'\) is slightly different. In this case the ji-component of \(C_t\) is

$$\begin{aligned} f(v_j^0, v_i^t) = f(v_j^t, v_i^{t-s}). \end{aligned}$$

This condition is settled at step (tk), where \(k = \max (i, j)\). Hence \(2k-1\) components of \(C_t\) are settled at step (tk). Therefore, in this case, (15) must be replaced with

$$\begin{aligned} q^{-(h-1)r - (2i - 1)} \left( 1 + O( q^{\ell r + (t-1) r + i-1 -n/2}) \right) . \end{aligned}$$

Taking the product over all i again gives (16).

Taking the product of (16) over all t, the probability that \(C_{t'}\) is satisfied for every \(t' \in T_1 \cup T_2 \cup T_3' \cup T_4\) is

$$\begin{aligned} q^{-Hr^2}\left( 1 + O(q^{2\ell r - n/2}) \right) . \end{aligned}$$
(17)

Finally, if all the conditions \(C_t\) are satisfied, then for each i the vector \(v_i^\ell \) is drawn from an affine subspace of codimension \(Hr+i-1\) which includes \(v_i^0\), less a subspace of dimension \(Hr+i-1\), subject to the quadratic condition \(Q(v_i^\ell ) = Q(v_i^{\ell -1}) = Q(v_i^0)\). Hence

$$\begin{aligned}&{\mathbf {P}}(v_i^\ell = v_i^0 \mid (v_j^t, (t, j) \prec (\ell , i)))\\&\quad = \frac{1}{q^{n-Hr-i+1} / q_0 - O(q^{Hr+i-1}) - O(q^{n/2})} \\&\quad = (q^{n-Hr-i+1}/q_0)^{-1} \left( 1 + O(q^{Hr+i-1-n/2} q_0) \right) \end{aligned}$$

Hence the conditional probability that \(v^\ell = v^0\) is

$$\begin{aligned} (q^{nr-Hr^2-r(r-1)/2} / q_0^r)^{-1} \left( 1 + O(q^{(H+1)r - n/2}) \right) . \end{aligned}$$

Hence the overall probability of \(E_1\) is, multiplying the previous line by (17),

$$\begin{aligned} (q^{nr - r(r-1)/2} / q_0^r)^{-1} \left( 1 + O(q^{2\ell r - n/2}) \right) . \end{aligned}$$

Comparing with (14), this is

$$\begin{aligned} N^{-1} \left( 1 + O(q^{2\ell r - n/2}) \right) . \end{aligned}$$

Thus in all cases the error is bounded as claimed. \(\square \)

We can now prove that the permutation action of uniformly random \(x_1, \ldots , x_k \in G\) on an orbit \(Gv \subseteq V^r\) has a uniform spectral gap. The argument is little different from that in the previous subsection. We may assume \(v_1, \ldots , v_r\) are linearly independent, by reducing r if necessary. Suppose the adjacency operator \({\mathcal {A}}\) acting on \({\mathbf {C}}[Gv]\) has spectrum \(1 = \lambda _1 \ge \cdots \ge \lambda _N\). Let \(\lambda = \max (\lambda _2, -\lambda _N)\). For even \(\ell \), let w be the result of a simple random walk of length \(\ell \) in \(F_k\). Then

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \le {\mathbf {P}}(w \in {\mathcal {P}}) N + \max _{w \notin {\mathcal {P}}, |w|\le \ell } \left( {\mathbf {P}}({{\overline{w}}} v = v) - \frac{1}{N}\right) N. \end{aligned}$$

We bound \({\mathbf {P}}(w \in {\mathcal {P}})\) as before, while by Lemma 9.4 we have

$$\begin{aligned} \max _{w \notin {\mathcal {P}}, |w| \le \ell } {\mathbf {P}}({{\overline{w}}} v = v) \le \frac{1}{N} \left( 1 + O(q^{2\ell r - n/2}) \right) , \end{aligned}$$

provided \(\ell r^2 < n/4\). Hence

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \ll k^{-c \ell } q^{rn} + q^{2 \ell r - n/2}. \end{aligned}$$

Take \(\ell \sim n/(5 r^2)\). If \(\log k / \log q \ge C r^3\), for a sufficiently large constant C, then

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \le q^{-c' \ell }. \end{aligned}$$

Hence, by Markov’s inequality,

$$\begin{aligned} {\mathbf {P}}(\lambda \ge q^{-c'/2}) \le q^{c' \ell / 2} {\mathbf {E}}\lambda ^\ell \le q^{-c'\ell / 2} < q^{-c''n/r^2}, \end{aligned}$$

so almost surely \(\lambda < q^{-c'/2}\), as before.

Other low-degree representations

The result of the final argument of the previous subsection can be expressed as follows.

Theorem 9.5

Let \({\mathbf {C}}[V^r]_0\) be the orthogonal complement of \({\mathbf {C}}[V^r]^G\) in \({\mathbf {C}}[V^r]\). Let \(x_1, \ldots , x_k \in G\) be uniform and independent, where \(k \ge q^{Cr^3}\) and \(r < cn^{1/4}\). Let \(\rho = \rho ({\mathcal {A}}, {\mathbf {C}}[V^r]_0)\) be the spectral radius of \({\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k}\) acting on \({\mathbf {C}}[V^r]_0\). Then

$$\begin{aligned} {\mathbf {P}}(\rho > q^{-c}) < q^{-cn/r^2}. \end{aligned}$$

Proof

By Witt’s lemma, there are \(O(q^{r^2})\) orbits of G on \(V^r\). Let \(Gv_1, \ldots , Gv_s\) be a decomposition of \(V^r\) into G-orbits, where \(s \ll q^{r^2}\). Then

$$\begin{aligned} {\mathbf {C}}[V^r]_0 = {\mathbf {C}}[Gv_1]_0 \oplus \cdots \oplus {\mathbf {C}}[Gv_s]_0. \end{aligned}$$

Let \(\rho _i = \rho ({\mathcal {A}}, {\mathbf {C}}[Gv_i]_0)\) be the spectral radius of \({\mathcal {A}}\) on \({\mathbf {C}}[Gv_i]_0\). Then

$$\begin{aligned} \rho = \max _{1\le i \le s} \rho _i. \end{aligned}$$

From the previous subsection (possibly with a smaller r, if the components of \(v_i\) are not linearly independent), for each i we have

$$\begin{aligned} {\mathbf {P}}(\rho _i > q^{-c}) < q^{-c'n/r^2}. \end{aligned}$$

Hence

$$\begin{aligned} {\mathbf {P}}(\rho > q^{-c}) \ll q^{r^2 - c'n/r^2} < q^{-c'' n/r^2}. \end{aligned}$$

Our main interest is the conjugation action of G on a conjugacy class \({\mathfrak {C}}\subseteq {\text {SCl}}_n(q)\) of elements of degree \(s = O(1)\), which is actually a quotient of an orbit of G on \(V^s \oplus (V^*)^s\), where \(V^*\) is the dual space. It is possible to repeat the analysis of the previous subsection allowing also r factors of \(V^*\), but in fact this generalization follows formally, since \({\mathbf {C}}[V^*] \cong {\mathbf {C}}[V]\) (as both have character \(\chi (g) = q^{\dim \ker (g-1)}\)), so

$$\begin{aligned} {\mathbf {C}}[V^r \oplus (V^*)^r] \cong {\mathbf {C}}[V]^{\otimes r} \otimes {\mathbf {C}}[V^*]^{\otimes r} \cong {\mathbf {C}}[V]^{\otimes 2r} \cong {\mathbf {C}}[V^{2r}]. \end{aligned}$$

Corollary 9.6

(the conjugation action on \({\mathfrak {M}}\) is expanding) Let \(x_1, \ldots , x_k \in G\) be independent and uniformly random, where \(k > q^C\) and \(n > C\). Let \(\rho = \rho ({\mathcal {A}}, {\mathbf {C}}[{\mathfrak {M}}]_0)\) be the spectral radius of \({\mathcal {A}}\) acting on \({\mathbf {C}}[{\mathfrak {M}}]_0\). Then

$$\begin{aligned} {\mathbf {P}}(\rho > q^{-c}) \le q^{-cn}. \end{aligned}$$

Proof

We claim that \({\mathbf {C}}[{\mathfrak {M}}]\) is contained in \({\mathbf {C}}[V^{2s}]\). The map

$$\begin{aligned} V^s \oplus (V^*)^s&\rightarrow {\text {M}}_n({\mathbf {F}}_q) \\ (v_i, \phi _i)&\mapsto 1 + \sum _{i=1}^s v_i \otimes \phi _i. \end{aligned}$$

is a map of permutation representations (where G acts by conjugation on \(M_n({\mathbf {F}}_q)\)), and hence induces a map of \({\mathbf {C}}[G]\)-modules \({\mathbf {C}}[V^s \oplus (V^*)^s] \rightarrow {\mathbf {C}}[M_n({\mathbf {F}}_q)]\). The module \({\mathbf {C}}[{\mathfrak {M}}]\) is contained in the image, so it is isomorphic to a submodule of \({\mathbf {C}}[V^s \oplus (V^*)^s] \cong {\mathbf {C}}[V^{2s}]\) by complete reducibility. Hence the result follows from the previous theorem with \(r = 2s\). \(\square \)

Diameter of the Cayley graph

We now collect results from the previous sections and bound the diameter of the Cayley graph of the subgroup of \({\text {Cl}}_n(q)\) generated by random elements.

\({\text {GL}}_n(p)\) and 3 random elements

In this subsection we prove Theorem 1.2. Recall that \({\text {SL}}_n(p) \le G \le {\text {GL}}_n(p)\), where p is prime and \(\log p < cn / \log ^2 n\), the elements \(x, y, z \in G\) are chosen uniformly at random, and \(S = \{x^{\pm 1}, y^{\pm 1}, z^{\pm 1}\}\). We claim that with probability \(1-e^{-cn}\) we have

$$\begin{aligned}&\langle S \rangle \ge {\text {SL}}_n(p),~\text {and} \\&{\text {diam}}{\text {Cay}}(\langle S \rangle , S) \le n^{O(\log p)}. \end{aligned}$$

First we show that \(\langle S \rangle \ge {\text {SL}}_n(p)\) with high probability. The argument is a slight modification of [14, Sect. 5].Footnote 7

Let \({\mathfrak {C}}_1\) be the set of all irreducible \(g \in {\text {GL}}_n(p)\) of order \(d(p^n-1)/(p-1)\) for some \(d \mid (p-1)\). Each such g is equivalent to the multiplication action of some \(x \in {\mathbf {F}}_{p^n}\) of the same order, and \(\det g = N(x)\). Therefore, for each \(\alpha \in G^\text {ab}\cong {\mathbf {F}}_p^\times \), the \({\text {GL}}_n(p)\)-classes in \({\mathfrak {C}}_{1;\alpha } = {\mathfrak {C}}_1 \cap \alpha G'\) are in bijection with elements of \({\mathbf {F}}_{p^n}\), up to Galois conjugacy, of order \(d(p^n-1)/(p-1)\) and norm \(\alpha \), where d is the order of \(\alpha \). Note there are \(\phi (d)\) elements \(\alpha \) of order d. Moreover, each such \(g \in G\) has centralizer isomorphic to \({\mathbf {F}}_{p^n}^\times \). Hence

$$\begin{aligned} \frac{|{\mathfrak {C}}_{1;\alpha }|}{|{\text {GL}}_n(p)|} = \frac{\phi (d(p^n-1) / (p-1)) / \phi (d)}{n (p^n-1)} > e^{-o(n)}. \end{aligned}$$

Here we used the standard estimate \(\phi (m) \gg m / \log \log m\).

Let \({\mathfrak {C}}_2\) be the set of all \(g \in {\text {GL}}_n(p)\) of order \(p^{n-1}-1\) splitting V as \(\ell \oplus W\) for some \(\ell , W\) with \(\dim \ell = 1\), \(\dim W = n-1\). A similar calculation shows that

$$\begin{aligned} \frac{|{\mathfrak {C}}_{2;\alpha }|}{|{\text {GL}}_n(p)|} > e^{-o(n)} \end{aligned}$$

for each \(\alpha \in {\mathbf {F}}_p^\times \) in this case as well. (In fact, \({\mathfrak {C}}_2\) is uniform over \(\det \) fibres.)

Hence, by Corollaries 5.3 and 6.2 as in the proof of Theorem 7.3, with probability at least \(1 - e^{- c n}\) there are words \(w_1, w_2\) such that

$$\begin{aligned} w_i(x,y,z) \in {\mathfrak {C}}_i \qquad (i \in \{1, 2\}). \end{aligned}$$

By a straightforward adaptation of [14, Lemma 5.2] (assuming \(n > 6\), say),

$$\begin{aligned} \langle w_1(x, y, z), w_2(x, y, z) \rangle \ge {\text {SL}}_n(p). \end{aligned}$$

Hence indeed \(\langle S \rangle \ge {\text {SL}}_n(p)\).

In particular, using Schreier generators, there is a symmetric set \(S' \subseteq S^{2p} \cap {\text {SL}}_n(p)\) such that \(\langle S'\rangle = {\text {SL}}_n(p)\).

Meanwhile, by Theorem 1.1, with probability \(1 - e^{-cn}\) there is another word w of length \(n^{O(\log p)}\) such that

$$\begin{aligned} w(x, y, z) \in {\mathfrak {M}}. \end{aligned}$$

Let \(X = S' \cup \{w(x, y, z)^{\pm 1}\}\). By [22, Theorem 1.5] we have

$$\begin{aligned} {\text {diam}}{\text {Cay}}({\text {SL}}_n(p), X) \ll p n^{12}. \end{aligned}$$

As \(|\langle S \rangle /{\text {SL}}_n(p)| < p\), we thus have

$$\begin{aligned} {\text {diam}}{\text {Cay}}(\langle S \rangle , S) \ll p^2 n^{12 + C \log p} = n^{O(\log p)}. \end{aligned}$$

This completes the proof.

Classical groups and \(q^C\) random elements

In this subsection we prove Theorem 1.4. Recall that \(G = {\text {Cl}}_n(q)\), where \(n > C\), elements \(x_1, \ldots , x_k \in G\) are chosen uniformly at random where \(k > q^C\), and \(S = \{x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\). We claim that with probability \(1-q^{-cn}\) we have

$$\begin{aligned}&\langle S \rangle \ge {\text {SCl}}_n(p),~\text {and} \\&{\text {diam}}{\text {Cay}}(\langle S \rangle , S) \le q^2 n^C. \end{aligned}$$

By Theorem 7.4, with probability at least \(1 - q^{-c_1 n}\) there is a word w of length at most \(q^2 n^{C_1}\) so that

$$\begin{aligned} w(x_1, \ldots , x_k) \in {\mathfrak {M}}. \end{aligned}$$

Let \({\mathfrak {C}}\) be the conjugacy class of \(w(x_1, \ldots , x_k)\) in G. Note that \({\mathfrak {C}}\subseteq {\text {SCl}}_n(q)\). It follows from Corollary 9.6 that, with probability at least \(1 - q^{-c_2 n}\), the conjugation action of G on \({\mathfrak {C}}\) is expanding with spectral gap bounded away from zero. Hence (see, e.g., [31, Proposition 3.1.5 and Proposition 3.3.6])

$$\begin{aligned} {\text {diam}}{\text {Sch}}(G, S, {\mathfrak {C}}) \ll \log |{\mathfrak {C}}|. \end{aligned}$$

It follows that with probability at least \(1 - q^{-c_3 n}\), every element of \({\mathfrak {C}}\) is a word in S of length at most

$$\begin{aligned} q^2 n^{C_1} + O(\log |{\mathfrak {C}}|) \ll q^2 n^{C_2}. \end{aligned}$$

This already proves that \(\langle S \rangle \ge {\text {SCl}}_n(q)\). It follows from [33] that

$$\begin{aligned} {\text {diam}}{\text {Cay}}({\text {SCl}}_n(q), {\mathfrak {C}}) \ll \log |{\text {SCl}}_n(q)| / \log |{\mathfrak {C}}| \ll n. \end{aligned}$$

Hence

$$\begin{aligned} {\text {diam}}{\text {Cay}}(\langle S \rangle , S) \ll q^2 n^{C_2 + 1}. \end{aligned}$$

This completes the proof.

Corollary 1.5(2) follows immediately for \(q < n^{O(1)}\), since \(\log |G| \asymp n^2 \log q\). If q is larger then the claim follows from Alon–Roichman [1], which implies that the Cayley graph on \(C n^2 \log q\) random generators is almost surely an expander.

Notes

  1. 1.

    Throughout the paper, we use the terms “almost surely” or “with high probability” to mean with probability \(1-o(1)\) as the relevant parameters tend to infinity.

  2. 2.

    The diameter of \({\text {SCl}}_n(q)\) with respect to a set S is essentially the same (up to a factor of 3) as the diameter of the simple quotient \({\text {PSCl}}_n(q)\) with respect to \(S \bmod Z\). Indeed, if \(S^d = G\) then certainly \(S^d Z = G\), and conversely if \(S^d Z = G\) then it is possible to show that \(S^{3d} = G\). Hence there is no need to consider \({\text {PSCl}}_n(q)\) explicitly.

  3. 3.

    Note that the distribution of \(\overline{w}\) is symmetric, so \({\mathbf {E}}_{x_1, \ldots , x_k, w} \chi ({{\overline{w}}}) / \chi (1)\) is real.

  4. 4.

    Note that \({\text {GO}}(R) \cong {\text {GO}}_2^+(q) \cong D_{2(q-1)}\). In odd characteristic, \(G^\text {ab}\cong C_2 \times C_2\), and determinant and spinor norm are independent characters on \({\text {GO}}_2^+(q)\). In even characteristic, \(G^\text {ab}\cong C_2\), and the Dickson invariant is nontrivial on \({\text {GO}}_2^+(q)\).

  5. 5.

    The existence of an even-order linear character of \({\text {GO}}_n(q)\) in even characteristic is why we need the extra factor of 2 in that case.

  6. 6.

    e.g.,

  7. 7.

    Alternatively, we could just cite [30]. The given argument avoids CFSG.

  8. 8.

    The authors state only \(n^2 (\log n)^c\), but a careful inspection of the proof gives \(n^2 (\log n)^2 \omega (1)\), for an arbitrarily slowly growing \(\omega (1)\). A word v of length \(\omega (1)\) is obtained such that \(v(x, y)^{O(n)}\) has support less than n/4. A random commutator process is then used to iteratively reduce the support. Each step quadruples the length of the word and roughly squares the density of the support, so the whole process multiplies the length of the word by \(O((\log n)^2)\). Thus a word w of length \(n (\log n)^2 \omega (1)\) is obtained such that w(xy) has support 3.

References

  1. 1.

    Alon, N., Roichman, Y.: Random Cayley graphs and expanders. Random Struct. Algorithms 5(2), 271–284 (1994)

    MathSciNet  Article  Google Scholar 

  2. 2.

    Aschbacher, M.: Finite Group Theory, vol. 10 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, second edition (2000)

  3. 3.

    Babai, L., Beals, R., Seress, Á: On the diameter of the symmetric group: polynomial bounds. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1108–1112. ACM, New York (2004)

  4. 4.

    Breuillard, E., Green, B., Guralnick, R., Tao, T.: Expansion in finite simple groups of Lie type. J. Eur. Math. Soc. 17(6), 1367–1434 (2015)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Breuillard, E., Green, B., Tao, T.: Approximate subgroups of linear groups. Geom. Funct. Anal. 21(4), 774–819 (2011)

    MathSciNet  Article  Google Scholar 

  6. 6.

    Babai, L., Hayes, T.P.: Near-independence of permutations and an almost sure polynomial bound on the diameter of the symmetric group. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1057–1066. ACM, New York (2005)

  7. 7.

    Bordenave, C.: A new proof of Friedman’s second eigenvalue Theorem and its extension to random lifts. Ann. Sci. de l’Ecole normale supérieure (2019)

  8. 8.

    Broder, A., Shamir, E.: On the second eigenvalue of random regular graphs. In: 28th Annual Symposium on Foundations of Computer Science (sfcs 1987), pp. 286–294 (1987)

  9. 9.

    Babai, L., Seress, Á.: On the diameter of permutation groups. Eur. J. Combin. 13(4), 231–243 (1992)

    MathSciNet  Article  Google Scholar 

  10. 10.

    Biswas, A., Yang, Y.: A diameter bound for finite simple groups of large rank. J. Lond. Math. Soc. (2) 95(2), 455–474 (2017)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Dickson, L.E.: Linear Groups, with an Exposition of the Galois Field Theory. Teubner, Leipzig B.G (1901)

  12. 12.

    Eberhard, S.: The trivial lower bound for the girth of \(S_n\). arXiv e-prints, arXiv:1706.09972 (2017)

  13. 13.

    Eberhard, S., Virchow, S.-C.: The probability of generating the symmetric group. Combinatorica 39(2), 273–288 (2019)

    MathSciNet  Article  Google Scholar 

  14. 14.

    Eberhard, S., Virchow, S.-C.: Random generation of the special linear group. Trans. Am. Math. Soc., to appear (2020)

  15. 15.

    Friedman, J., Joux, A., Roichman, Y., Stern, J., Tillich, J.-P.: The action of a few permutations on \(r\)-tuples is quickly transitive. Random Struct. Algorithms 12(4), 335–350 (1998)

    MathSciNet  Article  Google Scholar 

  16. 16.

    Friedman, J.: A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Am. Math. Soc., 195(910):viii+100 (2008)

  17. 17.

    Gromov, M., Guth, L.: Generalizations of the Kolmogorov–Barzdin embedding estimates. Duke Math. J. 161(13), 2549–2603 (2012)

    MathSciNet  Article  Google Scholar 

  18. 18.

    Guralnick, R.M., Larsen, M., Tiep, P.H.: Character levels and character bounds. II. arXiv e-prints, arXiv:1904.08070 (2019)

  19. 19.

    Guralnick, R.M., Larsen, M., Tiep, P.H.: Character levels and character bounds. Forum Math. Pi, 8:e2 (2020)

  20. 20.

    Grove, L.C.: Classical Groups and Geometric Algebra. Graduate Studies in Mathematics, vol. 39. American Mathematical Society, Providence, RI (2002)

  21. 21.

    Hadad, U.: On the shortest identity in finite simple groups of Lie type. J. Group Theory 14(1), 37–47 (2011)

    MathSciNet  Article  Google Scholar 

  22. 22.

    Halasi, Z.: Diameter of Cayley graphs of \(SL(n,p)\) with generating sets containing a transvection. arXiv e-prints, arXiv:2002.10443 (2020)

  23. 23.

    Helfgott, H.A.: Growth and generation in \({{\rm SL}}_2({\mathbb{Z}}/p{\mathbb{Z}})\). Ann. Math. (2) 167(2), 601–623 (2008)

    MathSciNet  Article  Google Scholar 

  24. 24.

    Halasi, Z., Maróti, A., Pyber, L., Qiao, Y.: An improved diameter bound for finite simple groups of Lie type. Bull. Lond. Math. Soc. 51(4), 645–657 (2019)

    MathSciNet  Article  Google Scholar 

  25. 25.

    Helfgott, H.A., Seress, Á.: On the diameter of permutation groups. Ann. Math. (2) 179(2), 611–658 (2014)

    MathSciNet  Article  Google Scholar 

  26. 26.

    Helfgott, H.A., Seress, Á., Zuk, A.: Random generators of the symmetric group: diameter, mixing time and spectral gap. J. Algebra 421, 349–368 (2015)

    MathSciNet  Article  Google Scholar 

  27. 27.

    Humphreys, J.E.: Conjugacy classes in semisimple algebraic groups. American Mathematical Society (1995)

  28. 28.

    Huppert, B.: Isometrien von vektorräumen. ii. Mathematische Zeitschrift 175(1), 5–20 (1980)

  29. 29.

    Kesten, H.: Symmetric random walks on groups. Trans. Am. Math. Soc. 92, 336–354 (1959)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Kantor, W.M., Lubotzky, A.: The probability of generating a finite classical group. Geom. Dedicata. 36(1), 67–87 (1990)

    MathSciNet  Article  Google Scholar 

  31. 31.

    Kowalski, E.: An Introduction to Expander Graphs. Cours Spécialisés [Specialized Courses], vol. 26. Société Mathématique de France, Paris (2019)

  32. 32.

    Landazuri, V., Seitz, G.M.: On the minimal degrees of projective representations of the finite Chevalley groups. J. Algebra 32, 418–443 (1974)

    MathSciNet  Article  Google Scholar 

  33. 33.

    Liebeck, M.W., Shalev, A.: Diameters of finite simple groups: sharp bounds and applications. Ann. Math. (2) 154(2), 383–406 (2001)

    MathSciNet  Article  Google Scholar 

  34. 34.

    Larsen, M., Shalev, A.: Characters of symmetric groups: sharp bounds and applications. Invent. Math. 174(3), 645–687 (2008)

    MathSciNet  Article  Google Scholar 

  35. 35.

    Larsen, M., Shalev, A.: Fibers of word maps and some applications. J. Algebra 354, 36–48 (2012)

    MathSciNet  Article  Google Scholar 

  36. 36.

    Liebeck, M.W., Shalev, A.: Girth, words and diameter. Bull. Lond. Math. Soc. 51(3), 539–546 (2019)

    MathSciNet  Article  Google Scholar 

  37. 37.

    Larsen, M., Shalev, A., Tiep, P.H.: The Waring problem for finite simple groups. Ann. Math. (2) 174(3), 1885–1950 (2011)

    MathSciNet  Article  Google Scholar 

  38. 38.

    Lubotzky, A.: Discrete Groups, Expanding Graphs and Invariant Measures. Modern Birkhäuser Classics. Birkhäuser Verlag, Basel, 2010. With an appendix by Jonathan D. Rogawski, Reprint of the 1994 edition

  39. 39.

    Pyber, L., Szabó, E.: Growth in finite simple groups of Lie type. J. Am. Math. Soc. 29(1), 95–146 (2016)

    MathSciNet  Article  Google Scholar 

  40. 40.

    Schlage-Puchta, J.-C.: Applications of character estimates to statistical problems for the symmetric group. Combinatorica 32(3), 309–323 (2012)

    MathSciNet  Article  Google Scholar 

  41. 41.

    Wall, G.E.: On the conjugacy classes in the unitary, symplectic and orthogonal groups. J. Austral. Math. Soc. 3, 1–62 (1963)

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

We thank László Pyber, Endre Szabó, and Péter Varjú for helpful discussions. Thanks are due to Emmanuel Breuillard and Bob Guralnick for discussions pertaining to the low-degree representation theory of \({\text {SCl}}_n(q)\), and to Aner Shalev for discussions about character bounds. We thank Zoltán Halasi for sharing the preprint [22]. We would also like to thank two anonymous referees for a thorough inspection of the paper and suggesting many improvements.

Funding

Open access funding provided by ELKH Alfréd Rényi Institute of Mathematics.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Urban Jezernik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

S. Eberhard has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 803711) U. Jezernik has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 741420).

Appendix A: Analogous arguments for \(S_n\)

Appendix A: Analogous arguments for \(S_n\)

In this appendix we give analogous arguments for \(S_n\). The main reason to do so is to motivate and give context to some of the arguments in the main body, as the arguments in the context of \(S_n\) are easier and somewhat more natural, involving only trajectories of points rather than vectors. A secondary reason is that a couple results are actually new, and of independent interest:

  1. 1.

    if w is a word of length \(o(n^{1/2})\), then with high probability \({{\overline{w}}}\) has o(n) fixed points (Theorem A.4);

  2. 2.

    the Cayley graph with respect to three random generators almost surely has diameter \(O(n^2 \log n)\).

Queries and trajectories

The following definitions only slightly generalize those in [8, 15].

Let \(G = S_n\) and \(\Omega = \{1, \ldots , n\}\). Let \(x_1, \ldots , x_k \in G\). Define a query to be a pair \((\xi , v)\), where \(\xi \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}\) and \(v \in \Omega \); the result of the query is \({\overline{\xi }} v\). After any finite sequence of queries

$$\begin{aligned} (w_1, v_1), (w_2, v_2), \ldots , (w_{t-1}, v_{t-1}) \end{aligned}$$

the known domain of a letter \(\xi \) at time t is

$$\begin{aligned} D_\xi ^t = \{ v_i : w_i = \xi , i< t\} \cup \{\overline{w_i} v_i : w_i = \xi ^{-1}, i < t\}. \end{aligned}$$

Suppose we make a further query \((w_t, v_t)\). If \(v_t \in D_{w_t}^t\), then the result \(\overline{w_t} v\) is determined already by the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t-1}} v_{t-1}\); we call this a forced choice. Otherwise, we say the query is a free choice.

Let R be some subset of \(\Omega \) fixed in advance. If a query \((w_t, v_t)\) is a free choice and yet

$$\begin{aligned} \overline{w_t} v_t \in R \cup \{v_1, \overline{w_1} v_1, \ldots , v_{t-1}, \overline{w_{t-1}} v_{t-1}, v_t\} \end{aligned}$$

then we say the result of the query is a coincidence.

Again, the language is most interesting when \(x_1, \ldots , x_k \in G\) are chosen randomly. The following lemma is trivial, and parallels Lemma 3.3.

Lemma A.1

Let \(x_1, \ldots , x_k \in G\) be uniformly random and independent, and let

$$\begin{aligned} (w_1, v_1), (w_2, v_2), \ldots , (w_{t-1}, v_{t-1}) \end{aligned}$$

be a sequence of queries. Assume that \((w_t, v_t)\) is a free choice. Then, conditionally on the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t-1}} v_{t-1}\), the result \(\overline{w_t} v_t\) of the query \((w_t, v_t)\) is uniformly distributed in \(\Omega \setminus D_{w_t^{-1}}^t\).

In particular, the conditional probability that \(\overline{w_t} v\) is a coincidence is bounded by

$$\begin{aligned} \frac{d}{n - s}, \end{aligned}$$

where

$$\begin{aligned} d = | R \cup \{v_1, \overline{w_1} v_1, \ldots , v_{t-1}, \overline{w_{t-1}} v_{t-1}, v_t\} | \end{aligned}$$

and s is the number of \(i < t\) with \(w_i \in \{w_t, w_t^{-1}\}\).

Let \(w \in F_k\), and let

$$\begin{aligned} w = w_\ell \cdots w_1 \qquad (w_i \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}) \end{aligned}$$

be the reduced expression. For each \(v \in \Omega \), the trajectory of v is the sequence of queries \((w_t, v^{t-1})\), where \(v^0 = v\) and for each \(t \ge 1\) the vector \(v^t\) is the result of the query \((w_t, v^{t-1})\); in other words, the sequence \(v^0, v^1, \ldots , v^\ell \) is defined by

$$\begin{aligned} v^0&= v, \\ v^t&= \overline{w_t} v^{t-1}&(1\le t\le \ell ). \end{aligned}$$

Note that if step t is free and not a coincidence then step \(t+1\) is also free, and hence if \(v^\ell \in R\) then there must be at least one coincidence in the trajectory (cf. Lemma 3.5).

More generally for any \(r\ge 1\) the joint trajectory of an r-tuple \(v_1, \ldots , v_r \in \Omega \) is simply the r-tuple of individual trajectories, with the queries \((w_t, v_i^{t-1})\) ordered lexicographically by (ti). Again write \(\prec \) for this order, i.e., \((t',i') \prec (t,i)\) if \(t' < t\) or \(t'=t\) and \(i' < i\). Note that if step (ti) is free and not a coincidence then

$$\begin{aligned} v_i^t = \overline{w_t} v_i^{t-1} \notin R \cup \{v_{i'}^{t'} : (t', i') \prec (t, i)\}; \end{aligned}$$

while

$$\begin{aligned} D_{w_{t+1}}^{(t+1, i)} \subseteq \{v_{i'}^{t'} : (t', i') \prec (t, i)\}; \end{aligned}$$

hence step \((t+1, i)\) is also free. Hence if \(v_i^\ell \in R\) then there must be at least one coincidence in the trajectory of \(v_i\). This observation is recorded as the following lemma (cf. Lemma 3.6).

Lemma A.2

Suppose \(v_i \notin \{v_1, \ldots , v_{i-1}\}\) and \(v_i^\ell \in R\). Then there is at least one coincidence in the trajectory of \(v_i\) (during the joint trajectory of \(v_1, \ldots , v_r\)).

The probability of small support

For \(g \in S_n\), define

$$\begin{aligned} {\text {fix}}g = \{ v \in \Omega : gv = v \}. \end{aligned}$$

In this section we show that if w is a short word then almost surely \(|{\text {fix}}{{\overline{w}}}|\) is small. The following lemma is similar to the argument used in [12, Lemma 2.2]; the only difference is that the set R is fixed in advance.

Lemma A.3

Let \(G = S_n\). Let \(R \subseteq \Omega \) be a subset of size r. Let \(w \in F_k\) be a nontrivial word of length \(\ell < n / r\). Then

$$\begin{aligned} {\mathbf {P}}({{\overline{w}}} R = R) \le \left( \frac{\ell ^2 r}{n - \ell r}\right) ^r. \end{aligned}$$

Proof

Let \(R = \{v_1, \ldots , v_r\}\) and consider the joint trajectory of \(v_1, \ldots , v_r\). By Lemma A.2, we can have \(\overline{w} R = R\) only if there is at least one coincidence in each individual trajectory. We take a union bound over all possibilities for when the coincidences could occur. By Lemma A.1, the conditional probability that step (ti) is a coincidence is bounded by

$$\begin{aligned} \frac{\ell r}{n - \ell r}; \end{aligned}$$

indeed there are at most \(\ell r\) previous points (if \(t = \ell \), assuming \(v_j^\ell \in R\) for \(j < i\)). There are \(\ell ^r\) possibilities for when the first coincidences might occur. Hence the claimed bound holds. \(\square \)

Theorem A.4

There is a constant \(c>0\) such that the following holds for all \(f \ge 0\). Let \(G = S_n\), and let \(w \in F_k\) be a nontrivial word of reduced length \(\ell < c f^{1/2}\). Then

$$\begin{aligned} {\mathbf {P}}\left( |{\text {fix}}{{\overline{w}}}| \ge f\right) \le \exp \left( -c f / \ell ^2\right) . \end{aligned}$$

Proof

Let \(x_1, \ldots , x_k\) be chosen independently and uniformly from G. Let \(F = |{\text {fix}}{{\overline{w}}}|\). By the lemma, for any subset \(R \subseteq \Omega \) of size r (for \(r < n /\ell \)) we have

$$\begin{aligned} {\mathbf {P}}(R \subseteq {\text {fix}}{{\overline{w}}}) = {\mathbf {P}}({{\overline{w}}} R = R) \le \left( \frac{r \ell ^2}{n - r \ell }\right) ^r. \end{aligned}$$

Therefore, by a union bound,

$$\begin{aligned} {\mathbf {E}}\left( {\begin{array}{c}F\\ r\end{array}}\right) \le \left( {\begin{array}{c}n\\ r\end{array}}\right) \left( \frac{r\ell ^2}{n-r\ell }\right) ^r. \end{aligned}$$

Since \(x\mapsto \left( {\begin{array}{c}x\\ r\end{array}}\right) \) is increasing for \(x > r\), for \(r < f/2\) we have

$$\begin{aligned} {\mathbf {P}}\left( F \ge f\right)&\le \left( {\begin{array}{c}f\\ r\end{array}}\right) ^{-1} {\mathbf {E}}\left( {\begin{array}{c}F\\ r\end{array}}\right) \\&\le \frac{n^r}{\left( f - f / 2\right) ^r} \left( \frac{r \ell ^2}{n-r\ell }\right) ^r \\&= \left( \frac{n r \ell ^2}{(f/2)(n-r\ell )}\right) ^r \end{aligned}$$

Take \(r \sim f / (4 \ell ^2)\). The conclusion is

$$\begin{aligned} {\mathbf {P}}(F \ge f) \le \exp \left( -c f / \ell ^2\right) \end{aligned}$$

for some constant \(c > 0\). \(\square \)

Remark A.5

If \(\ell < c \log \log n\), a stronger bound is proved in [35, Sect. 2].

Expected values of characters

A notable difference between \(S_n\) and \({\text {Cl}}_n(q)\) is that \(S_n\) has several low-degree characters: for example, the irreducible component of the standard representation has degree \(n-1\). However, we can show that the expected value of \(|\chi ({{\overline{w}}})| / \chi (1)\) is smaller than \(\chi (1)^{-c}\) using the Larsen–Shalev character bound [34]. For most characters, \(\chi (1)\) is exponentially large in n, so this bound is similar in strength to Theorem 5.2. In application, low-degree characters may have to be treated specially (as in the next section).

Theorem A.6

Let \(G = S_n\). Let \(w \in F_k\) be a fixed nontrivial word of reduced length \(\ell \). Then, for any \(f \ge C \ell ^2\),

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) < \exp \left( -c f / \ell ^2\right) + \chi (1)^{-\frac{\log (n/f)}{2 \log n} + o(1)}. \end{aligned}$$

In particular, taking \(f = n^{1/2}\), for \(\ell < cn^{1/4}\) we have

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)}\right) < \exp (- cn^{1/2} / \ell ^2) + \chi (1)^{-1/4 + o(1)}. \end{aligned}$$

Proof

By conditioning on whether or not \(|{\text {fix}}{{{\overline{w}}}}| \ge f\), we have

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)}\right) \le {\mathbf {P}}_{x_1, \ldots , x_k}\left( |{\text {fix}}{{{\overline{w}}}}| \ge f\right) + \max _{\begin{array}{c} x_1, \ldots , x_k \\ |{\text {fix}}{{{\overline{w}}}}| < f \end{array}} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) . \end{aligned}$$

The first term is bounded by Theorem A.4. The second term is bounded by [34, Theorem 1.3]. \(\square \)

The following corollary follows exactly as in Sect. 5.

Corollary A.7

There is a constant \(c > 0\) such that the following holds. Let w be the result of a simple random walk of length \(\ell < cn^{1/4}\) in \(F_k\). Then

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k\in G, w} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) < \exp (-cn^{1/2} / \ell ^2) + \chi (1)^{-1/4 + o(1)} + k^{-c \ell }. \end{aligned}$$

Expansion in low-degree representations: a brief survey

Let \(G = S_n\), let \(x_1, \ldots , x_k \in G\) be random, where \(k \ge 2\) and bounded, and consider the action of \(x_1, \ldots , x_k\) on \(\Omega = \{1, \ldots , n\}\). The resulting Schreier graph is one of the standard models for a random 2k-regular graph, and the spectral properties of this graph are well studied. The earliest results on the combinatorial expansion of bounded-degree random graphs essentially coincide with the dawn of expansion, beginning with Barzdin–Kolmogorov and Pinsker (see Gromov–Guth [17, Sect. 1.2] for some history), and such results are equivalent to lower bounds on the spectral gap by the discrete Cheeger inequality (due to Dodziuk and Alon–Milman): see Kowalski [31, Sect. 4.1].

Such bounds are weak, however. The strongest results on the spectral gap of a random regular graph are based on the trace method, which is an adaptation of Wigner’s proof of the semicircle law to the bounded-degree setting. These results begin with Broder and Shamir [8]. Let \(\rho \) be the spectral radius of \({\mathcal {A}}\) on \({\mathbf {C}}[\Omega ]_0\). Broder and Shamir proved that

$$\begin{aligned} \rho \ll k^{-1/4}. \end{aligned}$$

In particular, \(\rho \) is bounded away from 1 as long as k is large enough. On the other hand, there is a deterministic lower bound

$$\begin{aligned} \rho \ge (2k-1)^{1/2} / k + O(1 / \log _{2k} n), \end{aligned}$$

usually attributed to Alon and Boppana. The conjecture, due to Alon, that almost surely

$$\begin{aligned} \rho = (2k-1)^{1/2} / k + o_k(1) \end{aligned}$$

remained open for some time, but was finally and famously settled by Friedman, using an ingenious elaboration of the trace method: see [16] for the proof, and for much more background. (See also Bordenave [7] for a simplified proof.)

The trace method also generalizes well, unlike the pure “counting” proof of expansion. Consider the action of \({\mathcal {A}}\) on \({\mathbf {C}}[\left( {\begin{array}{c}\Omega \\ r\end{array}}\right) ]\) for bounded r. This action was studied by Friedman–Joux–Roichman–Stern–Tillich [15], who showed that there is almost surely a uniform spectral gap. Their method is an elaboration of the Broder–Shamir method, and was direct inspiration for the argument of Sects. 8 and 9. We quote their result here, which will be used in the next section:

Theorem A.8

Let \(G = S_n\), and \(x_1, \ldots , x_k \in G\) random. Let \(\rho = \rho ({\mathcal {A}}, {\mathbf {C}}[\left( {\begin{array}{c}\Omega \\ r\end{array}}\right) ]_0)\) be the spectral radius of \({\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k}\) acting on \({\mathbf {C}}[\left( {\begin{array}{c}\Omega \\ r\end{array}}\right) ]_0\). Then, for fixed k, r, and \(\epsilon > 0\),

$$\begin{aligned} {\mathbf {P}}\left( \rho > (1 + \epsilon ) (\sqrt{2k - 1} / k)^{1/(r+1)}\right) = o(1). \end{aligned}$$

Diameter with respect to 3 random elements

Let \(G = S_n\). Let \(x_1, \ldots , x_k \in G\) be random, and let \(S = \{x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\). Helfgott, Seress, and Zuk [26] showed that, if \(k \ge 2\), then with high probabilityFootnote 8

$$\begin{aligned} {\text {diam}}{\text {Cay}}(\langle S \rangle , S) \ll n^2 (\log n)^{2+o(1)}. \end{aligned}$$

We show in this section that if \(k \ge 3\) then with high probability

$$\begin{aligned} {\text {diam}}{\text {Cay}}(\langle S \rangle , S) \ll n^2 \log n. \end{aligned}$$

While this is only a modest improvement, it is interesting for being conjecturally sharp for any proof which uses elements of small support as a stepping stone: it seems unlikely that an element of small support can be obtained in fewer than \(O(n \log n)\) steps on average, and a generic element of \(A_n\) cannot be written as a product of fewer than O(n) elements of small support.

The argument is most closely related to the argument of Schlage-Puchta [40], which shows that for \(k = 2\) the diameter is bounded by \(O(n^3 \log n)\). We get a saving for \(k \ge 3\) by replacing the \(xy^i\) trick with the more powerful xw(yz) trick.

Alternative 1

Write

$$\begin{aligned} n - 5 = n' + r \end{aligned}$$

where \(3 \not \mid n'\) and \(r \in \{4, 5\}\). Let \({\mathfrak {C}}\subseteq S_n\) be the normal subset of all elements whose cycle type is either \((1, 1, 3, r, n')\) or \((2, 3, r, n')\). Note that

$$\begin{aligned} \frac{|{\mathfrak {C}}|}{n!} = \frac{1}{2! \cdot 3 \cdot r \cdot n'} + \frac{1}{2 \cdot 3 \cdot r \cdot n'} \asymp 1/n, \end{aligned}$$

while if \({\text {sgn}}\) is the sign character then

$$\begin{aligned} \langle 1_{\mathfrak {C}}, {\text {sgn}}\rangle = \frac{(-1)^{r + n'}}{2! \cdot 3 \cdot r \cdot n'} - \frac{(-1)^{r+n'}}{2 \cdot 3 \cdot r \cdot n'} = 0. \end{aligned}$$

Let \(x, y, z \in G\) be random. Then by Theorem 6.1 with \(f = 1_{\mathfrak {C}}\) and Corollary A.7, if E is the event that every word \(u \in F_2\) of length at most \(\ell < c n^{1/4}\) satisfies \(x u(y, z) \notin {\mathfrak {C}}\) and w is the result of a simple random walk of length \(2\ell \) in \(F_2\),

$$\begin{aligned}&{\mathbf {P}}_{x,y,z}(E)\\&\quad \ll n^2 \sum _{1 \ne \chi \in {\text {Irr}}G} |\langle 1_{\mathfrak {C}}, \chi \rangle |^2 {\mathbf {E}}_{y, z, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) \\&\quad \le n^2 \sum _{1 \ne \chi \in {\text {Irr}}G} |\langle 1_{\mathfrak {C}}, \chi \rangle |^2 \left( \exp (-cn^{1/2} / \ell ^2) + \chi (1)^{-1/4 + o(1)} + 2^{-c \ell } \right) . \end{aligned}$$

Fixing \(\ell = \left\lfloor {C \log n}\right\rfloor \) for a sufficiently large constant C, we have, for sufficiently large n,

$$\begin{aligned} {\mathbf {P}}(E) \ll n^2 \sum _{1 \ne \chi \in {\text {Irr}}G} |\langle 1_{\mathfrak {C}}, \chi \rangle |^2 \left( \chi (1)^{-1/5} + n^{-100}\right) . \end{aligned}$$
(18)

Let \({\mathcal {X}}\) be the set of characters \(\chi \in {\text {Irr}}{G}\) such that \(\chi (1) < n^{1000}\). The part of the sum (18) with \(\chi \notin {\mathcal {X}}\) is bounded by

$$\begin{aligned} n^2 \sum _{\chi \notin {\mathcal {X}}} |\langle 1_{\mathfrak {C}}, \chi \rangle |^2 n^{-100} \ll n^{-98} \sum _{\chi \in {\text {Irr}}{G}} |\langle 1_{\mathfrak {C}}, \chi \rangle |^2 \asymp n^{-99}. \end{aligned}$$

Now consider some \(\chi \in {\mathcal {X}}\). Let \(\pi \in {\mathfrak {C}}\). It follows from the Murnaghan–Nakayama rule (splitting off an \(n'\)-cycle) that \(|\chi (\pi )| = O(1)\). Hence

$$\begin{aligned} |\langle 1_{\mathfrak {C}}, \chi \rangle | \ll \frac{|{\mathfrak {C}}|}{|G|} \asymp n^{-1}. \end{aligned}$$

It follows from the hook length formula that \(|{\mathcal {X}}| = O(1)\). Hence, since \(\langle 1_{\mathfrak {C}}, {\text {sgn}}\rangle = 0\),

$$\begin{aligned} n^2 \sum _{1 \ne \chi \in {\mathcal {X}}} |\langle 1_{\mathfrak {C}}, \chi \rangle |^2 (\chi (1)^{-1/5} + n^{-100}) \ll n^{-1/5} \end{aligned}$$

(the main term coming from the characters of degree \(n-1\)). Hence, from (18),

$$\begin{aligned} {\mathbf {P}}(E) \ll n^{-1/5}. \end{aligned}$$

We conclude that with high probability there is a word \(w \in F_3\) of length \(O(\log n)\) such that \(w(x, y, z) \in {\mathfrak {C}}\). Hence there is a word \(w' = w^{2 r n'}\) of length \(O(n \log n)\) such that \(w'(x, y, z)\) is a 3-cycle. With high probability the conjugation action of xyz on the set of 3-cycles has a uniform spectral gap (by Theorem A.8), so it follows that every 3-cycle is a word in xyz of length \(O(n \log n)\). Thus every element of \(A_n\) is a word in xyz of length \(O(n^2 \log n)\).

Alternative 2

The crude bound \(n^{-1/5}\) for the probability can be improved as follows. Write

$$\begin{aligned} n - 101 = n' + r \end{aligned}$$

where \(101 \not \mid n'\) and \(r \in \{99, 100\}\). Let \({\mathfrak {C}}\subseteq S_n\) be the normal subset of all elements having both a 101-cycle and an \(n'\)-cycle (the remaining part is an arbitrary element of \(S_r\)). Assuming \(n' > 101\),

$$\begin{aligned} \frac{|{\mathfrak {C}}|}{n!} = \frac{1}{101 n'} \asymp 1/n, \end{aligned}$$

and as before we have \(\langle 1_{\mathfrak {C}}, {\text {sgn}}\rangle = 0\). In fact, \(\langle 1_{\mathfrak {C}}, \chi \rangle = 0\) for all low-degree \(\chi \).

Lemma A.9

If \(1 \ne \chi \in {\text {Irr}}G\) and \(\langle 1_{\mathfrak {C}}, \chi \rangle \ne 0\), then \(\chi (1) \gg n^{98}\).

Proof

It is well-known that characters of \(S_n\) are parameterized by partitions \(\lambda \vdash n\). Let \(\chi = \chi _\lambda \) be a character such that \(\langle 1_{\mathfrak {C}}, \chi \rangle \ne 0\). By the Murnaghan–Nakayama rule, it must be the case that \(\lambda \) can be obtained by starting from (r) and adding a 101-rim-hook and an \(n'\)-rim-hook. Hence if \(\chi \) is nontrivial and n is sufficiently large then \(\lambda _1 \le n - 100\) and \(\lambda _1' \le n - 98\). From the hook length formula it follows that, for sufficiently large n,

$$\begin{aligned} \chi (1) \ge \chi _{(99, 1^{n-99})}(1) = \frac{n!}{n \, 98! \, (n-99)!} \asymp n^{98}. \end{aligned}$$

\(\square \)

It follows as before that, with probability at least

$$\begin{aligned} 1 - O(n^{-98/5}), \end{aligned}$$

there is a word \(w \in F_3\) of length \(O(\log n)\) such that \(w(x, y, z) \in {\mathfrak {C}}\). Hence there is a word \(w' = w^{r! n'}\) of length \(O(n \log n)\) such that \(w'(x, y, z)\) is a 101-cycle. By Theorem A.8 (and inspecting the proof), the conjugation action of xyz on the set of 101-cycles has spectral gap at least \(\delta \) with probability at least

$$\begin{aligned} 1 - O(n^{-1 + O(\delta ) + o(1)}). \end{aligned}$$

Taking \(\delta = 1/\log n\) (say), it follows that every 101-cycle is a word in xyz of length \(O(n \log n)\), and hence the diameter of \({\text {Cay}}(\langle S \rangle , S)\) is \(O(n^2 \log n)\), with probability

$$\begin{aligned} 1 - n^{-1+o(1)}. \end{aligned}$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Eberhard, S., Jezernik, U. Babai’s conjecture for high-rank classical groups with random generators. Invent. math. (2021). https://doi.org/10.1007/s00222-021-01065-x

Download citation