Abstract
Let \(G = {\text {SCl}}_n(q)\) be a quasisimple classical group with n large, and let \(x_1, \ldots , x_k \in G\) be random, where \(k \ge q^C\). We show that the diameter of the resulting Cayley graph is bounded by \(q^2 n^{O(1)}\) with probability \(1  o(1)\). In the particular case \(G = {\text {SL}}_n(p)\) with p a prime of bounded size, we show that the same holds for \(k = 3\).
Introduction
Let G be a group and S a symmetric (\(S = S^{1}\)) subset of G. Write \({\text {Cay}}(G, S)\) for the associated Cayley graph: the graph whose vertices are the elements \(g \in G\) and whose edges are pairs \(\{g, sg\}\) with \(g\in G, s\in S\). The graph \({\text {Cay}}(G, S)\) is connected if and only if S generates G, and its diameter is equal to the smallest d such that \((S \cup \{1\})^d = G\). A wellknown conjecture of Babai [9] states that
uniformly over all nonabelian finite simple groups G and symmetric generating sets S. In other words, every connected Cayley graph of a nonabelian finite simple group has diameter within a power of the trivial lower bound.
By the classification of finite simple groups, Babai’s conjecture splits into essentially three broad cases:

1.
Groups of Lie type of bounded rank over \({\mathbf {F}}_q\) with \(q \rightarrow \infty \);

2.
Classical groups of unbounded rank over \({\mathbf {F}}_q\) with q arbitrary;

3.
Alternating groups \(A_n\) with \(n \rightarrow \infty \).
For groups of Lie type and bounded rank, Babai’s conjecture is now completely resolved, following breakthrough work of Helfgott [23], Pyber–Szabó [39], and Breuillard–Green–Tao [5]. In the other two cases the conjecture remains open. For the alternating groups, Helfgott and Seress [25] proved that
For comparison, Babai’s conjecture (folkloric in this case) asserts that
thus we have a quasipolynomial bound instead of the expected polynomial bound. The case of classical groups of unbounded rank on the other hand is still wide open. The best bounds currently known are due to Biswas–Yang and Halasi–Maróti–Pyber–Qiao:
By contrast, Babai’s conjecture in this case asserts that
so we are still exponentially stupid. A key open case is the family of groups \({\text {SL}}_n(2)\) with n tending to infinity.
In all cases, an important subproblem is the case of random generators (see, e.g., [38, Problem 10.8.6]). Let \(k \ge 2\) be a small constant and let \(S = \{x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\), where \(x_1, \ldots , x_k \in G\) are uniform and independent. For groups of Lie type of bounded rank, it was proved by Breuillard, Green, Guralnick, and Tao [4] that \({\text {Cay}}(G, S)\) is almost surely^{Footnote 1} an expander, and in particular
There is no consensus about whether such a strong bound is likely to hold for groups of unbounded rank. Babai’s conjecture for \(A_n\) and random generators was an open problem for some time. The first polynomial bound was proved by Babai and Hayes, and the exponent has been lowered by SchlagePuchta and Helfgott–Seress–Zuk:
In this paper we consider the case of highrank classical groups over a small field. Recall that these are obtained from the groups
of automorphisms of a finite vector space \(V = {\mathbf {F}}_q^n\), in the latter three cases equipped with a nondegenerate alternating, quadratic, or hermitian form, respectively. Throughout we write \({\text {GCl}}_n(q)\) for any of these groups, and \({\text {SCl}}_n(q)\) for the corresponding derived subgroup
We will write \({\text {Cl}}_n(q)\) for any intermediate group:
Omitting a few small exceptional cases, \({\text {SCl}}_n(q)\) is a quasisimple group, so Babai’s conjecture applies.^{Footnote 2} For \({\text {SCl}}_n(q)\) with n large and random generators, the best bound out there is just the uniform bound (1).
There is a promising programme of Pyber, which aims to prove Babai’s conjecture in three steps. The programme is motivated by the positive solution in the case of random generators in alternating groups, especially the result of Babai–Beals–Seress [3] that \({\text {diam}}{\text {Cay}}(A_n, S) \le n^{O(1)}\) provided only that S contains an element of degree at most \(n/(3 + \epsilon )\). Here the degree of a permutation is the number of nonfixed points. Analogously, the degree of an element \(g \in {\text {GL}}_n(q)\) is defined to be the rank of \(g  1\), and Pyber’s programme is the following.

1.
Given some generators, find an element whose degree is at most \((1\epsilon )n\).

2.
Given an element of degree \((1\epsilon )n\), find an element of minimal degree.

3.
Given an element whose degree is minimal, finish the proof.
In the case of alternating groups, step 3 is essentially trivial, since there are only \(O(n^3)\) 3cycles in \(A_n\), but for \({\text {SCl}}_n(q)\) it is highly nontrivial. In the case of \({\text {SL}}_n(p)\), p prime, step 3 was accomplished recently by Halasi [22].
We have two things to contribute in the case of large n, small q. First, assuming we have at least 3 random generators, we will do steps 1 and 2 of Pyber’s programme.
Theorem 1.1
Let \(G = {\text {Cl}}_n(q)\), and assume \(\log q < c n / \log ^2 n\) for a sufficiently small constant \(c>0\). Let \(x, y, z \in G\) be random. Then with probability \(1  e^{cn}\) there is a word \(w \in F_3\) of length \(n^{O(\log q)}\) such that w(x, y, z) has minimal degree in \(G' = {\text {SCl}}_n(q)\).
Combined with Halasi’s result, this settles Babai’s conjecture for \({\text {SL}}_n(p)\), p prime and bounded, with at least 3 random generators.
Theorem 1.2
Let \({\text {SL}}_n(p) \le G \le {\text {GL}}_n(p)\), where p is prime and \(\log p < cn / \log ^2 n\). Let x, y, z be elements of G chosen uniformly at random, and let \(S = \{ x^{\pm 1}, y^{\pm 1}, z^{\pm 1} \}\). Then with probability \(1  e^{cn}\) we have
Second, assuming we have sufficiently many random generators depending on q, we will do step 3 in a particularly satisfactory way. In fact, we will prove that the Schreier graph of the action of G on O(1)tuples of vectors is almost surely a union of expander graphs. (The analogous result for the symmetric group is a result of Friedman, Joux, Roichman, Stern, and Tillich [15], and was essential in [26].)
Theorem 1.3
Let \(G = {\text {Cl}}_n(q)\), and let \(x_1, \ldots , x_k \in G\) be random. Let W be the set of rtuples of vectors in the natural module \(V = {\mathbf {F}}_q^n\). Assume that \(r < cn^{1/3}\), and that \(k \ge q^{C r^3}\). Then almost surely the Schreier graph of G generated by \(x_1, \ldots , x_k\) on any of its orbits in W has a uniform spectral gap.
As we will explain, this implies that if we have an element of minimal degree then by conjugation we can rapidly obtain a full conjugacy class of elements of minimal degree, and it follows in short order that the diameter of G is not too large. This completes the proof of Babai’s conjecture for \({\text {SCl}}_n(q)\) for k random generators, as long as k is sufficiently large compared to q.
Theorem 1.4
There are constants \(c, C>0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(n > C\). Let \(x_1, \ldots , x_k\) be elements of G chosen uniformly at random, where \(k > q^C\), and let \(S = \{ x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\). Then with probability \(1  q^{cn}\) we have
Corollary 1.5
Babai’s conjecture holds in the following two cases:

(1)
\({\text {SL}}_n(p)\), p prime and bounded, and at least 3 random generators;

(2)
\({\text {SCl}}_n(q)\) and at least \(q^C\) random generators, where C is an absolute constant.
Our method does not depend on the classification of finite simple groups (CFSG) in any way. Having a CFSGfree method is valuable for transparency, but moreover we think it is essential for attacking Babai’s conjecture. It is wellknown that two random elements of \({\text {SCl}}_n(q)\) almost surely generate the group: this is a result of Kantor and Lubotzky [30]. Kantor and Lubotzky rely on CFSG through Aschbacher’s theorem, so unfortunately their method does not adapt well to proving diameter bounds. By contrast, in [14] the first author and Virchow found a CFSGfree proof in the case of \({\text {SL}}_n(q)\) and expressed the hope that the method would be generalizable. We recycle several ideas from that paper in the present one.
Perhaps the most important idea in our method is the idea that if \(x, y, z \in G\) are random and independent, then the elements xw(y, z) for all short words \(w \in F_2\) behave roughly independently, which allows us to imitate having many more than just 3 generators. This is a more powerful version of the “\(xy^i\) trick”, which comes originally from [3, Sect. 4] and has been essential in all subsequent work on the random generator subproblem in high rank.
Let us mention one further result, of independent interest. In the appendix we give analogous arguments for \(A_n\), based on the standard fanciful idea that \(A_n = {\text {PSL}}_n(1)\). The value of doing so is mostly motivational, but we also obtain a new result. Provided \(k \ge 3\), we sharpen (2) to
This is a modest improvement, but it is interesting for being conjecturally sharp for any proof which uses elements of small support as a stepping stone. Decreasing the exponent 2 appears to require a radically new idea.
Reader’s guide
We first record some preliminaries (Sect. 2) regarding asymptotic notation, Cayley and Schreier graphs, classical groups and their associated formed spaces and the notions of degree and support, and adjacency operators.
Next we turn to a more specialized preparatory section (Sect. 3) dealing with word maps, where we introduce the vocabulary of queries, coincidences, and trajectories. Briefly, the idea is that if \(w \in F_k\) is a given word, \(v \in V\) a given vector, and \(x_1, x_2, \ldots , x_k \in G\) random, then evaluating \(w(x_1, \ldots , x_k) v\) can be thought of as a kind of random walk. As much as possible we recycle the key language used by [15] in the case of the symmetric group. The tools of this section will be used in two essentially different ways in the rest of the paper.
We proceed (Sect. 4) by showing that a given short word w evaluated at random elements \(x_1, \ldots , x_k \in G\) almost surely has large support (Theorem 4.2). This is a kind of antithesis to step 1 of Pyber’s programme: all sufficiently short words in random generators will in fact fail to have degree \((1\epsilon )n\). However, this is interesting when combined with recent character bounds of Guralnick–Larsen–Tiep [18, 19], as it implies that the character ratio \(\chi (w(x_1, \ldots , x_k)) / \chi (1)\) is almost surely small for each nonlinear character \(\chi \) (Corollary 5.3).
This bound on the expectation of \(\chi (w(x_1, \ldots , x_k)) / \chi (1)\) is one of the two main ingredients in the “xw(y, z) trick”, which is the subject of Sect. 6. This trick shows that, given random generators \(x_0, x_1, \ldots , x_k\), one can almost surely find a short word \(x_0 w(x_1, \ldots , x_k)\) lying in a given normal subset \({\mathfrak {C}}\subseteq G\), provided that the density of \({\mathfrak {C}}\) is large compared to the expected values of character ratios. The trick is a simple consequence of the second moment method, following the observation that the elements \(x_0 w(x_1, \ldots , x_k)\) for various w are approximately pairwise independent.
The other main ingredient is the construction of an appropriate normal set \({\mathfrak {C}}\). This is the subject of Sect. 7. For each classical group we find a large normal set \({\mathfrak {C}}\), all of whose fibres over \(G^\text {ab}\) are large (allowing us to ignore linear characters), and a small integer m such that for every \(g \in {\mathfrak {C}}\) the power \(g^m\) has minimal degree in \({\text {SCl}}_n(q)\). This completes the proof of Theorem 1.1.
Once we have an element of minimal degree, we can act on that element by conjugation. Since the minimal degree in all cases is at most 2, this action is a constituent of the usual permutation action on 4tuples of vectors. We analyze this action by again using the language of trajectories and coincidences, and the trace method: we bound a high moment of the second eigenvalue by bounding the trace of the corresponding power of the adjacency matrix, interpretting the latter in terms of closed trajectories. This is analogous to a result for the symmetric group due to Friedman, Joux, Roichman, Stern, and Tillich [15], building on earlier work of Broder–Shamir [8]. However, in the case of classical groups there are some extra combinatorial complications that do not arise for symmetric groups.
We first focus (Sect. 8) on describing the structure of a closed trajectory with only one coincidence. We deal with the motivational case of G acting on V first, and then generalize to the action on tuples of vectors.
These results are then (Sect. 9) used to show that, in an orbit of G of size N, the probability that a trajectory closes is close to 1/N, with a small relative error. Again we first deal with the motivational case of G acting on V. Provided that we have sufficiently many generators in terms of q, these bounds are good enough for the trace method to work. This completes the proof of Theorem 1.3.
Finally, in Sect. 10 we collect results and deduce Theorems 1.2 and 1.4.
Many (but not all) of our arguments have natural analogues for the symmetric group. For independent interest and for motivation, these are presented in Appendix A.
Preliminaries
This section fixes some notation and definitions that will be relevant throughout the paper. The reader needing an introduction to expansion, particularly in Cayley and Schreier graphs, could consult Kowalski [31]. For an introduction to classical groups, see Aschbacher [2, Chapter 7] or Grove [20].
Asymptotic notation
Many of the arguments we will use are of asymptotic nature and we adopt standard asymptotic notation to state these. Given functions f, g, we write \(f \ll g\) or equivalently \(f = O(g)\) to denote that there are absolute constants \(N, C > 0\) so that \(f(n) \le C \cdot g(n)\) for all \(n \ge N\). Let \(f \asymp g\) mean that \(f \ll g\) and \(g \ll f\). We write \(f = o(g)\) to denote that for every \(\epsilon > 0\) there is a constant N so that \(f(n) \le \epsilon \cdot g(n)\) for all \(n \ge N\). Let \(f = \omega (g)\) mean that \(g = o(f)\).
We will generally write statements that involve anonymous (usually absolute) constants by using c for small constants and C for big constants.
Cayley and Schreier graphs
Let G be a group with generating set S satisfying \(S = S^{1}\). The (undirected, left) Cayley graph \({\text {Cay}}(G,S)\) is the graph whose vertices are elements of G and whose edges are pairs \(\{ g, s g \}\) for \(g \in G, s \in S\).
More generally, the (undirected) Schreier graph \({\text {Sch}}(G,S,\Omega )\) associated to a transitive action of G on a set \(\Omega \) is the graph whose vertices are elements of \(\Omega \) and whose edges are pairs \(\{ \omega , s \omega \}\) for \(\omega \in \Omega , s \in S\). Cayley graphs are Schreier graphs for the left regular representation of G on itself.
Let \(\Gamma \) be a connected graph. One can view \(\Gamma \) as a metric space in the following way. Define the length of a path in \(\Gamma \) to be the number of edges on the path, and let the distance \(d_\Gamma (v_1, v_2)\) between any two vertices \(v_1, v_2 \in V(\Gamma )\) be the length of the shortest path between \(v_1, v_2\). The diameter of a graph \(\Gamma \) is
The diameter of \({\text {Cay}}(G, S)\) is just the smallest \(d \ge 0\) such that \((S \cup \{1\})^d = G\).
Classical groups
Throughout the paper we write \({\text {SCl}}_n(q) \le {\text {GCl}}_n(q) \le {\text {GL}}_n(q)\) for any of the following groups:
In all cases the defining module is \(V = {\mathbf {F}}_q^n\). We sometimes refer to the first case as the linear case. We make the following conventions in the other cases (notation in other literature sometimes differs, particular in the \({\text {GU}}\) case):
 \({\text {Sp}}_n\)::

n must be even.
 \({\text {GO}}_n^{(\pm )}\)::

\(\Omega _n(q) = {\text {SO}}_n(q)'\). If n is even there are two possibilities, denoted \({\text {GO}}_n^+(q)\) and \({\text {GO}}_n^(q)\), depending on the choice of quadratic form. If n is odd there is only \({\text {GO}}_n(q)\), and q must be odd.
 \({\text {GU}}_n\) ::

q must be a square \(q_0^2\). The field automorphism of \({\mathbf {F}}_q\) of order 2 is denoted \(\theta \).
Note that any such group corresponds to a subgroup of the abelianization \({\text {GCl}}_n(q)^\text {ab}\), which is given as follows:
Binary and quadratic forms
In all cases we write f for the defining invariant binary form; thus f is zero in the linear case, alternating in the symplectic case, symmetric in the orthogonal case, and hermitian in the unitary case. Except in the linear case, f is nondegenerate.
In the orthogonal case, we write Q for the relevant quadratic form. Recall that Q is related to f by
in particular, in odd characteristic,
In even characteristic, Q is not determined by f, but is part of the defining data (and f is determined by Q via (3)). In the unitary case we write Q for the function
which we may regard as a quadratic form over \({\mathbf {F}}_{q_0}\). In the other cases define \(Q \equiv 0\). Define also \(q_0 = q\) in the orthogonal case and \(q_0=1\) in the linear and symplectic cases, so that Q always takes values in a \(q_0\)element space.
It is important that we are able to count solutions to \(Q(v) = x\) in any affine subspace.
Lemma 2.1
Let \(v_0 + W\) be an affine subspace of V of codimension s. The number of \(v \in v_0 + W\) with a specified value of Q(v) is within \(q^{n  s}/q_0 \pm q^{n/2}\).
Proof
(Cf. Dickson [11, Chapter IV].) This is trivial in the linear and symplectic cases: \(Q \equiv 0\), so the number is exactly \(q^{ns}\). The unitary case reduces to the orthogonal case by restriction of scalars, so it suffices to consider the orthogonal case.
For \(x \in {\mathbf {F}}_q\), let
The Fourier transform of \(\Phi \) is
For nontrivial \(\chi \) we have
The sum over w is zero unless \(h \in W^\perp \). Note that \(\dim W^\perp = s\). Hence
By Fourier inversion we have
so
Relatedly, we have Witt’s lemma, which characterizes the orbits of \({\text {GCl}}_n(q)\) in terms of f and Q.
Lemma 2.2
(Witt’s lemma) Let \(u_1, \ldots , u_k, v_1, \ldots , v_k \in V\) be vectors such that
Then there is an element \(g \in {\text {GCl}}_n(q)\) such that \(g u_i = v_i\) for each \(1 \le i \le k\). If \(k \le n2\) there is such an element in \({\text {SCl}}_n(q)\).
Proof
See, e.g., [2, Sect. 20]. \(\square \)
Degree and support
The concepts of degree and support are essential in the rest of the paper. Both concepts are analogous to the size of the support of a permutation, defined as the set of nonfixed points. The degree of an element \(g \in {\text {GL}}_n(q)\) is
the support of \(g \in {\text {GL}}_n(q)\) is
(the former definition follows [10] and [24]; the latter definition follows Larsen–Shalev–Tiep [37]). Equivalently, if \(V_\lambda = \ker (g  \lambda )\) denotes the \(\lambda \)eigenspace of g (for \(\lambda \in \overline{{\mathbf {F}}_q}\)), then
Support is closely related to the size of the centralizer, as in the following lemma.
Lemma 2.3
For \(g \in G \le {\text {GL}}_n(q)\),
Proof
(Cf. [35, Lemma 3.1].) Clearly
Note that \(C_{{\text {M}}_n({\mathbf {F}}_q)}(g)\) is a vector space over \({\mathbf {F}}_q\), so it will suffice to bound its dimension. Consider g as an element of \({\text {GL}}_n(\overline{{\mathbf {F}}_q})\) and decompose it into Jordan blocks. For each eigenvalue \(\lambda \) of g, let \(\pi _\lambda \) be the partition whose parts are the sizes of Jordan blocks associated to \(\lambda \). Denote by \(S^i(\pi )\) the sum of ith powers of the parts of a partition \(\pi \) and let \(\pi '\) be the transposed partition of \(\pi \). By [27, Sect. 1.3],
The largest part of \(\pi _\lambda '\) is the dimension of \(V_\lambda \), so
Combined with \(\sum _{\lambda } S^1(\pi _\lambda ') = n\), this implies
Adjacency operator
Given any group G and \(x_1, \ldots , x_k \in G\), let
This is an element of the group algebra \({\mathbf {C}}[G]\). Given any \({\mathbf {C}}[G]\)module W, we may consider the action of \({\mathcal {A}}\) on W. Since \({\mathcal {A}}\) is selfadjoint its spectrum is real. Write \(\rho ({\mathcal {A}}, W)\) for the spectral radius of \({\mathcal {A}}\).
We are most interested in permutation modules. If G acts transitively on a set \(\Omega \) then there is a corresponding permutation module \({\mathbf {C}}[\Omega ]\) containing a single copy of the trivial representation, denoted \({\mathbf {C}}[\Omega ]^G\). Let \(W = {\mathbf {C}}[\Omega ]_0\) denote the orthogonal complement of \({\mathbf {C}}[\Omega ]^G\). The spectral gap is \(1  \rho ({\mathcal {A}}, W)\). Equivalently, if \({\mathcal {A}}\) acting on \({\mathbf {C}}[\Omega ]\) has spectrum
where \(N = \Omega \), then
so the spectral gap is
We say the action of \(x_1, \ldots , x_k\) on \(\Omega \) is expanding if the spectral gap is bounded away from zero. This is equivalent to rapid mixing of the random walk on \(\Omega \).
Word maps, queries, and trajectories
Word maps
Write \(F_k = F\{\xi _1, \ldots , \xi _k\}\) for the free group with generators \(\{\xi _1, \ldots , \xi _k\}\). Let \(w \in F_k\) have length \(\ell \), and let
be the reduced expression of w. Let G be a finite group and \(x_1, \ldots , x_k \in G\). Write
for the image of w under the homomorphism \(F_k \rightarrow G\) defined by \(\xi _i \mapsto x_i\).
Usually, but not always, \(x_1, \ldots , x_k\) will be chosen randomly. The following lemma is often useful for reducing to the cyclically reduced case.
Lemma 3.1
If \(x_1, \ldots , x_k \in G\) are uniform and independent then \({{\overline{w}}}\) is just the image of w under a uniformly random homomorphism \(F_k \rightarrow G\). In particular, the distribution of \({{\overline{w}}}\) depends only on the automorphism class of w.
Queries and coincidences
Let \(G = {\text {Cl}}_n(q)\) be a classical group and \(V = {\mathbf {F}}_q^n\) the defining module. Let \(x_1, \ldots , x_k \in G\). Define a query to be a pair \((\xi , v)\), where \(\xi \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}\) and \(v \in V\); the result of the query is \({\overline{\xi }} v\). After any finite sequence of queries
the known domain of a letter \(\xi \) at time t is
Suppose we make a further query \((w_t, v_t)\). If \(v_t \in D_{w_t}^t\), then the result \(\overline{w_t} v_t\) is determined already by the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t1}} v_{t1}\); we call this a forced choice. Otherwise, we say the query is a free choice.
Let R be some subset of V fixed in advance. If a query \((w_t, v_t)\) is a free choice and yet
then we say the result of the query is a coincidence.
The language is most interesting when \(x_1, \ldots , x_k \in G\) are chosen randomly. Then, by Witt’s lemma, whenever \((\xi , v)\) is a free choice, \({\overline{\xi }} v\) is, conditionally on the result of previous queries, uniformly distributed among vectors satisfying the relevant independence and form conditions. In particular, coincidences are unlikely. We formalize these key points in the following lemmas.
Lemma 3.2
Let \(x \in G\) be uniformly random, and let \(u_1, \ldots , u_t\) be linearly independent, where \(t \le n2\). Then, conditionally on the values of \(v_1 = x u_1, \ldots , v_{t1} = x u_{t1}\), the value of \(x u_t\) is uniformly distributed among vectors \(v_t\) such that \(u_i \mapsto v_i\) defines an isometric isomorphism \(\langle u_1, \ldots , u_t\rangle \rightarrow \langle v_1, \ldots , v_t \rangle \), or in other words such that \(v_t \notin {\text {span}}\{v_1, \ldots , v_{t1}\}\) and \(f(u_i, u_t) = f(v_i, v_t)\) for each \(i\le t\) and \(Q(u_t) = Q(v_t)\).
Proof
For each such \(v_t\), Witt’s lemma asserts that there is at least one suitable \(x \in G\). The distribution is uniform by the orbit–stabilizer theorem. \(\square \)
Lemma 3.3
Let \(x_1, \ldots , x_k \in G\) be uniformly random and independent, and let
be a sequence of queries. Assume that \((w_t, v_t)\) is a free choice. Assume
Then, conditionally on the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t1}} v_{t1}\), the result \(\overline{w_t} v_t\) of the query \((w_t, v_t)\) is uniformly distributed outside \(D_{w_t^{1}}^t\) subject to
In particular, the conditional probability that \(\overline{w_t} v_t\) is a coincidence is bounded by
(provided the denominator is positive), where
and s is the number of \(i < t\) with \(w_i \in \{w_t, w_t^{1}\}\).
Proof
The first part of the lemma is immediate from the previous lemma. For the second part, note that \(\overline{w_t} v\) is drawn from an affine subspace of codimension at most s, less a subspace of dimension at most s, subject only to the quadratic condition; by Lemma 2.1 there are at least \(q^{ns}/q_0  q^{n/2}  q^s\) possibilities, so we get at least the denominator claimed. \(\square \)
Remark 3.4
In the linear case there are no form conditions, so we get the simpler bound \(q^d / (q^n  q^s)\) for the probability of a coincidence.
Trajectories
Let \(w \in F_k\), and let
be the reduced expression. For each \(v \in V\), the trajectory of v is the sequence of queries \((w_t, v^{t1})\), where \(v^0 = v\) and for each \(t \ge 1\) the vector \(v^t\) is the result of the query \((w_t, v^{t1})\); in other words, the sequence \(v^0, v^1, \ldots , v^\ell \) is defined by
The following lemma is trivial but essential.
Lemma 3.5
Suppose \(v \ne 0\) and \(v^\ell \in {\text {span}}R\). Then there is at least one coincidence in the trajectory of v.
Proof
Since \(D_{w_1}^1 = 0\), the first query \((w_1, v^0)\) is free. For each \(t\ge 1\), if \((w_t, v^{t1})\) is free and not a coincidence then
while
hence the query \((w_{t+1}, v^t)\) is also free. Finally if \((w_\ell , v^{\ell 1})\) is free and not a coincidence then \(v^\ell \notin {\text {span}}R\). \(\square \)
More generally for any \(r\ge 1\) we consider the joint trajectory of an rtuple
which is simply the rtuple of individual trajectories, with the queries \((w_t, v_i^{t1})\) ordered lexicographically by (t, i); i.e., we answer the queries
in reading order. Write \(\prec \) for this order, i.e., \((t',i') \prec (t,i)\) if \(t' < t\) or \(t'=t\) and \(i' < i\). The following lemma generalizes the previous one.
Lemma 3.6
Suppose \(v_i \notin {\text {span}}\{v_1, \ldots , v_{i1}\}\) and \(v_i^\ell \in {\text {span}}R\). Then there is at least one coincidence in the trajectory of \(v_i\) (during the joint trajectory of \(v_1, \ldots , v_r\)).
Proof
At time (1, i), we have
so the first query \((w_1, v_i^0)\) is free. For each \(t\ge 1\), if \((w_t, v_i^{t1})\) is free and not a coincidence then
(the vectors \(v_{i'}^{t'}\) with \(t' = t\) and \(i' < i\) get included because they are results of previous queries), while
hence the query \((w_{t+1}, v_i^t)\) is also free. Finally if \((w_\ell , v_i^{\ell 1})\) is free and not a coincidence then \(v_i^\ell \notin {\text {span}}R\). \(\square \)
The probability of small support
Let G be a finite group, let \(w \in F_k\), let \(x_1, \ldots , x_k \in G\) be random, and consider \({{\overline{w}}} = w(x_1, \ldots , x_k)\). The probability that \({{\overline{w}}} = 1\) quantifies the extent to which w is “almost a law” in G. This probability is a wellstudied quantity, particularly when G is simple. For example, it is known that for any \(w \ne 1\) there is some \(c = c(w) > 0\) such that \({\mathbf {P}}({{\overline{w}}} = 1) \le G^{c}\) for all sufficiently large finite simple groups G (Larsen–Shalev [35, Theorem 1.1]).
For groups of large rank (our particular interest), the following bounds have been proved recently. Let \(\ell > 0\) be the reduced length of w.

1.
For \(G = A_n\) or \(G = S_n\), if \(\ell < cn^{1/2}\) then
$${\mathbf {P}}({{\overline{w}}}=1) \le e^{c n / \ell ^2}$$(Eberhard [12, Lemma 2.2]).

2.
For any classical group \(G = {\text {Cl}}_n(q)\), if \(\ell < cn\) then
$${\mathbf {P}}({{\overline{w}}}=1) \le G^{c/\ell }$$(Liebeck–Shalev [36, Theorem 4]).
The proofs of these estimates can be adapted to show more, namely that with high probability \({{\overline{w}}}\) has large support. In this section we explain this observation in detail in the case of \(G = {\text {Cl}}_n(q)\). For the case of \(G = A_n\) or \(G = S_n\), see the appendix (Subsection A.2).
The following lemma generalizes a key step from the argument of [36, Theorem 4].
Lemma 4.1
Let \(G = {\text {Cl}}_n(q)\) be a classical group of dimension n. Let \(V = {\mathbf {F}}_q^n\) be the natural module, and let \(U \le V\) be a subspace of dimension \(r \le n2\). Let \(w \in F_k\) be a nontrivial word of length \(\ell \le (\frac{n}{2}  2)/r\). Then
where \(C_{q^r} = 1 + (1  q^{r})^{1} \le 3\).
Proof
Let \(v_1, \ldots , v_r\) be a basis for U. Consider the joint trajectory of \(v_1, \ldots , v_r\). By Lemma 3.6 with \(R = \{v_1, \ldots , v_r\}\), we can have \({{\overline{w}}} U = U\) only if there is at least one coincidence in each individual trajectory. We take a union bound over all possibilities for when the coincidences could occur. If \(t < \ell \), then by Lemma 3.3, the probability that step (t, i) is a coincidence is bounded by
indeed there are at most \(t r + i \le (t+1) r \le \ell r\) previous vectors. If \(t = \ell \), assuming \(v_j^\ell \in U\) for \(j < i\), we actually get a slightly stronger bound:
Summing over t, the probability that there is a coincidence in the trajectory of \(v_i\) is bounded by
Taking the product over i gives the claimed bound. \(\square \)
In the following proof we will refer to the “qbinomial coefficient”, defined by
When x is a nonnegative integer this is the number of rdimensional subspaces of \({\mathbf {F}}_q^x\). For \(x \ge r\) note that \(x\mapsto \left( {\begin{array}{c}x\\ r\end{array}}\right) _q\) is increasing and nonnegative, and
The following theorem will be used for an unspecified, but fixed, \(\delta > 0\).
Theorem 4.2
There are constants \(c, C>0\) such that the following holds for all \(\delta > 0\). Let \(G = {\text {Cl}}_n(q)\) be a classical group of dimension n, and let \(w \in F_k\) be a nontrivial word of reduced length \(\ell < \delta ^2 n / 20\). Assume \(q^{\delta n} > C\). Then
Proof
Let \(x_1, \ldots , x_k\) be chosen independently and uniformly from G. Suppose some eigenspace \(V_\lambda \le \overline{{\mathbf {F}}_q}^n\) of \({{\overline{w}}}\) has dimension at least \(\delta n\). Let \(d = [{\mathbf {F}}_q(\lambda ):{\mathbf {F}}_q]\). Let \(\Lambda \) be the set of d Galois conjugates of \(\lambda \). Since \(\dim V_{\lambda '} = \dim V_\lambda \) for each \(\lambda ' \in \Lambda \), \(\dim V_\lambda \le n / d\), so \(d \le 1 / \delta \). Let \(W \le V_\lambda \) be an rdimensional subspace defined over \({\mathbf {F}}_q(\lambda ) \cong {\mathbf {F}}_{q^d}\). Then there is a conjugate subspace \(W' \le V_{\lambda '}\) for each \(\lambda ' \in \Lambda \), and the sum \(U = \sum _{\lambda ' \in \Lambda } W'\) is a drdimensional and \({\mathbf {F}}_q\)rational since it is fixed by the Galois group, so it may be identified with a drdimensional subspace of V. Since \(U \cap V_\lambda = W\), this correspondence \(W \mapsto U\) is injective. Hence the number of drdimensional subspaces of V preserved by \({{\overline{w}}}\) is at least \(\left( {\begin{array}{c}\delta n\\ r\end{array}}\right) _{q^d}\).
Since \(\ell d \le \ell / \delta < \delta n / 20\), we may choose an integer \(r > 0\) such that \(\ell d r \in [\delta n / 5, \delta n / 4]\). Now by the previous lemma and Markov’s inequality, the probability that the number of drdimensional subspaces of V preserved by \({{\overline{w}}}\) is at least \(\left( {\begin{array}{c}\delta n\\ r\end{array}}\right) _{q^d}\) is bounded by
Taking the sum over all \(d \le 1/ \delta \), it follows that
Assuming \(q^{\delta n}\) is sufficiently large, the first two factors are negligible compared to the third. \(\square \)
Remark 4.3
The restriction \(\ell < c \delta ^2 n\) in Theorem 4.2 is essential, and related to our reliance on linear algebra. For example, let \(G = {\text {SL}}_n(q)\), and suppose w is a word of length \(\ell \approx 10 n\). We do not know how to bound \({\mathbf {P}}({{\overline{w}}} = 1)\) satisfactorily. Is it true that \({\mathbf {P}}({{\overline{w}}} = 1) \le q^{cn}\) for some \(c>0\)? Certainly w cannot be a law, because \({\text {SL}}_n(q)\) contains \({\text {SL}}_2(q^{\left\lfloor {n/2}\right\rfloor })\) and the shortest law in \({\text {SL}}_2(q^{\left\lfloor {n/2}\right\rfloor })\) has length at least \((q^{\left\lfloor {n/2}\right\rfloor }  1)/3\) (see Hadad [21, Theorem 2]). The question is whether it can be an almostlaw.
Expected values of characters
Throughout this section let \(G = {\text {Cl}}_n(q)\) be a classical group and \(\chi \in {\text {Irr}}G\) a nonlinear character. Our aim is to bound
when w is a fixed nontrivial word of length cn, evaluated at random \(x_1, \ldots , x_k \in G\). The proof consists of two steps:

1.
By the previous section, with high probability \({{\overline{w}}}\) has large support.

2.
By recent character bounds of Guralnick, Larsen, and Tiep [18, 19], if \({{\overline{w}}}\) has large support then \(\chi ({{\overline{w}}}) \le \chi (1)^\epsilon \).
We first deal with elements of large support.
Lemma 5.1
For every \(\epsilon >0\) there is a \(\delta > 0\) such that the following holds. Let \(g \in G\) with \({\text {supp}}g \ge (1\delta )n\). Then \(\chi (g) \le \chi (1)^\epsilon \).
Proof
By Lemma 2.3, \(C_G(g) \le q^{\delta n^2}\). Hence by the character bound [18, Theorem 1.3] we have \(\chi (g) \le \chi (1)^\epsilon \). \(\square \)
Theorem 5.2
There is a constant \(c > 0\) such that the following holds. Let \(w \in F_k\) be a fixed nontrivial word of reduced length less than cn. Then
Proof
Let \(\delta \) be as in the previous lemma with \(\epsilon = 1/2\). By conditioning on whether or not \({\text {supp}}{{{\overline{w}}}} < (1\delta )n\), we have
It follows from Theorem 4.2 that
for some constant \(c_1 > 0\). The other summand is bounded by Lemma 5.1:
for some constant \(c_2 > 0\). (Here we used \(\chi (1) \ge q^{c_3 n}\): see [32].) \(\square \)
Our main interest is the case in which w is the result of a simple random walk in \(F_k\). With high probability the result of the random walk is nontrivial, so we can apply the above theorem.
Corollary 5.3
There is a constant \(c > 0\) such that the following holds. Let w be the result of a simple random walk of length \(\ell < cn\) in \(F_k\). Then
Proof
By conditioning on whether or not the word w is trivial, we get
The first term is bounded by Theorem 5.2. The second term is the return probability of a simple random walk on a 2kregular tree, which is at most \(k^{c \ell }\) for a constant \(c > 0\) (see [29, Theorem 3 and Lemma 2.2] or [15, Appendix B]). \(\square \)
Reaching a normal subset: the xw(y, z) trick
In this section, something of an interlude, let G be any finite group, and let \({\mathfrak {C}}\) be a normal (i.e., conjugacyclosed) subset of a group G. We will develop a criterion ensuring that one can, with high probability as \(x,y,z \in G\) are chosen uniformly at random, find a word \(w \in F_2\) of at most a prescribed length such that \(x w(y, z) \in {\mathfrak {C}}\). The criterion applies to sets \({\mathfrak {C}}\) whose density is large compared to the expected values of characters. This is a variation of the technique used in [13, Sect. 4]; see also [14, Sect. 2].
The following theorem expresses the most general such estimate we will need, in which we further allow arbitrary weights to be attached to elements of \({\mathfrak {C}}\). We express the result in terms of a nonnegative conjugationinvariant function (class function) f on G. We define the \(L^p\) norm of f by
and we use the standard inner product on functions on G defined by
Theorem 6.1
Let f be a nonnegative and conjugationinvariant function on G, and let \(\ell \) be a positive integer. Let \(x_0, x_1, \ldots , x_k\) be elements of G chosen uniformly at random. Let E be the event that \(f(x_0 {{\overline{u}}}) = 0\) for every word \(u \in F_k\) of length at most \(\ell \). Let w be the result of a simple random walk of length \(2\ell \) in \(F_k\). Then^{Footnote 3}
In particular,
Proof
Let \({\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k}\) be the adjacency operator defined in Sect. 2.6, and consider its natural action on \(L^2(G)\). Let \(X = {\mathcal {A}}^\ell f(x_0)\), regarded as a random variable dependent on \(x_0, x_1, \ldots , x_k\), and note that E is precisely the event \(X = 0\). By Chebyshev’s inequality,
The first moment is
The second moment is
Since f is conjugationinvariant, we can expand this further in terms of characters. By orthogonality of characters, if \(\tau _x\) is the translation operator defined by \(\tau _x(h)(y) = h(x^{1} y)\), we have
Hence
and
where w is the result of a simple (symmetric) random walk of length \(2\ell \) in \(F_k\). Hence, from (5),
The \(\chi = 1\) term is \(\Vert f\Vert _1^2\), which is the same as \(({\mathbf {E}}X)^2\). Hence the first part of the theorem follows from (4). The second part holds because
Corollary 6.2
Let \({\mathfrak {C}}\) be a normal subset of G. Write
where \({\mathfrak {C}}_\alpha = {\mathfrak {C}}\cap \alpha G'\) is the fibre of \({\mathfrak {C}}\) over \(\alpha \in G^\text {ab}\). Let \(\delta _\alpha = {\mathfrak {C}}_\alpha  / G'\) be the fibre density, and let \(\delta = \min _{\alpha \in G^\text {ab}} \delta _\alpha \). Assume \(\delta > 0\).
Let \(x_0, x_1, \ldots , x_k \in G\) be chosen uniformly at random, and let E be the event that for every word \(u \in F_k\) of length at most \(\ell \) we have \(x_0 {{\overline{u}}} \notin {\mathfrak {C}}\). Let w be the result of a simple random walk of length \(2\ell \) in \(F_k\). Then
Proof
In the previous theorem, take
Then \(\Vert f\Vert _1 = 1\), and
Thus
Now if \(\chi \ne 1\) is onedimensional then \(\chi \) factors through \(G^\text {ab}\), so
Hence
Obtaining an element of minimal degree
Let \(G = {\text {GCl}}_n(q)\). Let s be the minimal degree of a nontrivial element of \({\text {SCl}}_n(q)\); thus \(s = 2\) in the orthogonal case and \(s = 1\) otherwise. Let
In this section we exhibit a large normal subset \({\mathfrak {C}}_d \subseteq G\) with an integer parameter d whose \(q^d  1\) power is contained in \({\mathfrak {M}}\). We will use \({\mathfrak {C}}_d\) in combination with Corollaries 5.3 and 6.2 to obtain an element of minimal degree as a short word in random generators.
Proposition 7.1
There is a constant \(C > 0\) so that the following holds. Let \(d \in [2, n]\) be an integer parameter. Assume \(q^d > Cn\). Then there is a normal subset \({\mathfrak {C}}_d \subseteq G\) with the following properties.

(1)
For every \(\alpha \in G^\text {ab}\), if \({\mathfrak {C}}_{d;\alpha }\) is the fibre of \({\mathfrak {C}}_d\) over \(\alpha \), then
$$\begin{aligned} \frac{{\mathfrak {C}}_{d; \alpha }}{G} \ge \exp \left( O(d^2 \log q)  O(d^{1} n \log n) \right) . \end{aligned}$$ 
(2)
For every \(g \in {\mathfrak {C}}_d\), we have
$$\begin{aligned} g^{\kappa (q^d  1)} \in {\mathfrak {M}}, \end{aligned}$$where \(\kappa = 2\) if G is orthogonal in even characteristic, and \(\kappa = 1\) otherwise.
The proof is split into cases depending on the type of G.
The linear case
Let \(G = {\text {GL}}_n(q)\). In this case \({\mathfrak {M}}\) is the set of transvections. Let V be the natural module for G. Write
i.e., let \(k = \left\lfloor {\frac{n3}{d}}\right\rfloor \) and \(r = n  3  k d\). Decompose V as
where \(\dim L = 2\), \(\dim V_i = d\), \(\dim R = 1\), and \(\dim W = r\). Fix a basis for each of the subspaces.
We now define a particular element \(g \in {\text {GL}}(V)\) respecting the above decomposition. We define g by its action on the chosen basis for each of the subspaces above.
 Subspace L::

Let g act as a transvection on L, say \({\begin{matrix} 1 &{}\quad 1 \\ 0 &{}\quad 1 \end{matrix}}\). Note that \((g_L)^{q^d1} = (g_L)^{1}\) is also a transvection.
 Subspace \(V_i\) ::

Let \(p_i\) be a monic irreducible polynomial of degree d over \({\mathbf {F}}_q\). Identify \(V_i\) with \({\mathbf {F}}_q[t]/(p_i(t))\). The variable t acts on the latter space by multiplication. Let g act on \(V_i\) as multiplication by t. Note that the minimal polynomial of this transformation is \(p_i\), and \((g_{V_i})^{q^d1} = 1\).
 Subspace R::

Let \(\alpha \in G^\text {ab}\). Let g act on R as the scalar \(\det (\alpha )/ \prod _{i = 1}^k (1)^d p_i(0)\).
 Subspace W::

Let g act trivially on W.
The union is disjoint, because the minimal polynomial of each element of \(g_{p_1, \ldots , p_k; \alpha }^G\) is divisible by \(p_1(t) \cdots p_k(t)\) (the other factors are \((t1)^2\) and \((t\lambda )\) for \(\lambda = \det (\alpha ) / \prod _{i=1}^k (1)^d p_i(0)\) if \(\lambda \ne 1\)). Finally let
Remark 7.2
This is a variation of the construction in [14, Sect. 3.2].
Proof of Proposition 7.1for \({\text {GL}}\). By construction, \({\mathfrak {C}}_{d;\alpha }\) is the fibre of \({\mathfrak {C}}_d\) over \(\alpha \), and for every \(p_1, \ldots , p_k,\alpha \) we have \(g_{p_1, \ldots , p_k; \alpha }^{q^d1} \in {\mathfrak {M}}\). It remains only to estimate the density of \({\mathfrak {C}}_{d;\alpha }\).
For \(g = g_{p_1, \ldots , p_k; \alpha }\), we have (as in the proof of Lemma 2.3)
Therefore
Recall that
In particular, by the hypothesis \(q^d > Cn\) we have \({\mathfrak {I}}_d > k\), and in fact
Hence, from (6), since \(r < d\),
This proves the proposition. \(\square \)
Other classical groups
Let \(G = {\text {GCl}}_n(q)\), where \({\text {GCl}}\ne {\text {GL}}\). Let V be the natural module for G equipped with a nondegenerate binary form f and possibly a quadratic form Q. By Witt’s decomposition theorem, there is an orthogonal decomposition of V of the form
where H is an orthogonal direct sum of hyperbolic planes and \(V_{\text {an}}\) is anisotropic, and \(\dim V_\text {an}\le 2\) by the Chevalley–Warning theorem. Let \(\delta = \dim V_\text {an}+ 4 + 2 \kappa \), where \(\kappa = 2\) if G is orthogonal in even characteristic, and \(\kappa = 1\) otherwise. Let \(D = 2d\) and write
i.e., let \(k = \left\lfloor {(n\delta )/D}\right\rfloor \) and \(r = n  \delta  k D\). Write the hyperbolic space H as
where each constituent is an orthogonal direct sum of hyperbolic planes with \(\dim L = 2 \kappa + 2\), \(\dim V_i = D\), \(\dim R = 2\), and \(\dim W' = r\). Let \(W = W' \perp V_{\text {an}}\). Thus we have the following orthogonal decomposition of V:
Fix a hyperbolic basis for each of the hyperbolic spaces, and fix a basis for W.
We now define a particular element \(g \in {\text {GCl}}(V)\) respecting the decomposition (7). As before we will define g by its action on the chosen bases.
 Subspace L::

Let \(v_1, \ldots , v_{\kappa + 1}, w_1, \ldots , w_{\kappa + 1}\) be the chosen hyperbolic basis for L, i.e., such that \(L_1 = \langle v_1, \ldots , v_{\kappa + 1} \rangle \) and \(L_2 = \langle w_1, \ldots , w_{\kappa + 1} \rangle \) are totally singular subplanes, and f is represented with respect to \(v_1, \ldots , v_{\kappa + 1}, w_1, \ldots , w_{\kappa + 1}\) by
$$\begin{aligned} \begin{pmatrix} 0 &{}\quad I \\ \pm I &{}\quad 0 \end{pmatrix}. \end{aligned}$$ Symplectic case::

Let g act on L as the transvection
$$\begin{aligned} \begin{pmatrix} 1 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \\ \end{pmatrix}. \end{aligned}$$  Unitary case::

Pick \(\lambda \in {\mathbf {F}}_q\) be such that \(\lambda + \lambda ^\theta = 0\) (where \(\theta \) is the field automorphism) and let g act on L as the transvection
$$\begin{aligned} \begin{pmatrix} 1 &{}\quad 0 &{}\quad \lambda &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \\ \end{pmatrix}. \end{aligned}$$  Orthogonal case::

Let \(g_L\) be represented by the matrix
$$\begin{aligned} \begin{pmatrix} A &{}\quad 0 \\ 0 &{}\quad A^{T} \end{pmatrix}, \end{aligned}$$where in odd characteristic
$$\begin{aligned} A = \begin{pmatrix} 1 &{}\quad 1 \\ 0 &{}\quad 1 \end{pmatrix} \end{aligned}$$and in even characteristic
$$\begin{aligned} A = \begin{pmatrix} 1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 1 \end{pmatrix}. \end{aligned}$$
In all cases we have \(g_L \in {\text {SCl}}(L)\) and \((g_L)^{\kappa (q^d1)} \ne 1\).
 Subspace \(V_i\)::

Fix a monic irreducible polynomial \(p_i \in {\mathbf {F}}_q[t]\) of degree d. Let \(v_1, \ldots , v_d, w_1, \ldots , w_d\) be the chosen hyperbolic basis for \(V_i\). Thus there is a decomposition
$$\begin{aligned} V_i = V_{i,1} \oplus V_{i,2} \end{aligned}$$into totally singular subspaces \(V_{i,1} = \langle v_1, \ldots , v_d \rangle \) and \(V_{i,2} = \langle w_1, \ldots , w_d \rangle \) with \(f(v_a, w_b) = \delta _{ab}\). Identify \(V_{i,1}\) with \({\mathbf {F}}_q[t]/(p_i(t))\). The variable t acts on the latter space by multiplication. By Witt’s lemma, this action extends to the space \(V_i\). This extension is moreover unique provided we demand that it preserves the decomposition of \(V_i\) (see [28, Hilfssatz 3.1]). Let \(g_{V_i}\) be defined by this unique extension. The minimal polynomial of this transformation can be determined as follows (see [41]). In the symplectic and orthogonal cases, let \(p^*(t) = p(0)^{1} t^d p(t^{1})\). In the unitary case, let \(p^*(t) = p^\theta (0)^{1} t^d p^\theta (t^{1})\), where \(\theta \) acts on the coefficients. The minimal polynomial of g acting on \(V_i\) is \(*\)symmetric, divisible by \(p_i\) (since \(p_i\) is irreducible), and hence also divisible by \(p_i^*\). Under the assumption that \(p_i \ne p_i^*\), the minimal polynomial of \(g_{V_i}\) must therefore be equal to \(p_i p_i^*\). If \(p_i = p_i^*\) then the minimal polynomial is \(p_i\).
 Subspace R::

Let \(\alpha \in G^\text {ab}\).
 Symplectic case::

Let g act trivially on R. (Note \(G^\text {ab}\) is trivial.)
 Unitary case::

Let g act as the matrix
$$\begin{aligned} \begin{pmatrix} a &{} 0 \\ 0 &{} a^{\theta } \end{pmatrix}, \end{aligned}$$where \(a \in {\mathbf {F}}_q\) satisfies \(a^{1\theta } \prod _{i=1}^k p_i(0)^{1\theta } = \det \alpha \). Such an element always exists since \(\det \alpha \) has norm 1.
 Orthogonal case::

The natural map \({\text {GO}}(R)^\text {ab}\rightarrow G^\text {ab}\) is bijective.^{Footnote 4} Let g act on R so that for every linear character \(\lambda \) of G we have
$$\begin{aligned} \lambda (g_R) \prod _{i=1}^k \lambda (g_{V_i}) = \lambda (\alpha ). \end{aligned}$$
In all cases note that \((g_R)^{\kappa (q^d1)}\) is trivial.^{Footnote 5}
 Subspace W::

Let g act trivially on W.
The union is disjoint, because the minimal polynomial of every element of \(g_{p_1, \ldots , p_k ; j}^G\) is divisible by \(p_1(t) p_1^*(t) \cdots p_k(t) p_k^*(t)\) and has no other nonlinear factors. Finally let
Proof of Proposition 7.1 for other classical groups
By construction, \(g_{p_{1},\ldots ,p_{k} ; \alpha }\) lies over \(\alpha \) and \(g_{p_1, \ldots , p_k ; \alpha }^{\kappa (q^d1)} \in {\mathfrak {M}}\). We must estimate the density of \({\mathfrak {C}}_{d;\alpha }\).
Consider \(g = g_{p_1, \ldots , p_k ; \alpha }\) for some \(p_1, \ldots , p_k \in {\mathfrak {I}}_d'\) with \(p_i \ne p_{i'}, p_{i'}^*\) for \(i \ne i'\). Let \(h \in C_G(g)\). Then h preserves each \(V_{i,1}\) and \(V_{i,2}\), those being the \(p_i\) and \(p_i^*\)primary subspaces of g. The restrictions of h to \(V_{i,1}\) and \(V_{i,2}\) determine one another, and there are at most \(q^d\) possibilities for \(h_{V_{i,1}}\) (as in Lemma 2.3). Hence, since \(\delta = O(1)\),
Therefore
The number of monic irreducible polynomials of degree d over \({\mathbf {F}}_q\) is \(q^d/d  O(q^{d/2}/d)\), while the number of \(*\)symmetric polynomials of degree d is at most \(q^{d/2}\), so
By the hypothesis \(q^d > Cn\) this is at least k, and in fact
so
This proves the proposition. \(\square \)
Collecting results
We now collect the results from the previous sections to conclude that with high probability as three random elements from G are chosen uniformly at random, there is a short word in these elements that belongs to \({\mathfrak {M}}\).
Theorem 7.3
There are constants \(c, C > 0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(\log q < c n \log ^{2} n\). Let x, y, z be elements of G chosen uniformly at random. Let M be the event that there exists a word \(w \in F_3\) of length at most \(n^{C \log q}\) such that \(w(x,y,z) \in {\mathfrak {M}}\). Then
Proof
By Corollaries 5.3 and 6.2 there are constants \(c_1, c_2 > 0\) and \(C_1, C_2\) such that the following holds. Let \(\ell = \left\lfloor {c_1 n / 2}\right\rfloor \), and let E be the event that every word \(u \in F_2\) of length at most \(\ell \) satisfies \(x u(y, z) \notin {\mathfrak {C}}_d\). Then
provided \(q^d > C_2 n\). Take \(d \sim C_3 \log n\) for a constant \(C_3\). If \(\log q < cn / \log ^2 n\) for a sufficiently small constant c so that \(c, C_3\) satisfy \(C_1 C_3^2 c + C_1/C_3  c_2 <  c\), then \({\mathbf {P}}(E) \le e^{cn}\).
On the other hand suppose E fails, i.e., suppose there is a word u of length at most \(c_1 n\) such that \(x u(y, z) \in {\mathfrak {C}}_d\). Let \(w \in F_3\) be the word
The length of w is at most
and
Hence \(E^c \subseteq M\). This completes the proof. \(\square \)
This completes the proof of Theorem 1.1.
If we are allowed \(q^C\) random generators, we can reach the set \({\mathfrak {M}}\) using shorter words.
Theorem 7.4
There are constants \(c, C > 0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(n > C\). Let \(x_0, x_1, \ldots , x_k\) be elements of G chosen uniformly at random, where \(k > q^C\). Let M be the event that there exists a word \(w \in F_{k+1}\) of length at most \(q^2 n^C\) such that \(w(x_0, \ldots , x_k) \in {\mathfrak {M}}\). Then
Proof
Follow the proof of the previous theorem, replacing \(u \in F_2\) with \(u \in F_k\). Since \(\log k > C \log q\), we can replace (8) with the bound
provided \(q^d > C_2 n\). Take \(d = \max (\left\lceil {C_3 \log n / \log q}\right\rceil , 2)\) for sufficiently large \(C_3\). As long as \(n > C\) we find \({\mathbf {P}}(E) \le q^{cn}\). Note that \(q^d \le q^2 n^C\) in this case. The rest of the argument is the same. \(\square \)
Closed trajectories with only one coincidence
A trajectory is closed if \(v^\ell = v^0\). In Sect. 9 we will need to understand the structure of closed trajectories with only one coincidence. More generally the joint trajectory of an rtuple \((v_1, \ldots , v_r)\) is called closed if each individual trajectory is closed, and we will need to understand the structure of closed joint trajectories with only one coincidence in each individual trajectory. We begin with the singletrajectory case, for motivation.
Lemma 8.1
Assume w is nontrivial and cyclically reduced. Suppose the trajectory \(v^0, \ldots , v^\ell \) is closed, and suppose there is only one coincidence, at step t say. Then
In particular if w is not a proper power then \(t = \ell \).
Proof
Let
be the leftinfinite \(\ell \)periodic extension of w. Since \(v^\ell = v^0\), the trajectory of v under \({{\widetilde{w}}}\) (defined in the obvious way) is just the \(\ell \)periodic extension of \(v^0, \ldots , v^\ell \), and still there is only one coincidence, at step t. The choices at steps \(1, \ldots , t\) are free and all subsequent choices are forced (as in the proof of Lemma 3.5). We claim that \({{\widetilde{w}}}\) is in fact \(\gcd (t, \ell )\)periodic, and it suffices to prove that it is tperiodic.
Since the choices at steps \(1, \ldots , t  1\) are free and not coincidences, the choice at step t is a coincidence, and all subsequent choices are forced, the vectors \(v^0, \ldots , v^{t1}\) are linearly independent and the whole trajectory is contained in their span. In particular
Given that step \(t+1\) is forced, we must have \(v^i \in D_{w_{t+1}}^{t+1}\) for each i such that \(a_i \ne 0\). Thus either \(w_{t+1} = w_{i+1}\) or \(w_{t+1} = w_i^{1}\) (\(i > 0\)). Similarly,
and \(v^\ell = v^0\) is forced. Since \(w_\ell \ne w_1^{1}\), we must have \(w_\ell = w_t\) and \(a_0 \ne 0\) (see Remark 8.2 for more details). Therefore
Consider now the trajectory of \(v^1\) under
The trajectory is just \(v^1, v^2, \ldots , v^\ell , v^0, v^1, \ldots \). By (9) and \(a_0 \ne 0\), \(v^1, \ldots , v^t\) are linearly independent, and, for every letter \(\xi \),
Therefore the trajectory of \(v^1\) also has just one coincidence, again at step t (when \(v^{t+1}\) is chosen). Therefore by the same argument we must have \(w'_{t+1} = w'_1\), or
Repeating this argument as many times as necessary proves that \({{\widetilde{w}}}\) is tperiodic, as claimed. \(\square \)
Remark 8.2
If \(t = \ell \), we must have \(a_0 = 1\) and all other \(a_i = 0\). The general case \(t < \ell \) is more complicated, but we can still describe the possibilities. From (9), because step \(t+1\) is forced we must have
the signs depending on whether \(w_{t+1} = w_{i+1}\) or \(w_{t+1} = w_i^{1}\) (\(a_i \ne 0\)). At the next step,
and so on. We make a few observations:

1.
The vectors \(v^{i \pm 1}\), etc, obey a nocrossing rule: we cannot have
$$\begin{aligned}&v^i \xrightarrow {w_{s+1}} v^{i+1}, \\&v^{i+1} \xrightarrow {w_{s+1}} v^i, \end{aligned}$$as then we would have both \(w_{s+1} = w_{i+1}\) and \(w_{s+1} = w_{i+1}^{1}\), for some i.

2.
Similarly, there is a nomeeting rule: we cannot have
$$\begin{aligned} v^i&\xrightarrow {w_{s+1}} v^{i+1}, \\ v^{i+2}&\xrightarrow {w_{s+1}} v^{i+1}, \end{aligned}$$as then we would have both \(w_{s+1} = w_{i+1}\) and \(w_{s+1} = w_{i+2}^{1}\), but the expression for w is supposed to be reduced.

3.
Finally, there is a timeconsistency rule: we cannot have
$$\begin{aligned} v^i \xrightarrow {w_{s+1}} v^{i+1} \xrightarrow {w_{s+2}} v^i, \end{aligned}$$as then we would have \(w_{s+1} = w_{i+1}\) and \(w_{s+2} = w_{i+1}^{1}\), but again the expression for w is supposed to be reduced; nor could we have
$$\begin{aligned} v^i \xrightarrow {w_{s+1}} v^{i1} \xrightarrow {w_{s+2}} v^i, \end{aligned}$$as then we would have \(w_{s+1} = w_i^{1}\) and \(w_{s+2} = w_i\).
Since \(a_0 \ne 0\) and \(w_{t+1} v^0 = v^1\), the only resolution is that
for all \(s \ge 0\) (extending \(\ell \)periodically). In other words, the sequence \((v^s)\) in \({\text {span}}\{v^0, \ldots , v^{t1}\}\) corresponds with the sequence \((X^s)\) in \({\mathbf {F}}_q[X] / (f)\), where
Since \(v^\ell = v^0\) we must have
Conversely, if f is a divisor of \(X^\ell  1\), and if the period of w divides t and \(i  i'\) whenever \(a_i \ne 0\) and \(a_{i'} \ne 0\), then a onecoincidence trajectory of this type exists.
We now consider closed joint trajectories with only one coincidence in each individual trajectory. The following lemma generalizes Lemma 8.1.
Lemma 8.3
Assume w is nontrivial and cyclically reduced. Let \(v_1, \ldots , v_r \in V\) be linearly independent. Suppose the joint trajectory of \(v_1, \ldots , v_r\) is closed. Suppose there is just one coincidence in each individual trajectory, and suppose the coincidence in the trajectory of \(v_i\) occurs at step \((t_i, i)\). Then
In particular if w is not a proper power then \(t_i = \ell \) for each i.
Proof
As in the proof of Lemma 8.1, let \({{\widetilde{w}}}\) be the leftinfinite \(\ell \)periodic extension of w, and note that the trajectory of \(v_1, \ldots , v_r\) under \({{\widetilde{w}}}\) is just the \(\ell \)periodic extension of the trajectory under w, and there are no further free choices.
The choice at step (t, i) must be free for \(t \le t_i\) and forced for \(t > t_i\). Therefore the vectors \((v_i^t)_{1 \le i \le r, 0 \le t < t_i}\) are linearly independent and the whole trajectory is contained in their span. Since there is a coincidence at step \((t_i, i)\), we have
where \(a_{itj} = 0\) whenever \(t \ge t_j\) (and \((t, j) \prec (t_i, i)\) means \(t < t_i\) or \(t = t_i\) and \(j < i\), as in Sect. 3.3). Let \(A_0\) be the \(r \times r\) matrix
The matrix \(A_0\) must be nonsingular, for otherwise we could not have \((v_1^\ell , \ldots , v_r^\ell ) = (v_1^0, \ldots , v_r^0)\). In particular, for each i there is some j such that \(a_{i0j} \ne 0\). Since step \((t_i+1, i)\) is forced, the value of \(w_{t_i+1} v_j^0\) must be known; hence
Consider the joint trajectory of \((v_1^1, \ldots , v_r^1)\) under
which is just \((v_i^t)_{1 \le i \le r, t \ge 1}\). Since \(A_0\) is nonsingular, we have
Therefore the vectors \((v_i^t)_{1 \le i \le r, 1 \le t \le t_i}\) are linearly independent, and the joint trajectory of \((v_1^1, \ldots , v_r^1)\) under \({{\widetilde{w}}}'\) has the same behaviour as that of \((v_1, \ldots , v_r)\) under \({{\widetilde{w}}}\): the trajectory of \(v_i^1\) has just one coincidence, at step \((t_i, i)\) (when \(v_i^{t_i+1}\) is chosen). Therefore by the same argument \(w'_{t_i+1} = w'_1\), or
Repeating the argument as many times as necessary, we conclude that the period of \({{\widetilde{w}}}\) divides \(t_i\) for each i. \(\square \)
Remark 8.4
The discussion in Remark 8.2 generalizes too. From (11) and forcedness, we have
where the signs are chosen depending on whether \(w_{t+1} = w_1\) or \(w_t = w_1^{1}\). The latter case can arise only for \(t > 0\), so no \(v_j^0\) can appear in this expression. Hence (12) is the analogue of (10) for the joint trajectory of \((v_1^1, \ldots , v_r^1)\). As before there are nocrossing, nomeeting, and timeconsistency rules for the indices t such that \(a_{itj}\ne 0\) for some i, j, so in fact we can never have \(v_j^{t1}\).
We conclude that
for all \(s \ge 0\), and hence the trajectory of \((v_1^s, \ldots , v_r^s)\) corresponds with the trajectory of \((Z^s X_1, \ldots , Z^s X_r)\) in the \({\mathbf {F}}_q[Z]\)module \(({\mathbf {F}}_q[Z] X_1 \oplus \cdots \oplus {\mathbf {F}}_q[Z] X_r) / \langle f_1, \ldots , f_r \rangle \), where
and we must have
Write \(f_i = \sum _j p_{ij} X_j\) for some \(p_{ij} \in {\mathbf {F}}_q[Z]\) and let \(F = (p_{ij} : 1 \le i, j \le r)\). Then there must exist a matrix \(E \in {\text {M}}_r({\mathbf {F}}_q[Z])\) with
This is possible if and only if \(\det F\) divides \(Z^\ell  1\).
Expansion in lowdegree representations
We turn now to the proof of Theorem 1.3. We again consider the action of \(G = {\text {Cl}}_n(q)\) on linearly independent rtuples of vectors, and we again consider trajectories under the action of a fixed word \(w \in F_k\), much as in Sect. 4. The difference is mainly one of parameter regime. In Sect. 4 we considered rtuples with r as large as cn for constant c, and we were satisfied with somewhat crude bounds. In this section we consider \(r = O(1)\), and we seek sharper bounds. Our aim is to show that, in an orbit of G of size N, the probability that a trajectory under a given word closes is close to 1/N, with a small relative error; if we can do this it follows that there is a uniform spectral gap. We begin with the case of \(r=1\), which contains most of the key ideas.
The defining representation
Now let \(x_1, \ldots , x_k \in G\) be chosen uniformly at random. Let \({{\overline{w}}} = w(x_1, \ldots , x_k)\). Let \(v \in V \setminus \{0\}\). Let \(N = G v\). By Witt’s lemma (Lemma 2.2), N is the number of \(u \in V \setminus \{0\}\) such that \(Q(u) = Q(v)\). Thus, by Lemma 2.1, \(N = q^n/q_0 + O(q^{n/2})\). More generally, if \(U \le V\) is a subspace of dimension d then
Lemma 9.1
Assume w is nontrivial and not a proper power. Assume \(\ell < n/4\). Then
Proof
By Lemma 3.1 we may also assume that w is cyclically reduced, as replacing w by its cyclic reduction can only decrease its length. In this case Lemma 8.1 implies that the event that \({{\overline{w}}} v = v\) is contained in the union of the following two events:
 \(E_1\)::

the trajectory \(v^0, \ldots , v^\ell \) has exactly one coincidence, occuring at step \(\ell \), and \(v^\ell = v^0\),
 \(E_2\)::

the trajectory \(v^0, \ldots , v^\ell \) has at least two coincidences.
Similarly, the conditional probability of a coincidence at a later step \(t'\) is bounded by
Summing over \(t < t' \le \ell \), we find, using \(\ell < n/4\),
Hence we may focus on the event \(E_1\). In the linear case (\({\text {Cl}}={\text {SL}}\)), \(v^\ell \) is chosen uniformly at random outside a linear subspace of dimension at most \(\ell 1\), so the probability of \(E_1\) is bounded by
This completes the proof in this case.
In general, the situation is complicated by form conditions, as previous choices may significantly impact the probability that \(v^\ell = v^0\), even if there were no previous coincidences.
Let \(\xi = w_\ell \). The choice of \(v^\ell \) is subject to one linear constraint for every occurence of \(\xi = w_\ell \) as \(w_t\) or \(w_{t+1}^{1}\) for some \(t < \ell \). Each such occurence is the end of a maximal subword matching a prefix \(u = w_\ell \cdots w_{\ell s+1}\) of w, forward in the case \(\xi = w_t\) and backward in the case \(\xi = w_{t+1}^{1}\) (see Fig. 1). Write \(s = s(t)\) and \(u = u(t)\). Define
Note that, for \(t \in T_1\), we must have \(t + s < \ell  s\), because \(w_\ell \cdots w_{t+1}\) is reduced. In the \(\xi = w_t\) case it is possible that the subword overlaps (or is adjacent to) the matching prefix, and the division into \(T_2\) and \(T_3\) reflects this possibility.
The choice of \(v^\ell \) at step \(\ell \) is constrained by the linear conditions
(where \(s = s(t)\)). We need to determine whether \(v^0\) is in this affine subspace. Obviously this is the case if and only if
Write \(C_t\) for this condition. For \(t \in T_1 \cup T_2\), the truth or falsity of \(C_t\) is determined at step \(\ell  s\), because \(\ell  s > t + s\) in the \(t \in T_1\) case and \(\ell  s > t\) in the \(t \in T_2\) case. The condition is not determined before step \(\ell  s\) by maximality of u(t). For \(t \in T_3\), \(C_t\) is settled at step t, because \(t \ge \ell  s\). The condition is not settled before step t because \(w_t = w_\ell \ne w_1^{1}\) (since w is cyclically reduced).
Note that we may have \(\ell  s = t\) for \(t \in T_3\): this is the case in which the subword is adjacent to the prefix (see Fig. 2). In this case the condition \(C_t\) is
However, we cannot have also \(t  s = 0\), for then we would have \(w = u^2\). Hence, by linear independence of \(v^0, \ldots , v^{t1}\), still the condition \(C_t\) is settled at step t and not before. Note, however, if G unitary then \(C_t\) is linear only over \({\mathbf {F}}_{q_0}\) (because the form f is only sesquilinear).
There is a case that may arise in which the various conditions \(C_{t'}\) settled at a given step t are not independent. This is the case in which \(t \in T_3\) and \(t = \ell  s'\) for some \(t' \in T_2\), where \(s' = s(t')\), and \(t'  s' = 0\) (see Fig. 3). Let \(T_4\) be the set of such steps t and let \(T_3' = T_3 \setminus T_4\). If \(t \in T_4\) then we have an overdetermined pair of conditions
This system is consistent if and only if
For \(t \in T_4\) let us redefine \(C_t\) to be this reduced condition. Certainly \(t  s < \ell  s\), and if \(t' = \ell  s\) then \(wu' = u'w\), so w is a proper power, contrary to hypothesis. Hence \(C_t\) is settled at step \(\ell  s \le t\).
Now consider any step \(t \in \{1, \ldots , \ell 1\}\), and consider all those conditions \(C_{t'}\) which are settled at step t. These conditions are \(C_{t'}\) for \(t' \in T_1 \cup T_2 \cup T_4\) such that \(\ell  s' = t\), as well as \(C_t\) if \(t \in T_3'\), i.e.,
We claim that these affine conditions for \(v^t\) are independent, and it suffices to demonstrate that the indices \(t' + s'\) (\(t' \in T_1, \ell  s' = t\)), \(t'  s'\) (\(t' \in T_2 \cup T_4\), \(\ell  s' = t\)), and 0 if \(t \in T_3'\) are all distinct. Since \(s' = \ell  t\) is a constant, the indices \(t'+s'\) are all distinct for \(t' \in T_1\), as are the indices \(t'  s'\) for \(t' \in T_2 \cup T_4\). Moreover we cannot have \(t_1 + s_1 = t_2  s_2\) for \(t_1 \in T_1\) and \(t_2 \in T_2 \cup T_4\) with \(\ell  s_1 = \ell  s_2 = t\), because then we would have \(w_{t_1+s_1} = w_{t_2s_2+1}^{1} = w_{t_1+s_1+1}^{1}\), in contradiction with the reducedness of w. If \(t'  s' = 0\) for some \(t' \in T_2\) then \(t \in T_4\) by definition, so \(t \notin T_3'\). Finally, if \(t' \in T_4\) then we cannot have \(t'  s' = 0\) unless w is a proper power, as discussed.
Hence, by linear independence of \(v^0, \ldots , v^{t1}\), the h (say) conditions \(C_{t'}\) settled at step t consist of h independent affine linear conditions for \(v^t\), or, in the unitary case, if \(t = \ell  s \in T_3\), 2h independent affine linear conditions over \({\mathbf {F}}_{q_0}\). Suppose \(v^t\) is drawn from a subspace of codimension d (d is the number of previous occurences of \(w_t\) or \(w_{t}^{1}\)). Then, by Lemma 2.1 and Lemma 3.3, the probability that all these conditions are satisfied, conditional on the past trajectory \(v^0, \ldots , v^{t1}\), is
(in the second line we used \(h < \ell \), \(d < t\), and \(q_0 \le q\)).
Suppose \(H = T_1 + T_2 + T_3' + T_4\) (i.e., let \(H+1\) be the number of appearances of \(w_\ell \) or \(w_\ell ^{1}\) in w). Taking the product of (13) over all t, the probability that \(C_{t'}\) is satisfied for every \(t' \in T_1 \cup T_2 \cup T_3' \cup T_4\) is
The conditions \(C_t\) are prequisite to the event \(v^\ell = v^0\). If all these conditions are satisfied, then at step \(\ell \) the vector \(v^\ell \) is drawn from an affine subspace of codimension H which includes \(v^0\). Note also that \(Q(v^{\ell 1}) = Q(v^0)\). Hence, from Lemma 3.3,
Hence the overall probability of \(E_1\) is bounded by
Thus in all cases the error is bounded as claimed. \(\square \)
Remark 9.2
In the linear case, the hypothesis that w is not a proper power is needed only to ensure that the event \(v^\ell = v^0\) is contained in \(E_1 \cup E_2\); we do not need the hypothesis in order to bound \({\mathbf {P}}(E_1)\) or \({\mathbf {P}}(E_2)\). By contrast, at least in the orthogonal case, we do need this hypothesis in order to bound \({\mathbf {P}}(E_1)\) satisfactorily, so at least some of the complexity of the above proof is necessary. Suppose \(G = {\text {GO}}_n(q)\) and \(w = u^2\) for some word u of length \(\ell / 2\). Then the choice of \(v^\ell \) is constrained by
Hence \(v^\ell \) is always restricted to an affine hyperplane that includes \(v^0\), so the probability that \({{\overline{w}}} v = v\) will be at least approximately q/N, even conditionally on there being only one coincidence.
Remark 9.3
On the other hand, it is usually possible to cyclically rotate w so that much of the complexity in the previous proof disappears. For example, if w can be cyclically rotated so that it has no square prefix, then, after such a rotation, \(T_3 = \emptyset \). Not every nonproperpower has this property,^{Footnote 6} but almost all words do.
We can now prove that the permutation action of uniformly random \(x_1, \ldots , x_k \in G\) on an orbit \(Gv \subseteq V\) has a uniform spectral gap. Assume \(v \ne 0\). As usual let \({\mathcal {A}}\) be the normalized adjacency operator
acting on \({\mathbf {C}}[Gv]\), and let \(1 = \lambda _1 \ge \lambda _2 \ge \cdots \ge \lambda _N\) be the spectrum. Let \(\lambda = \max (\lambda _2, \lambda _N)\). Then, for even \(\ell \),
where w is the result of a simple random walk of length \(\ell \) in \(F_k\). Let \({\mathcal {P}}\subseteq F_k\) be the set of proper powers \(w^m\) (\(w \in F_k, m \ge 2\)). Then
By [15, Lemma 2.6],
By Lemma 9.1,
provided \(\ell < n/4\). Hence
Take \(\ell \sim n/5\). If \(\log k / \log q\) is sufficiently large then
Hence, by Markov’s inequality,
so almost surely \(\lambda < q^{c'/2}\).
The action on rtuples
We now generalize the argument of the previous subsection to rtuples of vectors, where r is bounded. It will be convenient to use the following notation. For \(v, v' \in V^r\), let \(f(v, v')\) denote the \(r \times r\) matrix
Define also
Let \(v = (v_1, \ldots , v_r) \in V^r\), where \(v_1, \ldots , v_r \in V\) are linearly independent. Let \(N = Gv\). By Witt’s lemma, N is the number of \(v' \in V^r\) with \(v'_1, \ldots , v'_r\) linearly independent such that \(f(v, v) = f(v', v')\) and \(Q(v) = Q(v')\). In the linear case,
In the other cases we have, inductively, using Lemma 2.1,
Lemma 9.4
Assume w is nontrivial and not a proper power. Assume \(\ell r^2 < n/4\). Then
Proof
Again we may assume w is cyclically reduced. In this case Lemma 8.3 implies that the event that \({{\overline{w}}} v = v\) is contained in the union of the following two events:
 \(E_1\)::

the joint trajectory \((v_i^t)\) has exactly one coincidence in each individual trajectory, each occuring at the final step \(t=\ell \), and \(v_i^\ell = v_i^0\) for each i,
 \(E_2\)::

the joint trajectory \((v_i^t)\) has at least \(r+1\) coincidences.
Hence the probability of \(E_2\) is bounded by (summing over all possibilities for \(r+1\) coincidences)
Using \(N \le q^{rn}\), this is at most
Hence we may focus on the event \(E_1\). In the linear case, for each i the vector \(v_i^\ell \) is chosen uniformly at random outside a linear subspace of dimension at most \(\ell r\), so the probability of \(E_1\) is bounded by
This completes the proof in this case.
As in the previous subsection, the general situation is complicated by form conditions, but fortunately few changes are necessary in the \(r > 1\) case. Let \(\xi = w_\ell \). Assume there are \(H+1\) occurences of \(\xi \) or \(\xi ^{1}\) in w, and consider the H maximal subwords u ending with \(\xi \) or \(\xi ^{1}\) and matching a proper prefix of w, as in Fig. 1. Define \(T_1\), \(T_2\), and \(T_3 = T'_3 \cup T_4\) as before.
The choice of \(v^\ell \) at step \(\ell \) is constrained by the linear conditions
(where \(s = s(t)\)). For \(t \in T_1 \cup T_2 \cup T_3'\) we have a condition \(C_t\) defined by
For \(t \in T_4\) the condition \(C_t\) is the reduced condition
Conditional on linear independence of \(v_i^t\) for \(1 \le i \le r\) and \(t < \ell \), it can be verified exactly as in the \(r = 1\) case that the conditions settled at any given step \(t < \ell \) are precisely \(C_{t'}\) for \(t' \in T_1 \cup T_2 \cup T_4\) and \(\ell  s' = t\), as well as \(C_t\) if \(t \in T_3'\), and these conditions are linearly independent.
Suppose at step \(t < \ell \) there are h conditions \(C_{t'}\) to be settled. Assume first that we are not in the case \(t = \ell  s \in T_3'\) (the case in which the subword is adjacent to the prefix, as in Fig. 2). Let d be the number of previous occurences of \(w_t\) or \(w_t^{1}\). Then, by Lemma 3.3, at step (t, i) the vector \(v_i^t\) is drawn from an affine subspace of codimension \(d' = dr + i1\), less a subspace of dimension \(d'\), subject to the quadratic condition \(Q(v_i^t) = Q(v_i^{t1})\). Hence, using Lemma 2.1, the probability that jicomponent of each \(C_{t'}\) is satisfied for each \(j \in \{1, \ldots , r\}\) is
(using \(h < \ell \), \(d' \le (t1)r + i1\), and \(q_0 \le q\)). Taking the product over all i, the probability that each \(C_{t'}\) is satisfied after step t is
The case \(t = \ell  s \in T_3'\) is slightly different. In this case the jicomponent of \(C_t\) is
This condition is settled at step (t, k), where \(k = \max (i, j)\). Hence \(2k1\) components of \(C_t\) are settled at step (t, k). Therefore, in this case, (15) must be replaced with
Taking the product over all i again gives (16).
Taking the product of (16) over all t, the probability that \(C_{t'}\) is satisfied for every \(t' \in T_1 \cup T_2 \cup T_3' \cup T_4\) is
Finally, if all the conditions \(C_t\) are satisfied, then for each i the vector \(v_i^\ell \) is drawn from an affine subspace of codimension \(Hr+i1\) which includes \(v_i^0\), less a subspace of dimension \(Hr+i1\), subject to the quadratic condition \(Q(v_i^\ell ) = Q(v_i^{\ell 1}) = Q(v_i^0)\). Hence
Hence the conditional probability that \(v^\ell = v^0\) is
Hence the overall probability of \(E_1\) is, multiplying the previous line by (17),
Comparing with (14), this is
Thus in all cases the error is bounded as claimed. \(\square \)
We can now prove that the permutation action of uniformly random \(x_1, \ldots , x_k \in G\) on an orbit \(Gv \subseteq V^r\) has a uniform spectral gap. The argument is little different from that in the previous subsection. We may assume \(v_1, \ldots , v_r\) are linearly independent, by reducing r if necessary. Suppose the adjacency operator \({\mathcal {A}}\) acting on \({\mathbf {C}}[Gv]\) has spectrum \(1 = \lambda _1 \ge \cdots \ge \lambda _N\). Let \(\lambda = \max (\lambda _2, \lambda _N)\). For even \(\ell \), let w be the result of a simple random walk of length \(\ell \) in \(F_k\). Then
We bound \({\mathbf {P}}(w \in {\mathcal {P}})\) as before, while by Lemma 9.4 we have
provided \(\ell r^2 < n/4\). Hence
Take \(\ell \sim n/(5 r^2)\). If \(\log k / \log q \ge C r^3\), for a sufficiently large constant C, then
Hence, by Markov’s inequality,
so almost surely \(\lambda < q^{c'/2}\), as before.
Other lowdegree representations
The result of the final argument of the previous subsection can be expressed as follows.
Theorem 9.5
Let \({\mathbf {C}}[V^r]_0\) be the orthogonal complement of \({\mathbf {C}}[V^r]^G\) in \({\mathbf {C}}[V^r]\). Let \(x_1, \ldots , x_k \in G\) be uniform and independent, where \(k \ge q^{Cr^3}\) and \(r < cn^{1/4}\). Let \(\rho = \rho ({\mathcal {A}}, {\mathbf {C}}[V^r]_0)\) be the spectral radius of \({\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k}\) acting on \({\mathbf {C}}[V^r]_0\). Then
Proof
By Witt’s lemma, there are \(O(q^{r^2})\) orbits of G on \(V^r\). Let \(Gv_1, \ldots , Gv_s\) be a decomposition of \(V^r\) into Gorbits, where \(s \ll q^{r^2}\). Then
Let \(\rho _i = \rho ({\mathcal {A}}, {\mathbf {C}}[Gv_i]_0)\) be the spectral radius of \({\mathcal {A}}\) on \({\mathbf {C}}[Gv_i]_0\). Then
From the previous subsection (possibly with a smaller r, if the components of \(v_i\) are not linearly independent), for each i we have
Hence
Our main interest is the conjugation action of G on a conjugacy class \({\mathfrak {C}}\subseteq {\text {SCl}}_n(q)\) of elements of degree \(s = O(1)\), which is actually a quotient of an orbit of G on \(V^s \oplus (V^*)^s\), where \(V^*\) is the dual space. It is possible to repeat the analysis of the previous subsection allowing also r factors of \(V^*\), but in fact this generalization follows formally, since \({\mathbf {C}}[V^*] \cong {\mathbf {C}}[V]\) (as both have character \(\chi (g) = q^{\dim \ker (g1)}\)), so
Corollary 9.6
(the conjugation action on \({\mathfrak {M}}\) is expanding) Let \(x_1, \ldots , x_k \in G\) be independent and uniformly random, where \(k > q^C\) and \(n > C\). Let \(\rho = \rho ({\mathcal {A}}, {\mathbf {C}}[{\mathfrak {M}}]_0)\) be the spectral radius of \({\mathcal {A}}\) acting on \({\mathbf {C}}[{\mathfrak {M}}]_0\). Then
Proof
We claim that \({\mathbf {C}}[{\mathfrak {M}}]\) is contained in \({\mathbf {C}}[V^{2s}]\). The map
is a map of permutation representations (where G acts by conjugation on \(M_n({\mathbf {F}}_q)\)), and hence induces a map of \({\mathbf {C}}[G]\)modules \({\mathbf {C}}[V^s \oplus (V^*)^s] \rightarrow {\mathbf {C}}[M_n({\mathbf {F}}_q)]\). The module \({\mathbf {C}}[{\mathfrak {M}}]\) is contained in the image, so it is isomorphic to a submodule of \({\mathbf {C}}[V^s \oplus (V^*)^s] \cong {\mathbf {C}}[V^{2s}]\) by complete reducibility. Hence the result follows from the previous theorem with \(r = 2s\). \(\square \)
Diameter of the Cayley graph
We now collect results from the previous sections and bound the diameter of the Cayley graph of the subgroup of \({\text {Cl}}_n(q)\) generated by random elements.
\({\text {GL}}_n(p)\) and 3 random elements
In this subsection we prove Theorem 1.2. Recall that \({\text {SL}}_n(p) \le G \le {\text {GL}}_n(p)\), where p is prime and \(\log p < cn / \log ^2 n\), the elements \(x, y, z \in G\) are chosen uniformly at random, and \(S = \{x^{\pm 1}, y^{\pm 1}, z^{\pm 1}\}\). We claim that with probability \(1e^{cn}\) we have
First we show that \(\langle S \rangle \ge {\text {SL}}_n(p)\) with high probability. The argument is a slight modification of [14, Sect. 5].^{Footnote 7}
Let \({\mathfrak {C}}_1\) be the set of all irreducible \(g \in {\text {GL}}_n(p)\) of order \(d(p^n1)/(p1)\) for some \(d \mid (p1)\). Each such g is equivalent to the multiplication action of some \(x \in {\mathbf {F}}_{p^n}\) of the same order, and \(\det g = N(x)\). Therefore, for each \(\alpha \in G^\text {ab}\cong {\mathbf {F}}_p^\times \), the \({\text {GL}}_n(p)\)classes in \({\mathfrak {C}}_{1;\alpha } = {\mathfrak {C}}_1 \cap \alpha G'\) are in bijection with elements of \({\mathbf {F}}_{p^n}\), up to Galois conjugacy, of order \(d(p^n1)/(p1)\) and norm \(\alpha \), where d is the order of \(\alpha \). Note there are \(\phi (d)\) elements \(\alpha \) of order d. Moreover, each such \(g \in G\) has centralizer isomorphic to \({\mathbf {F}}_{p^n}^\times \). Hence
Here we used the standard estimate \(\phi (m) \gg m / \log \log m\).
Let \({\mathfrak {C}}_2\) be the set of all \(g \in {\text {GL}}_n(p)\) of order \(p^{n1}1\) splitting V as \(\ell \oplus W\) for some \(\ell , W\) with \(\dim \ell = 1\), \(\dim W = n1\). A similar calculation shows that
for each \(\alpha \in {\mathbf {F}}_p^\times \) in this case as well. (In fact, \({\mathfrak {C}}_2\) is uniform over \(\det \) fibres.)
Hence, by Corollaries 5.3 and 6.2 as in the proof of Theorem 7.3, with probability at least \(1  e^{ c n}\) there are words \(w_1, w_2\) such that
By a straightforward adaptation of [14, Lemma 5.2] (assuming \(n > 6\), say),
Hence indeed \(\langle S \rangle \ge {\text {SL}}_n(p)\).
In particular, using Schreier generators, there is a symmetric set \(S' \subseteq S^{2p} \cap {\text {SL}}_n(p)\) such that \(\langle S'\rangle = {\text {SL}}_n(p)\).
Meanwhile, by Theorem 1.1, with probability \(1  e^{cn}\) there is another word w of length \(n^{O(\log p)}\) such that
Let \(X = S' \cup \{w(x, y, z)^{\pm 1}\}\). By [22, Theorem 1.5] we have
As \(\langle S \rangle /{\text {SL}}_n(p) < p\), we thus have
This completes the proof.
Classical groups and \(q^C\) random elements
In this subsection we prove Theorem 1.4. Recall that \(G = {\text {Cl}}_n(q)\), where \(n > C\), elements \(x_1, \ldots , x_k \in G\) are chosen uniformly at random where \(k > q^C\), and \(S = \{x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\). We claim that with probability \(1q^{cn}\) we have
By Theorem 7.4, with probability at least \(1  q^{c_1 n}\) there is a word w of length at most \(q^2 n^{C_1}\) so that
Let \({\mathfrak {C}}\) be the conjugacy class of \(w(x_1, \ldots , x_k)\) in G. Note that \({\mathfrak {C}}\subseteq {\text {SCl}}_n(q)\). It follows from Corollary 9.6 that, with probability at least \(1  q^{c_2 n}\), the conjugation action of G on \({\mathfrak {C}}\) is expanding with spectral gap bounded away from zero. Hence (see, e.g., [31, Proposition 3.1.5 and Proposition 3.3.6])
It follows that with probability at least \(1  q^{c_3 n}\), every element of \({\mathfrak {C}}\) is a word in S of length at most
This already proves that \(\langle S \rangle \ge {\text {SCl}}_n(q)\). It follows from [33] that
Hence
This completes the proof.
Corollary 1.5(2) follows immediately for \(q < n^{O(1)}\), since \(\log G \asymp n^2 \log q\). If q is larger then the claim follows from Alon–Roichman [1], which implies that the Cayley graph on \(C n^2 \log q\) random generators is almost surely an expander.
Notes
 1.
Throughout the paper, we use the terms “almost surely” or “with high probability” to mean with probability \(1o(1)\) as the relevant parameters tend to infinity.
 2.
The diameter of \({\text {SCl}}_n(q)\) with respect to a set S is essentially the same (up to a factor of 3) as the diameter of the simple quotient \({\text {PSCl}}_n(q)\) with respect to \(S \bmod Z\). Indeed, if \(S^d = G\) then certainly \(S^d Z = G\), and conversely if \(S^d Z = G\) then it is possible to show that \(S^{3d} = G\). Hence there is no need to consider \({\text {PSCl}}_n(q)\) explicitly.
 3.
Note that the distribution of \(\overline{w}\) is symmetric, so \({\mathbf {E}}_{x_1, \ldots , x_k, w} \chi ({{\overline{w}}}) / \chi (1)\) is real.
 4.
Note that \({\text {GO}}(R) \cong {\text {GO}}_2^+(q) \cong D_{2(q1)}\). In odd characteristic, \(G^\text {ab}\cong C_2 \times C_2\), and determinant and spinor norm are independent characters on \({\text {GO}}_2^+(q)\). In even characteristic, \(G^\text {ab}\cong C_2\), and the Dickson invariant is nontrivial on \({\text {GO}}_2^+(q)\).
 5.
The existence of an evenorder linear character of \({\text {GO}}_n(q)\) in even characteristic is why we need the extra factor of 2 in that case.
 6.
e.g.,
 7.
Alternatively, we could just cite [30]. The given argument avoids CFSG.
 8.
The authors state only \(n^2 (\log n)^c\), but a careful inspection of the proof gives \(n^2 (\log n)^2 \omega (1)\), for an arbitrarily slowly growing \(\omega (1)\). A word v of length \(\omega (1)\) is obtained such that \(v(x, y)^{O(n)}\) has support less than n/4. A random commutator process is then used to iteratively reduce the support. Each step quadruples the length of the word and roughly squares the density of the support, so the whole process multiplies the length of the word by \(O((\log n)^2)\). Thus a word w of length \(n (\log n)^2 \omega (1)\) is obtained such that w(x, y) has support 3.
References
 1.
Alon, N., Roichman, Y.: Random Cayley graphs and expanders. Random Struct. Algorithms 5(2), 271–284 (1994)
 2.
Aschbacher, M.: Finite Group Theory, vol. 10 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, second edition (2000)
 3.
Babai, L., Beals, R., Seress, Á: On the diameter of the symmetric group: polynomial bounds. In: Proceedings of the Fifteenth Annual ACMSIAM Symposium on Discrete Algorithms, pp. 1108–1112. ACM, New York (2004)
 4.
Breuillard, E., Green, B., Guralnick, R., Tao, T.: Expansion in finite simple groups of Lie type. J. Eur. Math. Soc. 17(6), 1367–1434 (2015)
 5.
Breuillard, E., Green, B., Tao, T.: Approximate subgroups of linear groups. Geom. Funct. Anal. 21(4), 774–819 (2011)
 6.
Babai, L., Hayes, T.P.: Nearindependence of permutations and an almost sure polynomial bound on the diameter of the symmetric group. In: Proceedings of the Sixteenth Annual ACMSIAM Symposium on Discrete Algorithms, pp. 1057–1066. ACM, New York (2005)
 7.
Bordenave, C.: A new proof of Friedman’s second eigenvalue Theorem and its extension to random lifts. Ann. Sci. de l’Ecole normale supérieure (2019)
 8.
Broder, A., Shamir, E.: On the second eigenvalue of random regular graphs. In: 28th Annual Symposium on Foundations of Computer Science (sfcs 1987), pp. 286–294 (1987)
 9.
Babai, L., Seress, Á.: On the diameter of permutation groups. Eur. J. Combin. 13(4), 231–243 (1992)
 10.
Biswas, A., Yang, Y.: A diameter bound for finite simple groups of large rank. J. Lond. Math. Soc. (2) 95(2), 455–474 (2017)
 11.
Dickson, L.E.: Linear Groups, with an Exposition of the Galois Field Theory. Teubner, Leipzig B.G (1901)
 12.
Eberhard, S.: The trivial lower bound for the girth of \(S_n\). arXiv eprints, arXiv:1706.09972 (2017)
 13.
Eberhard, S., Virchow, S.C.: The probability of generating the symmetric group. Combinatorica 39(2), 273–288 (2019)
 14.
Eberhard, S., Virchow, S.C.: Random generation of the special linear group. Trans. Am. Math. Soc., to appear (2020)
 15.
Friedman, J., Joux, A., Roichman, Y., Stern, J., Tillich, J.P.: The action of a few permutations on \(r\)tuples is quickly transitive. Random Struct. Algorithms 12(4), 335–350 (1998)
 16.
Friedman, J.: A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Am. Math. Soc., 195(910):viii+100 (2008)
 17.
Gromov, M., Guth, L.: Generalizations of the Kolmogorov–Barzdin embedding estimates. Duke Math. J. 161(13), 2549–2603 (2012)
 18.
Guralnick, R.M., Larsen, M., Tiep, P.H.: Character levels and character bounds. II. arXiv eprints, arXiv:1904.08070 (2019)
 19.
Guralnick, R.M., Larsen, M., Tiep, P.H.: Character levels and character bounds. Forum Math. Pi, 8:e2 (2020)
 20.
Grove, L.C.: Classical Groups and Geometric Algebra. Graduate Studies in Mathematics, vol. 39. American Mathematical Society, Providence, RI (2002)
 21.
Hadad, U.: On the shortest identity in finite simple groups of Lie type. J. Group Theory 14(1), 37–47 (2011)
 22.
Halasi, Z.: Diameter of Cayley graphs of \(SL(n,p)\) with generating sets containing a transvection. arXiv eprints, arXiv:2002.10443 (2020)
 23.
Helfgott, H.A.: Growth and generation in \({{\rm SL}}_2({\mathbb{Z}}/p{\mathbb{Z}})\). Ann. Math. (2) 167(2), 601–623 (2008)
 24.
Halasi, Z., Maróti, A., Pyber, L., Qiao, Y.: An improved diameter bound for finite simple groups of Lie type. Bull. Lond. Math. Soc. 51(4), 645–657 (2019)
 25.
Helfgott, H.A., Seress, Á.: On the diameter of permutation groups. Ann. Math. (2) 179(2), 611–658 (2014)
 26.
Helfgott, H.A., Seress, Á., Zuk, A.: Random generators of the symmetric group: diameter, mixing time and spectral gap. J. Algebra 421, 349–368 (2015)
 27.
Humphreys, J.E.: Conjugacy classes in semisimple algebraic groups. American Mathematical Society (1995)
 28.
Huppert, B.: Isometrien von vektorräumen. ii. Mathematische Zeitschrift 175(1), 5–20 (1980)
 29.
Kesten, H.: Symmetric random walks on groups. Trans. Am. Math. Soc. 92, 336–354 (1959)
 30.
Kantor, W.M., Lubotzky, A.: The probability of generating a finite classical group. Geom. Dedicata. 36(1), 67–87 (1990)
 31.
Kowalski, E.: An Introduction to Expander Graphs. Cours Spécialisés [Specialized Courses], vol. 26. Société Mathématique de France, Paris (2019)
 32.
Landazuri, V., Seitz, G.M.: On the minimal degrees of projective representations of the finite Chevalley groups. J. Algebra 32, 418–443 (1974)
 33.
Liebeck, M.W., Shalev, A.: Diameters of finite simple groups: sharp bounds and applications. Ann. Math. (2) 154(2), 383–406 (2001)
 34.
Larsen, M., Shalev, A.: Characters of symmetric groups: sharp bounds and applications. Invent. Math. 174(3), 645–687 (2008)
 35.
Larsen, M., Shalev, A.: Fibers of word maps and some applications. J. Algebra 354, 36–48 (2012)
 36.
Liebeck, M.W., Shalev, A.: Girth, words and diameter. Bull. Lond. Math. Soc. 51(3), 539–546 (2019)
 37.
Larsen, M., Shalev, A., Tiep, P.H.: The Waring problem for finite simple groups. Ann. Math. (2) 174(3), 1885–1950 (2011)
 38.
Lubotzky, A.: Discrete Groups, Expanding Graphs and Invariant Measures. Modern Birkhäuser Classics. Birkhäuser Verlag, Basel, 2010. With an appendix by Jonathan D. Rogawski, Reprint of the 1994 edition
 39.
Pyber, L., Szabó, E.: Growth in finite simple groups of Lie type. J. Am. Math. Soc. 29(1), 95–146 (2016)
 40.
SchlagePuchta, J.C.: Applications of character estimates to statistical problems for the symmetric group. Combinatorica 32(3), 309–323 (2012)
 41.
Wall, G.E.: On the conjugacy classes in the unitary, symplectic and orthogonal groups. J. Austral. Math. Soc. 3, 1–62 (1963)
Acknowledgements
We thank László Pyber, Endre Szabó, and Péter Varjú for helpful discussions. Thanks are due to Emmanuel Breuillard and Bob Guralnick for discussions pertaining to the lowdegree representation theory of \({\text {SCl}}_n(q)\), and to Aner Shalev for discussions about character bounds. We thank Zoltán Halasi for sharing the preprint [22]. We would also like to thank two anonymous referees for a thorough inspection of the paper and suggesting many improvements.
Funding
Open access funding provided by ELKH Alfréd Rényi Institute of Mathematics.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
S. Eberhard has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 803711) U. Jezernik has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 741420).
Appendix A: Analogous arguments for \(S_n\)
Appendix A: Analogous arguments for \(S_n\)
In this appendix we give analogous arguments for \(S_n\). The main reason to do so is to motivate and give context to some of the arguments in the main body, as the arguments in the context of \(S_n\) are easier and somewhat more natural, involving only trajectories of points rather than vectors. A secondary reason is that a couple results are actually new, and of independent interest:

1.
if w is a word of length \(o(n^{1/2})\), then with high probability \({{\overline{w}}}\) has o(n) fixed points (Theorem A.4);

2.
the Cayley graph with respect to three random generators almost surely has diameter \(O(n^2 \log n)\).
Queries and trajectories
The following definitions only slightly generalize those in [8, 15].
Let \(G = S_n\) and \(\Omega = \{1, \ldots , n\}\). Let \(x_1, \ldots , x_k \in G\). Define a query to be a pair \((\xi , v)\), where \(\xi \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}\) and \(v \in \Omega \); the result of the query is \({\overline{\xi }} v\). After any finite sequence of queries
the known domain of a letter \(\xi \) at time t is
Suppose we make a further query \((w_t, v_t)\). If \(v_t \in D_{w_t}^t\), then the result \(\overline{w_t} v\) is determined already by the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t1}} v_{t1}\); we call this a forced choice. Otherwise, we say the query is a free choice.
Let R be some subset of \(\Omega \) fixed in advance. If a query \((w_t, v_t)\) is a free choice and yet
then we say the result of the query is a coincidence.
Again, the language is most interesting when \(x_1, \ldots , x_k \in G\) are chosen randomly. The following lemma is trivial, and parallels Lemma 3.3.
Lemma A.1
Let \(x_1, \ldots , x_k \in G\) be uniformly random and independent, and let
be a sequence of queries. Assume that \((w_t, v_t)\) is a free choice. Then, conditionally on the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t1}} v_{t1}\), the result \(\overline{w_t} v_t\) of the query \((w_t, v_t)\) is uniformly distributed in \(\Omega \setminus D_{w_t^{1}}^t\).
In particular, the conditional probability that \(\overline{w_t} v\) is a coincidence is bounded by
where
and s is the number of \(i < t\) with \(w_i \in \{w_t, w_t^{1}\}\).
Let \(w \in F_k\), and let
be the reduced expression. For each \(v \in \Omega \), the trajectory of v is the sequence of queries \((w_t, v^{t1})\), where \(v^0 = v\) and for each \(t \ge 1\) the vector \(v^t\) is the result of the query \((w_t, v^{t1})\); in other words, the sequence \(v^0, v^1, \ldots , v^\ell \) is defined by
Note that if step t is free and not a coincidence then step \(t+1\) is also free, and hence if \(v^\ell \in R\) then there must be at least one coincidence in the trajectory (cf. Lemma 3.5).
More generally for any \(r\ge 1\) the joint trajectory of an rtuple \(v_1, \ldots , v_r \in \Omega \) is simply the rtuple of individual trajectories, with the queries \((w_t, v_i^{t1})\) ordered lexicographically by (t, i). Again write \(\prec \) for this order, i.e., \((t',i') \prec (t,i)\) if \(t' < t\) or \(t'=t\) and \(i' < i\). Note that if step (t, i) is free and not a coincidence then
while
hence step \((t+1, i)\) is also free. Hence if \(v_i^\ell \in R\) then there must be at least one coincidence in the trajectory of \(v_i\). This observation is recorded as the following lemma (cf. Lemma 3.6).
Lemma A.2
Suppose \(v_i \notin \{v_1, \ldots , v_{i1}\}\) and \(v_i^\ell \in R\). Then there is at least one coincidence in the trajectory of \(v_i\) (during the joint trajectory of \(v_1, \ldots , v_r\)).
The probability of small support
For \(g \in S_n\), define
In this section we show that if w is a short word then almost surely \({\text {fix}}{{\overline{w}}}\) is small. The following lemma is similar to the argument used in [12, Lemma 2.2]; the only difference is that the set R is fixed in advance.
Lemma A.3
Let \(G = S_n\). Let \(R \subseteq \Omega \) be a subset of size r. Let \(w \in F_k\) be a nontrivial word of length \(\ell < n / r\). Then
Proof
Let \(R = \{v_1, \ldots , v_r\}\) and consider the joint trajectory of \(v_1, \ldots , v_r\). By Lemma A.2, we can have \(\overline{w} R = R\) only if there is at least one coincidence in each individual trajectory. We take a union bound over all possibilities for when the coincidences could occur. By Lemma A.1, the conditional probability that step (t, i) is a coincidence is bounded by
indeed there are at most \(\ell r\) previous points (if \(t = \ell \), assuming \(v_j^\ell \in R\) for \(j < i\)). There are \(\ell ^r\) possibilities for when the first coincidences might occur. Hence the claimed bound holds. \(\square \)
Theorem A.4
There is a constant \(c>0\) such that the following holds for all \(f \ge 0\). Let \(G = S_n\), and let \(w \in F_k\) be a nontrivial word of reduced length \(\ell < c f^{1/2}\). Then
Proof
Let \(x_1, \ldots , x_k\) be chosen independently and uniformly from G. Let \(F = {\text {fix}}{{\overline{w}}}\). By the lemma, for any subset \(R \subseteq \Omega \) of size r (for \(r < n /\ell \)) we have
Therefore, by a union bound,
Since \(x\mapsto \left( {\begin{array}{c}x\\ r\end{array}}\right) \) is increasing for \(x > r\), for \(r < f/2\) we have
Take \(r \sim f / (4 \ell ^2)\). The conclusion is
for some constant \(c > 0\). \(\square \)
Remark A.5
If \(\ell < c \log \log n\), a stronger bound is proved in [35, Sect. 2].
Expected values of characters
A notable difference between \(S_n\) and \({\text {Cl}}_n(q)\) is that \(S_n\) has several lowdegree characters: for example, the irreducible component of the standard representation has degree \(n1\). However, we can show that the expected value of \(\chi ({{\overline{w}}}) / \chi (1)\) is smaller than \(\chi (1)^{c}\) using the Larsen–Shalev character bound [34]. For most characters, \(\chi (1)\) is exponentially large in n, so this bound is similar in strength to Theorem 5.2. In application, lowdegree characters may have to be treated specially (as in the next section).
Theorem A.6
Let \(G = S_n\). Let \(w \in F_k\) be a fixed nontrivial word of reduced length \(\ell \). Then, for any \(f \ge C \ell ^2\),
In particular, taking \(f = n^{1/2}\), for \(\ell < cn^{1/4}\) we have
Proof
By conditioning on whether or not \({\text {fix}}{{{\overline{w}}}} \ge f\), we have
The first term is bounded by Theorem A.4. The second term is bounded by [34, Theorem 1.3]. \(\square \)
The following corollary follows exactly as in Sect. 5.
Corollary A.7
There is a constant \(c > 0\) such that the following holds. Let w be the result of a simple random walk of length \(\ell < cn^{1/4}\) in \(F_k\). Then
Expansion in lowdegree representations: a brief survey
Let \(G = S_n\), let \(x_1, \ldots , x_k \in G\) be random, where \(k \ge 2\) and bounded, and consider the action of \(x_1, \ldots , x_k\) on \(\Omega = \{1, \ldots , n\}\). The resulting Schreier graph is one of the standard models for a random 2kregular graph, and the spectral properties of this graph are well studied. The earliest results on the combinatorial expansion of boundeddegree random graphs essentially coincide with the dawn of expansion, beginning with Barzdin–Kolmogorov and Pinsker (see Gromov–Guth [17, Sect. 1.2] for some history), and such results are equivalent to lower bounds on the spectral gap by the discrete Cheeger inequality (due to Dodziuk and Alon–Milman): see Kowalski [31, Sect. 4.1].
Such bounds are weak, however. The strongest results on the spectral gap of a random regular graph are based on the trace method, which is an adaptation of Wigner’s proof of the semicircle law to the boundeddegree setting. These results begin with Broder and Shamir [8]. Let \(\rho \) be the spectral radius of \({\mathcal {A}}\) on \({\mathbf {C}}[\Omega ]_0\). Broder and Shamir proved that
In particular, \(\rho \) is bounded away from 1 as long as k is large enough. On the other hand, there is a deterministic lower bound
usually attributed to Alon and Boppana. The conjecture, due to Alon, that almost surely
remained open for some time, but was finally and famously settled by Friedman, using an ingenious elaboration of the trace method: see [16] for the proof, and for much more background. (See also Bordenave [7] for a simplified proof.)
The trace method also generalizes well, unlike the pure “counting” proof of expansion. Consider the action of \({\mathcal {A}}\) on \({\mathbf {C}}[\left( {\begin{array}{c}\Omega \\ r\end{array}}\right) ]\) for bounded r. This action was studied by Friedman–Joux–Roichman–Stern–Tillich [15], who showed that there is almost surely a uniform spectral gap. Their method is an elaboration of the Broder–Shamir method, and was direct inspiration for the argument of Sects. 8 and 9. We quote their result here, which will be used in the next section:
Theorem A.8
Let \(G = S_n\), and \(x_1, \ldots , x_k \in G\) random. Let \(\rho = \rho ({\mathcal {A}}, {\mathbf {C}}[\left( {\begin{array}{c}\Omega \\ r\end{array}}\right) ]_0)\) be the spectral radius of \({\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k}\) acting on \({\mathbf {C}}[\left( {\begin{array}{c}\Omega \\ r\end{array}}\right) ]_0\). Then, for fixed k, r, and \(\epsilon > 0\),
Diameter with respect to 3 random elements
Let \(G = S_n\). Let \(x_1, \ldots , x_k \in G\) be random, and let \(S = \{x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\). Helfgott, Seress, and Zuk [26] showed that, if \(k \ge 2\), then with high probability^{Footnote 8}
We show in this section that if \(k \ge 3\) then with high probability
While this is only a modest improvement, it is interesting for being conjecturally sharp for any proof which uses elements of small support as a stepping stone: it seems unlikely that an element of small support can be obtained in fewer than \(O(n \log n)\) steps on average, and a generic element of \(A_n\) cannot be written as a product of fewer than O(n) elements of small support.
The argument is most closely related to the argument of SchlagePuchta [40], which shows that for \(k = 2\) the diameter is bounded by \(O(n^3 \log n)\). We get a saving for \(k \ge 3\) by replacing the \(xy^i\) trick with the more powerful xw(y, z) trick.
Alternative 1
Write
where \(3 \not \mid n'\) and \(r \in \{4, 5\}\). Let \({\mathfrak {C}}\subseteq S_n\) be the normal subset of all elements whose cycle type is either \((1, 1, 3, r, n')\) or \((2, 3, r, n')\). Note that
while if \({\text {sgn}}\) is the sign character then
Let \(x, y, z \in G\) be random. Then by Theorem 6.1 with \(f = 1_{\mathfrak {C}}\) and Corollary A.7, if E is the event that every word \(u \in F_2\) of length at most \(\ell < c n^{1/4}\) satisfies \(x u(y, z) \notin {\mathfrak {C}}\) and w is the result of a simple random walk of length \(2\ell \) in \(F_2\),
Fixing \(\ell = \left\lfloor {C \log n}\right\rfloor \) for a sufficiently large constant C, we have, for sufficiently large n,
Let \({\mathcal {X}}\) be the set of characters \(\chi \in {\text {Irr}}{G}\) such that \(\chi (1) < n^{1000}\). The part of the sum (18) with \(\chi \notin {\mathcal {X}}\) is bounded by
Now consider some \(\chi \in {\mathcal {X}}\). Let \(\pi \in {\mathfrak {C}}\). It follows from the Murnaghan–Nakayama rule (splitting off an \(n'\)cycle) that \(\chi (\pi ) = O(1)\). Hence
It follows from the hook length formula that \({\mathcal {X}} = O(1)\). Hence, since \(\langle 1_{\mathfrak {C}}, {\text {sgn}}\rangle = 0\),
(the main term coming from the characters of degree \(n1\)). Hence, from (18),
We conclude that with high probability there is a word \(w \in F_3\) of length \(O(\log n)\) such that \(w(x, y, z) \in {\mathfrak {C}}\). Hence there is a word \(w' = w^{2 r n'}\) of length \(O(n \log n)\) such that \(w'(x, y, z)\) is a 3cycle. With high probability the conjugation action of x, y, z on the set of 3cycles has a uniform spectral gap (by Theorem A.8), so it follows that every 3cycle is a word in x, y, z of length \(O(n \log n)\). Thus every element of \(A_n\) is a word in x, y, z of length \(O(n^2 \log n)\).
Alternative 2
The crude bound \(n^{1/5}\) for the probability can be improved as follows. Write
where \(101 \not \mid n'\) and \(r \in \{99, 100\}\). Let \({\mathfrak {C}}\subseteq S_n\) be the normal subset of all elements having both a 101cycle and an \(n'\)cycle (the remaining part is an arbitrary element of \(S_r\)). Assuming \(n' > 101\),
and as before we have \(\langle 1_{\mathfrak {C}}, {\text {sgn}}\rangle = 0\). In fact, \(\langle 1_{\mathfrak {C}}, \chi \rangle = 0\) for all lowdegree \(\chi \).
Lemma A.9
If \(1 \ne \chi \in {\text {Irr}}G\) and \(\langle 1_{\mathfrak {C}}, \chi \rangle \ne 0\), then \(\chi (1) \gg n^{98}\).
Proof
It is wellknown that characters of \(S_n\) are parameterized by partitions \(\lambda \vdash n\). Let \(\chi = \chi _\lambda \) be a character such that \(\langle 1_{\mathfrak {C}}, \chi \rangle \ne 0\). By the Murnaghan–Nakayama rule, it must be the case that \(\lambda \) can be obtained by starting from (r) and adding a 101rimhook and an \(n'\)rimhook. Hence if \(\chi \) is nontrivial and n is sufficiently large then \(\lambda _1 \le n  100\) and \(\lambda _1' \le n  98\). From the hook length formula it follows that, for sufficiently large n,
\(\square \)
It follows as before that, with probability at least
there is a word \(w \in F_3\) of length \(O(\log n)\) such that \(w(x, y, z) \in {\mathfrak {C}}\). Hence there is a word \(w' = w^{r! n'}\) of length \(O(n \log n)\) such that \(w'(x, y, z)\) is a 101cycle. By Theorem A.8 (and inspecting the proof), the conjugation action of x, y, z on the set of 101cycles has spectral gap at least \(\delta \) with probability at least
Taking \(\delta = 1/\log n\) (say), it follows that every 101cycle is a word in x, y, z of length \(O(n \log n)\), and hence the diameter of \({\text {Cay}}(\langle S \rangle , S)\) is \(O(n^2 \log n)\), with probability
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Eberhard, S., Jezernik, U. Babai’s conjecture for highrank classical groups with random generators. Invent. math. (2021). https://doi.org/10.1007/s0022202101065x
Received:
Accepted:
Published: