Babai's conjecture for high-rank classical groups with random generators

Let $G = \mathrm{SCl}_n(q)$ be a quasisimple classical group with $n$ large, and let $x_1, \dots, x_k \in G$ random, where $k \geq q^C$. We show that the diameter of the resulting Cayley graph is bounded by $q^2 n^{O(1)}$ with probability $1 - o(1)$. In the particular case $G = \mathrm{SL}_n(p)$ with $p$ a prime of bounded size, we show that the same holds for $k = 3$.


Introduction
Let G be a group and S a symmetric (S = S −1 ) subset of G. Write Cay(G, S) for the associated Cayley graph: the graph whose vertices are the elements g ∈ G and whose edges are pairs {g, sg} with g ∈ G, s ∈ S. The graph Cay(G, S) is connected if and only if S generates G, and its diameter is equal to the smallest d such that (S ∪ {1}) d = G. A well-known conjecture of Babai [BS92] states that diam Cay(G, S) = (log |G|) O(1) , uniformly over all nonabelian finite simple groups G and symmetric generating sets S. In other words, every connected Cayley graph of a nonabelian finite simple group has diameter within a power of the trivial lower bound.
By the classification of finite simple groups, Babai's conjecture splits into essentially three broad cases: 1. groups of Lie type of bounded rank over F q with q → ∞; S. Eberhard has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 803711). U. Jezernik has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 741420).
2. classical groups of unbounded rank over F q with q arbitrary; 3. alternating groups A n with n → ∞.
For groups of Lie type and bounded rank, Babai's conjecture is now completely resolved, following breakthrough work of Helfgott [Hel08], Pyber-Szabó [PS16], and Breuillard-Green-Tao [BGT11]. In the other two cases the conjecture remains open. For the alternating groups, Helfgott and Seress [HS14] proved that diam Cay(A n , S) = exp O((log n) 4 log log n). A key open case is the family of groups SL n (2) with n tending to infinity.
In all cases, an important subproblem is the case of random generators (see, e.g., [Lub10, Problem 10.8.6]). Let k ≥ 2 be a small constant and let S = {x ±1 1 , . . . , x ±1 k }, where x 1 , . . . , x k ∈ G are uniform and independent. For groups of Lie type of bounded rank, it was proved by Breuillard, Green, Guralnick, and Tao [BGGT15] that Cay(G, S) is almost surely 1 an expander, and in particular diam Cay(G, S) = O(log |G|).
There is no consensus about whether such a strong bound is likely to hold for groups of unbounded rank. Babai's conjecture for A n and random generators was an open problem for some time. The first polynomial bound was proved by Babai In this paper we consider the case of high-rank classical groups over a small field. Recall that these are obtained from the groups GL n (q), Sp n (q), GO (±) n (q), GU n (q), of automorphisms of a finite vector space V = F n q , in the latter three cases equipped with a nondegenerate alternating, quadratic, or hermitian form, respectively. Throughout we write GCl n (q) for any of these groups, and SCl n (q) for the corresponding derived subgroup SL n (q), Sp n (q), Ω (±) n (q), SU n (q).
1 Throughout the paper, we use the terms "almost surely" or "with high probability" to mean with probability 1 − o(1) as the relevant parameters tend to infinity.
We will write Cl n (q) for any intermediate group: SCl n (q) ≤ Cl n (q) ≤ GCl n (q).
Omitting a few small exceptional cases, SCl n (q) is a quasisimple group, so Babai's conjecture applies. 2 For SCl n (q) with n large and random generators, the best bound out there is just the uniform bound (1).
There is a promising programme of Pyber, which aims to prove Babai's conjecture in three steps. The programme is motivated by the positive solution in the case of random generators in alternating groups, especially the result of Babai-Beals-Seress [BBS04] that diam Cay(A n , S) ≤ n O(1) provided only that S contains an element of degree at most n/(3 + ǫ). Here the degree of a permutation is the number of non-fixed points. Analogously, the degree of an element g ∈ GL n (q) is defined to be the rank of g − 1, and Pyber's programme is the following.
that k ≥ q Cr 3 . Then almost surely the Schreier graph of G generated by x 1 , . . . , x k on any of its orbits in W has a uniform spectral gap.
As we will explain, this implies that if we have an element of minimal degree then by conjugation we can rapidly obtain a full conjugacy class of elements of minimal degree, and it follows in short order that the diameter of G is not too large. This completes the proof of Babai's conjecture for SCl n (q) for k random generators, as long as k is sufficiently large compared to q.
Theorem 1.4. There are constants c, C > 0 so that the following holds. Let G = Cl n (q), where n > C. Let x 1 , . . . , x k be elements of G chosen uniformly at random, where k > q C , and let S = {x ±1 1 , . . . , x ±1 k }. Then with probability 1 − q −cn we have S ≥ SCl n (q), and diam Cay( S , S) ≤ q 2 n C .
Corollary 1.5. Babai's conjecture holds in the following two cases: (1) SL n (p), p prime and bounded, and at least 3 random generators; (2) SCl n (q) and at least q C random generators, where C is an absolute constant.
Our method does not depend on the classification of finite simple groups (CFSG) in any way. Having a CFSG-free method is valuable for transparency, but moreover we think it is essential for attacking Babai's conjecture. It is well-known that two random elements of SCl n (q) almost surely generate the group: this is a result of Kantor and Lubotzky [KL90]. Kantor and Lubotzky rely on CFSG through Aschbacher's theorem, so unfortunately their method does not adapt well to proving diameter bounds. By contrast, in [EV20] the first author and Virchow found a CFSG-free proof in the case of SL n (q) and expressed the hope that the method would be generalizable. We recycle several ideas from that paper in the present one.
Perhaps the most important idea in our method is the idea that if x, y, z ∈ G are random and independent, then the elements xw(y, z) for all short words w ∈ F 2 behave roughly independently, which allows us to imitate having many more than just 3 generators. This is a more powerful version of the "xy i trick", which comes originally from [BBS04, Section 4] and has been essential in all subsequent work on the random generator subproblem in high rank.
Let us mention one further result, of independent interest. In the appendix we give analogous arguments for A n , based on the standard fanciful idea that A n = PSL n (1). The value of doing so is mostly motivational, but we also obtain a new result. Provided k ≥ 3, we sharpen (2) to diam Cay(A n , S) ≤ O(n 2 log n). This is a modest improvement, but it is interesting for being conjecturally sharp for any proof which uses elements of small support as a stepping stone. Decreasing the exponent 2 appears to require a radically new idea.
Reader's guide. We first record some preliminaries (Section 2) regarding asymptotic notation, Cayley and Schreier graphs, classical groups and their associated formed spaces and the notions of degree and support, and adjacency operators.
Next we turn to a more specialized preparatory section (Section 3) dealing with word maps, where we introduce the vocabulary of queries, coincidences, and trajectories. Briefly, the idea is that if w ∈ F k is a given word, v ∈ V a given vector, and x 1 , x 2 , . . . , x k ∈ G random, then evaluating w(x 1 , . . . , x k )v can be thought of as a kind of random walk. As much as possible we recycle the key language used by [FJR + 98] in the case of the symmetric group. The tools of this section will be used in two essentially different ways in the rest of the paper.
We proceed (Section 4) by showing that a given short word w evaluated at random elements x 1 , . . . , x k ∈ G almost surely has large support (Theorem 4.2). This is a kind of antithesis to step 1 of Pyber's programme: all sufficiently short words in random generators will in fact fail to have degree (1 − ǫ)n. However, this is interesting when combined with recent character bounds of Guralnick-Larsen-Tiep [GLT20,GLT19], as it implies that the character ratio χ(w(x 1 , . . . , x k ))/χ(1) is almost surely small for each nonlinear character χ (Corollary 5.3).
This bound on the expectation of χ(w(x 1 , . . . , x k ))/χ(1) is one of the two main ingredients in the "xw(y, z) trick", which is the subject of Section 6. This trick shows that, given random generators x 0 , x 1 , . . . , x k , one can almost surely find a short word x 0 w(x 1 , . . . , x k ) lying in a given normal subset C ⊆ G, provided that the density of C is large compared to the expected values of character ratios. The trick is a simple consequence of the second moment method, following the observation that the elements x 0 w(x 1 , . . . , x k ) for various w are approximately pairwise independent.
The other main ingredient is the construction of an appropriate normal set C. This is the subject of Section 7. For each classical group we find a large normal set C, all of whose fibres over G ab are large (allowing us to ignore linear characters), and a small integer m such that for every g ∈ C the power g m has minimal degree in SCl n (q). This completes the proof of Theorem 1.1.
Once we have an element of minimal degree, we can act on that element by conjugation. Since the minimal degree in all cases is at most 2, this action is a constituent of the usual permutation action on 4-tuples of vectors. We analyze this action by again using the language of trajectories and coincidences, and the trace method : we bound a high moment of the second eigenvalue by bounding the trace of the corresponding power of the adjacency matrix, interpretting the latter in terms of closed trajectories. This is analogous to a result for the symmetric group due to Friedman, Joux, Roichman, Stern, and Tillich [FJR + 98], building on earlier work of Broder-Shamir [BS87]. However, in the case of classical groups there are some extra combinatorial complications that do not arise for symmetric groups.
We first focus (Section 8) on describing the structure of a closed trajectory with only one coincidence. We deal with the motivational case of G acting on V first, and then generalize to the action on tuples of vectors.
These results are then (Section 9) used to show that, in an orbit of G of size N , the probability that a trajectory closes is close to 1/N , with a small relative error. Again we first deal with the motivational case of G acting on V . Provided that we have sufficiently many generators in terms of q, these bounds are good enough for the trace method to work. This completes the proof of Theorem 1.3.
Finally, in Section 10 we collect results and deduce Theorems 1.2 and 1.4. Many (but not all) of our arguments have natural analogues for the symmetric group. For independent interest and for motivation, these are presented in Appendix A.
Acknowledgments. We thank László Pyber, Endre Szabó, and Péter Varjú for helpful discussions. Thanks are due to Emmanuel Breuillard and Bob Guralnick for discussions pertaining to the low-degree representation theory of SCl n (q), and to Aner Shalev for discussions about character bounds. We thank Zoltán Halasi for sharing the preprint [Hal20]. We would also like to thank two anonymous referees for a thorough inspection of the paper and suggesting many improvements.

Preliminaries
This section fixes some notation and definitions that will be relevant throughout the paper. The reader needing an introduction to expansion, particularly in Cayley and Schreier graphs, could consult Kowalski [Kow19]. For an introduction to classical groups, see Aschbacher [Asc00, Chapter 7] or Grove [Gro02].
2.1. Asymptotic notation. Many of the arguments we will use are of asymptotic nature and we adopt standard asymptotic notation to state these. Given functions f, g, we write f ≪ g or equivalently f = O(g) to denote that there are absolute constants N, C > 0 so that |f (n)| ≤ C · g(n) for all n ≥ N . Let f ≍ g mean that f ≪ g and g ≪ f . We write f = o(g) to denote that for every ǫ > 0 there is We will generally write statements that involve anonymous (usually absolute) constants by using c for small constants and C for big constants.
2.2. Cayley and Schreier graphs. Let G be a group with generating set S satisfying S = S −1 . The (undirected, left) Cayley graph Cay(G, S) is the graph whose vertices are elements of G and whose edges are pairs {g, sg} for g ∈ G, s ∈ S.
More generally, the (undirected) Schreier graph Sch(G, S, Ω) associated to a transitive action of G on a set Ω is the graph whose vertices are elements of Ω and whose edges are pairs {ω, sω} for ω ∈ Ω, s ∈ S. Cayley graphs are Schreier graphs for the left regular representation of G on itself.
Let Γ be a connected graph. One can view Γ as a metric space in the following way. Define the length of a path in Γ to be the number of edges on the path, and let the distance d Γ (v 1 , v 2 ) between any two vertices v 1 , v 2 ∈ V (Γ) be the length of the shortest path between v 1 , v 2 . The diameter of a graph Γ is 2.3. Classical groups. Throughout the paper we write SCl n (q) ≤ GCl n (q) ≤ GL n (q) for any of the following groups: GL n (q), Sp n (q), GO (±) n (q), GU n (q), SCl n (q) : SL n (q), Sp n (q), Ω (±) n (q), SU n (q).
In all cases the defining module is V = F n q . We sometimes refer to the first case as the linear case. We make the following conventions in the other cases (notation in other literature sometimes differs, particular in the GU case): Sp n : n must be even. GO (±) n : Ω n (q) = SO n (q) ′ . If n is even there are two possibilities, denoted GO + n (q) and GO − n (q), depending on the choice of quadratic form. If n is odd there is only GO n (q), and q must be odd. GU n : q must be a square q 2 0 . The field automorphism of F q of order 2 is denoted θ.
We write Cl n (q) for any intermediate group: Note that any such group corresponds to a subgroup of the abelianization GCl n (q) ab , which is given as follows: GO ± n (q) ab ∼ = C 2 (q even, n even), in particular, in odd characteristic, In even characteristic, Q is not determined by f , but is part of the defining data (and f is determined by Q via (3)). In the unitary case we write Q for the function which we may regard as a quadratic form over F q0 . In the other cases define Q ≡ 0. Define also q 0 = q in the orthogonal case and q 0 = 1 in the linear and symplectic cases, so that Q always takes values in a q 0 -element space. It is important that we are able to count solutions to Q(v) = x in any affine subspace. [Dic01,Chapter IV].) This is trivial in the linear and symplectic cases: Q ≡ 0, so the number is exactly q n−s . The unitary case reduces to the orthogonal case by restriction of scalars, so it suffices to consider the orthogonal case. For The Fourier transform of Φ is For nontrivial χ we have The sum over w is zero unless h ∈ W ⊥ . Note that dim W ⊥ = s. Hence By Fourier inversion we have Relatedly, we have Witt's lemma, which characterizes the orbits of GCl n (q) in terms of f and Q.
Then there is an element g ∈ GCl n (q) such that gu i = v i for each 1 ≤ i ≤ k. If k ≤ n − 2 there is such an element in SCl n (q).
2.5. Degree and support. The concepts of degree and support are essential in the rest of the paper. Both concepts are analogous to the size of the support of a permutation, defined as the set of non-fixed points. The degree of an element g ∈ GL n (q) is deg g = rank(g − 1); the support of g ∈ GL n (q) is (the former definition follows [BY17] and [HMPQ19]; the latter definition follows Larsen-Shalev-Tiep [LST11]). Equivalently, if V λ = ker(g − λ) denotes the λeigenspace of g (for λ ∈ F q ), then Support is closely related to the size of the centralizer, as in the following lemma.
Lemma 2.3. For g ∈ G ≤ GL n (q), Note that C M n (Fq) (g) is a vector space over F q , so it will suffice to bound its dimension. Consider g as an element of GL n (F q ) and decompose it into Jordan blocks. For each eigenvalue λ of g, let π λ be the partition whose parts are the sizes of Jordan blocks associated to λ. Denote by S i (π) the sum of ith powers of the parts of a partition π and let π ′ be the transposed partition of π. By [Hum95, The largest part of π ′ λ is the dimension of V λ , so 2.6. Adjacency operator. Given any group G and x 1 , . . . , x k ∈ G, let This is an element of the group algebra C[G]. Given any C[G]-module W , we may consider the action of A on W . Since A is self-adjoint its spectrum is real. Write ρ(A, W ) for the spectral radius of A.
We say the action of x 1 , . . . , x k on Ω is expanding if the spectral gap is bounded away from zero. This is equivalent to rapid mixing of the random walk on Ω.
Usually, but not always, x 1 , . . . , x k will be chosen randomly. The following lemma is often useful for reducing to the cyclically reduced case.
Lemma 3.1. If x 1 , . . . , x k ∈ G are uniform and independent then w is just the image of w under a uniformly random homomorphism F k → G. In particular, the distribution of w depends only on the automorphism class of w.
3.2. Queries and coincidences. Let G = Cl n (q) be a classical group and V = F n q the defining module. Let x 1 , . . . , x k ∈ G. Define a query to be a pair (ξ, v), where ξ ∈ {ξ ±1 1 , . . . , ξ ±1 k } and v ∈ V ; the result of the query is ξv. After any finite sequence of queries the known domain of a letter ξ at time t is wt , then the result w t v t is determined already by the values of w 1 v 1 , . . . , w t−1 v t−1 ; we call this a forced choice. Otherwise, we say the query is a free choice.
Let R be some subset of V fixed in advance. If a query (w t , v t ) is a free choice and yet then we say the result of the query is a coincidence. The language is most interesting when x 1 , . . . , x k ∈ G are chosen randomly. Then, by Witt's lemma, whenever (ξ, v) is a free choice, ξv is, conditionally on the result of previous queries, uniformly distributed among vectors satisfying the relevant independence and form conditions. In particular, coincidences are unlikely. We formalize these key points in the following lemmas.
Lemma 3.2. Let x ∈ G be uniformly random, and let u 1 , . . . , u t be linearly independent, where t ≤ n− 2. Then, conditionally on the values of v 1 = xu 1 , . . . , v t−1 = xu t−1 , the value of xu t is uniformly distributed among vectors v t such that u i → v i defines an isometric isomorphism u 1 , . . . , u t → v 1 , . . . , v t , or in other words such that v t / ∈ span{v 1 , . . .
Proof. For each such v t , Witt's lemma asserts that there is at least one suitable x ∈ G. The distribution is uniform by the orbit-stabilizer theorem.
Lemma 3.3. Let x 1 , . . . , x k ∈ G be uniformly random and independent, and let Then, conditionally on the values of w 1 v 1 , . . . , w t−1 v t−1 , the result w t v t of the query In particular, the conditional probability that w t v t is a coincidence is bounded by q d q n−s /q 0 − q s − q n/2 (provided the denominator is positive), where and s is the number of i < t with w i ∈ {w t , w −1 t }.
Proof. The first part of the lemma is immediate from the previous lemma. For the second part, note that w t v is drawn from an affine subspace of codimension at most s, less a subspace of dimension at most s, subject only to the quadratic condition; by Lemma 2.1 there are at least q n−s /q 0 − q n/2 − q s possibilities, so we get at least the denominator claimed.
Remark 3.4. In the linear case there are no form conditions, so we get the simpler bound q d /(q n − q s ) for the probability of a coincidence.
3.3. Trajectories. Let w ∈ F k , and let The following lemma is trivial but essential.
Lemma 3.5. Suppose v = 0 and v ℓ ∈ span R. Then there is at least one coincidence in the trajectory of v.
More generally for any r ≥ 1 we consider the joint trajectory of an r-tuple which is simply the r-tuple of individual trajectories, with the queries (w t , v t−1 i ) ordered lexicographically by (t, i); i.e., we answer the queries The following lemma generalizes the previous one.
Then there is at least one coincidence in the trajectory of v i (during the joint trajectory of v 1 , . . . , v r ).
i ′ with t ′ = t and i ′ < i get included because they are results of previous queries), while

The probability of small support
Let G be a finite group, let w ∈ F k , let x 1 , . . . , x k ∈ G be random, and consider w = w(x 1 , . . . , x k ). The probability that w = 1 quantifies the extent to which w is "almost a law" in G. This probability is a well-studied quantity, particularly when G is simple. For example, it is known that for any w = 1 there is some For groups of large rank (our particular interest), the following bounds have been proved recently. Let ℓ > 0 be the reduced length of w.
1. For G = A n or G = S n , if ℓ < cn 1/2 then 2. For any classical group G = Cl n (q), if ℓ < cn then The proofs of these estimates can be adapted to show more, namely that with high probability w has large support. In this section we explain this observation in detail in the case of G = Cl n (q). For the case of G = A n or G = S n , see the appendix (Subsection A.2).
The following lemma generalizes a key step from the argument of [LS19, Theorem 4].
Lemma 4.1. Let G = Cl n (q) be a classical group of dimension n. Let V = F n q be the natural module, and let U ≤ V be a subspace of dimension r ≤ n − 2. Let w ∈ F k be a nontrivial word of length ℓ ≤ ( n 2 − 2)/r. Then Proof. Let v 1 , . . . , v r be a basis for U . Consider the joint trajectory of v 1 , . . . , v r . By Lemma 3.6 with R = {v 1 , . . . , v r }, we can have wU = U only if there is at least one coincidence in each individual trajectory. We take a union bound over all possibilities for when the coincidences could occur. If t < ℓ, then by Lemma 3.3, the probability that step (t, i) is a coincidence is bounded by indeed there are at most tr + i ≤ (t + 1)r ≤ ℓr previous vectors. If t = ℓ, assuming v ℓ j ∈ U for j < i, we actually get a slightly stronger bound: Summing over t, the probability that there is a coincidence in the trajectory of v i is bounded by Taking the product over i gives the claimed bound.
In the following proof we will refer to the "q-binomial coefficient", defined by When x is a nonnegative integer this is the number of r-dimensional subspaces of F x q . For x ≥ r note that x → x r q is increasing and nonnegative, and The following theorem will be used for an unspecified, but fixed, δ > 0.
Theorem 4.2. There are constants c, C > 0 such that the following holds for all δ > 0. Let G = Cl n (q) be a classical group of dimension n, and let w ∈ F k be a nontrivial word of reduced length ℓ < δ 2 n/20. Assume q δn > C. Then Proof. Let x 1 , . . . , x k be chosen independently and uniformly from G. Suppose some eigenspace V λ ≤ F q n of w has dimension at least δn.
Let Λ be the set of d Galois conjugates of λ.
Then there is a conjugate subspace W ′ ≤ V λ ′ for each λ ′ ∈ Λ, and the sum U = λ ′ ∈Λ W ′ is a dr-dimensional and F q -rational since it is fixed by the Galois group, so it may be identified with a dr-dimensional subspace of V . Since U ∩ V λ = W , this correspondence W → U is injective. Hence the number of dr-dimensional subspaces of V preserved by w is at least δn r q d . Since ℓd ≤ ℓ/δ < δn/20, we may choose an integer r > 0 such that ℓdr ∈ [δn/5, δn/4]. Now by the previous lemma and Markov's inequality, the probability that the number of dr-dimensional subspaces of V preserved by w is at least δn Taking the sum over all d ≤ 1/δ, it follows that Assuming q δn is sufficiently large, the first two factors are negligible compared to the third.
Remark 4.3. The restriction ℓ < cδ 2 n in Theorem 4.2 is essential, and related to our reliance on linear algebra. For example, let G = SL n (q), and suppose w is a word of length ℓ ≈ 10n. We do not know how to bound P(w = 1) satisfactorily. Is it true that P(w = 1) ≤ q −cn for some c > 0? Certainly w cannot be a law, because SL n (q) contains SL 2 (q ⌊n/2⌋ ) and the shortest law in SL 2 (q ⌊n/2⌋ ) has length at least (q ⌊n/2⌋ − 1)/3 (see Hadad [Had11,Theorem 2]). The question is whether it can be an almost-law.

Expected values of characters
Throughout this section let G = Cl n (q) be a classical group and χ ∈ Irr G a nonlinear character. Our aim is to bound when w is a fixed nontrivial word of length cn, evaluated at random x 1 , . . . , x k ∈ G.
The proof consists of two steps: 1. By the previous section, with high probability w has large support. 2. By recent character bounds of Guralnick, Larsen, and Tiep [GLT20, GLT19], if w has large support then |χ(w)| ≤ χ(1) ǫ .
We first deal with elements of large support.
Theorem 5.2. There is a constant c > 0 such that the following holds. Let w ∈ F k be a fixed nontrivial word of reduced length less than cn. Then Proof. Let δ be as in the previous lemma with ǫ = 1/2. By conditioning on whether or not supp w < (1 − δ)n, we have It follows from Theorem 4.2 that for some constant c 1 > 0. The other summand is bounded by Lemma 5.1: for some constant c 2 > 0. (Here we used χ(1) ≥ q c3n : see [LS74].) Our main interest is the case in which w is the result of a simple random walk in F k . With high probability the result of the random walk is nontrivial, so we can apply the above theorem.
Corollary 5.3. There is a constant c > 0 such that the following holds. Let w be the result of a simple random walk of length ℓ < cn in F k . Then Proof. By conditioning on whether or not the word w is trivial, we get The first term is bounded by Theorem 5.2. The second term is the return probability of a simple random walk on a 2k-regular tree, which is at most k −cℓ for a constant c > 0 (see [Kes59,  In this section, something of an interlude, let G be any finite group, and let C be a normal (i.e., conjugacy-closed) subset of a group G. We will develop a criterion ensuring that one can, with high probability as x, y, z ∈ G are chosen uniformly at random, find a word w ∈ F 2 of at most a prescribed length such that xw(y, z) ∈ C. The criterion applies to sets C whose density is large compared to the expected values of characters. This is a variation of the technique used in [EV19, Section 4]; see also [EV20, Section 2].
The following theorem expresses the most general such estimate we will need, in which we further allow arbitrary weights to be attached to elements of C. We express the result in terms of a nonnegative conjugation-invariant function (class function) f on G. We define the L p norm of f by and we use the standard inner product on functions on G defined by Theorem 6.1. Let f be a nonnegative and conjugation-invariant function on G, and let ℓ be a positive integer. Let x 0 , x 1 , . . . , x k be elements of G chosen uniformly at random. Let E be the event that f (x 0 u) = 0 for every word u ∈ F k of length at most ℓ. Let w be the result of a simple random walk of length 2ℓ in F k . Then 3 In particular, Proof. Let A = A x1,...,x k be the adjacency operator defined in Subsection 2.6, and consider its natural action on L 2 (G). Let X = A ℓ f (x 0 ), regarded as a random variable dependent on x 0 , x 1 , . . . , x k , and note that E is precisely the event X = 0. By Chebyshev's inequality, The first moment is The second moment is we can expand this further in terms of characters. By orthogonality of characters, if τ x is the translation operator defined by where w is the result of a simple (symmetric) random walk of length 2ℓ in F k . Hence, from (5), The χ = 1 term is f 2 1 , which is the same as (EX) 2 . Hence the first part of the theorem follows from (4). The second part holds because Corollary 6.2. Let C be a normal subset of G. Write Note that the distribution of w is symmetric, so Ex 1 ,...,x k ,w χ(w)/χ(1) is real.
where C α = C ∩ αG ′ is the fibre of C over α ∈ G ab . Let δ α = |C α |/|G ′ | be the fibre density, and let δ = min α∈G ab δ α . Assume δ > 0. Let x 0 , x 1 , . . . , x k ∈ G be chosen uniformly at random, and let E be the event that for every word u ∈ F k of length at most ℓ we have x 0 u / ∈ C. Let w be the result of a simple random walk of length 2ℓ in F k . Then Proof. In the previous theorem, take Then f 1 = 1, and

Obtaining an element of minimal degree
Let G = GCl n (q). Let s be the minimal degree of a nontrivial element of SCl n (q); thus s = 2 in the orthogonal case and s = 1 otherwise. Let In this section we exhibit a large normal subset C d ⊆ G with an integer parameter d whose q d − 1 power is contained in M. We will use C d in combination with Corollaries 5.3 and 6.2 to obtain an element of minimal degree as a short word in random generators.
Proposition 7.1. There is a constant C > 0 so that the following holds. Let d ∈ [2, n] be an integer parameter. Assume q d > Cn. Then there is a normal subset C d ⊆ G with the following properties.
(1) For every α ∈ G ab , if C d;α is the fibre of C d over α, then (2) For every g ∈ C d , we have where κ = 2 if G is orthogonal in even characteristic, and κ = 1 otherwise.
The proof is split into cases depending on the type of G.
7.1. The linear case. Let G = GL n (q). In this case M is the set of transvections. Let V be the natural module for G. Write i.e., let k = n−3 Fix a basis for each of the subspaces. We now define a particular element g ∈ GL(V ) respecting the above decomposition. We define g by its action on the chosen basis for each of the subspaces above.
Subspace L: Let g act as a transvection on L, say ( 1 1 . The variable t acts on the latter space by multiplication. Let g act on V i as multiplication by t. Note that the minimal polynomial of this transformation is Let I d denote the set of monic irreducible polynomials of degree d over F q . For every tuple p 1 , . . . , p k ∈ I d with p i = p i ′ for i = i ′ and α ∈ G ab we thus have an element g = g p1,...,p k ;α ∈ G. Let g G p1,...,p k ;α denote the conjugacy class of g p1,...,p k ;α (this class does not depend on the order of p 1 , . . . , p k ). Let The union is disjoint, because the minimal polynomial of each element of g G p1,...,p k ;α is divisible by p 1 (t) · · · p k (t) (the other factors are (t − 1) 2 and (t − λ) Remark 7.2. This is a variation of the construction in [EV20, Section 3.2].
Proof of Proposition 7.1 for GL. By construction, C d;α is the fibre of C d over α, and for every p 1 , . . . , p k , α we have g q d −1 p1,...,p k ;α ∈ M. It remains only to estimate the density of C d;α .
For g = g p1,...,p k ;α , we have (as in the proof of Lemma 2.3)

Recall that
In particular, by the hypothesis q d > Cn we have |I d | > k, and in fact ) . Hence, from (6), since r < d, This proves the proposition.
where H is an orthogonal direct sum of hyperbolic planes and V an is anisotropic, and dim V an ≤ 2 by the Chevalley-Warning theorem. Let δ = dim V an + 4 + 2κ, where κ = 2 if G is orthogonal in even characteristic, and κ = 1 otherwise. Let D = 2d and write n − δ = kD + r, (0 ≤ r < D), i.e., let k = ⌊(n − δ)/D⌋ and r = n − δ − kD. Write the hyperbolic space H as where each constituent is an orthogonal direct sum of hyperbolic planes with dim L = 2κ + 2, dim V i = D, dim R = 2, and dim W ′ = r. Let W = W ′ ⊥ V an . Thus we have the following orthogonal decomposition of V : Fix a hyperbolic basis for each of the hyperbolic spaces, and fix a basis for W . We now define a particular element g ∈ GCl(V ) respecting the decomposition (7). As before we will define g by its action on the chosen bases.
. . , w d be the chosen hyperbolic basis for V i . Thus there is a decomposition . The variable t acts on the latter space by multiplication. By Witt's lemma, this action extends to the space V i . This extension is moreover unique provided we demand that it preserves the decomposition of V i (see [Hup80,Hilfssatz 3.1]). Let g| Vi be defined by this unique extension. The minimal polynomial of this transformation can be determined as follows (see [Wal63]). In the symplectic and orthogonal cases, let p * (t) = p(0) −1 t d p(t −1 ). In the unitary case, let p * (t) = p θ (0) −1 t d p θ (t −1 ), where θ acts on the coefficients. The minimal polynomial of g acting on V i is *symmetric, divisible by p i (since p i is irreducible), and hence also divisible by p * i . Under the assumption that p i = p * i , the minimal polynomial of g| Vi must therefore be equal to p i p * i . If p i = p * i then the minimal polynomial is Symplectic case: Let g act trivially on R. (Note G ab is trivial.) Unitary case: Let g act as the matrix where a ∈ F q satisfies a 1−θ k i=1 p i (0) 1−θ = det α. Such an element always exists since det α has norm 1.
Orthogonal case: The natural map GO(R) ab → G ab is bijective. 4 Let g act on R so that for every linear character λ of G we have In all cases note that (g| R ) κ(q d −1) is trivial. 5 Subspace W : Let g act trivially on W .
For every k-tuple p 1 , . . . , p k ∈ I d and every α ∈ G ab , we thus have an element g = g p1,...,p k ;α ∈ G. The conjugacy class g G p1,...,p k ;α is invariant under reordering p 1 , . . . , p k , and under replacing any p i by p * i . Conversely, g G p1,...,p k ;α is determined by p 1 (t)p * 1 (t) · · · p k (t)p * k (t) and α. Let I ′ d be the set of unordered pairs {p, p * } of monic irreducible polynomials p, p * ∈ I d with p = p * . Let The union is disjoint, because the minimal polynomial of every element of g G p1,...,p k ;j is divisible by p 1 (t)p * 1 (t) · · · p k (t)p * k (t) and has no other nonlinear factors. Finally let Proof of Proposition 7.1 for other classical groups. By construction, g p1,...,p k ;α lies over α and g κ(q d −1) p1,...,p k ;α ∈ M. We must estimate the density of C d;α . Consider g = g p1,...,p k ;α for some p 1 , . . . , Let h ∈ C G (g). Then h preserves each V i,1 and V i,2 , those being the p i -and p * iprimary subspaces of g. The restrictions of h to V i,1 and V i,2 determine one another, and there are at most q d possibilities for h| Vi,1 (as in Lemma 2.3). Hence, since The number of monic irreducible polynomials of degree d over F q is q d /d−O(q d/2 /d), while the number of * -symmetric polynomials of degree d is at most q d/2 , so By the hypothesis q d > Cn this is at least k, and in fact This proves the proposition.
4 Note that GO(R) ∼ = GO + 2 (q) ∼ = D 2(q−1) . In odd characteristic, G ab ∼ = C 2 × C 2 , and determinant and spinor norm are independent characters on GO + 2 (q). In even characteristic, G ab ∼ = C 2 , and the Dickson invariant is nontrivial on GO + 2 (q). 5 The existence of an even-order linear character of GOn(q) in even characteristic is why we need the extra factor of 2 in that case. 7.3. Collecting results. We now collect the results from the previous sections to conclude that with high probability as three random elements from G are chosen uniformly at random, there is a short word in these elements that belongs to M.
Theorem 7.3. There are constants c, C > 0 so that the following holds. Let G = Cl n (q), where log q < cn log −2 n. Let x, y, z be elements of G chosen uniformly at random. Let M be the event that there exists a word w ∈ F 3 of length at most n C log q such that w(x, y, z) ∈ M. Then Proof. By Corollaries 5.3 and 6.2 there are constants c 1 , c 2 > 0 and C 1 , C 2 such that the following holds. Let ℓ = ⌊c 1 n/2⌋, and let E be the event that every word u ∈ F 2 of length at most ℓ satisfies xu(y, z) / ∈ C d . Then provided q d > C 2 n. Take d ∼ C 3 log n for a constant C 3 . If log q < cn/ log 2 n for a sufficiently small constant c so that c, C 3 satisfy C 1 C 2 On the other hand suppose E fails, i.e., suppose there is a word u of length at most c 1 n such that xu(y, z) ∈ C d . Let w ∈ F 3 be the word The length of w is at most κ(q d − 1)(1 + c 1 n/2) ≤ n C log q , and w(x, y, z) = (xu(y, z)) κ(q d −1) ∈ M.
Hence E c ⊆ M . This completes the proof.
This completes the proof of Theorem 1.1. If we are allowed q C random generators, we can reach the set M using shorter words.
Theorem 7.4. There are constants c, C > 0 so that the following holds. Let G = Cl n (q), where n > C. Let x 0 , x 1 , . . . , x k be elements of G chosen uniformly at random, where k > q C . Let M be the event that there exists a word w ∈ F k+1 of length at most q 2 n C such that w(x 0 , . . . , x k ) ∈ M. Then Proof. Follow the proof of the previous theorem, replacing u ∈ F 2 with u ∈ F k . Since log k > C log q, we can replace (8) with the bound ≤ exp(C 1 d 2 log q + C 1 d −1 n log n − c 2 n log q), provided q d > C 2 n. Take d = max(⌈C 3 log n/ log q⌉ , 2) for sufficiently large C 3 . As long as n > C we find P(E) ≤ q −cn . Note that q d ≤ q 2 n C in this case. The rest of the argument is the same.

Closed trajectories with only one coincidence
A trajectory is closed if v ℓ = v 0 . In Section 9 we will need to understand the structure of closed trajectories with only one coincidence. More generally the joint trajectory of an r-tuple (v 1 , . . . , v r ) is called closed if each individual trajectory is closed, and we will need to understand the structure of closed joint trajectories with only one coincidence in each individual trajectory. We begin with the singletrajectory case, for motivation.
Lemma 8.1. Assume w is nontrivial and cyclically reduced. Suppose the trajectory v 0 , . . . , v ℓ is closed, and suppose there is only one coincidence, at step t say. Then In particular if w is not a proper power then t = ℓ.
Proof. Let w = · · · w 1 w ℓ · · · w 1 be the left-infinite ℓ-periodic extension of w. Since v ℓ = v 0 , the trajectory of v under w (defined in the obvious way) is just the ℓ-periodic extension of v 0 , . . . , v ℓ , and still there is only one coincidence, at step t. The choices at steps 1, . . . , t are free and all subsequent choices are forced (as in the proof of Lemma 3.5). We claim that w is in fact gcd(t, ℓ)-periodic, and it suffices to prove that it is t-periodic.
Since the choices at steps 1, . . . , t − 1 are free and not coincidences, the choice at step t is a coincidence, and all subsequent choices are forced, the vectors v 0 , . . . , v t−1 are linearly independent and the whole trajectory is contained in their span. In particular (9) v t = a 0 v 0 + · · · + a t−1 v t−1 (a 0 , . . . , a t−1 ∈ F q ).
Given that step t + 1 is forced, we must have v i ∈ D t+1 wt+1 for each i such that a i = 0. Thus either w t+1 = w i+1 or w t+1 = w −1 and v ℓ = v 0 is forced. Since w ℓ = w −1 1 , we must have w ℓ = w t and a 0 = 0 (see Remark 8.2 for more details). Therefore Consider now the trajectory of v 1 under w ′ = ww −1 1 = · · · w 3 w 2 . The trajectory is just v 1 , v 2 , . . . , v ℓ , v 0 , v 1 , . . . . By (9) and a 0 = 0, v 1 , . . . , v t are linearly independent, and, for every letter ξ, Therefore the trajectory of v 1 also has just one coincidence, again at step t (when v t+1 is chosen). Therefore by the same argument we must have w ′ t+1 = w ′ 1 , or Repeating this argument as many times as necessary proves that w is t-periodic, as claimed.
Remark 8.2. If t = ℓ, we must have a 0 = 1 and all other a i = 0. The general case t < ℓ is more complicated, but we can still describe the possibilities. From (9), because step t + 1 is forced we must have the signs depending on whether w t+1 = w i+1 or w t+1 = w −1 i (a i = 0). At the next step, and so on. We make a few observations: 1. The vectors v i±1 , etc, obey a no-crossing rule: we cannot have as then we would have both w s+1 = w i+1 and w s+1 = w −1 i+1 , for some i.

Similarly, there is a no-meeting rule: we cannot have
as then we would have both w s+1 = w i+1 and w s+1 = w −1 i+2 , but the expression for w is supposed to be reduced. 3. Finally, there is a time-consistency rule: we cannot have as then we would have w s+1 = w i+1 and w s+2 = w −1 i+1 , but again the expression for w is supposed to be reduced; nor could we have as then we would have w s+1 = w −1 i and w s+2 = w i .
Since a 0 = 0 and w t+1 v 0 = v 1 , the only resolution is that for all s ≥ 0 (extending ℓ-periodically). In other words, the sequence (v s ) in span{v 0 , . . . , v t−1 } corresponds with the sequence (X s ) in Conversely, if f is a divisor of X ℓ − 1, and if the period of w divides t and i − i ′ whenever a i = 0 and a i ′ = 0, then a one-coincidence trajectory of this type exists.
We now consider closed joint trajectories with only one coincidence in each individual trajectory. The following lemma generalizes Lemma 8.1. Lemma 8.3. Assume w is nontrivial and cyclically reduced. Let v 1 , . . . , v r ∈ V be linearly independent. Suppose the joint trajectory of v 1 , . . . , v r is closed. Suppose there is just one coincidence in each individual trajectory, and suppose the coincidence in the trajectory of v i occurs at step (t i , i). Then where d = gcd(t 1 , . . . , t r , ℓ).
In particular if w is not a proper power then t i = ℓ for each i.
Proof. As in the proof of Lemma 8.1, let w be the left-infinite ℓ-periodic extension of w, and note that the trajectory of v 1 , . . . , v r under w is just the ℓ-periodic extension of the trajectory under w, and there are no further free choices. The choice at step (t, i) must be free for t ≤ t i and forced for t > t i . Therefore the vectors (v t i ) 1≤i≤r,0≤t<ti are linearly independent and the whole trajectory is contained in their span. Since there is a coincidence at step (t i , i), we have where a itj = 0 whenever t ≥ t j (and (t, j) ≺ (t i , i) means t < t i or t = t i and j < i, as in Subsection 3.3). Let A 0 be the r × r matrix The matrix A 0 must be nonsingular, for otherwise we could not have (v ℓ 1 , . . . , v ℓ r ) = (v 0 1 , . . . , v 0 r ). In particular, for each i there is some j such that a i0j = 0. Since step (t i + 1, i) is forced, the value of w ti+1 v 0 j must be known; hence w ti+1 = w 1 .
Repeating the argument as many times as necessary, we conclude that the period of w divides t i for each i.
Remark 8.4. The discussion in Remark 8.2 generalizes too. From (11) and forcedness, we have where the signs are chosen depending on whether w t+1 = w 1 or w t = w −1 1 . The latter case can arise only for t > 0, so no v 0 j can appear in this expression. Hence (12) is the analogue of (10) for the joint trajectory of (v 1 1 , . . . , v 1 r ). As before there are no-crossing, no-meeting, and time-consistency rules for the indices t such that a itj = 0 for some i, j, so in fact we can never have v t−1 j . We conclude that v ti+s i = (t,j)≺(ti,i) a itj v t+s j for all s ≥ 0, and hence the trajectory of (v s 1 , . . . , v s r ) corresponds with the trajectory of (Z s X 1 , . . . , and we must have This is possible if and only if det F divides Z ℓ − 1.

Expansion in low-degree representations
We turn now to the proof of Theorem 1.3. We again consider the action of G = Cl n (q) on linearly independent r-tuples of vectors, and we again consider trajectories under the action of a fixed word w ∈ F k , much as in Section 4. The difference is mainly one of parameter regime. In Section 4 we considered r-tuples with r as large as cn for constant c, and we were satisfied with somewhat crude bounds. In this section we consider r = O(1), and we seek sharper bounds. Our aim is to show that, in an orbit of G of size N , the probability that a trajectory under a given word closes is close to 1/N , with a small relative error; if we can do this it follows that there is a uniform spectral gap. We begin with the case of r = 1, which contains most of the key ideas. 9.1. The defining representation. Now let x 1 , . . . , x k ∈ G be chosen uniformly at random. Let w = w(x 1 , . . . , x k ). Let v ∈ V \ {0}. Let N = |Gv|. By Witt's lemma (Lemma 2.2), N is the number of u ∈ V \ {0} such that Q(u) = Q(v). Thus, by Lemma 2.1, N = q n /q 0 + O(q n/2 ). More generally, if U ≤ V is a subspace of dimension d then Lemma 9.1. Assume w is nontrivial and not a proper power. Assume ℓ < n/4. Then Proof. By Lemma 3.1 we may also assume that w is cyclically reduced, as replacing w by its cyclic reduction can only decrease its length. In this case Lemma 8.1 implies that the event that wv = v is contained in the union of the following two events: E 1 : the trajectory v 0 , . . . , v ℓ has exactly one coincidence, occuring at step ℓ, and v ℓ = v 0 , E 2 : the trajectory v 0 , . . . , v ℓ has at least two coincidences. Figure 1. The word w and one of its maximal subwords matching a prefix u. Each occurence of the letter w ℓ or w −1 ℓ is the end of one such subword. In the w ℓ = w −1 t+1 case we must have t+s < ℓ−s.
We can bound the probability of E 2 using Lemma 3.3. Suppose there is a free choice at step t ≤ ℓ. There are t previous vectors, so the probability of a coincidence, conditional on previous steps, is bounded by Similarly, the conditional probability of a coincidence at a later step t ′ is bounded by Summing over t < t ′ ≤ ℓ, we find, using ℓ < n/4, Hence we may focus on the event E 1 . In the linear case (Cl = SL), v ℓ is chosen uniformly at random outside a linear subspace of dimension at most ℓ − 1, so the probability of E 1 is bounded by This completes the proof in this case. In general, the situation is complicated by form conditions, as previous choices may significantly impact the probability that v ℓ = v 0 , even if there were no previous coincidences.
Let ξ = w ℓ . The choice of v ℓ is subject to one linear constraint for every occurence of ξ = w ℓ as w t or w −1 t+1 for some t < ℓ. Each such occurence is the end of a maximal subword matching a prefix u = w ℓ · · · w ℓ−s+1 of w, forward in the case ξ = w t and backward in the case ξ = w −1 t+1 (see Figure 1). Write s = s(t) and u = u(t). Define Note that, for t ∈ T 1 , we must have t + s < ℓ − s, because w ℓ · · · w t+1 is reduced. In the ξ = w t case it is possible that the subword overlaps (or is adjacent to) the matching prefix, and the division into T 2 and T 3 reflects this possibility. Figure 2. The case t = ℓ − s ∈ T 3 . In this case we must have t − s > 0, or else w = u 2 .
The choice of v ℓ at step ℓ is constrained by the linear conditions . We need to determine whether v 0 is in this affine subspace.
Obviously this is the case if and only if Write C t for this condition. For t ∈ T 1 ∪ T 2 , the truth or falsity of C t is determined at step ℓ − s, because ℓ − s > t + s in the t ∈ T 1 case and ℓ − s > t in the t ∈ T 2 case. The condition is not determined before step ℓ − s by maximality of u(t). For t ∈ T 3 , C t is settled at step t, because t ≥ ℓ − s. The condition is not settled before step t because w t = w ℓ = w −1 1 (since w is cyclically reduced). Note that we may have ℓ − s = t for t ∈ T 3 : this is the case in which the subword is adjacent to the prefix (see Figure 2). In this case the condition C t is However, we cannot have also t − s = 0, for then we would have w = u 2 . Hence, by linear independence of v 0 , . . . , v t−1 , still the condition C t is settled at step t and not before. Note, however, if G unitary then C t is linear only over F q0 (because the form f is only sesquilinear).
There is a case that may arise in which the various conditions C t ′ settled at a given step t are not independent. This is the case in which t ∈ T 3 and t = ℓ − s ′ for some t ′ ∈ T 2 , where s ′ = s(t ′ ), and t ′ − s ′ = 0 (see Figure 3). Let T 4 be the set of such steps t and let T ′ 3 = T 3 \ T 4 . If t ∈ T 4 then we have an overdetermined pair of conditions This system is consistent if and only if For t ∈ T 4 let us redefine C t to be this reduced condition. Certainly t − s < ℓ − s, and if t ′ = ℓ − s then wu ′ = u ′ w, so w is a proper power, contrary to hypothesis. Hence C t is settled at step ℓ − s ≤ t. Now consider any step t ∈ {1, . . . , ℓ − 1}, and consider all those conditions C t ′ which are settled at step t. These conditions are C t ′ for t ′ ∈ T 1 ∪ T 2 ∪ T 4 such that We claim that these affine conditions for v t are independent, and it suffices to demonstrate that the indices 3 are all distinct. Since s ′ = ℓ − t is a constant, the indices t ′ + s ′ are all distinct for t ′ ∈ T 1 , as are the indices t ′ − s ′ for t ′ ∈ T 2 ∪ T 4 . Moreover we cannot have t 1 + s 1 = t 2 − s 2 for t 1 ∈ T 1 and t 2 ∈ T 2 ∪ T 4 with ℓ − s 1 = ℓ − s 2 = t, because then we would have w t1+s1 = w −1 t2−s2+1 = w −1 t1+s1+1 , in contradiction with the reducedness of w. If t ′ − s ′ = 0 for some t ′ ∈ T 2 then t ∈ T 4 by definition, so t / ∈ T ′ 3 . Finally, if t ′ ∈ T 4 then we cannot have t ′ − s ′ = 0 unless w is a proper power, as discussed.
Hence, by linear independence of v 0 , . . . , v t−1 , the h (say) conditions C t ′ settled at step t consist of h independent affine linear conditions for v t , or, in the unitary case, if t = ℓ − s ∈ T 3 , 2h independent affine linear conditions over F q0 . Suppose v t is drawn from a subspace of codimension d (d is the number of previous occurences of w t or w −1 t ). Then, by Lemma 2.1 and Lemma 3.3, the probability that all these conditions are satisfied, conditional on the past trajectory v 0 , . . . , v t−1 , is (in the second line we used h < ℓ, d < t, and q 0 ≤ q).
Suppose H = |T 1 |+ |T 2 |+ |T ′ 3 |+ |T 4 | (i.e., let H + 1 be the number of appearances of w ℓ or w −1 ℓ in w). Taking the product of (13) over all t, the probability that C t ′ is satisfied for every The conditions C t are prequisite to the event v ℓ = v 0 . If all these conditions are satisfied, then at step ℓ the vector v ℓ is drawn from an affine subspace of codimension H which includes v 0 . Note also that Q(v ℓ−1 ) = Q(v 0 ). Hence, from Lemma 3.3, Hence the overall probability of E 1 is bounded by Thus in all cases the error is bounded as claimed.
Remark 9.2. In the linear case, the hypothesis that w is not a proper power is needed only to ensure that the event v ℓ = v 0 is contained in E 1 ∪ E 2 ; we do not need the hypothesis in order to bound P(E 1 ) or P(E 2 ). By contrast, at least in the orthogonal case, we do need this hypothesis in order to bound P(E 1 ) satisfactorily, so at least some of the complexity of the above proof is necessary. Suppose G = GO n (q) and w = u 2 for some word u of length ℓ/2. Then the choice of v ℓ is constrained by Hence v ℓ is always restricted to an affine hyperplane that includes v 0 , so the probability that wv = v will be at least approximately q/N , even conditionally on there being only one coincidence.
Remark 9.3. On the other hand, it is usually possible to cyclically rotate w so that much of the complexity in the previous proof disappears. For example, if w can be cyclically rotated so that it has no square prefix, then, after such a rotation, T 3 = ∅. Not every non-proper-power has this property, 6 but almost all words do.
Take ℓ ∼ n/5. If log k/ log q is sufficiently large then Hence, by Markov's inequality, so almost surely λ < q −c ′ /2 . 9.2. The action on r-tuples. We now generalize the argument of the previous subsection to r-tuples of vectors, where r is bounded. It will be convenient to use the following notation.
Lemma 9.4. Assume w is nontrivial and not a proper power. Assume ℓr 2 < n/4. Then Proof. Again we may assume w is cyclically reduced. In this case Lemma 8.3 implies that the event that wv = v is contained in the union of the following two events: E 1 : the joint trajectory (v t i ) has exactly one coincidence in each individual trajectory, each occuring at the final step t = ℓ, and v ℓ i = v 0 i for each i, E 2 : the joint trajectory (v t i ) has at least r + 1 coincidences. Again we can bound the probability of E 2 using Lemma 3.3. Suppose there is a free choice at step (t, i). There are at most tr + i − 1 ≤ ℓr previous vectors, so the conditional probability of a coincidence is bounded by q tr+i−1 q n−ℓr − q ℓr−1 − q n/2 = q tr+i−1+ℓr−n 1 + O(q ℓr−n/2 ) .
Hence we may focus on the event E 1 . In the linear case, for each i the vector v ℓ i is chosen uniformly at random outside a linear subspace of dimension at most ℓr, so the probability of E 1 is bounded by This completes the proof in this case. As in the previous subsection, the general situation is complicated by form conditions, but fortunately few changes are necessary in the r > 1 case. Let ξ = w ℓ . Assume there are H + 1 occurences of ξ or ξ −1 in w, and consider the H maximal subwords u ending with ξ or ξ −1 and matching a proper prefix of w, as in Figure 1. Define T 1 , T 2 , and T 3 = T ′ 3 ∪ T 4 as before. The choice of v ℓ at step ℓ is constrained by the linear conditions For t ∈ T 4 the condition C t is the reduced condition Conditional on linear independence of v t i for 1 ≤ i ≤ r and t < ℓ, it can be verified exactly as in the r = 1 case that the conditions settled at any given step t < ℓ are precisely C t ′ for t ′ ∈ T 1 ∪ T 2 ∪ T 4 and ℓ − s ′ = t, as well as C t if t ∈ T ′ 3 , and these conditions are linearly independent.
Suppose at step t < ℓ there are h conditions C t ′ to be settled. Assume first that we are not in the case t = ℓ − s ∈ T ′ 3 (the case in which the subword is adjacent to the prefix, as in Figure 2). Let d be the number of previous occurences of w t or w −1 t . Then, by Lemma 3.3, at step (t, i) the vector v t i is drawn from an affine subspace of codimension d ′ = dr + i − 1, less a subspace of dimension d ′ , subject to the quadratic condition Q(v t i ) = Q(v t−1 i ). Hence, using Lemma 2.1, the probability that ji-component of each C t ′ is satisfied for each j ∈ {1, . . . , r} is (using h < ℓ, d ′ ≤ (t − 1)r + i − 1, and q 0 ≤ q). Taking the product over all i, the probability that each C t ′ is satisfied after step t is (16) q −hr 2 1 + O(q ℓr+tr−n/2 ) .
The case t = ℓ − s ∈ T ′ 3 is slightly different. In this case the ji-component of C t is This condition is settled at step (t, k), where k = max(i, j). Hence 2k − 1 components of C t are settled at step (t, k). Therefore, in this case, (15) must be replaced with Taking the product over all i again gives (16). Taking the product of (16) over all t, the probability that C t ′ is satisfied for Finally, if all the conditions C t are satisfied, then for each i the vector v ℓ i is drawn from an affine subspace of codimension Hr + i − 1 which includes v 0 i , less a subspace of dimension Hr+i−1, subject to the quadratic condition Hence the conditional probability that v ℓ = v 0 is (q nr−Hr 2 −r(r−1)/2 /q r 0 ) −1 1 + O(q (H+1)r−n/2 ) . Hence the overall probability of E 1 is, multiplying the previous line by (17), (q nr−r(r−1)/2 /q r 0 ) −1 1 + O(q 2ℓr−n/2 ) . Comparing with (14), this is Thus in all cases the error is bounded as claimed.
We can now prove that the permutation action of uniformly random x 1 , . . . , x k ∈ G on an orbit Gv ⊆ V r has a uniform spectral gap. The argument is little different from that in the previous subsection. We may assume v 1 , . . . , v r are linearly independent, by reducing r if necessary. Suppose the adjacency operator A acting on C[Gv] has spectrum 1 = λ 1 ≥ · · · ≥ λ N . Let λ = max(λ 2 , −λ N ). For even ℓ, let w be the result of a simple random walk of length ℓ in F k . Then We bound P(w ∈ P) as before, while by Lemma 9.4 we have max w / ∈P,|w|≤ℓ provided ℓr 2 < n/4. Hence Eλ ℓ ≪ k −cℓ q rn + q 2ℓr−n/2 .
9.3. Other low-degree representations. The result of the final argument of the previous subsection can be expressed as follows.
. Let x 1 , . . . , x k ∈ G be uniform and independent, where k ≥ q Cr 3 and r < cn 1/4 . Let ρ = ρ(A, C[V r ] 0 ) be the spectral radius of A = A x1,...,x k acting on C[V r ] 0 . Then Proof. By Witt's lemma, there are O(q r 2 ) orbits of G on V r . Let Gv 1 , . . . , Gv s be a decomposition of V r into G-orbits, where s ≪ q r 2 . Then From the previous subsection (possibly with a smaller r, if the components of v i are not linearly independent), for each i we have Our main interest is the conjugation action of G on a conjugacy class C ⊆ SCl n (q) of elements of degree s = O(1), which is actually a quotient of an orbit of G on V s ⊕ (V * ) s , where V * is the dual space. It is possible to repeat the analysis of the previous subsection allowing also r factors of V * , but in fact this generalization follows formally, since C[V * ] ∼ = C[V ] (as both have character χ(g) = q dim ker(g−1) ), so Corollary 9.6 (the conjugation action on M is expanding). Let x 1 , . . . , x k ∈ G be independent and uniformly random, where k > q C and n > C. Let ρ = ρ(A, C[M] 0 ) be the spectral radius of A acting on C[M] 0 . Then is a map of permutation representations (where G acts by conjugation on M n (F q )), and hence induces a map of ] by complete reducibility. Hence the result follows from the previous theorem with r = 2s.

Diameter of the Cayley graph
We now collect results from the previous sections and bound the diameter of the Cayley graph of the subgroup of Cl n (q) generated by random elements.
10.1. GL n (p) and 3 random elements. In this subsection we prove Theorem 1.2. Recall that SL n (p) ≤ G ≤ GL n (p), where p is prime and log p < cn/ log 2 n, the elements x, y, z ∈ G are chosen uniformly at random, and S = {x ±1 , y ±1 , z ±1 }. We claim that with probability 1 − e −cn we have S ≥ SL n (p), and diam Cay( S , S) ≤ n O(log p) .
First we show that S ≥ SL n (p) with high probability. The argument is a slight modification of [EV20, Section 5]. 7 Let C 1 be the set of all irreducible g ∈ GL n (p) of order d(p n − 1)/(p − 1) for some d | (p − 1). Each such g is equivalent to the multiplication action of some x ∈ F p n of the same order, and det g = N (x). Therefore, for each α ∈ G ab ∼ = F × p , the GL n (p)-classes in C 1;α = C 1 ∩ αG ′ are in bijection with elements of F p n , up to Galois conjugacy, of order d(p n − 1)/(p − 1) and norm α, where d is the order of α. Note there are φ(d) elements α of order d. Moreover, each such g ∈ G has centralizer isomorphic to F × p n . Hence Here we used the standard estimate φ(m) ≫ m/ log log m. Let C 2 be the set of all g ∈ GL n (p) of order p n−1 − 1 splitting V as ℓ ⊕ W for some ℓ, W with dim ℓ = 1, dim W = n − 1. A similar calculation shows that for each α ∈ F × p in this case as well. (In fact, C 2 is uniform over det fibres.) Hence, by Corollaries 5.3 and 6.2 as in the proof of Theorem 7.3, with probability at least 1 − e −cn there are words w 1 , w 2 such that By a straightforward adaptation of [EV20, Lemma 5.2] (assuming n > 6, say), w 1 (x, y, z), w 2 (x, y, z) ≥ SL n (p).
Hence indeed S ≥ SL n (p). In particular, using Schreier generators, there is a symmetric set S ′ ⊆ S 2p ∩ SL n (p) such that S ′ = SL n (p).
Meanwhile, by Theorem 1.1, with probability 1 − e −cn there is another word w of length n O(log p) such that w(x, y, z) ∈ M.
As | S / SL n (p)| < p, we thus have diam Cay( S , S) ≪ p 2 n 12+C log p = n O(log p) .
This completes the proof.
10.2. Classical groups and q C random elements. In this subsection we prove Theorem 1.4. Recall that G = Cl n (q), where n > C, elements x 1 , . . . , x k ∈ G are chosen uniformly at random where k > q C , and S = {x ±1 1 , . . . , x ±1 k }. We claim that with probability 1 − q −cn we have S ≥ SCl n (p), and diam Cay( S , S) ≤ q 2 n C . By Theorem 7.4, with probability at least 1 − q −c1n there is a word w of length at most q 2 n C1 so that w(x 1 , . . . , x k ) ∈ M.
Let C be the conjugacy class of w(x 1 , . . . , x k ) in G. Note that C ⊆ SCl n (q). It follows from Corollary 9.6 that, with probability at least 1 − q −c2n , the conjugation action of G on C is expanding with spectral gap bounded away from zero. Hence (see, e.g., [Kow19, Proposition 3.1.5 and Proposition 3.3.6]) diam Sch(G, S, C) ≪ log |C|.
It follows that with probability at least 1 − q −c3n , every element of C is a word in S of length at most q 2 n C1 + O(log |C|) ≪ q 2 n C2 .
This completes the proof. Corollary 1.5(2) follows immediately for q < n O(1) , since log |G| ≍ n 2 log q. If q is larger then the claim follows from Alon-Roichman [AR94], which implies that the Cayley graph on Cn 2 log q random generators is almost surely an expander.
Appendix A. Analogous arguments for S n In this appendix we give analogous arguments for S n . The main reason to do so is to motivate and give context to some of the arguments in the main body, as the arguments in the context of S n are easier and somewhat more natural, involving only trajectories of points rather than vectors. A secondary reason is that a couple results are actually new, and of independent interest: 1. if w is a word of length o(n 1/2 ), then with high probability w has o(n) fixed points (Theorem A.4); 2. the Cayley graph with respect to three random generators almost surely has diameter O(n 2 log n).
A.1. Queries and trajectories. The following definitions only slightly generalize those in [BS87, FJR + 98]. Let G = S n and Ω = {1, . . . , n}. Let x 1 , . . . , x k ∈ G. Define a query to be a pair (ξ, v), where ξ ∈ {ξ ±1 1 , . . . , ξ ±1 k } and v ∈ Ω; the result of the query is ξv. After any finite sequence of queries (w 1 , v 1 ), (w 2 , v 2 ), . . . , (w t−1 , v t−1 ) the known domain of a letter ξ at time t is Suppose we make a further query (w t , v t ). If v t ∈ D t wt , then the result w t v is determined already by the values of w 1 v 1 , . . . , w t−1 v t−1 ; we call this a forced choice. Otherwise, we say the query is a free choice.
Let R be some subset of Ω fixed in advance. If a query (w t , v t ) is a free choice and yet then we say the result of the query is a coincidence. Again, the language is most interesting when x 1 , . . . , x k ∈ G are chosen randomly. The following lemma is trivial, and parallels Lemma 3.3.
Lemma A.1. Let x 1 , . . . , x k ∈ G be uniformly random and independent, and let be a sequence of queries. Assume that (w t , v t ) is a free choice. Then, conditionally on the values of In particular, the conditional probability that w t v is a coincidence is bounded by Let w ∈ F k , and let w = w ℓ · · · w 1 (w i ∈ {ξ ±1 1 , . . . , ξ ±1 k }) be the reduced expression. For each v ∈ Ω, the trajectory of v is the sequence of queries (w t , v t−1 ), where v 0 = v and for each t ≥ 1 the vector v t is the result of the query (w t , v t−1 ); in other words, the sequence v 0 , v 1 , . . . , v ℓ is defined by Note that if step t is free and not a coincidence then step t + 1 is also free, and hence if v ℓ ∈ R then there must be at least one coincidence in the trajectory (cf. Lemma 3.5).
More generally for any r ≥ 1 the joint trajectory of an r-tuple v 1 , . . . , v r ∈ Ω is simply the r-tuple of individual trajectories, with the queries (w t , v t−1 i ) ordered lexicographically by (t, i). Again write ≺ for this order, i.e., (t ′ , i ′ ) ≺ (t, i) if t ′ < t or t ′ = t and i ′ < i. Note that if step (t, i) is free and not a coincidence then i)}; hence step (t + 1, i) is also free. Hence if v ℓ i ∈ R then there must be at least one coincidence in the trajectory of v i . This observation is recorded as the following lemma (cf. Lemma 3.6).
Then there is at least one coincidence in the trajectory of v i (during the joint trajectory of v 1 , . . . , v r ).
A.2. The probability of small support. For g ∈ S n , define fix g = {v ∈ Ω : gv = v}.
In this section we show that if w is a short word then almost surely | fix w| is small. The following lemma is similar to the argument used in [Ebe17, Lemma 2.2]; the only difference is that the set R is fixed in advance.
Lemma A.3. Let G = S n . Let R ⊆ Ω be a subset of size r. Let w ∈ F k be a nontrivial word of length ℓ < n/r. Then Proof. Let R = {v 1 , . . . , v r } and consider the joint trajectory of v 1 , . . . , v r . By Lemma A.2, we can have wR = R only if there is at least one coincidence in each individual trajectory. We take a union bound over all possibilities for when the coincidences could occur. By Lemma A.1, the conditional probability that step (t, i) is a coincidence is bounded by ℓr n − ℓr ; indeed there are at most ℓr previous points (if t = ℓ, assuming v ℓ j ∈ R for j < i). There are ℓ r possibilities for when the first coincidences might occur. Hence the claimed bound holds.
Theorem A.4. There is a constant c > 0 such that the following holds for all f ≥ 0. Let G = S n , and let w ∈ F k be a nontrivial word of reduced length ℓ < cf 1/2 . Then P (| fix w| ≥ f ) ≤ exp −cf /ℓ 2 .
Proof. Let x 1 , . . . , x k be chosen independently and uniformly from G. Let F = | fix w|. By the lemma, for any subset R ⊆ Ω of size r (for r < n/ℓ) we have Therefore, by a union bound, Take r ∼ f /(4ℓ 2 ). The conclusion is for some constant c > 0.
Remark A.5. If ℓ < c log log n, a stronger bound is proved in [LS12, Section 2].
A.3. Expected values of characters. A notable difference between S n and Cl n (q) is that S n has several low-degree characters: for example, the irreducible component of the standard representation has degree n − 1. However, we can show that the expected value of |χ(w)|/χ(1) is smaller than χ(1) −c using the Larsen-Shalev character bound [LS08]. For most characters, χ(1) is exponentially large in n, so this bound is similar in strength to Theorem 5.2. In application, low-degree characters may have to be treated specially (as in the next section).
The first term is bounded by Theorem A.4. The second term is bounded by [LS08, Theorem 1.3].
The following corollary follows exactly as in Section 5.
A.4. Expansion in low-degree representations: a brief survey. Let G = S n , let x 1 , . . . , x k ∈ G be random, where k ≥ 2 and bounded, and consider the action of x 1 , . . . , x k on Ω = {1, . . . , n}. The resulting Schreier graph is one of the standard models for a random 2k-regular graph, and the spectral properties of this graph are well studied. The earliest results on the combinatorial expansion of boundeddegree random graphs essentially coincide with the dawn of expansion, beginning with Barzdin-Kolmogorov and Pinsker (see Gromov-Guth [GG12, Section 1.2] for some history), and such results are equivalent to lower bounds on the spectral gap by the discrete Cheeger inequality (due to Dodziuk and Alon-Milman): see Kowalski [Kow19, Section 4.1]. Such bounds are weak, however. The strongest results on the spectral gap of a random regular graph are based on the trace method, which is an adaptation of Wigner's proof of the semicircle law to the bounded-degree setting. These results begin with Broder and Shamir [BS87]. Let ρ be the spectral radius of A on C[Ω] 0 . Broder and Shamir proved that In particular, ρ is bounded away from 1 as long as k is large enough. On the other hand, there is a deterministic lower bound ρ ≥ (2k − 1) 1/2 /k + O(1/ log 2k n), usually attributed to Alon and Boppana. The conjecture, due to Alon, that almost surely ρ = (2k − 1) 1/2 /k + o k (1) remained open for some time, but was finally and famously settled by Friedman, using an ingenious elaboration of the trace method: see [Fri08] for the proof, and for much more background. (See also Bordenave [Bor19] for a simplified proof.) The trace method also generalizes well, unlike the pure "counting" proof of expansion. Consider the action of A on C[ Ω r ] for bounded r. This action was studied by Friedman-Joux-Roichman-Stern-Tillich [FJR + 98], who showed that there is almost surely a uniform spectral gap. Their method is an elaboration of the Broder-Shamir method, and was direct inspiration for the argument of Sections 8 and 9. We quote their result here, which will be used in the next section: Theorem A.8. Let G = S n , and x 1 , . . . , x k ∈ G random. Let ρ = ρ(A, C[ Ω r ] 0 ) be the spectral radius of A = A x1,...,x k acting on C[ Ω r ] 0 . Then, for fixed k, r, and ǫ > 0, P ρ > (1 + ǫ)( √ 2k − 1/k) 1/(r+1) = o(1).
We show in this section that if k ≥ 3 then with high probability diam Cay( S , S) ≪ n 2 log n.
While this is only a modest improvement, it is interesting for being conjecturally sharp for any proof which uses elements of small support as a stepping stone: it seems unlikely that an element of small support can be obtained in fewer than O(n log n) steps on average, and a generic element of A n cannot be written as a product of fewer than O(n) elements of small support.
The argument is most closely related to the argument of Schlage-Puchta [SP12], which shows that for k = 2 the diameter is bounded by O(n 3 log n). We get a saving for k ≥ 3 by replacing the xy i trick with the more powerful xw(y, z) trick.
8 The authors state only n 2 (log n) c , but a careful inspection of the proof gives n 2 (log n) 2 ω(1), for an arbitrarily slowly growing ω(1). A word v of length ω(1) is obtained such that v(x, y) O(n) has support less than n/4. A random commutator process is then used to iteratively reduce the support. Each step quadruples the length of the word and roughly squares the density of the support, so the whole process multiplies the length of the word by O((log n) 2 ). Thus a word w of length n(log n) 2 ω(1) is obtained such that w(x, y) has support 3. Let x, y, z ∈ G be random. Then by Theorem 6.1 with f = 1 C and Corollary A.7, if E is the event that every word u ∈ F 2 of length at most ℓ < cn 1/4 satisfies xu(y, z) / ∈ C and w is the result of a simple random walk of length 2ℓ in F 2 , P x,y,z (E) ≪ n 2 1 =χ∈Irr G | 1 C , χ | 2 E y,z,w χ(w) χ(1) ≤ n 2 1 =χ∈Irr G | 1 C , χ | 2 exp(−cn 1/2 /ℓ 2 ) + χ(1) −1/4+o(1) + 2 −cℓ .
We conclude that with high probability there is a word w ∈ F 3 of length O(log n) such that w(x, y, z) ∈ C. Hence there is a word w ′ = w 2rn ′ of length O(n log n) such that w ′ (x, y, z) is a 3-cycle. With high probability the conjugation action of x, y, z on the set of 3-cycles has a uniform spectral gap (by Theorem A.8), so it follows that every 3-cycle is a word in x, y, z of length O(n log n). Thus every element of A n is a word in x, y, z of length O(n 2 log n).
A.5.2. Alternative 2. The crude bound n −1/5 for the probability can be improved as follows. Write n − 101 = n ′ + r where 101 ∤ n ′ and r ∈ {99, 100}. Let C ⊆ S n be the normal subset of all elements having both a 101-cycle and an n ′ -cycle (the remaining part is an arbitrary element of S r ). Assuming n ′ > 101, |C| n! = 1 101n ′ ≍ 1/n, and as before we have 1 C , sgn = 0. In fact, 1 C , χ = 0 for all low-degree χ.
Proof. It is well-known that characters of S n are parameterized by partitions λ ⊢ n. Let χ = χ λ be a character such that 1 C , χ = 0. By the Murnaghan-Nakayama rule, it must be the case that λ can be obtained by starting from (r) and adding a 101-rim-hook and an n ′ -rim-hook. Hence if χ is nontrivial and n is sufficiently large then λ 1 ≤ n − 100 and λ ′ 1 ≤ n − 98. From the hook length formula it follows that, for sufficiently large n, χ(1) ≥ χ (99,1 n−99 ) (1) = n! n 98! (n − 99)! ≍ n 98 .
It follows as before that, with probability at least there is a word w ∈ F 3 of length O(log n) such that w(x, y, z) ∈ C. Hence there is a word w ′ = w r!n ′ of length O(n log n) such that w ′ (x, y, z) is a 101-cycle. By Theorem A.8 (and inspecting the proof), the conjugation action of x, y, z on the set of 101-cycles has spectral gap at least δ with probability at least 1 − O(n −1+O(δ)+o(1) ).
Taking δ = 1/ log n (say), it follows that every 101-cycle is a word in x, y, z of length O(n log n), and hence the diameter of Cay( S , S) is O(n 2 log n), with probability 1 − n −1+o(1) .