1 Introduction

If G is a finite group, we define the class-k nilpotency degree by

$$\begin{aligned} d_k(G) = |\{(x_1, \dots , x_{k+1}) \in G^{k+1} : [x_1, \dots , x_{k+1}] = 1\}| / |G^{k+1}|. \end{aligned}$$

In probablistic notation,

$$\begin{aligned} d_k(G) = \textbf{P}_{x_1, \dots , x_{k+1} \in G}([x_1, \dots , x_{k+1}] = 1). \end{aligned}$$

Then G is nilpotent of class k if and only if \(d_k(G) = 1\), so class-k nilpotency degree is a statistical relaxation of class-k nilpotency. We may call G probabilistically nilpotent of class k if \(d_k(G)\) is bounded away from zero (so G has statistically significant class-k nilpotency).

For example, \(d_1(G)\) is the probability that two random elements of G commute, sometimes called the commuting probability of G. It is well-known that \(d_1(G) \le 5/8\) for any nonabelian group G. Another important result, less well known, is a theorem of Peter Neumann which states that if \(d_1(G)\) is bounded away from zero then G has a subgroup H such that [G : H] and \(|H'|\) are both bounded. Thus a finite probabilistically abelian group is bounded-by-abelian-by-bounded.Footnote 1

It is natural to ask for an analogous qualitative description of probabilistically nilpotent groups of class k. Essentially this question has been asked by several authors, including Shalev [16], Martino et al. [12, Question 1.10], and Green [personal communication]. If G has boundedly many generators then it was shown by Shalev [16, Theorem 1.1] that G is class-k-by-bounded. If not, the structure of G is much less transparent, and certainly any structure theorem must at least include the class of bounded-by-class-k-by-bounded groups.

In this paper we consider finite groups G with \(d_2(G)\) bounded away from zero. We show that G need not be bounded-by-class-2-by-bounded (as might be hoped), nor even class-3-by-bounded, but we will show that G must be bounded-by-class-3-by-bounded.

Theorem 1.1

Let G be a finite group such that \(d_2(G) \ge \epsilon > 0\). Then G has a subgroup H of nilpotency class at most 4 such that [G : H] and \(|\gamma _4(H)|\) are both \(\epsilon \)-bounded.

Further details about the structure of G can be extracted from the proof of the theorem. It is noteworthy that if T is the trilinear map \((H / H')^3 \rightarrow \gamma _3(H) / \gamma _4(H)\) induced by the triple commutator [xyz], then there is an expression for the map T of the form

$$\begin{aligned} T(x, y, z) = A(a(x, y), z) + B(b(x, z), y) + C(c(y, z), x), \end{aligned}$$

where ABCabc are bilinear maps and abc have \(\epsilon \)-bounded codomains. Conversely, if G has this structure then \(d_2(G)\) is bounded away from zero by a function of [G : H], \(|\gamma _4(H)|\), and the size of the codomains of abc.

Much of the proof of Theorem 1.1 consists of a study of groups satisfying a certain commutator covering condition. If X and Y are subsets of a group G, let \({\text {Comm}}(X, Y)\) denote the set of commutators \(\{[x, y] : x \in X, y \in Y\}\). Let \(n \ge 1\), let \(B = \{x \in G: |x^G| \le n\}\), and let \(S \subseteq G'\). Then we say G satisfies the commutator covering condition (with given n and S) if

$$\begin{aligned} {\text {Comm}}(G, G) \subseteq B S. \end{aligned}$$
(1.1)

We will prove that a finite group with \(d_2(G)\) bounded away from zero has a bounded-index subgroup satisfying the covering condition with bounded n and |S|, and conversely the covering condition implies \(d_2(G)\) is bounded away from zero by a function of n and |S|, so for finite groups the condition that \(d_2(G)\) is bounded away from zero and the covering condition are loosely equivalent. On the other hand the covering condition makes sense for infinite as well as finite groups, and is satisfied by groups with boundedly finite conjugacy classes of commutators (the case \(S = \{1\}\)), which were studied in [6]. We will prove the following structure theorem for groups (finite or infinite) satisfying the commutator covering condition.

Theorem 1.2

Let G be a group satisfying the commutator covering condition (1.1). Then G has a subgroup H of nilpotency class at most 4 such that [G : H] and \(|\gamma _4(H)|\) are both finite and (n, |S|)-bounded.

We do not know whether the techniques used in this paper can be adapted to handle groups G with \(d_k(G)\) bounded away from zero for \(k \ge 3\) (that is, probabilistically nilpotent groups of class k), or even more specially groups G in which all weight-k commutators \([x_1, \dots , x_k]\) have boundedly many conjugates. It may be that any such group has a subgroup H for which [G : H] and \(|\gamma _{k+2}(H)|\) are bounded. However, where \(X_k = \{[x_1, \dots , x_k] : x_1, \dots , x_k \in G\}\) is the set of weight-k commutators, it is known that

  • if \(|x^G| \le n\) for all \(x \in X_k\) then \(|\gamma _k(G)'|\) is finite and (kn)-bounded [4];

  • if \(|g^{X_k}| \le n\) for all \(g \in G\) then \(|\gamma _{k+1}(G)|\) is finite and (kn)-bounded [3].

2 Example

In this section, we construct a family of finite groups G for which \(d_2(G)\) is bounded away from zero but such that the largest class-3 subgroup of G does not have bounded index. A slight variant also gives an infinite group G in which commutators have boundedly many conjugates but such that G is not virtually class-3. In both cases G is nilpotent of class 4 and bounded-by-class-3.

Let \(K = \textbf{F}_p\) where p is a small prime (say 2, 3, or 5) and let V be a vector space over K. Let \(f : V \times V \rightarrow K\) be a (generic) bilinear map. We define a graded unital K-algebra

$$\begin{aligned} R = R_0 \oplus R_1 \oplus R_2 \oplus R_3 \oplus R_4 \end{aligned}$$

with

$$\begin{aligned} R_0&= K,\\ R_1&= V,\\ R_2&= V \otimes V,\\ R_3&= V,\\ R_4&= K, \end{aligned}$$

and \(R_k = 0\) for \(k > 4\). We must define the multiplication maps \(R_i \times R_j \rightarrow R_{i+j}\) for all \(i, j \ge 0\). The unit 1 of \(R_0 = K\) is defined to be a unit of R. Multiplication \(R_1 \times R_1 \rightarrow R_2\) is defined to be the universal map \(V \times V \rightarrow V \otimes V\). Let \(\theta : R_1 \rightarrow R_3\) be an isomorphism. We define multiplication between \(R_1\) and \(R_2\) and between \(R_1\) and \(R_3\) by the rules

$$\begin{aligned} xyz&= f(y, z) x^\theta + f(x, y) z^\theta{} & {} (x, y, z \in R_1) \\ x^\theta y&= x y^\theta = f(x, y){} & {} (x, y \in R_1). \end{aligned}$$

Multiplication \(R_2 \times R_2 \rightarrow R_4\) is then determined by \((xy)(zw) = x(yzw)\).

Associativity holds on \(R_1 \times R_1 \times R_1\) by definition, so it suffices to check associativity on \(R_2 \times R_1 \times R_1\), \(R_1 \times R_2 \times R_1\), and \(R_1 \times R_1 \times R_2\). Since \((xy)(zw) = x(yzw)\) for all \(x, y, z, w \in R_1\), it suffices to verify that \((xyz)w = x(yzw)\) for \(x, y, z, w \in R_1\), and this is a straightforward check:

$$\begin{aligned} (xyz)w = f(y, z) (x^\theta w) + f(x, y) (z^\theta w) = f(y, z) f(x, w) + f(x, y) f(z, w), \\ x(yzw) = f(z, w) (x y^\theta ) + f(y, z) (x w^\theta ) = f(z, w) f(x, y) + f(y, z) f(x, w). \end{aligned}$$

Let \(f^S(x, y) = f(x, y) + f(y, x)\) and \(f^A(x, y) = f(x, y) - f(y, x)\). Write \([x, y]_L = xy - yx\) for the Lie bracket (\(x, y \in R\)). Then we compute (for \(x, y, z, w \in R_1\))

$$\begin{aligned}{}[x, y]_L z&= f(y, z) x^\theta - f(x, z) y^\theta + f^A(x, y) z^\theta \\ z [x, y]_L&= f^A(x, y) z^\theta + f(z, x) y^\theta - f(z, y) x^\theta \\ [x, y, z]_L&= f^S(y, z) x^\theta - f^S(x, z) y^\theta \\ [x, y, z, w]_L&= f^A(x, w) f^S(y, z) - f^A(y, w) f^S(x, z). \end{aligned}$$

Assume \(f^S\) and \(f^A\) are both nondegenerate. Let \(H \le R_1\) be a subspace of finite codimension d. If \(2d < \dim V\) (or V is infinite-dimensional), we can find \(x, w \in H\) such that \(f^A(x, w) \ne 0\). Let \(H_1 = H \cap \ker f^S(x, \cdot )\). Then \(H_1\) has codimension at most \(d+1\), and if \(d + (d+1) < \dim V\) then we can find \(y \in H\) and \(z \in H_1\) such that \(f^S(y, z) \ne 0\). Since \(f^S(x, z) = 0\), it follows that

$$\begin{aligned}{}[x, y, z, w]_L = f^A(x, w) f^S(y, z) \ne 0. \end{aligned}$$

Hence if \([x, y, z, w]_L = 0\) identically on H then H has codimension at least \((\dim V - 1) / 2\).

Let \(L_i = \bigoplus _{k \ge i} R_k\) and let \(G = 1 + L_1\). Then G is a nilpotent group of class 4. The group commutator and Lie bracket are related by

$$\begin{aligned}{}[1 + x, 1 + y] = 1 + (1+x)^{-1} (1 + y)^{-1} [x, y]_L. \end{aligned}$$

It follows that

$$\begin{aligned}{}[1 + x, 1 + y, 1 + z] = 1 + [x, y, z]_L \pmod {L_4}. \end{aligned}$$

For any \(x, y \in L_1\) the formula for \([x, y, z]_L\) shows that the set of \(z \in L_1\) for which \([x, y, z]_L = 0\) is a subspace of codimension at most 2. Hence the group of \(1 + z \in G\) for which \([1+x, 1+y, 1+z] \in 1 + L_4\) has index at most \(p^2\). Since \(|1+L_4| = |K| = p\), it follows that \([1+x, 1+y]\) has at most \(p^3\) conjugates. Hence every commutator has boundedly many conjugates. On the other hand, the quadruple commutator is

$$\begin{aligned}{}[1+x, 1+y, 1+z, 1+w] = 1 + [x, y, z, w]_L. \end{aligned}$$

By our earlier remarks about \([x, y, z, w]_L\), if \(H \le G\) has class 3 then [G : H] is at least \(p^{(\dim V - 1)/2}\).

To give a concrete example let V be the \(\textbf{F}_2\)-vector space with basis \((e_i, e'_i : i \in I)\) (for some index set I) and let

$$\begin{aligned}&f(e_i, e_j) = f(e'_i, e_j) = f(e'_i, e'_j) = 0\\&f(e_i, e'_j) = \delta _{ij}. \end{aligned}$$

Then \(f^S = f^A\) is the bilinear map with matrix

$$\begin{aligned} \begin{pmatrix} 0 &{} I \\ I &{} 0 \end{pmatrix}, \end{aligned}$$

which is clearly nondegenerate. By taking \(I = \{1, \dots , n\}\) we get a family of finite groups with \(d_2(G) \ge 1/2^3\) and no bounded-index class-3 subgroup. By taking \(I = \{1, 2, 3, \dots \}\) we get an infinite group G in which every commutator has at most \(2^3\) conjugates and yet G has no finite-index class-3 subgroup.

Remark 2.1

A similar but weaker example (class-3, but not bounded-by-class-2-by-bounded) appeared in [8, Section 2.7].

3 Outline

In this section, we give an outline of the proof and in the next we list some tools that we need. The proof of the main theorems then occupies the rest of the paper and consists of the following steps.

  1. 1.

    In Sect. 5, we abstract the method in the proof of Peter Neumann’s theorem (Theorem 4.2) and we obtain a general statement about groups with an invariant seminorm. Applied to the seminorm \(\Vert x\Vert = \log |x^G|\), the result is that if \(d_2(G)\) is bounded away from zero then G has a subgroup of bounded index satisfying the covering condition (1.1) with n and |S| bounded in terms of \(d_2(G)\). Thus Theorem 1.1 reduces to Theorem 1.2.

  2. 2.

    In Sect. 6, we show that a group satisfying the covering condition is virtually soluble of bounded derived length. In the special case \(S = \{1\}\) (groups in which commutators have boundedly many conjugates), it was shown in [6] that G is bounded-by-metabelian. For arbitrary S we use induction on |S| and adapt the method of [6].

  3. 3.

    In Sect. 7, we show that a soluble group of bounded derived length satisfying the covering condition has a nilpotent subgroup of bounded index and class. A key tool in this part of the proof is an asymmetric version of Neumann’s theorem about \(d_1(G)\). We also use Hall’s criterion for nilpotency, which enables us to reduce to the metabelian case. In the metabelian case we argue that all elements in an appropriate subgroup satisfy the Engel identity, and it follows that the subgroup is nilpotent of bounded class.

  4. 4.

    In Sect. 8, we complete the proof by showing that a bounded-class nilpotent group satisfying the covering condition is bounded-by-class-3-by-bounded. To do this, we use induction on nilpotency class to reduce to the class-4 case. In the class-4 case, we use the theory of multilinear bias (or analytic rank). The quadruple commutator [xyzw] is a uniformly biased 4-linear map, and a version of the Jacobi identity harshly restricts the form of this map, and we can conclude that G has a subgroup H of bounded index with \(|\gamma _4(H)|\) bounded, as desired.

4 Tools

4.1 The Neumanns’ theorems

Key tools and prototypes for our main theorems are a pair of results by a pair of Neumanns. The first is a result due to Bernhard Neumann describing the structure of a BFC group, a group in which \(|x^G| \le n\) for all \(x \in G\). The conclusion is that \(|G'|\) is finite, and conversely it is clear that if \(G'\) is finite then \(|x^G|\) is finite and \(|G'|\)-bounded for every \(x \in G\).

Theorem 4.1

(B. Neumann’s BFC theorem [13]; see also [15, 14.5.11]) Let G be a group in which \(|x^G| \le n\) for every \(x \in G\). Then \(|G'|\) is finite and n-bounded.

An explicit bound for \(|G'|\) was first established by Wiegold [17]. Subsequently several people worked on the bound for \(|G'|\). Guralnick and Maróti proved in [11, Theorem 1.9] that \(|G'| \le n^{(7 + \log _2 n)/2}\).

The second is the theorem of Peter Neumann about \(d_1\) already mentioned in the introduction.

Theorem 4.2

(Neumann [14]) Let G be a finite group such that \(d_1(G) \ge \epsilon > 0\). Then G has a normal subgroup H such that [G : H] and \(|H'|\) are \(\epsilon \)-bounded.

This theorem bears roughly the same relation to Theorem 4.1 as Theorem 1.1 bears to Theorem 1.2.

Actually we will use an asymmetric version of Peter Neumann’s theorem that considers not just \(d_1(G)\) but the commuting probability between two normal subgroups \(A, B \trianglelefteq G\).

Theorem 4.3

(P. Neumann’s theorem, asymmetric version) Let G be a finite group with normal subgroups \(A, B \trianglelefteq G\) such that \(\textbf{P}_{a \in A, b \in B}([a, b] = 1) \ge 1/C\). Then there are C-bounded-index subgroups \(H \le A\) and \(K \le B\), both normal in G, such that |[HK]| is C-bounded.

To our knowledge this result does not appear in the literature, though it is similar to (and easier than) [5, Proposition 1.2]. It can be established by adapting the proof in the symmetric case appropriately. This result is also a corollary of a more general result we will prove, so a proof will be given in Sect. 5.

4.2 Tools for reducing to the finite case

In Theorem 1.2, we allow G to be infinite, mostly because we can reduce to the finite case whenever G is locally residually finite.

Say that G is m-by-class-s-by-n if G has a subgroup H such that \([G : H] \le n\) and \(|\gamma _s(H)| \le m\).

Lemma 4.4

Let G be a group.

  1. (1)

    G is m-by-class-s-by-n if and only if every finitely generated subgroup of G is so.

  2. (2)

    If G is residually finite, G is m-by-class-s-by-n if and only if every finite quotient of G is so.

Proof

(1) Suppose \([G:H] \le n\) and \(|\gamma _s(H)| \le m\). Let \(\Gamma \le G\) be a subgroup. Then clearly \([\Gamma : H \cap \Gamma ] \le n\) and \(|\gamma _s(H \cap \Gamma )| \le m\), so the forward implication is clear. Now suppose every finitely generated subgroup \(\Gamma \le G\) is m-by-class-s-by-n. Let I be the set of finitely generated subgroups of G. Since any d-generated group has at most \(n!^d\) subgroups of index at most n, the product space

$$\begin{aligned} X = \prod _{\Gamma \in I} \{H \le \Gamma : [\Gamma :H] \le n, |\gamma _s(H)| \le m\} \end{aligned}$$

is compact. For each \(\Gamma \in I\) let \(C_\Gamma \) be the set of all vectors \((H_\Delta : \Delta \in I) \in X\) such that \(H_\Delta = H_\Gamma \cap \Delta \) for \(\Delta \le \Gamma \). Then the sets \(C_\Gamma \) are closed, \(C_{\Gamma _1} \cap \cdots \cap C_{\Gamma _k} \supseteq C_\Delta \) where \(\Delta = \langle \Gamma _1, \dots , \Gamma _k\rangle \), and the hypothesis that every \(\Gamma \in I\) is m-by-class-s-by-n implies that \(C_\Gamma \ne \emptyset \). Hence the sets \(C_\Gamma \) have the finite intersection property, so by compactness \(\bigcap _{\Gamma \in I} C_\Gamma \ne \emptyset \). If \((H_\Gamma ) \in \bigcap _{\Gamma \in I} C_\Gamma \) then \(H_\Gamma = H \cap \Gamma \) for all \(\Gamma \), where \(H = \langle H_\Gamma : \Gamma \in I\rangle \). Now \([\Gamma : H \cap \Gamma ] \le n\) for every \(\Gamma \in I\), so \([G : H] \le n\), and \(|\gamma _s(H \cap \Gamma )| \le m\) for every \(\Gamma \in I\), so \(|\gamma _s(H)| \le m\). Hence G is m-by-class-s-by-n, as claimed.

(2) Suppose \([G:H] \le n\) and \(|\gamma _s(H)| \le m\). Let \({\overline{G}} = G / N\) be a quotient of G. Then \([{\overline{G}} : {\overline{H}}] = [G : HN] \le n\) and \(|\gamma _s({\overline{H}})| \le |\gamma _s(H)| \le m\), so the forward implication is clear. Now suppose every finite quotient \(Q = G / N\) is m-by-class-s-by-n. Let I be the set of finite quotients of G. A compactness argument as in the previous part establishes that there are subgroups \(H_Q \le Q\) for each \(Q \in I\) obeying \([Q : H_Q] \le n\) and \(|\gamma _s(H_Q)| \le m\) and which are consistent in the sense that if R is (naturally) a quotient of Q then \(H_R\) is the quotient of \(H_Q\). Since \([Q : H_Q] \le n\) for all Q there is some Q such that \([Q : H_Q]\) is maximal. Let H be the preimage of \(H_Q\) in G for this Q. Then \([G : H] = [Q : H_Q] \le n\) and \(H_{{\overline{G}}} = {\overline{H}}\) for every quotient \({\overline{G}} \in I\), so \(|\gamma _s(H)| = \max _{{\overline{G}} \in I} |\gamma _s({\overline{H}})| \le m\). Hence G is m-by-class-s-by-n, as claimed. \(\square \)

We will also need to know that the covering condition (1.1) behaves well with respect to subgroups and quotients.

Lemma 4.5

Suppose G satisfies the covering condition (1.1).

  1. (1)

    If \(H \le G\) then H satisfies the covering condition with \((n^2, S')\) in place of (nS) for some set \(S'\) of size at most |S|.

  2. (2)

    If \({\overline{G}}\) is a homomorphic image of G then \({\overline{G}}\) satisfies the covering condition with \((n, {\overline{S}})\) in place of (nS).

Proof

(1) Let \(S'\) be a set containing one point of \(Bs \cap H\) whenever \(Bs \cap H \ne \emptyset \). If \(x \in {\text {Comm}}(H, H) \subseteq {\text {Comm}}(G, G)\) then \(x \in Bs \cap H\), so \(x, s' \in Bs\) for some \(s' \in S'\). Write \(x = b_1s\) and \(s' = b_2s\). Then \(x = b_1b_2^{-1}s' \in B^2 s\), so \({\text {Comm}}(H, H) \subseteq B^2 S'\).

(2) Clear. \(\square \)

Somewhat related to this lemma, the beautiful result [12, Theorem 1.21] states that, for G finite,

$$\begin{aligned} d_k(G) \le d_k(N) d_k(G/N) \end{aligned}$$
(4.1)

whenever \(N \trianglelefteq G\). This shows that \(d_k\) behaves very well with respect to normal subgroups and quotients. However, we do not need a result of this form because we work mainly with the covering condition instead of \(d_2\).

4.3 Multilinear bias and a 4-linear rank-reduction lemma

We need the following structure theorem for biased multilinear maps. If \(A_1, \dots , A_k\) are groups and \(I \subseteq [k] = \{1, \dots , k\}\) we write \(A_I = \prod _{i \in I} A_i\) and we write \(x_I\) for the projection of \(x \in A_{[k]}\) to \(A_I\). We write \(I^c\) for the complement \([k] {\setminus } I\). If g is a (multilinear) function we write \({\text {cod}}(g)\) for its codomain.

Theorem 4.6

[9, Corollary 1.3] Suppose \(A_1, \dots , A_k, B\) are finite abelian groups and \(F : A_{[k]} \rightarrow B\) is a multilinear map such that \(\textbf{P}(F = 0) \ge \epsilon > 0\). Then there is an expression

$$\begin{aligned} F(x) = \sum _{\emptyset \ne I \subseteq [k]} G_I(g_I(x_I), x_{I^c}), \end{aligned}$$
(4.2)

where for each I the functions \(g_I\) and \(G_I\) are multilinear maps

$$\begin{aligned}&g_I : A_I \rightarrow {\text {cod}}(g_I), \\&G_I : {\text {cod}}(g_I) \times A_{I^c} \rightarrow B, \end{aligned}$$

and \(|{\text {cod}}(g_I)|\) is \((\epsilon , k)\)-bounded.

If we call

$$\begin{aligned} \prod _{\emptyset \ne I \subseteq [k]} |{\text {cod}}(g_I)| \end{aligned}$$

(or perhaps its logarithm) the rank of the expression (4.2), the conclusion of Theorem 4.6 is simply that there is an expression for F of \((\epsilon , k)\)-bounded rank. This terminology is reasonable given the connection to the theory of analytic rank, which is explained in [9].

We need to probe the uniqueness of bounded-rank expression such as (4.2). Certainly the expression need not be unique, strictly speaking, but suppose F has two bounded-rank expressions

$$\begin{aligned} F(x) = \sum _{I \in \mathcal {I}} G_I(g_I(x_I), x_{I^c}) = \sum _{J \in \mathcal {J}} G_J(g_J(x_J), x_{J^c}) \end{aligned}$$

for some sets \(\mathcal {I}\) and \(\mathcal {J}\) of nonempty subsets of [k]. If \(\mathcal {I}\ne \mathcal {J}\), then it is natural to ask whether there is a third expression, still of bounded rank, only involving terms which could have appeared in either sum. This may be true generally. We will establish it in the special case we need.

Lemma 4.7

Suppose \(F : A_1 \times A_2 \times A_3 \times A_4 \rightarrow B\) is a multilinear map of abelian groups with expressions

$$\begin{aligned} F(x) = \sum _{I \in \mathcal {I}} F_I(f_I(x_I), x_{I^c}) = \sum _{J \in \mathcal {J}} G_J(g_J(x_J), x_{J^c}), \end{aligned}$$

both of complexity at most C, where

$$\begin{aligned} \mathcal {I}&= \{\{1\}, \{2\}, \{3\}, \{4\}, \{1, 2\}, \{1, 3\}, \{2, 3\}, \{1, 2, 3\}\},\\ \mathcal {J}&= \{\{1\}, \{2\}, \{3\}, \{4\}, \{1, 2\}, \{1, 4\}, \{2, 4\}, \{3, 4\}, \\&\qquad \{1, 2, 4\}, \{1, 3, 4\}, \{2,3,4\}, \{1,2,3,4\}\}. \end{aligned}$$

Then there are C-bounded-index subgroups \(A_i' \le A_i\) such that the restriction \(F'\) of F to \({A_1' \times A_2' \times A_3' \times A_4'}\) has the form

$$\begin{aligned} F'(x, y, z, w) = F_1(f_1(x, y), z, w) + F_2(f_2(x, z), f_2'(y, w)) + F_3(f_3(x, w), f_3'(y, z)), \end{aligned}$$

where \(F_1, F_2, F_3, f_1, f_2, f_2', f_3, f_3'\) are multilinear and \(f_1, f_2, f_2', f_3, f_3'\) have C-bounded codomains.

We will continue to use large Roman letters \(F, G, H, \dots \) for arbitrary multilinear maps and little Roman letters \(f, g, h, \dots \) for multilinear maps with bounded codomain.

Proof

Throughout the proof, “bounded” means “C-bounded”. We will repeatedly use the observation that if we have boundedly many maps \(g_1, \dots , g_n\) of one variable and bounded codomain then we can pass to the kernel of all these maps to get rid of them. In particular we can immediately get rid of all terms of the forms

$$\begin{aligned} G(g(x), y, z, w), G(x, g(y), z, w), G(x, y, g(z), w), G(x, y, z, g(w)), \end{aligned}$$

so forget those. Hence we may assume the \(\mathcal {I}\)-expression takes the simpler form

$$\begin{aligned} F(x, y, z, w)&= F_1(f_1(x, y), z, w) + F_2(f_2(x, z), y, w) + F_3(f_3(y, z), x, w)\nonumber \\&\quad + F_4(f_4(x, y, z), w). \end{aligned}$$
(4.3)

First consider the term \(F_2(f_2(x, z), y, w)\). Fix \(u = f_2(x, z)\). By equating (4.3) with a \(\mathcal {J}\)-expression we find an identity of the form

$$\begin{aligned} F_2(u, y, w) = G_u(g_u(y, w)) + G_u'(g_u'(y), w) + G_u''(g_u''(w), y), \end{aligned}$$
(4.4)

where \(g_u, g_u', g_u''\) have bounded codomain. Since \(|{\text {cod}}(f_2)|\) is bounded, in fact we have (4.4) for every u in the group

$$\begin{aligned} U = \langle f_2(x, z) : x \in A_1, z \in A_3\rangle \le {\text {cod}}(f_2). \end{aligned}$$

Pass to \(\bigcap _{u \in U} \ker g_u' \cap \ker g_u''\) to reduce to

$$\begin{aligned} F_2(u, y, w) = G_u(g_u(y, w)). \end{aligned}$$

Now replace \(g_u\) with \(f_2'' = (g_u : u \in U)\), which has bounded codomain \(\prod _{u \in U} {\text {cod}}(g_u)\), and replace \(G_u\) with \(G_u \circ \pi _u\), where \(\pi _u\) is the projection; thus

$$\begin{aligned} F_2(u, y, w) = G_u(f_2''(y, w)). \end{aligned}$$

We may restrict the codomain of \(f_2''\) to \(\langle f_2''(y, w) : y \in A_2, w \in A_4\rangle \) and we may restrict the domain of \(G_u\) to \({\text {cod}}(f_2'')\). But now since \(F_2(u, y, w)\) is u-linear, \(G_u(v)\) must be u-linear for every \(v \in {\text {cod}}(g)\), so

$$\begin{aligned} F_2(u, y, w) = F_2'(u, f_2''(y, w)) \end{aligned}$$

for some multilinear function \(F_2'\). Hence

$$\begin{aligned} F_2(f_2(x, z), y, w) = F_2'(f_2(x, z), f_2''(y, w)), \end{aligned}$$

and by an analogous argument

$$\begin{aligned} F_3(f_3(y, z), x, w) = F_3'(f_3(y, z), f_3''(x, w)). \end{aligned}$$

Finally, consider \(F_4(f_4(x, y, z), w)\). By comparing the \(\mathcal {I}\)- and \(\mathcal {J}\)-expressions, we get an identity of the form

$$\begin{aligned} F_4(f_4(x, y, z), w) = H(h(x, y), z, w) + H'(x, y, z, w), \end{aligned}$$
(4.5)

where \(H'(x, y, z, w)\) has an expression involving terms only of the forms

$$\begin{aligned}&G(g(x, z), g'(y, w)), G(g(y, z), g'(x, w)), \\&G(g(x, w), y, z), G(g(y, w), x, z), G(g(z, w), x, y), \\&G(g(x, y, w), z), G(g(x, z, w), y), G(g(y, z, w), x), G(g(x, y, z, w)). \end{aligned}$$

Fix xy and let \(u = h(x, y)\). Then by rearranging (4.5) we get an identity of the form

$$\begin{aligned} H(u, z, w) = G_u(g_u(z, w)) + G'_u(g'_u(z), w) + G''_u(g'_u(w), z). \end{aligned}$$

By arguing exactly as before we establish that

$$\begin{aligned} H(h(x, y), z, w) = G(h(x, y), g(z, w)). \end{aligned}$$

Now fix xyz and let \(u = g(x, y, z)\). Then from (4.5) we have an expression of the form

$$\begin{aligned} F_4(u, w) = H_u(h_u(w)). \end{aligned}$$

Let \(V = \langle g(x, y, z) : (x, y, z) \in A_1 \times A_2 \times A_3 \rangle \). Passing to \(\bigcap _{u \in V} \ker h_u\), \(F_4(u, w) = 0\). Thus the \(\mathcal {I}\)-expression (4.3) is reduced to the desired form. \(\square \)

5 Neumann’s theorem for metric entropy

Let G be a group. An invariant seminorm on G is a function \(\Vert \cdot \Vert : G \rightarrow [0, \infty ]\) satisfying identically

  1. (1)

    (reflexivity) \(\Vert 1\Vert = 0\),

  2. (2)

    (symmetry) \(\Vert x^{-1}\Vert = \Vert x\Vert \),

  3. (3)

    (invariance) \(\Vert x^y\Vert = \Vert x\Vert \),

  4. (4)

    (triangle inequality) \(\Vert xy\Vert \le \Vert x\Vert + \Vert y\Vert \).Footnote 2

If \(\Vert \cdot \Vert \) is an invariant seminorm then \(d(x, y) = \Vert xy^{-1}\Vert \) defines a bi-invariant pseudometric on G. Conversely if d is a bi-invariant pseudometric then \(\Vert x\Vert = d(x, 1)\) is an invariant seminorm. This makes it reasonable to use metric language.

Example 5.1

The discrete norm is

$$\begin{aligned} \Vert x\Vert = {\left\{ \begin{array}{ll} 0 &{}: x = 1 \\ \infty &{}: x \ne 1 \end{array}\right. }. \end{aligned}$$

Example 5.2

The conjugacy class norm is

$$\begin{aligned} \Vert x\Vert = \log |x^G|. \end{aligned}$$

Theorem 5.3

Let G be a finite group with an invariant seminorm \(\Vert \cdot \Vert \). Let \(A, B \trianglelefteq G\) be normal subgroups and suppose \(\textbf{P}_{a \in A, b \in B}(\Vert [a, b]\Vert \le C) \ge 1/C\). Then there are subgroups \(H \le A\) and \(K \le B\), both normal in G, such that [A : H] and [B : K] are C-bounded and such that \({\text {Comm}}(H, K)\) is covered by C-boundedly many balls of C-bounded radius.

Conversely, if there are subgroups \(H \le A\) and \(K \le B\) such that \({[A:H]},{[B:K]} \le C\) and such that \({\text {Comm}}(H, K)\) is covered by at most C balls of radius C, then there is some C-bounded number D such that \(\textbf{P}_{a \in A, b \in B}(\Vert [a, b]\Vert \le D) \ge 1/D\).

We will use the following lemma from [7] multiple times.

Lemma 5.4

(See [7, Lemma 2.1]) Let G be a finite group and X a symmetric subset of G containing the identity. Then \(\langle X \rangle = X^r\) for \(r = 3\left\lfloor {|G|/|X|}\right\rfloor \).

Proof of Theorem 5.3

Let \(X = \{a \in A: \textbf{P}_{b \in B}(\Vert [a, b]\Vert \le C) \ge 1/2C\}\). Then

$$\begin{aligned} 1/C \le \textbf{P}_{a \in A, b \in B}(\Vert [a, b]\Vert \le C) \le \textbf{P}(a \in X) + 1/2C, \end{aligned}$$

so \(\textbf{P}(a \in X) \ge 1/2C\). Fix \(x \in X\). Let \(Y_x = \{b \in B : \Vert [x, b]\Vert \le C\}\). Then \(\textbf{P}_{b \in B}(b \in Y_x) \ge 1/2C\). It follows that \(K_x = \langle Y_x\rangle \) has C-bounded index in B and every \(k \in K_x\) is the product of C-boundedly many \(y \in Y_x\) by Lemma 5.4. Suppose \(k \in K_x\), so \(k = y_1 \ldots y_n\) for some C-bounded n and \(y_1, \dots , y_n \in Y_x\). Then

$$\begin{aligned}{}[x, k] = [x, y_1 \ldots y_n] = [x, y_n] [x, y_{n-1}]^{y_n} \ldots [x, y_1]^{y_2 \ldots y_n}, \end{aligned}$$

so \(\Vert [x, k]\Vert \le nC\). Hence \(\Vert x, k]\Vert \) is C-bounded for every \(x \in X\) and \(k \in K_x\).

Now let \(H = \langle X \rangle \). Since X is a normal subset of G, H is normal in G. Since \(\textbf{P}(a \in X) \ge 1/2C\), \([A : H] \le 2C\), and if \(h \in H\) then \(h = x_1 \ldots x_n\) for some \(x_1, \dots , x_n \in X\) and some C-bounded n by Lemma 5.4 again. Then the subgroup \(K_h = K_{x_1} \cap \cdots \cap K_{x_n}\) has C-bounded index and every \(k \in K_h\) is such that \(\Vert [h, k]\Vert \) is C-bounded.

Symmetrically, there is a subgroup \(K \le B\) of index at most 2C such that for every \(k \in K\) there is a subgroup \(H_k \le A\) of C-bounded index and such that \(\Vert [h, k]\Vert \) is C-bounded for every \(h \in H_k\).

Hence there is a C-bounded number D such that for every \(h \in H\) we have

$$\begin{aligned} \textbf{P}_{k \in K}(\Vert [h, k]\Vert \le D) \ge 1/D \end{aligned}$$

and for every \(k \in K\) we have

$$\begin{aligned} \textbf{P}_{h \in H}(\Vert [h, k]\Vert \le D) \ge 1/D. \end{aligned}$$

Let \((h, k) \in H \times K\). Then if \(y \in K\) and \(\Vert [h, y]\Vert \le D\) we have

$$\begin{aligned}{}[h, y k] = [h, k] [h, y]^k, \end{aligned}$$

so, letting \(k' = yk\),

$$\begin{aligned} \Vert [h, k'] [h, k]^{-1}\Vert \le D. \end{aligned}$$

Similarly if \(x \in H\) and \(\Vert [x, k']\Vert \le D\) we have

$$\begin{aligned}{}[x h, k'] = [x, k']^h [h, k'], \end{aligned}$$

so, letting \(h' = xh\),

$$\begin{aligned} \Vert [h', k'] [h, k]^{-1}\Vert \le \Vert [h', k'] [h, k']^{-1}\Vert + \Vert [h, k'] [h, k]^{-1}\Vert \le 2D. \end{aligned}$$

Hence

$$\begin{aligned} \textbf{P}_{h' \in H, k' \in K}(\Vert [h', k'] [h, k]^{-1}\Vert \le 2D) \ge 1/D^2. \end{aligned}$$

It follows that if S is any set of \((4D+1)\)-separated points of \({\text {Comm}}(H, K)\) we must have \(|S| \le D^2\). Hence there is a set S of size at most \(D^2\) such that any point of \({\text {Comm}}(H, K)\) is within distance \(4D+1\) of a point of S.

For the converse, write B for the ball of radius C and S for a set of size at most C such that

$$\begin{aligned} {\text {Comm}}(H, K) \subseteq BS. \end{aligned}$$

Let \(h \in H\). By the pigeonhole principle there is some \(s \in S\) such that

$$\begin{aligned} \textbf{P}_{k \in K}([h, k] \in Bs) \ge 1/|S| \ge 1/C. \end{aligned}$$

Fix \(k_0\) such that \([h,k_0] \in Bs\). Then whenever \([h, k] \in Bs\) we have

$$\begin{aligned}{}[h,kk_0^{-1}] = \left( [h, k_0]^{-1} [h, k] \right) ^{k_0^{-1}} \in B^2, \end{aligned}$$

so \(\Vert [h, kk_0^{-1}]\Vert \le 2C\). Thus

$$\begin{aligned} \textbf{P}_{k \in K}(\Vert [h, k]\Vert \le 2C) \ge 1/C. \end{aligned}$$

This holds for every \(h \in H\), so

$$\begin{aligned} \textbf{P}_{a \in A, b \in B}(\Vert [a, b]\Vert \le 2C) \ge \frac{1}{C [A:H] [B:K]} \ge 1/C^3. \end{aligned}$$

Letting \(D = 2C^3\), we get the claim. \(\square \)

By taking the discrete norm we recover the asymmetric version of Neumann’s theorem stated earlier as Theorem 4.3.

Corollary 5.5

(Neumann’s theorem, asymmetric version) Let G be a finite group with normal subgroups \(A, B \trianglelefteq G\) such that \(\textbf{P}_{a \in A, b \in B}([a, b] = 1) \ge 1/C\). Then there are C-bounded-index subgroups \(H \le A\) and \(K \le B\), both normal in G, such that |[HK]| is C-bounded.

Proof

By Theorem 5.3 applied to the discrete norm, there are subgroups \(H \le A\) and \(K \le B\) of C-bounded index such that \({\text {Comm}}(H, K)\) has C-bounded size. Now we may apply [5, Lemma 2.1] to conclude that |[HK]| is C-bounded, or we may argue directly as follows. Let \([h_1, k_1], \dots , [h_n, k_n]\) be an enumeration of the elements of \({\text {Comm}}(H, K)\). Let \(H_0 = \langle h_1, \dots , h_n, {\text {Comm}}(H, K)\rangle \), \(K_0 = \langle k_1, \dots , k_n, {\text {Comm}}(H, K) \rangle \). Then \(H_0, K_0 \trianglelefteq \langle H_0, K_0\rangle \) and \([H_0, K_0] = [H, K]\). Since both \(H_0\) and \(K_0\) have boundedly many generators and \({\text {Comm}}(H, K)\) is bounded, the centralizers \(C_{H_0}(K_0)\) and \(C_{K_0}(H_0)\) have bounded indices in \(H_0\) and \(K_0\), respectively. Now it follows from Baer’s asymmetric version of Schur’s theorem [15, 14.5.2] that \([H_0, K_0] = [H, K]\) has C-bounded size. \(\square \)

The main corollary we need in the rest of the paper is the conjugacy class norm case.

Corollary 5.6

Let G be a finite group such that \(d_2(G) \ge 1/C\). Then there is a subgroup \(H \le G\) of C-bounded index satisfying the covering condition (1.1) with n and |S| both C-bounded.

Proof

By Theorem 5.3 applied with \(A = B = G\) and the conjugacy class norm \(\Vert x\Vert = \log |x^G|\), there is a subgroup H of C-bounded index such that \({\text {Comm}}(H, H)\) is covered by C-boundedly many balls of C-bounded radius. Let S be a set containing one point from each ball. Then |S| is C-bounded and for every \(x, y \in H\) there is some \(s \in S\) such that \(\Vert [x, y] s^{-1}\Vert \) is C-bounded, so \([x, y] s^{-1}\) has C-boundedly many conjugates. \(\square \)

6 From covering to bounded derived length

Throughout this section, we assume the covering condition (1.1), which states

$$\begin{aligned} {\text {Comm}}(G, G) \subseteq B S, \end{aligned}$$

where \({\text {Comm}}(G, G)\) is the set of commutators of G, \(B = \{x \in G: |x^G| \le n\}\), and \(S \subseteq G'\) is finite. We will show that G is bounded-by-derived-length-4-by-bounded, with a bound depending only on |S| and n.

Proposition 6.1

Let G be a group satisfying the covering condition (1.1). Then G has a subgroup H such that [G : H] and \(|H^{(4)}|\) are both (|S|, n)-bounded. In particular if \(K = C_H(H^{(4)})\) then [G : K] is (|S|, n)-bounded and \(K^{(5)} = 1\).

Remark 6.2

  1. (1)

    If \(S = \{1\}\), the covering condition states that the commutators in G have boundedly many conjugates. The main result of [6] states in this case that G is bounded-by-metabelian. We closely follow some parts of [6] to prove the more general but weaker result stated above.

  2. (2)

    The proof actually shows that G is bounded-by-metabelian-by-class-2-by-bounded. We do not need this more detailed information right now, and we will eventually prove the stronger result Theorem 1.2 anyway.

  3. (3)

    Results of Bors, Larsen, and Shalev (see [2, Corollary 1.5] and references), relying on the classification of finite simple groups, imply that if G is finite and \(d_k(G) \ge \epsilon > 0\) then \([G : {\text {rad}}(G)]\) is \((k, \epsilon )\)-bounded, where \({\text {rad}}(G)\) is the soluble radical of G, and it follows from (4.1) that \({\text {rad}}(G)\) has \((k, \epsilon )\)-bounded derived length. By comparison, in Proposition 6.1, we do not depend on the classification, the group G is allowed to be infinite, and there is a stronger conclusion (though it will be superseded by Theorem 1.2).

Now assume the hypothesis of Proposition 6.1. We may assume \(1 \in S\) and we will use inducton on |S|. Suppose there is some commutator \(x \in {\text {Comm}}(G, G)\) such that \(n < |x^G| \le n^{100}\). By the covering condition \(x \in Bs\) for some nontrivial \(s \in S\). This implies that \(|s^G| \le n^{101}\). Then for any \(g \in Bs\) we have \(|g^G| \le n^{102}\). Hence we can remove s from S at the cost of increasing n to \(n^{102}\) and then apply induction on |S|. Hence we may assume B satisfies the following stability condition:

$$\begin{aligned} x \in {\text {Comm}}(G, G) \wedge |x^G| \le n^{100} \implies x \in B. \end{aligned}$$
(6.1)

Let \(H = \langle B \rangle \). Given an element \(g\in H\), we write l(g) for the minimal number l with the property that g can be written as a product of l elements of B.

Lemma 6.3

(See [6, Lemma 2.1]) Let \(K\le H\) be a subgroup of index m in H, and let \(b\in H\). Then the coset Kb contains an element g such that \(l(g)\le m-1\).

Lemma 6.4

For any \(x\in B\) the subgroup [Hx] has n-bounded order. Consequently, the order of \(\langle [H,x]^G\rangle \) is n-bounded.

Proof

Let \(m = [H : C_H(x)] = |x^H|\), so \(m \le n\). By Lemma 6.3 we can choose \(b_1, \dots , b_m\) such that \(x^H = \{x^{b_1}, \dots , x^{b_m}\}\) and \(l(b_i) \le m\) for each i, so \(|b_i^G| \le n^m\). Note then that [Hx] is generated by \([b_1, x], \dots , [b_m, x]\). Let \(T = \langle x, b_1, \dots , b_m\rangle \) and \(U = C_G(T)\). Then since \(Z(T) = T \cap C_G(T)\) we have

$$\begin{aligned}{}[T : Z(T)] \le [G : C_G(T)] \le n^{m^2+1}. \end{aligned}$$

By Schur’s theorem, \(T'\) is n-bounded. Since \([H, x] \le T'\), this gives the first statement.

For the second statement, recall that \([H, x] \trianglelefteq H \trianglelefteq G\). The G-conjugates of [Hx] are \([H, x^{b_1}], \dots , [H, x^{b_m}]\), so \(\langle [H, x]^G \rangle = \prod _{i=1}^m [H, x^{b_i}]\), so \(|\langle [H, x]^G\rangle | \le |[H, x]|^m\). \(\square \)

Now let \(X = B \cap {\text {Comm}}(G, G)\) and pick \(x \in X\) so that \(m = |x^H|\) is maximal. Define \(b_1, \dots , b_m, T, U\) as in the proof of Lemma 6.4.

Lemma 6.5

If \(u\in U\) and \(ux\in X\), then \([H,u]\le [H,x]\).

Proof

Since \(ux \in X\), \(|(ux)^H| \le m\). For each \(i=1,\dots ,m\) we have \((ux)^{b_i} = ux^{b_i}\). Since these elements are distinct it follows that

$$\begin{aligned} (ux)^H = \{ux^{b_i} : i = 1, \dots , m\}. \end{aligned}$$

Thus for any \(h \in H\) there is \(b \in \{b_1, \dots , b_m\}\) such that \((ux)^h = u^h x^h = ux^b\). It follows that \([u, h] = u^{-1} u^h = x^b x^{-h} \in [H, x]\), so the claim holds. \(\square \)

Let \(R=\langle [H, x]^G\rangle \). By Lemma 6.4 the order of R is bounded. By Lemma 6.5, \([H, u] \le R\) whenever \(u \in U\) and \(ux \in X\).

Let \(U_0\) be the normal core of U. Then \([G : U_0]\) is n-bounded. Let \(B_0 = B^2 \cap U_0\).

Now write \(x=[x_1 ,x_2]\) where \(x_1, x_2\in G\). Let \(t_1, t_2 \in B_0\) and consider

$$\begin{aligned} y = [x_1 t_1, x_2 t_2]. \end{aligned}$$

We may expand y has a product of four commutators:

$$\begin{aligned} y = [x_1, t_2]^{t_1} [x_1, x_2]^{t_2 t_1} [t_1, t_2] [t_1, x_2]^{t_2}. \end{aligned}$$

Since \([x_1, x_2] = x\) and \([x, U_0] = 1\), we may simplify this slightly to

$$\begin{aligned} y = x [x_1, t_2]^{t_1} [t_1, t_2] [t_1, x_2]^{t_2}. \end{aligned}$$

Each commutator involving a \(t_i\) is in \(B^4\), and \(x \in B\), so \(y \in B^{13}\). By (6.1), this implies \(y \in B\). Applying Lemma 6.5, we find that

$$\begin{aligned}{}[H, [x_1, t_2]^{t_1} [t_1, t_2] [t_1, x_2]^{t_2}] \le R, \end{aligned}$$

for all \(t_1, t_2, \in B_0\). Taking \(t_2 = 1\) we get \([H, [t_1, x_2]] \le R\). Taking \(t_1 = 1\) we get \([H, [x_1, t_2]] \le R\). Hence \([H, [t_1, t_2]] \le R\). Since this holds for all \(t_1, t_2 \in B_0\) it follows that

$$\begin{aligned}{}[H, \langle B_0 \rangle '] \le R. \end{aligned}$$

Also \([t_1, t_2] \in B\) by (6.1), so \(\langle B_0 \rangle ' \le H\). Hence \(\langle B_0 \rangle \) is metabelian mod R.

Next suppose \(y_1, y_2 \in {\text {Comm}}(U_0, U_0) \cap Bs\) for some \(s \in S\). Then \(y_1 y_2^{-1} \in B^2 \cap U_0 = B_0\). Hence there are at most |S| commutators in \(U_0/\langle B_0\rangle \), so \((U_0 / \langle B_0 \rangle )'\) has bounded order by Neumann’s BFC theorem (Theorem 4.1). We can pass to a bounded-index subgroup of \(U_0\) whose image in \(U_0 / \langle B_0\rangle \) is nilpotent of class at most 2. Hence G is bounded-by-metabelian-by-class-2-by-bounded. This proves Proposition 6.1.

7 From bounded derived length to bounded class

Let G be a soluble group satisfying the covering condition (1.1). In this section, we show that G has a nilpotent subgroup of bounded index and class, with bounds depending on |S|, n, and the derived length l of G.

Proposition 7.1

Let G be a group satisfying the covering condition (1.1). Assume G is soluble of derived length l. Then G has a nilpotent subgroup H such that [G : H] and the class of H are both (|S|, nl)-bounded.

The proof of this proposition fills the rest of this section. Throughout, we say “bounded” to mean “(|S|, nl)-bounded”.

7.1 Reduction to the finite metabelian case

Recall Hall’s criterion for nilpotency (see [15, 5.2.10]): if N is a normal subgroup of G such that N and \(G/N'\) are both nilpotent then G is nilpotent with class bounded in terms of the classes of N and \(G/N'\).

By induction on derived length, \(G'\) has a nilpotent subgroup A of bounded index and class. Replacing A with its normal core in \(G'\), we may assume A is normal in \(G'\). Then it follows from Fitting’s theorem that the characteristic closure of A has bounded class, so we may assume A is characteristic in \(G'\). If \(A' \ne 1\) then by induction on the class of A there is a nilpotent subgroup of \(G / A'\) of bounded index and class. By Hall’s criterion we are done in this case, so we may assume A is abelian. Since \(|G'/A|\) is bounded, we may replace G with a bounded-index subgroup whose image in G/A has class at most 2.

In particular, G is abelian-by-class-2. By another classical result of Hall, any abelian-by-nilpotent group is locally residually finite (see [15, 15.4.1]). Applying Lemmas 4.4 and 4.5, we may assume G is finite.

Let \(g \in G\) and assume g is central modulo A. Then the subgroup [Ag] is normal in G and consists of commutators, so \([A, g] \subseteq BS\). Note also that \(a \mapsto [a, g]\) is a homomorphism \(A \rightarrow [A, g]\). Suppose \(a, b \in A\) satisfy \([a, g], [b, g] \in Bs\) for some \(s \in S\). Then

$$\begin{aligned}{}[ab^{-1}, g] = [a, g] [b, g]^{-1} \in B^2. \end{aligned}$$

It follows that

$$\begin{aligned} \textbf{P}_{a \in A}([a, g] \in B^2) \ge 1/|S|, \end{aligned}$$

so

$$\begin{aligned} \textbf{P}_{a \in A, h \in G}([a, g, h] = 1) \ge \frac{1}{|S| n^2}. \end{aligned}$$

Applying Theorem 4.3, there are subgroups \(B_g \le [A, g]\) and \(T_g \le G\), both normal in G, such that the indices \([[A, g]: B_g]\) and \([G:T_g]\) and the order of \([B_g,T_g]\) are bounded.

Write \(G' = \langle b_1, \dots , b_s \rangle A\), where s is bounded. Since G/A has class 2, \(b_1, \dots , b_s\) are central mod A. Write \(T_i\) for \(T_{b_i}\), and \(B_i\) for \(B_{b_i}\). Then \([A,G'] = \prod _{i=1}^s [A, b_i]\). Let \(U = \bigcap _{i=1}^s T_i \cap C_G([B_i, T_i]) \cap C_G([A, b_i] / B_i)\). Then [G : U] is bounded and \(U'\) has class at most 4, since

$$\begin{aligned}&\gamma _3(U) \le A{} & {} (\text {since }G/A\text { has class 2}), \\&[\gamma _3(U), U'] \le [A, G'] = \prod _{i=1}^s [A, b_i], \\&[\gamma _3(U), U', U] \le \prod _{i=1}^s B_i{} & {} (\text {since }U \le C_G([A, b_i] / B_i)),\\&[\gamma _3(U), U', U, U] \le \prod _{i=1}^s [B_i, T_i]{} & {} (\hbox { since}\ U \le T_i),\\&[\gamma _3(U), U', U, U, U] = 1{} & {} (\text {since }U \le C_G([B_i, T_i])), \end{aligned}$$

and

$$\begin{aligned} \gamma _5(U') \le [\gamma _4(U), U', U, U, U, U] = 1. \end{aligned}$$

If we can show that \(U / U''\) has a subgroup \(H / U''\) of bounded index and class then H has bounded index in G and it would follow from Hall’s criterion that H has bounded class. Thus Proposition 7.1 is reduced to the finite metabelian case.

For the rest of this section, we assume G is a finite metabelian group and \(A = G'\). We will continue to refer to the subgroups \(B_g \le [A, g]\) and \(T_g \le G\), now valid for any \(g \in G\) since G/A is abelian.

7.2 The case of p-groups

We use the notation \([x, {}_l y]\) for the long commutator \([x, y, \dots , y]\) where y is repeated l times. If \(l = 0\) then \([x, {}_l y] = x\). An element y of a group \(\Gamma \) is l-Engel if \([x, {}_l y] = 1\) for all \(x \in \Gamma \), and \(\Gamma \) is l-Engel if every \(y \in \Gamma \) is l-Engel. The following lemma will be useful.

Lemma 7.2

Let \(\Gamma \) be a metabelian group and suppose that \(a,b\in \Gamma \) are l-Engel elements. If \(x\in \langle a,b\rangle \), then x is \((2l+1)\)-Engel.

Proof

Both subgroups \(\Gamma '\langle a\rangle \) and \(\Gamma '\langle b\rangle \) are normal in \(\Gamma \) and nilpotent of class at most l. By Fitting’s theorem, \(\Gamma '\langle a,b\rangle \) is nilpotent of class at most 2l and the lemma follows. \(\square \)

Now let G be a finite metabelian p-group obeying the covering condition (1.1). Let \(A = G'\) and \(B_g \le [A, g]\) and \(T_g \le G\) be as in the previous subsection. Since \(|[B_g, T_g]|\) is bounded, there is a bounded i such that \(Z_i(G)\) contains \([B_g, T_g]\) for every \(g\in G\). Passing to the quotient \(G/Z_i(G)\), we may assume \([B_g, T_g] = 1\) identically. We may also assume that \(B_g = C_{[A, g]}(T_g)\) and \(T_g = C_G(B_g)\).

Lemma 7.3

If G is l-Engel then G is nilpotent of (|S|, nl)-bounded class.

Proof

Since \(G/T_g\) has bounded order, \(G = \langle x_1, \dots , x_k\rangle T_g\) for some \(x_1, \dots , x_k \in G\) and bounded k. Since G is metabelian and l-Engel, \(A\langle x_i\rangle \) has class at most l for each i, so \(A \langle x_1, \dots , x_k \rangle \) has bounded class by Fitting’s theorem. In particular \(B_g \langle x_1, \dots , x_k \rangle \) has bounded class. Since \([B_g, T_g] = 1\), this shows that \(B_g\) is contained in \(Z_j(G)\) for some bounded j. Since \([A,g] / B_g\) has bounded order, [Ag] is contained in \(Z_{j'}(G)\) for bounded \(j'\). Since \([A, g] \in Z_{j'}(G)\) for all g, \(\gamma _3(G) \le Z_{j'}(G)\), so G has class at most \(j'+2\). \(\square \)

Thus it suffices to find a bounded-index subgroup of G which is l-Engel for some bounded l. Since \([A, g] / B_g\) is bounded, there is a bounded number f such that \([A, {}_fg] \le B_g\) for all \(g \in G\). For \(g \in G\) and \(i \ge 0\) write \(B_{i,g}\) for \([B_g, {}_ig]\), and note that \(B_{i,g}\) is normal in G. Let \(T_{i,g} = C_G(B_{i,g})\). Then \(B_g = B_{0,g} \ge B_{1,g} \ge \cdots \) and \(T_g = T_{0,g} \le T_{1,g} \le \cdots \). Let \(\beta _i = \max _{g \in G} [G : T_{i,g}]\), so \(\beta _0 \ge \beta _1 \ge \cdots \). Since \(\beta _0 = \max _{g \in G} [G : T_g]\) is bounded, there is a bounded number i such that \(\beta _i = \beta _{2i+f}\). Pick g so that \([G : T_{i,g}] = \beta _i = \beta _{2i+f}\). Then for any \(h \in T_{i,g}\) we have

$$\begin{aligned} B_{2i+f, g} = [B_{i,g}, {}_{f+i} g] = [B_{i,g}, {}_{f+i} gh] \le B_{i,gh}, \end{aligned}$$

so \(T_{i,gh} \le T_{2i+f,g} = T_{i,g}\). Since \([G:T_{i,g}] = \beta _i\), this implies that \(T_{i,gh} = T_{i,g}\) whenever \(h \in T_{i,g}\). Hence \(T_{i,g}\) centralizes \(B_{i,gh}\) for all \(h \in T_{i,g}\).

Let \(H = T_{i,g}\) and \(Z = Z(H)\) and \({\overline{G}} = G/Z\). Let \(h \in H\). Since Z contains \(B_{i,gh}\), it follows that \(\overline{gh}\) is \((f + i + 1)\)-Engel in \({\overline{G}}\). In particular \({\overline{g}}\) is \((f+i+1)\)-Engel. Applying Lemma 7.2, \({\overline{h}}\) is \((2f+2i+3)\)-Engel. Hence \({\overline{H}}\) is \((2f+2i+3)\)-Engel. Applying Lemma 7.3, H is nilpotent of bounded class, as required.

7.3 The coprime case

Lemma 7.4

Let A be a finite nilpotent group and let \(\Gamma \) be a group acting on A with kernel K. Assume |A| and \(|\Gamma / K|\) are coprime. Then \([A, \Gamma ] = [A, \Gamma , \Gamma ]\).

Proof

We may assume \(K = 1\) and |A| and \(|\Gamma |\) are coprime. Since A is the direct product of its Sylow subgroups, which are preserved by \(\Gamma \), we may assume A is a p-group for some prime p, and so \(\Gamma \) is a \(p'\)-group. Now see [1, (24.5)]. \(\square \)

We now consider the case of Proposition 7.1 in which G is finite and metabelian, \(G / G'\) is a p-group, and \(G'\) is a \(p'\)-group. As before let \(A = G'\) and let \(B_g, T_g\) be normal subgroups of G such that \(B_g \le [A, g]\), the indices \([[A, g] : B_g]\) and \([G : T_g]\) and the order of \([B_g, T_g]\) are bounded. Then \(C_G([B_g, T_g])\) has bounded index in G, so we may additionally assume that \(T_g\) centralizes \([B_g, T_g]\) by replacing \(T_g\) with \(C_{T_g}([B_g, T_g])\) if necessary. Similarly we may assume \(T_g\) centralizes \([A, g] / B_g\). But then two applications of the lemma show

$$\begin{aligned}{}[A, g, T_g] = [A, g, T_g, T_g] \le [B_g, T_g] = [B_g, T_g, T_g] = 1. \end{aligned}$$

Hence we may assume \(B_g = [A, g]\) and \(T_g = C_G([A, g])\).

Choose \(g \in G\) so that \([G : T_g]\) is maximal. Let \(h \in T_g\). By the lemma,

$$\begin{aligned}{}[A, g] = [A, g, g] = [A, g, gh] \le [A, gh]. \end{aligned}$$

It follows that \(T_{gh} \le T_g\). Since \([G : T_g]\) is maximal, \(T_{gh} = T_g\). Hence \(T_g\) centralizes both [Ag] and [Agh], and \([A, h] \le [A, g][A, gh]\), so \(T_g\) centralizes [Ah]. Since this holds for every \(h \in T_g\), \([A, T_g, T_g] = 1\). Since \(T_g' \le A\), it follows that \(T_g\) is nilpotent (and in fact abelian, since G has no nonabelian nilpotent subgroups).

7.4 Finite metabelian groups in general

Let G be a finite metabelian group satisfying (1.1). Let \(e = n!\). Then \(G^e\) centralizes B. The group \(G^e/Z(G^e)\) has at most |S| commutators so it has a subgroup of bounded index of nilpotency class at most 2, so \(G^e\) has a bounded-index subgroup of class at most 3.

Since \(G / G'\) is finite and abelian it is the direct product of its Sylow p-subgroups. For each prime \(p \le n\) let \(G_p\) be the preimage in G of the Sylow p-subgroup of \(G / G'\). For \(p > n\), the Sylow p-subgroups of G are contained in \(G^e\). Hence \(G = G^e \prod _{p \le n} G_p\), so by Fitting’s theorem it suffices to prove the result for \(G = G_p\). Thus we may assume \(G / G'\) is a p-group.

Let P be a Sylow p-subgroup of G. Then \(G = G' P\). By the subsection on p-groups, we can replace P with a subgroup of bounded index and bounded class, so without loss of generality P has bounded class c say. Since the action of G on \(G'\) factors through P, it follows that \([P \cap G', {}_c G] = 1\), so \(P \cap G' \le Z_c(G)\). Factoring out \(Z_c(G)\), we may thus assume \(G'\) is a \(p'\)-group. By the result of the coprime section we are done.

8 Bounded-class groups

We need the following result from [10], which is a local version of a classical result of Baer.

Theorem 8.1

[10, Theorem B] Let G be a group such that \(n = [\gamma _s(G) : Z_t(G) \cap \gamma _s(G)]\) is finite. Then \([\gamma _{s+1}(G) : Z_{t-1}(G) \cap \gamma _{s+1}(G)]\) is finite and (ns)-bounded.

Now we will complete the proofs of Theorems 1.1 and 1.2. By Corollary 5.6, it suffices to prove Theorem 1.2. By Propositions 6.1 and 7.1, G has a subgroup of bounded index and bounded nilpotency class. Replacing G with this subgroup, we may assume G has bounded class, say c. In particular G is locally residually finite, so by Lemmas 4.4 and 4.5 we may assume G is finite. We will show by induction on c that G has a subgroup H such that [G : H] and \(|\gamma _4(H)|\) are both (n, |S|, c)-bounded (G is bounded-by-class-3-by-bounded); if we can show this then, since \(C_H(\gamma _4(H))\) has class 4, we are done. By induction G/Z(G) has a subgroup H/Z(G) such that [G : H] and \([\gamma _4(H) : \gamma _4(H) \cap Z(G)]\) are both bounded. Applying Theorem 8.1, \(|\gamma _5(H)|\) is bounded. Hence, passing to the section \(H / \gamma _5(H)\), we may assume G has class 4. Moreover, by the converse of Theorem 5.3, \(d_2(G)\) is bounded away from zero. Hence the proof is completed by the following proposition.

Proposition 8.2

Let G be a finite group such that \(\gamma _5(G) = 1\) and \(d_2(G) \ge \epsilon > 0\). Then G has a subgroup H such that [G : H] and \(|\gamma _4(H)|\) are both \(\epsilon \)-bounded.

Proof

In this proof, “bounded” means “\(\epsilon \)-bounded”. Let \(\mathfrak {g}= \mathfrak {g}_1 \oplus \mathfrak {g}_2 \oplus \mathfrak {g}_3 \oplus \mathfrak {g}_4\) be the graded Lie ring associated to G, where \(\mathfrak {g}_i = \gamma _i(G) / \gamma _{i+1}(G)\) for \(1 \le i \le 4\). Since \(d_2(G) \ge \epsilon \),

$$\begin{aligned} \textbf{P}_{x, y, z \in \mathfrak {g}_1}([x, y, z] = 0) \ge \epsilon . \end{aligned}$$
(8.1)

We claim that also

$$\begin{aligned} \textbf{P}_{x, y \in \mathfrak {g}_1, z \in \mathfrak {g}_2}([x, y, z] = 0) \ge \epsilon . \end{aligned}$$
(8.2)

Since

$$\begin{aligned} \textbf{P}_{x, y, z_0 \in G, z \in G'}([x, y, zz_0] = 1) = \textbf{P}_{x, y, z \in G} ([x, y, z] = 1) \ge \epsilon , \end{aligned}$$

there is some \(z_0 \in G\) such that

$$\begin{aligned} \textbf{P}_{x, y \in G, z \in G'} ([x, y, zz_0] = 1) \ge \epsilon . \end{aligned}$$

Pick \(z_1 \in G'\) such that \([x, y, z_1z_0] = 1\). Now for \(z \in G'\) we have

$$\begin{aligned}{}[x, y, zz_0] = [x, y, z] [x, y, z_0], \end{aligned}$$

and for fixed x and y the map \(z \mapsto [x, y, z]\) is a homomorphism for \(z \in G'\), so

$$\begin{aligned}{}[x, y, zz_0] = 1 \implies [x, y, zz_1^{-1}] = 1. \end{aligned}$$

It follows that

$$\begin{aligned} \textbf{P}_{z \in G'}([x, y, zz_0] = 1) \le \textbf{P}_{z \in G'}([x, y, z] = 1). \end{aligned}$$

Hence

$$\begin{aligned} \textbf{P}_{x, y \in G, z \in G'}([x, y, z] = 1) \ge \textbf{P}_{x, y \in G, z \in G'} ([x,y,zz_0] = 1) \ge \epsilon . \end{aligned}$$

Descending to \(\mathfrak {g}\) gives (8.2).

From (8.1) and (8.2) and Theorem 4.6, we have have expressions

$$\begin{aligned}{}[x, y, z]&= F_1(x, y, z) + \cdots + F_7(x, y, z){} & {} (x, y, z \in \mathfrak {g}_1)\\ [x, y, z]&= F'_1(x, y, z) + \cdots + F'_7(x, y, z){} & {} (x, y \in \mathfrak {g}_1, z \in \mathfrak {g}_2), \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} F_1(x, y, z)&= G_1(g_1(x), y, z),\\ F_2(x, y, z)&= G_2(g_2(y), x, z),\\ F_3(x, y, z)&= G_3(g_3(z), x, y),\\ F_4(x, y, z)&= G_4(g_4(x, y), z),\\ F_5(x, y, z)&= G_5(g_5(x, z), y),\\ F_6(x, y, z)&= G_6(g_6(y, z), x),\\ F_7(x, y, z)&= G_7(g_7(x, y, z)), \end{aligned} \end{aligned}$$
(8.3)

likewise for \(F'_1, \dots , F'_7\), and where \(|{\text {cod}}(g_i)|\) and \(|{\text {cod}}(g'_i)|\) are \(\epsilon \)-bounded for each \(i \in \{1, \dots , 7\}\). Hence we obtain two expressions for [xyzw]:

$$\begin{aligned}{}[x, y, z, w] = \sum _{i=1}^7 [F_i(x, y, z), w] \end{aligned}$$
(8.4)

and, by the Jacobi identity,

$$\begin{aligned}{}[x, y, z, w]&= [x, y, w, z] + [x, y, [z, w]] \nonumber \\&= \sum _{i=1}^7 [F_i(x, y, w), z] + \sum _{i=1}^7 F_i'(x, y, [z, w]). \end{aligned}$$
(8.5)

Note that (8.4) only has terms of the forms

$$\begin{aligned}&G(g(x), y, z, w), G(x, g(y), z, w), G(x, y, g(z), w), \\&G(g(x, y), z, w), G(g(x, z), y, w), G(g(y, z), x, w), \\&G(g(x, y, z), w); \end{aligned}$$

while (8.5) only has terms of the forms

$$\begin{aligned}&G(g(x), y, z, w), G(x, g(y), z, w), G(x, y, z, g(w)), \\&G(g(x, y), z, w), G(g(x, w), y, z), G(g(y, w), x, z), G(x, y, g(z, w)), \\&G(g(x, y, w), z), G(g(x, z, w), y), G(g(y, z, w), x), \\&G(g(x, y, z, w)). \end{aligned}$$

Applying Lemma 4.7, there is a bounded-index subgroup \(\mathfrak {h}_1 \le \mathfrak {g}_1\) such that for \((x, y, z, w) \in \mathfrak {h}_1^4\) we have an expression for [xyzw] of the form

$$\begin{aligned}{}[x,y,z,w] = F_1(f_1(x, y), z, w) + F_2(f_2(x, z), f'_2(y, w)) + F_3(f_3(x, w), f'_3(y, z)), \end{aligned}$$

where \(f_1, f_2, f_2', f_3, f_3'\) have bounded codomains. The latter two terms take values in a bounded part of \(\mathfrak {g}_4\), which we may quotient out, so without loss of generality

$$\begin{aligned}{}[x, y, z, w] = F_1(f_1(x, y), z, w). \end{aligned}$$

Now if \(f_1(x, y) = f_1(x', y')\) we have \([x, y] - [x', y'] \in Z_2(\mathfrak {h})\), where \(\mathfrak {h}\) is the subring generated by \(\mathfrak {h}_1\).

Let H be the preimage of \(\mathfrak {h}_1\) in G. Then \([G:H] = [\mathfrak {g}_1:\mathfrak {h}_1]\). For \(x \in H\) write \({\overline{x}}\) for the image in \(\mathfrak {h}_1\). Then for \(x, y, x', y' \in H\) if \(f_1({\overline{x}}, {\overline{y}}) = f_1(\overline{x'}, \overline{y'})\) then \([x, y] [x', y']^{-1} \in Z_2(H)\). Hence there are only boundedly many commutators in \(\gamma _2(H) / Z_2(H)\), so \([\gamma _2(H) : Z_2(H)]\) is bounded by Neumann’s BFC theorem (Theorem 4.1). Applying Theorem 8.1 twice, \(|\gamma _4(H)|\) is bounded, as claimed. \(\square \)

Remark 8.3

Let G be a finite group with \(d_2(G) \ge \epsilon > 0\) as in the statement of Theorem 1.1. We have established the existence of a subgroup H of class 4 such that [G : H] and \(|\gamma _4(H)|\) are bounded. The covering condition and Lemma 4.5 (or (4.1)) shows that \(d_2(H / \gamma _4(H))\) is bounded away from zero, so by replacing G with \(H / \gamma _4(H)\) we may assume G has class 3. Now consider the expression

$$\begin{aligned}{}[x,y,z] = F_1(x, y, z) + \cdots + F_7(x, y, z) \qquad (x, y, z \in G / G') \end{aligned}$$

established in the proof above, where \(F_1, \dots , F_7\) are given by (8.3). By passing to the bounded-index subgroup \(\ker g_1 \cap \ker g_2 \cap \ker g_3\) we can kill off \(F_1, F_2, F_3\). The image of \(F_7\) is contained in a bounded-size subgroup of \(\gamma _3(G)\), so we may quotient it out. Thus we are left with

$$\begin{aligned}{}[x, y, z] = G_4(g_4(x, y), z) + G_5(g_5(x, z), y) + G_6(g_6(y, z), x). \end{aligned}$$

Conversely, suppose G has class 3 and the triple commutator has an expression of this form. Then for any \(x \in G\), the proportion of \(y \in G\) such that \(g_4(x, y) = 0\) is at least \(|{\text {cod}}(g_4)|^{-1}\), and for any \(x, y \in G\) the proportion of \(z \in G\) such that \(g_5(x, z) = 0\) and \(g_6(y, z) = 0\) is at least \(|{\text {cod}}(g_5)|^{-1} |{\text {cod}}(g_6)|^{-1}\). It follows that \(d_2(G) \ge |{\text {cod}}(g_4)|^{-1} |{\text {cod}}(g_5)|^{-1} |{\text {cod}}(g_6)|^{-1}\).