1 Introduction

The study of equations over algebraic structures has a long history in mathematics. Some of the first explicit decidability results in group theory are due to Makanin [25], who showed that equations over free groups are decidable. Subsequently several other decidability and undecidability results as well as complexity results on equations over infinite groups emerged (see [5, 9, 23, 29] for a random selection). Also the famous 10th Hilbert problem on Diophantine equations, that asks whether an equation of two polynomials over the ring of integers has a solution, was shown to be undecidable [26].

One can treat polynomials over a ring R to be terms over R with some variables already evaluated by elements of R. The same can be done with groups to define polynomials over a group G. Now the problem \(\textsc {{PolSat}} \!\left ({G} \right )\) takes as input an equation of the form t(x1,…,xn) = s(x1,…,xn) (or equivalently t(x1,…,xn) = 1, by replacing t = s by ts− 1 = 1), where \(\mathbf {s}(\overline {x})\) and \(\mathbf {t}(\overline {x})\) are polynomials over G, and asks whether this equation has a solution in G. Obviously working with terms t, s rather than polynomials this problem trivializes by setting all the xi’s to 1. Likewise \(\textsc {{PolEqv}} \!\left ({G} \right )\) is the problem of deciding whether two polynomials \(\mathbf {t}(\overline {x}), \mathbf {s}(\overline {x})\) are equal for all evaluations of the variables \(\overline {x}\) in G.

While for infinite groups G the problems \(\textsc {{PolSat}} \!\left ({G} \right )\) and \(\textsc {{PolEqv}} \!\left ({G} \right )\) may be undecidable, they are solvable in exponential time in finite realms. In fact, \({\textsc {{PolSat}}} \!\left ({G} \right )\) is in NP, whereas \({\textsc {{PolEqv}}} \!\left ({G} \right )\) is in coNP. Actually the hardest possible groups that lead to NP-complete PolSat and coNP-complete PolEqv are all groups that are not solvable [10, 15]. On the other hand it is easy to see that both these problems can be solved in a linear time for all finite abelian groups.

Also in nilpotent groups both PolSat and PolEqv can be solved in polynomial time. While the running time of the first such algorithm for PolSat, due to Goldmann and Russell [10], is bounded by a polynomial of very high degree (as this bound was obtained by a Ramsey-type argument), the first algorithm for PolEqv (due to [3]) is much faster. For polynomials of length the running time for \({\textsc {{PolEqv}}} \!\left ({G} \right )\) is bounded by \(\mathcal {O}\left (\ell ^{k+1}\right )\), where \(k \leqslant \log \left | G \right |\) is the nilpotency class of the group G. Very recently two much faster algorithms for \({\textsc {{PolSat}}} \!\left ({G} \right )\) have been described. One by [7] runs in \(\mathcal {O}\left (\ell ^{\frac {1}{2} \left | G \right |^{2} \log \left | G \right |} \right )\) steps. The other one, provided in [21], runs even faster for all but finitely many nilpotent groups, i.e. in \(\mathcal {O}\left (\ell ^{\left | G \right |^{2}+1}\right )\) steps. The very same paper [21] concludes this race by providing randomized algorithms for PolSat and PolEqv working in linear time for all nilpotent groups.

However, the situation for solvable but non-nilpotent groups has been almost completely open. Due to [13] we know that PolSat and PolEqv for the symmetric group S3 (and some others) can be done in polynomial time. More examples of such solvable but non-nilpotent groups are provided in [8, 12]. Actually already in 2004 Burris and Lawrence [3] conjectured that PolEqv for all solvable groups is in P. In 2011 Horváth renewed this conjecture and extended it to PolSat [11]. Actually these conjectures have been strongly supported also by recent results in [8], where many other examples of solvable non-nilpotent groups are shown to be tractable.

Up to recently, the smallest solvable non-nilpotent group with unknown complexity was the symmetric group S4. One reason that prevented existing techniques for polynomial time algorithms to work for S4 is that S4 does not have a nilpotent normal subgroup with a nilpotent quotient. Somewhat surprisingly, in [18] the first three authors succeeded to show that neither \({\textsc {{PolSat}}} {\!\left ({S_{4}} \right )}\) nor \({\textsc {{PolEqv}}} {\!\left ({S_{4}} \right )}\) is in P as long as the Existential Time Hypothesis holds. Simultaneously, in [30] the fourth author proved super-polynomial lower bounds on PolSat and PolEqv for a broad class of finite solvable groups—again unless ETH fails. Both the lower bounds in [18] and [30] depended on the so-called Fitting length, which is defined as the length d of the shortest chain

$$ 1 = G_{0} \leqslant G_{1} \leqslant {\ldots} \leqslant G_{d} = G $$

of normal subgroups Gi of G with all the quotients Gi+ 1/Gi being nilpotent.

Indeed, the lower bounds in [30] apply to all finite solvable groups of Fitting length at least four and to certain groups of Fitting length three. However, this class of groups does not include S4—although its Fitting length is three.

The present paper extends these results by showing super-polynomial lower bounds for the complexity of PolSat(G) and PolEqv(G)—again depending on the Fitting length. It strongly indicates that the mentioned conjectures by Burris and Lawrence and by Horváth fail by showing the following result.

Theorem 1

If G is a finite solvable group of Fitting length \(d\geqslant 3\), then both \({\textsc {{PolSat}}} \!\left ({G} \right )\) and \({\textsc {{PolEqv}}} \!\left ({G} \right )\) require at least \(2^{\Omega (\log ^{d-1} \ell )}\) steps unless ETH fails.

The paper [2] contains all necessary pieces to provide for \({\textsc {{PolSat}}} \!\left ({G} \right )\) an upper bound of the form \(2^{\mathcal {O}(\log ^{r} \ell )}\) with \(r\geqslant 1\) depending on G whenever G is a finite solvable group. This upper bound relies on the AND-weakness conjecture saying that each CC0 circuit for the n-input AND function has at least \(2^{n^{\delta }}\) gates. Thus, the AND-weakness conjecture implies that the lower bounds in Theorem 1 cannot be improved in an essential way.

Finally, we note that allowing to use definable polynomials as additional basic operations to build the input terms t, s we may substantially shorten the size of the input. For example with the commutator [x, y] = x− 1y− 1xy the expression \([\dots [[x,y_{1}], y_{2}],\ldots ,y_{n}]\) has linear size, while when presented in the pure group language it has exponential size. In this new setting PolSat (and PolEqv) have been shown [14, 22] to be NP-complete (or coNP-complete, respectively) for all non-nilpotent groups. Actually our proof of Theorem 1 shows this as well.

Moreover, the paper [17] shows (in a very broad context of an arbitrary algebra) that allowing such definable polynomials can be simulated by circuits over this algebra.

2 Preliminaries

Complexity and the Exponential Time Hypothesis

We use standard notation from complexity theory as can be found in any textbook on complexity, e.g. [27].

The Exponential Time Hypothesis (ETH) is the conjecture that there is some δ > 0 such that every algorithm for 3SAT needs time Ω(2δn) in the worst case where n is the number of variables of the given 3SAT instance. By the Sparsification Lemma [20, Thm. 1] this is equivalent to the existence of some 𝜖 > 0 such that every algorithm for 3SAT needs time Ω(2𝜖(m+n)) in the worst case where m is the number of clauses of the given 3SAT instance (see also [4, Thm. 14.4]). In particular, under ETH there is no algorithm for 3SAT running in time 2o(n+m).

Another classical NP-complete problem is C-Coloring for \(C\geqslant 3\). Given an undirected graph Γ = (V, E) the question is whether there is a valid C-coloring of Γ, i.e. a map \(\chi :V \to \left \{ \mathinner {1, \dots , C} \right \}\) satisfying χ(u)≠χ(v) whenever \(\left \{\mathinner {u,v}\right \}\in E\). Moreover, by [4, Thm. 14.6], 3-Coloring cannot be solved in time \(2^{o(\left |\mathinner {V}\right | + \left |\mathinner {E}\right |)}\) unless ETH fails. Since 3-Coloring can be reduced to C-Coloring for fixed \(C \geqslant 3\) by introducing only a linear number of additional edges and a constant number of vertices, it follows for every \(C\geqslant 3\) that also C-Coloring cannot be solved in time \(2^{o(\left |\mathinner {V}\right | + \left |\mathinner {E}\right |)}\) unless ETH fails.

Groups and Commutators

Throughout, we only consider finite groups G. We follow the notation of [28]. For groups G and H we write \(H\leqslant G\) if H is a subgroup of G, or H < G if H is a proper subgroup of G. Similarly we write H G (or \(H\mathrel {\lhd } G\)) if H is a normal subgroup of G (or a proper normal subgroup). For a subset \(X \subseteq G\) we write \(\left < \mathinner {X} \right >\) for the subgroup generated by X, and \(\left <\!\left < \mathinner {X} \right >\!\right > = \left < \mathinner {x^{g}} \middle | \mathinner {x\in X, g\in G} \right >\) for the normal subgroup generated by X.

We write [x, y] = x− 1y− 1xy for the commutator and xy = y− 1xy for the conjugation. Moreover, we write \([x_{1}, \dots , x_{n}] = [[x_{1}, \dots , x_{n-1}],x_{n}]\) for \(n\geqslant 3\).

We will be also using commutator of (normal) subgroups (or even subsets) \(X,Y,X_{1},\dots , X_{k} \subseteq G\) defined by \([X,Y] = \left < \mathinner {[x,y]} \middle | \mathinner {x \in X, y \in Y} \right >\) and \([X_{1},\dots , X_{k}] = [[X_{1},\dots ,X_{k-1}], X_{k}]\). Note here that the commutator [H, K] is a normal subgroup of G whenever H and K are. Finally, we put \(\left [{x},_{k} {y} \right ] = [x,\underbrace {y,\dots ,y}_{k\text { times}}]\) and \(\left [{X},_{k} {Y} \right ] = [X,\underbrace {Y,\dots ,Y}_{k\text { times}}]\).

We will also need the concept of a centralizer of a subset X in G, which is defined as \(C_{G}(X) = \left \{ \mathinner {g \in G}\vphantom {[g,h]=1 \text { for all } h \in X} \left | \vphantom {g \in G}\mathinner {[g,h]=1 \text { for all } h \in X} \right . \right \}\). If N is a normal subgroup, then CG(N) is a normal subgroup as well.

Below we collect some basic facts about commutators of elements and subgroups.

  1. (2.1)

    For \(g,x,y,z,x_{1}, {\dots } x_{n},y_{1}, {\dots } y_{n} \in G\) and normal subgroups K1,K2,M, N of a group G and we have

    1. (i)

      [xy, z] = [x, z]y[y, z] and [x, yz] = [x, z][x, y]z.

    2. (ii)

      \([K_{1}, K_{2}] = [K_{2},K_{1}] \leqslant K_{1} \cap K_{2}\) and [K1K2,N] = [K1,N][K2,N].

    3. (iii)

      If xy mod N and gM, then for all \(k\in {\mathbb {N}}\) we have

      $$\left[{x},\kern.1em_{k} \kern.1em {g} \right] \equiv \left[{y},\kern.1em_{k} \kern.1em {g} \right] \mod \left[{N},\kern.1em_{k} \kern.1em {M} \right].$$
    4. (iv)

      If gM and xiyi mod N, then

      $$[g,x_{1}, \dots, x_{n}] \equiv [g,y_{1}, \dots, y_{n}]\mod [M,N].$$
    5. (v)

      For all fCG(N), gG, hN and \(k \in {\mathbb {N}}\) we have

      $$\left[{{hf}},\kern.1em_{{k}} \kern.1em {{g}} \right] = \left[{{h}},\kern.1em_{{k}} \kern.1em {{g}} \right]\left[{{f}},\kern.1em_{{k}} \kern.1em {{g}} \right].$$

Proof

(i) is a straightforward standard calculation (see also [28, 5.1.5]):

$$ \begin{array}{@{}rcl@{}} [x,z]^{y}[y,z] &=& y^{-1} (x^{-1}z^{-1}xz) y y^{-1}z^{-1}yz\\ &= & y^{-1}x^{-1}z^{-1}xyz = (xy)^{-1}z^{-1}(xy)z = [xy,z] \end{array} $$

The first part of (ii) is clear from the definition, while the second one follows immediately from (i). To see (iii) and (iv), let gM, x, yG and hN with hx = y to see that

$$ \begin{array}{@{}rcl@{}} {[hx,g]} &=& {[h,g]}^{x}{[x,g]} \in {[N,M]} {[x,g]}\qquad \text{ and }\\ {[g,hx]} &=& {[g,x][g,h]}^{x} \in {[g,x]} {[M,N]}. \end{array} $$

Then our statements follow by induction.

Finally, for (v), let \(f \in C_{G}(N) = \left \{ \mathinner {g \in G}\vphantom {[f,h]=1 \text { for all } h \in N} \left | \vphantom {g \in G}\mathinner {[f,h]=1 \text { for all } h \in N} \right . \right \}\) and gG, hN. Then we have

$$ \begin{array}{@{}rcl@{}} [hf,g] &= [h,g]^{f}[f,g] = [h,g] [f,g]. \end{array} $$

Since CG(N) is a normal subgroup, also [f, g] ∈ CG(N) so that we can then induct on k. □

Since G is finite, for all x, yG, there are i < j such that \(\left [{x},_{i} {y} \right ]=\left [{x},_{j} {y} \right ]\). Writing k = ji, we get \(\left [{{x}},_{{i}} {{y}} \right ]=\left [{{x}},_{{i+k}} {{y}} \right ]\) for all sufficiently large i’s. For each choice of x and y we might get a different value for k; yet, by taking a common multiple of all the k’s, we obtain some \(\omega \in \mathbb {N}\) such that for all x, yG and all \(i\geqslant \omega \) we have \(\left [{{x}},_{{i}} {{y}} \right ]=\left [{{x}},_{{i+\omega }} {{y}} \right ]\).

Since for normal subgroups M, N of G we have

$$ M \geqslant \left[{M},\kern.1em_{1} \kern.1em {N} \right] \geqslant \left[{M},\kern.1em_{2} \kern.1em {N} \right] \geqslant {\ldots} \geqslant \left[{M},\kern.1em_{i} \kern.1em {N} \right] \geqslant \left[{M},\kern.1em_{i+1} \kern.1em {N} \right]\geqslant \ldots, $$

the finiteness of G ensures us that there is some \(k_{0} \in {\mathbb {N}}\) such that \(\left [{M},_{k_{0}} {N} \right ] = \left [{M},_{k} {N} \right ]\) for all \(k \geqslant k_{0}\) and all normal subgroups M, N of G. We can assume that \(\omega \geqslant k_{0}\). It is clear that ω = |G|! is large enough, but typically much smaller values suffice. Thus, we have:

  1. (2.2)

    For x, yG, M, N G and \(i,j\geqslant \omega \) we have

    • \(\left [{x},_{i} {y} \right ]=\left [{x},_{i+\omega } {y} \right ]\),

    • \(\left [{M},_{i} {N} \right ]=\left [{M},_{j} {N} \right ]\).

We fix ω = ω(G) throughout. Be aware that it depends on the specific group G.

Nilpotency and Fitting Series

The k-th term of the lower central series is \(\gamma _{k}(G) = \left [{G},_{k} {G} \right ]\). The nilpotent residual of G is defined as \(\bigcap _{k\geqslant 0 }\gamma _{k}(G) = \gamma _{\omega } (G)\) where ω is as above (i.e. γω(G) = γi(G) for every \(i \geqslant \omega \)). Recall that a finite group G is nilpotent if and only if γω(G) = 1.

The Fitting subgroup Fit(G) is the union of all nilpotent normal subgroups. Let G be a finite solvable group. It is well-known that Fit(G) itself is a nilpotent normal subgroup (see e.g. [16, Satz 4.2]). We will need the following characterization of the Fitting subgroup due to Baer ([1, Satz L’], which is also an immediate consequence of [28, 12.3.8]).

  1. (2.3)

    \(\operatorname {Fit}(G) = \left \{ \mathinner {g \in G}\vphantom {\left [{h},_{\omega } {g} \right ] =1\text { for all } h\in G} \left | \vphantom {g \in G}\mathinner {\left [{h},_{\omega } {g} \right ] =1\text { for all } h\in G} \right . \right \}\).

Now we define the upper Fitting series

$$1 = \mathcal{U}_{0}(G) \mathrel{\lhd} \mathcal{U}_{1}(G) \mathrel{\lhd} {\cdots} \mathrel{\lhd} \mathcal{U}_{k} (G) = G $$

by \(\mathcal {U}_{i+1}(G)/\mathcal {U}_{i}(G) = \operatorname {Fit}(G/\mathcal {U}_{i}(G))\). If the group is clear, we simply write \(\mathcal {U}_{i} \) for \(\mathcal {U}_{i}(G)\). The number of factors k is called the Fitting length of G (denoted by FitLen(G)).

The following fact can be derived by a straightforward induction from the characterization of Fit(G) as largest nilpotent normal subgroup.

  1. (2.4)

    For H G and gG we have

    • \(\mathcal {U}_{i} (H) = \mathcal {U}_{i} \cap H\), for all i,

    • \(\operatorname {FitLen}(H) \leqslant i\) if and only if \(H \leqslant \mathcal {U}_{i}\),

    • \(\operatorname {FitLen}\left <\!\left < \mathinner {g} \right >\!\right > = i\) if and only if \(g \in \mathcal {U}_{i} \setminus \mathcal {U}_{i-1}\).

Example 1

The symmetric group on four elements S4 has Fitting length 3 with \( 1 \leqslant C_{2} \times C_{2} \leqslant A_{4} \leqslant S_{4}\) being the upper (and also lower) Fitting series.

Example 2

If \(G_{1}, \dots , G_{k}\) are nilpotent groups, then the Fitting length of the wreath product \(G_{1} \wr \dots \wr G_{k}\) is at most k (for a definition of wreath products, we refer to any standard textbook like [28]). The Fitting length is exactly k if and only if there are primes \(p_{1}, \dots , p_{k}\) with \(p_{i} \mid \left |\mathinner {G_{i}}\right |\) and pipi+ 1 for all i.

More generally, every group of Fitting length k is a divisor (a quotient of a subgroup) of such a wreath product of k nilpotent groups. As we do not rely on these characterizations, we leave the proofs to the reader.

Equations in Groups

A term (in the language of groups) is a word over an alphabet \(\mathcal {X} \cup \mathcal {X}^{-1}\) where \(\mathcal {X}\) is a set of variables. A polynomial over a group G is a term where some of the variables are replaced by constants—i.e., a word over the alphabet \(G \cup \mathcal {X} \cup \mathcal {X}^{-1}\). Since we are dealing with finite groups only, a symbol \(X^{-1}\in \mathcal {X}^{-1}\) for \(X\in \mathcal {X}\) can be considered as an abbreviation for \(X^{\left |\mathinner {G}\right |-1}\). We write \({\mathbf {s}}(x_{1}, \dots , x_{n})\) or short \({\mathbf {s}}(\overline {x})\) for a polynomial (resp. term) s with variables from \(\left \{ \mathinner {x_{1}, \dots , x_{n}} \right \}\). There is a natural composition of terms and polynomials: if \(\mathbf {r}(x_{1}, \dots , x_{n}),\mathbf {s}_{1}, \dots , \mathbf {s}_{n}\) are polynomials (resp. terms), we write \(\mathbf {r}(\mathbf {s}_{1}, \dots , \mathbf {s}_{n})\) for the polynomials (resp. terms) obtained by substituting each occurrence of a variable xi by the polynomial (resp. term) si.

A tuple \((g_{1}, \dots , g_{n})\in G^{n}\) is a satisfying assignment for s if \({\mathbf {s}}(g_{1}, \dots , g_{n})=1\) in G. The problems \({\textsc {{PolSat}}} \!\left ({G} \right )\) and \({\textsc {{PolEqv}}} \!\left ({G} \right )\) are as follows: for both of them the input is a polynomial \({\mathbf {s}}(x_{1}, \dots , x_{n})\). For \({\textsc {{PolSat}}} \!\left ({G} \right )\) the question is whether there exists a satisfying assignment, for \({\textsc {{PolEqv}}} \!\left ({G} \right )\) the question is whether all assignments are satisfying. Note here that these problems have many other names. For example in in [10, 30], PolSat is denoted by EQN-SAT and PolEqv by EQN-ID.

Inducible Subgroups

According to [10], we call a subset \(S \subseteq G\) inducible if \(S = \left \{ \mathinner {{\mathbf {s}}(g_{1}, \dots , g_{n})}\vphantom {g_{1}, \dots , g_{n} \in G} \left | \vphantom {{\mathbf {s}}(g_{1}, \dots , g_{n})}\mathinner {g_{1}, \dots , g_{n} \in G} \right . \right \}\) for some polynomial \({\mathbf {s}}(x_{1}, \dots , x_{n})\) of G.

The importance of inducible subgroups lies in the observation that one can restrict variables in equations to inducible subgroups (simply by replacing each variable by the polynomial defining the inducible subgroup). This immediately gives the following lemma.

Lemma 1 ([10, Lemma 8], [14, Lemma 9, 10])

If H is an inducible subgroup of G, then

  • \({\textsc {{PolSat}}} {\!\left ({H} \right )}\) is polynomial time many-one reducible to \({\textsc {{PolSat}}} \!\left ({G} \right )\),

  • \({\textsc {{PolEqv}}} {\!\left ({H} \right )}\) is polynomial time many-one reducible to \({\textsc {{PolEqv}}} \!\left ({G} \right )\).

We will use this lemma to restrict our consideration for an appropriate subgroup of the form γk(G). We will see that such subgroups are inducible.

3 Proof of Theorem 1

The proof of the theorem is based on coding (by group polynomials) functions that imitate the behaviour of conjunctions. Unfortunately, the lengths of such n-ary conjunction-like group polynomials are not bounded by any polynomial in n and, therefore, they cannot be used to show NP-completeness of PolSat.Footnote 1 However, the group polynomials we are going to produce have length bounded by \(2^{\mathcal {O}(n^{\frac {1}{d-1}})}\) where d = FitLen(G). Given such relatively short conjunction-like group polynomials we reduce graph coloring or 3SAT, depending on whether \(\left |\mathinner {G/H}\right | \geqslant 3\) for a carefully chosen large subgroup H of G. In any case such reduction, together with the ETH, would give the lower bound \(2^{\Omega (\log ^{d-1} \ell )}\) for \({\textsc {{PolSat}}} \!\left ({G} \right )\).

To see how to produce such relatively short conjunction-like polynomials, we start with the upper Fitting series of G

$$ 1 = \mathcal{U}_{0} \mathrel{\lhd} \mathcal{U}_{1} \mathrel{\lhd} {\cdots} \mathrel{\lhd} \mathcal{U}_{d} = G $$

to go downwards along this series and consecutively carefully choose \(h_{\alpha } \in \mathcal {U}_{\alpha } \setminus \mathcal {U}_{\alpha -1}\) on each level α = d, d − 1,…,1 of this sequence. Then we get two different cosets \(\mathcal {U}_{\alpha -1}\) and \(h_{\alpha }\cdot {}\mathcal {U}_{\alpha -1}\) which are supposed to simulate false and true values, respectively.

The conjunction-like polynomials are based on the terms \(\tilde {\mathbf {q}}^{(k)}(z,x_{1}, \dots , x_{k})\) and \({\mathbf {q}}^{(k)}(z,x_{1}, \dots , x_{k},w)\) for \(k\geqslant 0\) defined by

$$ \begin{array}{@{}rcl@{}} \tilde{\mathbf{q}}^{(0)}(z) &=& z, \\ \tilde{\mathbf{q}}^{(k)}(z, x_{1}, \dots, x_{k}) &=& \left[{\tilde{\mathbf{q}}^{(k-1)}(z, x_{1}, \dots, x_{k-1})},\kern.1em_{\omega} \kern.1em {x_{k}} \right], \qquad \text{for } k \geqslant 1, \text{ and }\\ {\mathbf{q}}^{(k)}(z, x_{1}, \dots, x_{k},w) &=& \tilde{\mathbf{q}}^{(k+1)}(z, x_{1}, \dots, x_{k},w), \qquad\qquad\quad~ \text{for } k \geqslant 0. \end{array} $$

Note that our definition of the q(k)’s immediately yields

$$ \begin{array}{@{}rcl@{}} {\mathbf{q}}^{(k+1)}(z,x_{1},\dots, x_{k},w,w)={\mathbf{q}}^{(k)}(z,x_{1},\dots, x_{k},w) \end{array} $$
(1)

The conjunction-like behaviour of the q(k)’s on the \(\mathcal {U}_{\alpha }\)-cosets is precisely described in the following lemma.

Lemma 2

For any level \(1 \leqslant \alpha \leqslant d-1\) and \(h_{\alpha +1} \in \mathcal {U}_{\alpha +1} \setminus \mathcal {U}_{\alpha }\) there is some \(h_{\alpha } \in \mathcal {U}_{\alpha } \setminus \mathcal {U}_{\alpha -1}\) such that for each \(k\in {\mathbb {N}}\) we have

$$ \begin{array}{@{}rcl@{}} {\mathbf{q}}^{(k)}(h_{\alpha},x_{1}, \dots, x_{k}, h_{\alpha+1}) \in \left\{\begin{array}{lll} h_{\alpha}\cdot \mathcal{U}_{\alpha-1}, &\text{if } x_{i} \in h_{\alpha+1}\cdot \mathcal{U}_{\alpha} \text{ for all \textit{i},} \\ \hphantom{h_{\alpha}\cdot{}}\mathcal{U}_{\alpha-1}, &\text{if } x_{i} \in \mathcal{U}_{\alpha} \text{ for some \textit{i}}. \end{array}\right. \end{array} $$

Proof

In this proof we may, without loss of generality, factor out our group G by \(\mathcal {U}_{\alpha -1}\), or equivalently assume that α = 1. This means that \(\mathcal {U}_{\alpha } = \operatorname {Fit}(G)\) and so, by Baer’s theorem (2.3), there is some aG with \(\left [{a},_{\omega } {h_{\alpha +1}} \right ] \neq 1\). Let \(\beta \in {\mathbb {N}}\) be maximal such that \(\left [{a},_{\omega } {h_{\alpha +1}} \right ] \in \gamma _{\beta }(\mathcal {U}_{\alpha }) \setminus \{1\}\) for some aG. Now, we simply put \(h_{\alpha } = \left [{a},_{\omega } {h_{\alpha +1}} \right ]\), to observe that \(h_{\alpha } = \left [{h_{\alpha }},_{\omega } {h_{\alpha +1}} \right ]\) and \(\left <\!\left < \mathinner {h_{\alpha }} \right >\!\right >\leqslant \gamma _{\beta }(\mathcal {U}_{\alpha })\). The last inclusion gives that for all \(x_{1}, \dots , x_{k+1}\in G\) we have \({\mathbf {q}}^{(k)}(h_{\alpha }, x_{1}, \dots , x_{k+1})\in \gamma _{\beta }(\mathcal {U}_{\alpha })\).

Suppose now that one of the xi’s is in \(\mathcal {U}_{\alpha }\). Then \(\tilde {\mathbf { q}}^{(i)}(h_{\alpha }, x_{1}, \dots , x_{i}) = \left [{\tilde {\mathbf {q}}^{(i-1)}(h_{\alpha }, x_{1}, \dots , x_{i-1})},_{\omega } {x_{i}} \right ] \in \left [{\mathcal {U}_{\alpha }},_{\omega } {\mathcal {U}_{\alpha }} \right ] = \gamma _{\omega }(\mathcal {U}_{\alpha }) = \{1\}\). Hence, also \({\mathbf {q}}^{(k)}(h_{\alpha }, x_{1}, \dots , x_{k},h_{\alpha +1})=1\).

On the other hand, if all the xi’s are in the coset \(h_{\alpha +1} \mathcal {U}_{\alpha }\), then, by (2.1.iv), we have \({\mathbf {q}}^{(k)}(h_{\alpha },x_{1}, \dots , x_{k}, h_{\alpha +1}) \equiv {\mathbf {q}}^{(k)}(h_{\alpha },h_{\alpha +1}, \dots , h_{\alpha +1}, h_{\alpha +1})= h_{\alpha }\) modulo \([\left <\!\left < \mathinner {h_{\alpha }} \right >\!\right >,\mathcal {U}_{\alpha }] \leqslant [\gamma _{\beta }(\mathcal {U}_{\alpha }) ,\mathcal {U}_{\alpha }] \leqslant \gamma _{\beta +1}(\mathcal {U}_{\alpha })\). Hence, \({\mathbf {q}}^{(k)}(h_{\alpha }, \overline {x},h_{\alpha +1}) = h_{\alpha } f\) for some \(f \in \gamma _{\beta +1}(\mathcal {U}_{\alpha })\). Thus, all we have to show is that \(f \in \mathcal {U}_{\alpha -1}\), or – in our setting – that f = 1. To do this we induct on \(j\geqslant \beta +1\) to show that \(f \in \gamma _{j}(\mathcal {U}_{\alpha })\) for all j’s.

Starting with \(f \in \gamma _{j}(\mathcal {U}_{\alpha }) \leqslant \gamma _{\beta +1}(\mathcal {U}_{\alpha })\), we also have \(\left [{f},_{\omega } {h_{\alpha +1}} \right ] \in \gamma _{\beta +1}(\mathcal {U}_{\alpha })\). But now, maximality of β ensures us that

$$ \begin{array}{@{}rcl@{}} \left[{f},\kern.1em_{\omega} \kern.1em {h_{\alpha+1}} \right] =1. \end{array} $$
(2)

Obviously \([f,g] \in \gamma _{j+1}(\mathcal {U}_{\alpha })\) whenever \(f \in \gamma _{j}(\mathcal {U}_{\alpha })\) and \(g \in \mathcal {U}_{\alpha }\). This simply means that \(f\in C_{G/\gamma _{j+1}(\mathcal {U}_{\alpha })} (\mathcal {U}_{\alpha }/\gamma _{j+1} (\mathcal {U}_{\alpha }))\), and by (2.1.v) we obtain

$$ \begin{array}{@{}rcl@{}} \left[{h_{\alpha}f},\kern.1em_{\omega} \kern.1em {h_{\alpha+1}} \right] &\equiv \left[{h_{\alpha}},\kern.1em_{\omega} \kern.1em {h_{\alpha+1}} \right]\left[{f},\kern.1em_{\omega} \kern.1em {h_{\alpha+1}} \right] \mod \gamma_{j+1}(\mathcal{U}_{\alpha}). \end{array} $$
(3)

Summing up we get

$$ \begin{array}{@{}rcl@{}} h_{\alpha}f &=& \left[{h_{\alpha}f},\kern.1em_{\omega} \kern.1em {h_{\alpha+1}} \right] \text{(by (1))} \\[.2em] &\equiv& \left[{h_{\alpha}},\kern.1em_{\omega} \kern.1em {h_{\alpha+1}} \right]\left[{f},\kern.1em_{\omega} \kern.1em {h_{\alpha+1}} \right]\mod \gamma_{j+1}(\mathcal{U}_{\alpha}) \text{(by (3))} \\[.2em] &=& h_{\alpha} \cdot 1, \text{(by (2))} \end{array} $$

so that \(f\in \gamma _{j+1}(\mathcal {U}_{\alpha })\).

Going along the j’s we arrive to the conclusion that \(f \in \gamma _{\omega }(\mathcal {U}_{\alpha }) = \{1\}\), as promised. □

Now, picking \(h_{d} \in G\setminus \mathcal {U}_{d-1}\), the consecutive use of Lemma 2 supplies us with elements hd− 1,…,h1 that allow us to define conjunction-like polynomials \({\mathbf {q}}^{(k)}_{\alpha + 1}(x_{1}, \dots , x_{k}) ={\mathbf {q}}^{(k)}(h_{\alpha },x_{1}, \dots , x_{k}, h_{\alpha + 1})\). Note here that, since the terms q(k) use iterated commutators (ω ⋅ (k + 2) times), their sizes are exponential in k. However, to get a conjunction on n = kd− 1 elements we first split these elements into kd− 2 groups, each having k elements. If there were only two cosets of G of \(\mathcal {U}_{d-1}\), then applying to each such k element group the polynomial \({\mathbf {q}}^{(k)}_{d}\) everything would be sent into \(\mathcal {U}_{d-2} \cup h_{d-1}\cdot \mathcal {U}_{d-2}\). Now, we group the obtained kd− 2 values into kd− 3 groups, each of size k and apply \({\mathbf {q}}^{(k)}_{d-1}\) to each such group. Repeating this procedure we finally arrive into \(\mathcal {U}_{1}\) ensuring that the appropriate composition of the \({\mathbf {q}}^{(k)}_{\alpha }\)’s returns either the value 1 or h1. One can easily notice that the size (i.e., length as a word) of such composed polynomial is \(2^{\mathcal {O}(k)} = 2^{\mathcal {O}(n^{\frac {1}{d-1}})}\).

Unfortunately, the behaviour of the \({\mathbf {q}}^{(k)}_{d}\)’s and the entire long compositions can be controlled only on two cosets of \(\mathcal {U}_{d-1}\). This requires \(\left |\mathinner {G/\mathcal {U}_{d-1}}\right | = 2\)— which very seldom is the case. Thus, the very top level requires a very careful treatment. First, we replace the group G with a smaller subgroup G0 of the same Fitting length but such that G0 is abelian over its \(\mathcal {U}_{d-1}\). Then we find a normal subgroup \(\mathcal {U}_{d-1}\leqslant H \mathrel {\lhd } G_{0}\) so that we will be able to control the behaviour of the q(k)’s on all cosets of H in G0. The first step towards realizing this idea is described in the next observation.

Lemma 3

In each finite solvable group G there is a subgroup G0 satisfying:

  • G0 is inducible,

  • FitLen(G0) = FitLen(G) = d, and

  • \(G_{0}/\mathcal {U}_{d-1}(G_{0})\) is abelian.

Proof

We simply set G0 = γm(G) where m is maximal with \(\gamma _{m}(G) \not \leqslant \mathcal {U}_{d-1}(G)\). This secures FitLen(G0) = d. To see that all groups γj(G) in the lower central series

$$ G = \gamma_{0}(G) \geqslant \gamma_{1}(G) \geqslant {\dots} \geqslant \gamma_{\omega}(G) $$

are inducible, we induct on j and argue like in [10, Lemma 5]. Let γj(G) be the image of the polynomial \(\mathbf {p}(\overline {x})\). Every element in γj+ 1(G) = [γj(G),G] is a product of at most |G| elements of the form [z, y], where z ranges over γj(G) and y over entire G. Thus, introducing new sequences of pairwise different variables \(\overline {x}^{1},\ldots , \overline {x}^{|{G}|}\) we can produce γj+ 1(G) as the image of the polynomial \({\prod }_{i=1}^{\left |\mathinner {G}\right |} [\mathbf {p}(\overline {x}^{i}),y_{i}]\).

Finally, \(G_{0}/\mathcal {U}_{d-1}(G_{0})\) is abelian as we have \([G_{0},G_{0}] = [\gamma _{m}(G),\gamma _{m}(G)] \leqslant [\gamma _{m}(G),G] =\gamma _{m+1}(G) \leqslant \mathcal {U}_{d-1}\), where the last inclusion is a consequence of the maximality of m. □

From now on we simply change notation and replace our starting group G by G0, or in other words we assume that \(G/\mathcal {U}_{d-1}(G)\) is abelian. Now, to construct (and control) the promised normal subgroup H first we pick K G among the minimal (with respect to inclusion) normal subgroups satisfying:

  • [K, G] = K and

  • FitLen(K) = d − 1.

Since γω(G) satisfies both above conditions, such K indeed exists.

  1. (3.1)

    K is indecomposable, i.e. if K = K1K2 for some K1K2 G then K = K1 or K = K2.

Proof

Suppose that (K1,K2) is a minimal pair (coordinatewise) with K = K1K2. Since K = [K, G] = [K1K2,G] = [K1,G][K2,G] and \([K_{i},G] \leqslant K_{i}\), we immediately get [Ki,G] = Ki for both i = 1,2. Now if Ki < K, then minimality of K gives \(\operatorname {FitLen}(K_{i}) \leqslant d-2\). If this happens for both K1 and K2, then \(d-1 =\operatorname {FitLen}(K)= \operatorname {FitLen}(K_{1}K_{2}) = \max \limits \left \{ \mathinner {\operatorname {FitLen}(K_{1}),\operatorname {FitLen}(K_{2})} \right \}\leqslant d-2\), a contradiction. □

By (3.1) we know that there exists the unique K0 G with K0 < K and such that there is no normal subgroup of G that lies strictly between K0 and K.

Note that, if aKK0, we cannot have \(\left <\!\left < \mathinner {a} \right >\!\right >\leqslant K_0\). This gives

  1. (3.2)

    For all aKK0 we have \(\left <\!\left < \mathinner {a} \right >\!\right > = K\).

The other consequence of the fact that the solvable group G has no normal subgroups strictly between K0 and K is the following.

  1. (3.3)

    K/K0 is abelian.

We will also need:

  1. (3.4)

    \(\left [{K_0},_{\omega } {G} \right ] \leqslant \mathcal {U}_{d-2}(K)\).

Proof

By our choice of ω, we have \([\left [{K_0},_{\omega } {G} \right ], G ] = \left [{K_0},_{\omega } {G} \right ]\). Since \(\left [{K_0},_{\omega } {G} \right ]\leqslant K_0\) is strictly contained in K and K was chosen to be minimal with [K, G] = K and FitLen(K) = d − 1, we must have \(\operatorname {FitLen}(\left [{K_0},_{\omega } {G} \right ]) \leqslant d-2\). □

Now we are ready to define the normal subgroup H of G. We simply put H to be the centralizer in G of K modulo K0, i.e the largest normal subgroup with \([H,K]\leqslant K_0\). Then obviously \(H = \left \{ \mathinner {g \in G}\vphantom {[K,g] \leqslant K_0} \left | \vphantom {g \in G}\mathinner {[K,g] \leqslant K_0} \right . \right \}\).

  1. (3.5)

    \(\mathcal {U}_{d-1} \leqslant H < G\). In particular, G/H is abelian.

Proof

To see that H < G suppose otherwise, i.e. \([K,G] \leqslant K_0\). This, however, contradicts our choice of K to satisfy [K, G] = K.

The first inclusion is simply equivalent to \([K,\mathcal {U}_{d-1}] \leqslant K_0\). Indeed, since FitLen(K) = d − 1, we have \(\left [{K},_{\omega } {\mathcal {U}_{d-1}} \right ] \leqslant \gamma _{\omega }(\mathcal {U}_{d-1}) \leqslant \mathcal {U}_{d-2}\) and, thus, \([K,\mathcal {U}_{d-1}]< K\). Since we assumed \(G/\mathcal {U}_{d-1}\) to be abelian, the second part of the statement follows. □

Directly from our definitions, we know that [x, y] ∈ K0 whenever xK and yH. But the reason for our careful choice of K and then H was to have a precise control over the behaviour of [x, y] for y in other cosets of H (and x still in K.)

Thus, for any gG we define a map φg : KK/K0 by φg(x) = [x, g] ⋅ K0. Since by (3.3) is K/K0 is abelian, using (2.1.i), one can easily check that φg is a group homomorphism for all gG. Also we have \(\varphi _{g}(K_0) \leqslant K_0\), i.e. the kernel of this homomorphism contains K0 so that φg actually induces a homomorphism K/K0K/K0. We also write φg for this induced homomorphism.

  1. (3.6)

    If gGH, then φg : K/K0K/K0 is an isomorphism.

Proof

We start with showing that for gG

$$ \begin{array}{@{}rcl@{}} \varphi_{g}(x^{b}) = \varphi_{g}(x)^{b} \end{array} $$
(4)

whenever xK and bG. Indeed, by (3.3), we can write bg = hgb for some hH. Then we have

$$ \begin{array}{@{}rcl@{}} \varphi_{g}(x^{b}) &=& [x^{b},g]\cdot K_0\\ & =& (x^{b})^{-1} g^{-1} b^{-1} x bg \cdot K_0\\ & =& (x^{b})^{-1} b^{-1} g^{-1}h^{-1} x hgb \cdot K_0\\ & =& (x^{b})^{-1} b^{-1} g^{-1}x gb\cdot K_0 {}(\text{since}~h \in H)\\ & =& (x^{-1} g^{-1}x g)^{b}\cdot K_0 \\ &=& \varphi_{g}(x)^{b}. \end{array} $$

To see that the kernel of the original φg is K0, pick aKK0, so that, by (3.2), every element xK can be represented as \(x= a^{g_{1}} {\cdots } a^{g_{n}}\) for some \(g_{1}, \dots , g_{n} \in G\). Now, if φg(a) = K0, then (4) gives φg(x) = K0 for all xK. This would however put g into the centralizer H, contrary to our assumption. □

Note that (4) means that φg is not only a group homomorphism but actually a homomorphism of G-modules. Here K/K0 is a G-module under the action of G on K/K0 via conjugation. In terms of modules the proof of (3.6) is stated even easier: The kernel of φg has to be a submodule of K/K0. However, by (3.2) K/K0 is generated, as a G-module, by any of its non-trivial elements.

Remark 1

Notice, that for (3.6), we need G/H to be abelian. Indeed, in general, if N is a minimal (and, thus, indecomposable) normal subgroup with [N, G] = N, the map NN defined by x↦[x, g] is not necessarily bijective for all gCG(N). For instance take the semidirect product \((C_{3} \times C_{3}) \rtimes D_{4}\) where \(D_{4} = \left < \mathinner {a,b} \middle | \mathinner {a^{2}=b^{2}=(ab)^{4}=1} \right >\) is the dihedral group of order 8 and a acts by exchanging the two components of C3 × C3 and b by inverting the second one. Then, N = C3 × C3 is an indecomposable normal subgroup and [N, G] = N but aCG(N) and [(1,1),a] = [(2,2),a] = 1, so x↦[x, g] is not bijective on N (here we use an additive notation for \(C_{3} = \left \{ \mathinner {0,1,2} \right \} \)).

We summarize our observations in the following claim.

  1. (3.7)

    For all xK we have

    $$ \begin{array}{@{}rcl@{}} \tilde{\mathbf{q}}^{(1)}(x,y) \in \left\{\begin{array}{lll} x K_0, &\text{if } y \not\in H,\\ K_0, &\text{if } y \in H. \end{array}\right. \end{array} $$

Proof

Note first that ω was chosen to satisfy \(\left [{x},_{\omega } {y} \right ] = \left [{x},_{2\omega } {y} \right ]\). Moreover, for a fixed gG the unary polynomial \(\tilde {\mathbf {q}}^{(1)}(x,g)\) acts on K as the composition \(\varphi _{g}^{\omega }\) of φg with itself ω times. Now, if gH, then (3.6) yields that \(\varphi _{g}^{\omega }\) is the identity on the quotient K/K0. Moreover, \(\varphi _{g}^{\omega }\) is constant K0 for gH. □

With claim (3.7) we are ready to construct polynomials that will allow us to code coloring or 3SAT at the very top level.

Lemma 4

There is \(h \in K\setminus \mathcal {U}_{d-2}\) and families of polynomials

$$ \begin{array}{@{}rcl@{}} &&{\mathbf{r}}^{(k)}(y_{1}, \dots, y_{k}) \qquad\qquad\qquad\qquad\qquad\qquad \text{ and }\\ &&{\mathbf{s}}^{(k)}(y_{1,1},y_{1,2},y_{1,3} \dots, y_{k,1},y_{k,2},y_{k,3}) \end{array} $$

of length \(2^{\mathcal {O}(k)}\) such that

$$ \begin{array}{@{}rcl@{}} {\mathbf{r}}^{(k)}(\overline{y}) &\in \left\{\begin{array}{lll} h \cdot \mathcal{U}_{d-2}, & \text{if } y_{i} \not\in H \text{ for all \textit{i}},\\ \hphantom{h\cdot{}}\mathcal{U}_{d-2}, & \text{if } y_{i} \in H \text{ for some \textit{i}}, \end{array}\right. \end{array} $$
(5)

and

$$ \begin{array}{@{}rcl@{}} {\mathbf{s}}^{(k)}(\overline{y}) &\in \left\{\begin{array}{lll} h \cdot \mathcal{U}_{d-2}, & \text{if for all}~i~\text{there is some}~j~\text{with } y_{i,j} \in H,\\ \hphantom{h\cdot{}}\mathcal{U}_{d-2}, & \text{if } y_{i,1},y_{i,2},y_{i,3} \not\in H \text{for some} ~i. \end{array}\right. \end{array} $$
(6)

Proof

First, we use (3.7) and induct on k in order to see that for all aKK0 we have

$$ \begin{array}{@{}rcl@{}} \tilde{\mathbf{q}}^{(k)}(a,y_{1}, \dots, y_{k}) \in \left\{\begin{array}{lll} a K_0, &\text{if } y_{i} \not\in H \text{ for all}~i,\\ K_0, &\text{if } y_{i} \in H \text{ for some}~i. \end{array}\right. \end{array} $$

Now we fix some arbitrary aKK0 and gGH. Then obviously also \(h = \left [{a},_{\omega } {g} \right ]\in K \setminus K_0\). Actually \(h\not \in \mathcal {U}_{d-2}\), as otherwise \(h\in \mathcal {U}_{d-2} \cap K \leqslant K_0\).

Now, by (3.4) we know that \(M := \left [{K_0},_{\omega } {G} \right ]\leqslant \mathcal {U}_{d-2}(K)\). By (2.1.iii) it follows that

$$ \begin{array}{@{}rcl@{}} {\mathbf{q}}^{(k)}(a,y_{1}, \dots, y_{k},g) \in \left\{\begin{array}{lll} h M, &\text{if } y_{i} \not\in H \text{ for all}~i,\\ M, &\text{if } y_{i} \in H \text{ for some}~i. \end{array}\right. \end{array} $$

Thus, \({\mathbf {r}}^{(k)}(y_{1}, \dots , y_{k}) = {\mathbf {q}}^{(k)}(a,y_{1}, \dots , y_{k},g) \) satisfies (5). Clearly, its length is in \(2^{\mathcal {O}(k)}\).

To construct the polynomials s(k), we first define

$$ {\mathbf{p}}(x,y_{1}, y_{2},y_{3}) = x\cdot \tilde{\mathbf{q}}^{(3)}(x, y_{1},y_{2},y_{3})^{-1}. $$

Then for all xK, by (3.7), we have

$$ \begin{array}{@{}rcl@{}} {\mathbf{p}}(x,y_{1}, y_{2},y_{3}) \in \left\{\begin{array}{lll} K_0, &\text{if } y_{j} \not\in H \text{ for all}~j,\\ x K_0, &\text{if } y_{j} \in H \text{ for some}~j. \end{array}\right. \end{array} $$

Now, with a, g, h and M as above, we proceed as with the r(k)’s to define

$$ \begin{array}{@{}rcl@{}} \tilde{\mathbf{s}}^{(k)}(\overline{y}) &= {\mathbf{p}}({\cdots} {\mathbf{p}}(a, y_{1,1},y_{1,2},y_{1,3}),{\dots} ,y_{k,1},y_{k,2},y_{k,3}) \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} {\mathbf{s}}^{(k)}(\overline{y}) &= \left[{\vphantom{k^{k}}\tilde{\mathbf{s}}^{(k)}(\overline{y})},\kern.1em_{\omega} \kern.1em {g} \right]. \end{array} $$

As previously, (6) follows from (2.1.iii). □

Our next claim summarizes Lemma 2 and 4.

Lemma 5

For \(1 \leqslant \alpha \leqslant d-1\) there are elements hα≠ 1 and families of polynomials

$$ \begin{array}{@{}rcl@{}} &&{\mathbf{r}}_{\alpha}^{(m)}(y_{1}, \dots, y_{m}) \qquad\qquad\qquad\qquad\qquad\qquad \text{ and }\\ &&{\mathbf{s}}_{\alpha}^{(m)}(y_{1,1},y_{1,2},y_{1,3} \dots, y_{m,1},y_{m,2},y_{m,3}) \end{array} $$

of length \(2^{\mathcal {O}(m^{\frac {1}{d-\alpha }})}\) such that

$$ \begin{array}{@{}rcl@{}} {\mathbf{r}}_{\alpha}^{(m)}(\overline{y}) &\in \left\{\begin{array}{lll} h_{\alpha}\cdot \mathcal{U}_{\alpha-1}, & \text{if } y_{i} \not\in H \text{ for all}~i,\\ \hphantom{h_{\alpha}\cdot{}}\mathcal{U}_{\alpha-1}, & \text{if } y_{i} \in H \text{ for some}~i, \end{array}\right. \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} {\mathbf{s}}_{\alpha}^{(m)}(\overline{y}) &\in \left\{\begin{array}{lll} h_{\alpha}\cdot \mathcal{U}_{\alpha-1}, & \text{if for all}~i~\text{there is some}~j~\text{with } y_{i,j} \in H,\\ \hphantom{h_{\alpha}\cdot{}}\mathcal{U}_{\alpha-1}, & \text{if } y_{i,1},y_{i,2},y_{i,3} \not\in H \text{ for some \textit{i}}. \end{array}\right. \end{array} $$

Proof

We induct downwards on α = d − 1,…,2,1. To start with we refer to 4 to set hd− 1 = h while \({\mathbf {r}}_{d-1}^{(m)}(\overline {y})={\mathbf {r}}^{(m)}(\overline {y})\) and \({\mathbf {s}}_{d-1}^{(m)}(\overline {y})={\mathbf {s}}^{(m)}(\overline {y})\).

Now let α < d − 1 and set \(k = \left \lceil \mathinner {\sqrt [d-\alpha ]{m}} \right \rceil \) and \(\ell = \left \lceil \mathinner {\frac {m}{k}} \right \rceil \). By possibly duplicating some of the variables we may assume that m = k.

To define \({\mathbf {r}}_{\alpha }^{(m)}(\overline {y})={\mathbf {r}}_{\alpha }^{(m)}(y_{1}, \dots , y_{m})\) we first refer to Lemma 2 to get hα from hα+ 1 and then we set

$$ \begin{array}{@{}rcl@{}} {\mathbf{r}}_{\alpha}^{(m)}(\overline{y}) &= {\mathbf{q}}^{(k)}\!\left( h_{\alpha},\mathbf{r}_{\alpha+1}^{(\ell)}(y_{1},{\dots} ,y_{\ell}), \dots, \mathbf{r}_{\alpha+1}^{(\ell)}(y_{m-\ell+1},{\dots} ,y_{m}), h_{\alpha+1}\right), \end{array} $$

where the polynomial \({\mathbf {r}}_{\alpha +1}^{(\ell )}\) is supplied by the induction hypothesis. From Lemma 2 it should be clear that \({\mathbf {r}}_{\alpha }^{(m)}\) satisfies the condition claimed for it.

Also its length can be bounded inductively. Substituting to the polynomial \({\mathbf {q}}^{(k)}(h_{\alpha }, x_{1}, \dots , x_{k},h_{\alpha +1})\) of length \(2^{\mathcal {O}(k)}\) (by Lemma 2) the \(k= m^{\frac {1}{d-\alpha }}\) copies of the polynomial \({\mathbf {r}}_{\alpha +1}^{(\ell )}\) of length \(2^{\mathcal {O}\bigl (\ell ^{\frac {1}{d-\alpha -1}}\bigr )}\) and using \(\ell = m^{\frac {d-\alpha -1}{d-\alpha }}\) we arrive at the following bound for the length of \({\mathbf {r}}_{\alpha }^{(m)}\)

$$ \begin{array}{@{}rcl@{}} 2^{\mathcal{O}(k)}\cdot 2^{\mathcal{O}(\ell^{\frac{1}{d-\alpha-1}})} &=& 2^{\mathcal{O}\left( m^{\frac{1}{d-\alpha}} + \left( m^{\frac{d-\alpha-1}{d-\alpha}}\right)^{\frac{1}{d-\alpha-1}}\right)}\\ &=&2^{\mathcal{O}\left( m^{\frac{1}{d-\alpha}} + m^{\frac{d-\alpha-1}{d-\alpha}\cdot \frac{1}{d-\alpha-1}}\right)}\\ &=&2^{\mathcal{O}\left( m^{\frac{1}{d-\alpha}}\right)}. \end{array} $$

In a very similar way we produce \({\mathbf {s}}_{\alpha }^{(m)}(\overline {y})\) from the \({\mathbf {s}}_{\alpha +1}^{(\ell )}\)’s by simply putting

$$ \begin{array}{@{}rcl@{}} {\mathbf{s}}_{\alpha}^{(m)}(\overline{y}) &=& {\mathbf{q}}^{(k)}\big(h_{\alpha},{\mathbf{s}}_{\alpha+1}^{(\ell)}(y_{1,1},y_{1,2},y_{1,3}, \dots, y_{\ell,1},y_{\ell,2},y_{\ell,3}),\ldots\\ & &\dots, {\mathbf{s}}_{\alpha+1}^{(\ell)}(y_{m-\ell+1,1},y_{m-\ell+1,2},y_{m-\ell+1,3}, \dots, y_{m,1},y_{m,2},y_{m,3}),h_{\alpha+1}\big). \end{array} $$

Now we are ready to conclude our proof of Theorem 1. Recall that due to Lemma 3 we are working in the group G in which \(G/\mathcal {U}_{d-1}G\) is abelian. We are going to reduce 3SAT or C-Coloring to \({\textsc {{PolSat}}} \!\left ({G} \right )\) and \({\textsc {{PolEqv}}} \!\left ({G} \right )\) depending on whether \(C=\left |\mathinner {G/H}\right |>2\) or not. In either case the reduction from C-Coloring to \({\textsc {{PolSat}}} \!\left ({G} \right )\) and PolEqv(G) works; however, the case C = 2 has to be treated in a different way since 2-Coloring is decidable in polynomial time.

In our reduction the formula Φ from 3SAT (or a graph Γ from C-Coloring) is transformed to a polynomial sΦ (or rΓ) and a group element h1 so that the following will hold:

  1. (A)

    the length of sΦ (resp. rΓ) is in \(2^{\mathcal {O}(\sqrt [d-1]{m})}\) where m is the number of clauses (resp. the number of edges),

  2. (B)

    sΦ (resp. rΓ) can be computed in time \(2^{\mathcal {O}(\sqrt [d-1]{m})}\) (i.e., polynomial in the length of sΦ (resp. HCode \({\mathbf {r}}_{\kern -.12em {\varGamma }}))\),

  3. (C)

    if Φ is satisfiable (resp. Γ has a valid C-coloring), then sΦ = h1 (resp. rΓ = h1) is satisfiable, and,

  4. (D)

    if Φ is not satisfiable (resp. Γ does not have a valid C-coloring), then sΦ = 1 (resp. rΓ = 1) holds under all evaluations.

The latter two points imply that sΦ = h1 (resp. rΓ = h1) is satisfiable if and only if Φ is satisfiable (resp. Γ has a valid C-coloring) and sΦ = 1 (resp. rΓ = 1) holds identically in G if and only if Φ is not satisfiable (resp. Γ does not have a valid C-coloring).

Now, if denotes the input length for PolSat or PolEqv (i.e. the size of sΦ or rΓ), then an algorithm for PolSat or PolEqv working in \(2^{o(\log ^{d-1}\ell )}\)-time would solve 3SAT (resp. C-Coloring) in time

$$ 2^{\mathcal{O}(\sqrt[d-1]{m})} + 2^{o(\log^{d-1}(2^{\sqrt[d-1]{m}}))} = 2^{o(m)}, $$

contradicting ETH.

We start with describing the reduction from C-Coloring to \({\textsc {{PolSat}}} \!\left ({G} \right )\) and \({\textsc {{PolEqv}}} \!\left ({G} \right )\) where \(C=\left |\mathinner {G/H}\right |\). The quotient \(\left |\mathinner {G/H}\right |\) serves as the set of colors. For a graph Γ = (V, E) with \(E \subseteq \binom {V}{2}\), \(\left |\mathinner {V}\right | = n\) and \(\left |\mathinner {E}\right | = m\), we use variables xv for vV. For an edge {u, v}∈ E the value of \(x_{u}x_{v}^{-1}\) (modulo H) decides whether the vertices u, v have the same color. To control whether the coloring of Γ is proper we define the polynomial rΓ by putting

$$ {\mathbf{r}}_{{\varGamma}}((x_{v})_{v\in V}) = {\mathbf{r}}_{1}^{(m)}\!\!\left( (x_{u}x_{v}^{-1})_{\{u,v\}\in E}\right) $$

where \({\mathbf {r}}_{1}^{(m)}\) and h1 are supplied by Lemma 5—and, thus, meet the length bound (A). Point (B) is clear from the definition of the polynomial. Notice that the edges can be fed into \(\mathbf {r}_{1}^{(m)}\) in any order without affecting the final value of polynomials. Every evaluation of the variables xv by elements of G defines a coloring χ : VG/H in a natural way. If this coloring is valid (i.e. χ(u)≢χ(v) mod H for every edge {u, v}∈ E), then all the expressions χ(u)χ(v)− 1 are not in H and Lemma 5 ensures us that rΓ((xv)vV) = h1. This shows (C).

Conversely, by Lemma 5, for every evaluation of the xv’s by elements of G that does not satisfy the equation rΓ((xv)vV) = 1, we have \(x_{u} x_{v}^{-1} \not \in H\) for all edges {u, v}. This obviously yields a valid coloring of Γ—hence, it proves (D).

As 2-Coloring is solvable in polynomial time in the case \(\left |\mathinner {G/H}\right | =2\), we interpret 3SAT and use the two cosets of H in G as the true/false boolean values. We start with the formula

$$ {\varPhi} = (A_{1,1} \lor A_{1,2}\lor A_{1,3}) \land {\cdots} \land (A_{m,1} \lor A_{m,2}\lor A_{m,3}), $$

where each literal Ai, j is either one of the boolean variables \(X_{1}, \dots , X_{n}\) or its negation. First, we transform the literals Ai, j into the expressions xi, j that are supposed to range over G by picking gGH and then setting

$$ x_{i,j} = \left\{\begin{array}{lll} gx_{k}, &\text{if } A_{i,j} = X_{k},\\ x_{k}, &\text{if } A_{i,j} = \lnot X_{k}. \end{array}\right. $$

Finally, we set

$$ \begin{array}{@{}rcl@{}} {\mathbf{s}}_{{\varPhi}}(x_{1},\ldots,x_{n}) &= {\mathbf{s}}_{1}^{(m)}\!\left( x_{1,1},x_{1,2},x_{1,3}, \dots,x_{m,1},x_{m,2},x_{m,3}\right) \end{array} $$

where again \({\mathbf {s}}_{1}^{(m)}\) is supplied by Lemma 5.

Now, given an assignment to the boolean variables \(X_{1}, \dots , X_{n}\), we obtain an assignment for \(x_{1}, \dots , x_{n}\) by setting xi = g if Xi is true and xi = 1 if Xi is false. It can be easily checked using Lemma 5 that the original assignment was satisfying for Φ if and only if \({\mathbf {s}}_{{\varPhi }}(\overline {x}) = h_{1}\) is satisfied (notice that g2H). This shows (C). On the other hand, if \({\mathbf {s}}_{{\varPhi }}(\overline {x}) \neq 1\), then, by Lemma 5, for all i there is some j with xi, jH. Hence, if we assign true to Xk if and only if xkH, we obtain a satisfying assignment for Φ—proving (D).

Notice that also in the case \(\left |\mathinner {G/H}\right | \geqslant 3\) it would be possible to describe a reduction of 3SAT to \({\textsc {{PolSat}}} \!\left ({G} \right )\). However, in order to encode negations of literals, we need to restrict the variables to only two possible values (modulo H). This can be done using the polynomials we constructed for the reduction from C-Coloring (which we can use to “forbid” any undesired value). Nevertheless, the total construction would be more complicated than the two individual reductions we described above.

4 Conclusion

With Theorem 1 in mind, one could suspect that finite solvable groups of Fitting length 2 have polynomial time algorithms for PolSat. As we have already mentioned, the very recent paper [8] shows that PolSat is in P for many such groups, in particular, for all semidirect products \(G_{p} \rtimes A\), where Gp is a p-group and A is abelian. This, however, does not cover e.g. the dihedral group D15. In fact, in [19] \({\textsc {{PolSat}}} {\!\left ({D_{15}} \right )}\) is shown not to be in P, unless ETH fails. On the other hand, \({\textsc {{PolEqv}}} {\!\left ({D_{15}} \right )}\in \mathsf {P}\). Actually from [8] we know that \(\textsc {{PolEqv}} \!\left ({G} \right ) \in \mathsf {P}\) for each semidirect product \(G = N \rtimes A\) where N is nilpotent and A is abelian. In fact, D15 is the first known example of a group with polynomial time PolEqv and non-polynomial (under ETH) PolSat. The converse situation cannot happen as, for a group G, \({\textsc {{PolSat}}} \!\left ({G} \right )\in \mathsf {P}\) implies \({\textsc {{PolEqv}}} \!\left ({G} \right )\in \mathsf {P}\). Indeed, to confirm that \(\mathbf {t}(\overline {x})=1\) holds for all possible values of the \(\overline {x}\)’s, we check that for no gG ∖{1} the equation \(\mathbf {t}(\overline {x})=g\) has a solution.

We conclude our paper with two obvious questions.

Problem 1

Characterize finite solvable groups (of Fitting length 2) with PolSat decidable in polynomial time.

Problem 2

Characterize finite solvable groups (of Fitting length 2) with PolEqv decidable in polynomial time.

Finally, we want to point out the consequences of our main result to another problem: For a finitely generated (but possibly infinite) group with a finite set of generators Σ the power word problem is as follows: The input is a tuple (p1,x1,p2,x2,…,pn,xn) where the pi are words over Σ and the xi are integers encoded in binary. The question is whether \(p_{1}^{x_{1}} {\cdots } p_{n}^{x_{n}}\) evaluates to the identity of the group. The complexity of the power word problem in a wreath product \(G \wr {\mathbb {Z}}\) where G is a finite group has a similar behaviour as PolEqv: if G is nilpotent, the power word problem of \(G \wr {\mathbb {Z}}\) is in polynomial time [6] (actually even in TC0) and, if G is non-solvable, it is coNP-complete [24]. Indeed, in [6] a surprising connection to PolEqv has been pointed out: if G is a finite group, then PolEqv\(\left ({G} \right )\) can be reduced in polynomial time to the power word problem of the wreath product \(G \wr {\mathbb {Z}}\). In particular, Theorem 1 implies that the power word problem of \(G \wr {\mathbb {Z}}\) where G is a finite solvable group of Fitting length at least three is not in P assuming ETH.