Algorithmically complex residually finite groups

We construct the first examples of an algorithmically complex finitely presented residually finite groups and first examples of finitely presented residually finite groups with arbitrarily large (recursive) Dehn function and depth function. The groups are solvable of class 3. We also prove that the universal theory of finite solvable of class 3 groups is undecidable.


Introduction
1.1. The problem and previous approaches for a solution. It is well known that finitely presented residually finite groups are much simpler algorithmically than arbitrary finitely presented groups. For example, the word problem in every such group is decidable. Moreover the most "common" residually finite groups, the linear groups over fields, are algorithmically "tame": the word problem in any linear group is decidable in polynomial time and even log-space [35]. Dehn functions are important witnesses of the complexity of the word problem. Although it is well known that groups with solvable word problem can have very large Dehn functions [40,58], no examples of finitely presented residually finite groups with superexponential Dehn function are known. Thus one of the main open problems in this area is how large could the Dehn function of a residually finite finitely presented group be. The question was known since early 90s. It was open for so long because all known methods to construct algorithmically hard groups produced either nonresidually finite groups or groups where the question about their residual finiteness is very difficult. Not much is known even for linear groups (Steve Gersten asked [19,20] if there exists a uniform upper bound for Dehn functions of linear groups). Let us briefly discuss the previous attempts to solve the problem and the reasons why these methods did not work.
1.1.1. Method 1. Known groups with large Dehn functions. One could hope that some of the known finitely presented groups with very large Dehn function may turn out to be residually finite, which would shed some light on how to produce residually finite groups with even larger Dehn functions. Unfortunately, this is not the case, all these groups are non-residually finite. For example, the Dehn function of the one relator group G (1,2) = a, b | b −1 a −1 bab −1 ab = a 2 introduced by Baumslag in [2] is bigger than any iterated exponent (see Gersten [18]). Platonov [51] proved that it is equivalent to the function exp (log 2n) (1), where exp (m) (x) is the function defined by exp (0) (n) = n and exp (k+1) (n) = exp(exp (k) (n))). However, G (1,2) is not residually finite (and in fact has very few finite quotients) [2]. Furthermore, the word problem in G (1,2) is in polynomial time [42].
1.1.2. Method 2. Using subgroups with very large distortion. Consider a finitely presented group G and a "badly" distorted finitely generated subgroup H. Let T = G, t | t −1 ht = h (h ∈ H) be the HNN extension of G where the free letter t centralizes H. It was noticed by Bridson and Häfliger [11,Theorem 6.20.III] that the Dehn function of T is at least as large as the distortion function of H in G. The following result puts some limitations on this method of constructing complicated residually finite groups. Lemma 1.1. If the group T is residually finite then H is closed in the pro-finite topology of G.
finitely presented group Q gives a finitely presented residually finite small cancellation group G with a short exact sequence where N finitely generated. It is easy to see that the distortion function of N in G is at least as bad as the Dehn function in Q, so choosing Q properly one can get a finitely presented residually finite group G with a highly distorted subgroup N . Now, the subgroup N is normal in the HNN extension T . So it is closed in the pro-finite topology of T only if Q = G/N is residually finite. By Lemma 1.1, T can be residually finite only if Q is residually finite. In other words to construct a complicated finitely presented residually finite group T one has to have the initial group Q complicated, finitely presented and residually finite as well.
The second example is the standard Mikhailova construction. In this case highly distorted subgroups of the direct product of two free groups F 2 × F 2 can be obtained as equalizers of two homomorphisms φ 1 : E 1 → M and φ 2 : E 2 → M where E 1 , E 2 are finitely generated subgroups of F 2 and M is finitely presented (see Sections 1.4,5 below). But by Remark 5.4 below the equalizer is closed in the pro-finite topology only if M is residually finite. Thus as in the previous example, in order to construct an algorithmically complex residually finite finitely presented group using Mikhailova's construction and the HNN extensions as above, one needs to have already a finitely presented residually finite algorithmically complex group M .
The third example is Cohen's construction of highly distorted subgroups employing the modular machines [12]. One can also prove that in that construction the subgroup will be pro-finitely closed only if the modular machine is very easy.
One can also try to use the hydra groups [14,10] to construct HNN extensions as above with Dehn functions bigger than any prescribed Ackermann function. The question of whether these groups are residually finite is open (most probably the answer is negative because the distorted subgroup should not be closed in the pro-finite topology, but this needs a proof).
1.1.3. Method 3. Boone-Novikov constructions. One of the standard ways to produce algorithmically complicated groups is by simulating Turing machines using free constructions (HNN extensions and amalgamated products) which goes back to the seminal papers by Boone and Novikov (see, for example, [54]). There are currently many versions of that construction (for a recent survey see [57]). But in fact, it can be shown that for each known version of the proof of Boone-Novikov theorem using free constructions, even for easy Turing machines the corresponding group is non-residually finite. This is, for instance, the main idea of the example in [31]. Here is an even easier example. Let G = G(M ) be a group constructed by any of these constructions. Then for every input word u of the Turing machine M there exists a word w = w(u) obtained by inserting some copies of u in w(∅), so that u is accepted by M if and only if w(u) = 1 in G. Now consider M that accepts a word a n if and only if n = 0 (that machine is actually one of the basic building blocks in [58]). Then w(a n ) = 1 in G if and only if n = 0. Suppose that there exists a homomorphism φ onto a finite group H that separates w(∅) = w(a 0 ) from 1. Then φ(a) has finite order, say, s, in H. Therefore φ(w(∅)) = φ(w(a 0 )) = φ(w(a s )) = φ(1) = 1, a contradiction. Hence G(M ) is not residually finite.
1.1.4. Method 4. Residually finite groups obtained by free constructions. In general, the question about residual finiteness of free constructions is very difficult. Currently there are only two large classes of groups where the question was settled: these are ascending HNN extensions of free groups [7] and certain groups acting "nicely" on CAT(0)-cubical complexes including small cancelation groups (see the recent work of Wise, for example, [25] and references therein). All these groups have easy word problem and uniformly bounded Dehn functions. The reason for the lack of more examples is that groups obtained by free constructions from "nice" groups contain a lot of extra elements and it is not at all clear how to separate these elements from 1 by homomorphisms onto finite groups. In the two cases when it could be done, it was possible to reformulate the problem in the language of algebraic geometry and geometric topology, respectively. 1.1.5. Method 5. Other groups with complicated word problem. There are several other constructions of groups with complicated word problem but each of these also almost always produce non-residually finite groups. For example, the group in [38] is based on the R. Thompson group V which is infinite and simple (hence not residually finite).
1.1.6. The main results of the paper. In this paper, we construct finitely presented residually finite groups with arbitrary complex word problem, and also easy word problem but arbitrary high (of course recursive) Dehn function, and arbitrary high depth function. We also give applications of these results to the questions about solvability of the universal theory of finite solvable groups.
Our constructions are based on simulating Minsky machines in groups. Surprisingly, the algebraic structure of the group not only depends on the construction itself, but also heavily depends on the computational properties of the machines we simulate: for the group to be residually finite the Minsky machines should be sym-universally halting, that is their transition graphs are vertex-disjoint unions of finite trees.
1.1.7. What next? We expect the approach used in this paper to be useful in solving other problems that are still open. For example, the residually finite version of the Higman embedding theorem would be very desirable. It is known [39,15,21,33] that a finitely generated recursively presented residually finite group may have undecidable word problem, and hence cannot be embedded into a finitely presented residually finite group. But it is not known whether every unsolvability of the word problem is the only obstacle for such an embedding. Thus it would be very interesting to find out whether every finitely generated residually finite group with solvable word problem embeds into a finitely presented residually finite group. Note that usually a version of Boone-Novikov theorem precedes a version of Higman theorem, hence we can consider this paper as a step toward the residually finite version of Higman's theorem.
1.2. The "yes" and "no" parts of the McKinsey algorithm. One of the initial motivations for studying residually finite groups, semigroups and other algebraic structures was McKinsey's algorithm solving the word problem in finitely presented residually finite algebraic structures. Even though the algorithm is well known and classical, surprisingly little is known about its complexity. In this paper, we fill this gap.
Let G = X; R be a residually finite finitely presented algebraic structure of finite type (signature) T (say, groups, semigroups, rings, etc.) Let us recall McKinsey's algorithm solving the word problem in G (see [41], [37]). The word problem is divided into two parts.
Let F (X) be the free algebraic structure of type T freely generated by X. Then we define the "yes" and "no" parts of the word problem in G as follows: To solve the word problem in G one runs in parallel two separate algorithms A yes and A no , such that starting with a given pair of elements w, w ′ ∈ F (X) A yes stops if and only if (w, w ′ ) ∈ WP yes and A no stops if and only if (w, w ′ ) ∈ WP no .
The algorithm A yes enumerates one by one all consequences of the defining relations R and waits until w = w ′ appears in the list.
The algorithm A no enumerates all homomorphisms φ 1 , φ 2 , . . . , of G into finite algebraic structures of type T and waits until φ i (w) = φ i (w ′ ).
Let now G be a finitely presented residually finite group. Although it seems like in general A yes and A no are very slow, there were no examples of groups G for which these algorithms were actually very slow. More precisely, there were no known examples of finitely presented residually finite groups with very hard "yes" or "no" part of the word problem. Indeed, the most "common" residually finite groups are linear groups, say, over fields [37]. In that case it is well known that the "yes" part can be solved in deterministic polynomial time [35,61]. The "no" part can be solved by considering factor groups corresponding to ideals of finite index of some polynomial rings, hence also can be shown to be solvable in deterministic polynomial time. In fact the same can be said about most finitely presented groups (where "most" means "with overwhelming probability" in one of several probabilistic models): recent results of Agol [1] and Ollivier and Wise [47] together with the older result of Olshanskii [48] imply that most finitely presented groups are linear (even over Z).
One of our main results is the following theorem (an immediate corollary of Theorem 4.19 below): Theorem. Let f (n) be a recursive function. Then there exists a residually finite finitely presented solvable group G such that for any finite presentation X; R of G the time complexity of both "yes" and "no" parts of the word problem are at least as high as f (n).
We also show that both algorithms A yes and A no can be very slow even when both "yes" and "no" parts of the word problem are easy.
Remark 1.2. Note that if we replace "finitely presented" assumption by "recursively presented", then residually finite groups are known to be very complicated. As we have mentioned before, recursively presented finitely generated residually finite groups may have undecidable word problem [39,15,21]). Even more, recently the second author and B. Khoussainov constructed residually finite Dehn monsters, i.e., infinite groups which are recursively presented, residually finite and algorithmically finite [33]. These are groups where the word problem is not only undecidable, but one cannot algorithmically enumerate an infinite set of pair-wise distinct elements of the group.
Note also that although our groups are not linear they are (elementary Abelian)-by-linear since they are solvable of class 3 with the second derived subgroup elementary Abelian.
1.3. Quantification of the "yes" part: the Dehn function. It was noticed by Madlener and Otto [40] that the Dehn function of a group measures the complexity of the word problem. They also constructed finitely presented groups with arbitrary large Dehn functions. For residually finite groups, the situation is different. Nilpotent groups are examples of residually finite groups with arbitrary high polynomial Dehn function [4]. The Baumslag-Solitar groups x, y | x y = x k , k ≥ 2, are examples of residually finite (even linear) groups with exponential Dehn function. No examples of residually finite groups with bigger Dehn functions were known. This gap is filled by the following Theorem 4.18. For every recursive function f , there is a residually finite finitely presented solvable of class 3 group G with Dehn function greater than f . In addition, one can assume that the word problem in G is at least as hard as any given recursive function or as easy as polynomial time.
As a corollary of Theorem 4.18 we mention the following exotic examples of groups.
Corollary. For every recursive function f , there is a residually finite finitely presented solvable of class 3 group G with Dehn function greater than f and the word problem decidable in polynomial time.
1.4. Quantification of the "no" part: the depth function. The function quantifying the algorithm A yes is the depth function introduced by Bou-Rabee [9]. Recall that if G = X is a finitely generated group or semigroup, the depth function ρ G (n) is the smallest function such that every two words w = G w ′ of length at most n are separated by a homomorphism to a group (semigroup) H with |H| ≤ ρ G (n). That function does not depend on the choice of finite generating set X (up to the natural equivalence).
It is easy to see that for every finitely generated linear group or semigroup G, ρ G is at most polynomial. Since finitely generated metabelian groups are subgroups of direct products of linear groups [62] the depth function of every finitely generated metabelian group is at most polynomial. By the recent result of Agol [1] based on the earlier results of Wise [63], every small cancelation group is a subgroup of a Right Angled Artin group, hence linear and has polynomial depth function. In fact one can have much smaller bounds for many linear groups. For example, for the free group F 2 , ρ F 2 (n) is at most n 2 3 by a result of Kassabov and Matucci [26]. There are some finitely presented groups for which the depth function is unknown and very interesting. For example the ascending HNN extensions of free groups are known to be residually finite and even virtually residually nilpotent (proved by A. Borisov and the third author [7,8]) but the only upper bound one can deduce from the proof is exponential. Although many of these groups have small cancelation presentations and so covered by the results from [1], there are some groups of this kind for which the depth function is not known. One of these groups is x, y, t | txt −1 = xy, tyt −1 = yx . The fact that it is hyperbolic follows from Bestvina-Feign combination theorem [6] and was proved by Minasyan (unpublished). If the depth function of that group is not polynomial, that group would not be linear, disproving a conjecture by Wise (he conjectured that all hyperbolic ascending HNN extensions of free groups are linear and, moreover, subgroups of Right Angled Artin groups).
For finitely generated infinitely presented groups (even amenable ones) the situation is much more clear now. Using the method of Kassabov and Nikolov [27] and the result of Nikolov and Segal [46] one can construct a finitely generated residually finite group with arbitrary large recursive depth function.
In this paper, we show that a similar result holds for finitely presented solvable of class 3 groups. For every recursive function f , there is a residually finite finitely presented solvable of class 3 group G with depth function greater than f . In addition, one can assume that the word problem in G is at least as hard as the membership problem in a given recursive set of natural numbers Z or as easy as polynomial time.
As a corollary of Theorem 4.20 we mention the following exotic examples of groups.
Corollary. For every recursive function f , there is a residually finite finitely presented solvable of class 3 group G with depth function greater than f and the word problem decidable in polynomial time.
1.5. Distortion of pro-finitely closed subgroups of finitely presented groups. Let G be a group generated by a finite set X, H ≤ G be a subgroup generated by a finite set Y . Recall that the distortion function f H,G (n) is defined as the minimal number f such that every element of H represented as a word w of length ≤ n in the alphabet X can be represented as a word of length ≤ f in the alphabet Y [16]. It is clear [16] that the distortion function f G,H is recursive if and only if the membership problem in H is decidable.
Recall that H is closed in the pro-finite topology of G if H is the intersection of some subgroups of G of finite index. If G is finitely presented and H is closed in the profinite topology of G, then there exists a McKinsey-type algorithm A(G, H) solving the membership problem for H (and thus the f G,H is recursive). For every word w in the alphabet X, the "yes" part A yes (G, H) of the algorithm lists all words in Y , rewrites them as words in X, and then applies relations of G to check whether one of these words is equal to w. The "no" part A no (G, H) of the algorithm lists all homomorphisms φ of G into finite groups and checks whether φ(w) ∈ φ(H). As in Section 1.2, one can asks what is the complexity of the "yes" and "no" parts of that algorithm, in particular, and of the membership problem for H in general.
One can also quantify the complexity of the two parts A yes (G, H) and A no (G, H). The "yes" part is quantified by the distortion function f G,H (n) and the "no" part is quantified by the relative depth function ρ G,H (n) which is defined as the minimal number r such that for every word w of length ≤ n in X which does not represent an element of H there exists a homomorphism φ from G to a finite group of order ≤ r such that φ(w) ∈ φ(H).
As for the word problem in residually finite finitely presented groups (discussed above), there were no examples of finitely generated subgroups of finitely presented groups that are closed in the pro-finite topology but have "arbitrary bad" distortion or "arbitrary bad" relative depth function.
The Mikhailova's construction shows that finitely generated subgroups of the residually finite group F 2 ×F 2 (here F 2 is a free group of rank 2) could be as distorted as one pleases. In fact the set of possible distortion functions of subgroups of F 2 ×F 2 coincides, up to a natural equivalence, with the set of Dehn functions of finitely presented groups [49]. By a result of Baumslag and Roseblade [5] subgroups of F 2 × F 2 are equalizers of pairs of homomorphisms φ : F k → G, ψ : F n → G (where F k , F n are subgroups of F 2 ), i.e. the subgroups of the form {(x, y) ∈ F k × F n | φ(x) = ψ(y)}. The equalizer subgroup is finitely generated if and only if G is finitely presented. It is easy to prove (see Lemma 5.2 below) that if G is residually finite, then the equalizer is closed in the pro-finite topology of F 2 × F 2 . Thus we can use the examples of residually finite finitely presented groups with complicated word problem and complicated depth function to prove the following Theorem 5.5 For every recursive function f (n) there exists a finitely generated subgroup H ≤ F 2 ×F 2 that is closed in the pro-finite topology of F 2 ×F 2 and whose distortion function f F 2 ×F 2 ,H , the relative depth function, and the time complexities of both "yes" and "no" parts of the membership problem are at least f (n).
There is an analogous (though a bit weaker) result, for subgroups of a direct product S 3 (X) × S 3 (X), where S 3 (X) is a free solvable group of class 3 with free generating set X.
Theorem 5.6 For any recursive function f (n) there is a finite set X and a finitely generated subgroup H ∈ S 3 (X) × S 3 (X) such that E is closed in the pro-finite topology on S 3 (X) × S 3 (X) and the distortion function, the relative depth function, and the time complexities of both "yes" and "no" parts of the membership problem in H are at least f (n).
1.6. Methods of proof. As we have shown above (see Section 1.1.3) most versions of the Boone-Novikov construction ( [54,38,12,58]) do not produce residually finite groups. Instead, we simulate Minsky machines in solvable groups of class 3. A similar construction is due to the first author [28]. We use a version from the unpublished thesis [29]. As in [28,29], our group is a split extension of an elementary Abelian group of prime exponent by a metabelian group. Since every metabelian group has easy word problem and is residually finite, we can concentrate only on the elementary Abelian subgroup which is spanned, basically, by the configurations of the Minsky machine encoded in a certain way. That encoding is a very important feature of the construction. It helps us avoid the problem from Section 1.1.3 because the words w(u) corresponding to the input configurations u of the Minsky machines are not obtained by inserting copies of u in w(∅).
Of course that construction also often leads to non-residually finite groups. But it turned out that the difficulty can be overcome by modifying the Minsky machine first. In this paper, we use the fact that every Turing machine and every Minsky machine with decidable halting problem is equivalent to a universally halting and even sym-universally halting machine (these machines can be characterized as the machines whose transition graphs are vertex disjoint unions of finite trees).

1.7.
Structure of the paper. The paper is organized as follows. Section 2 contains preliminary results about Turing and Minsky machines that are needed further. We show that one can modify any Turing or Minsky machine that recognizes a recursive set into a machine that halts on every configuration. In fact we can even assume that the symmetrized machine always halts (we call such machines sym-universally halting).
In Section 3, we simulate sym-universally halting Minsky machines in residually finite finitely presented semigroups and prove the analogs of the above theorems for semigroups.
In Section 4 we simulate Minsky machines in solvable groups and construct complicated residually finite finitely presented groups.
Sections 5 and 6 contain applications of the main theorems. In Section 5 we prove, in particular, Theorem 5.5. In Section 6, we strengthen the well-known result of Slobodskoi about undecidability of the universal theory of finite groups. We show, in particular, that the universal theory of any set of finite groups that contains all finite solvable groups of class 3 is undecidable.
Acknowledgement. The authors are grateful to Jean-Camille Birget and Friedrich Otto for pointing to the references [13], to Ben Steinberg for pointing to the reference [35],to Rostislav Grigorchuk for pointing to the references [15,21] and to Tim Riley for pointing to the references [19,20]. We are also grateful to Markus Lohrey and Ralph Strebel for their comments.

Turing machines and Minsky machines
2.1. Turing machines. Let us give a definition of a Turing machine. A Turing machine M with K tapes consists of hardware (the tape alphabet A = ⊔ k i=1 A i , and the state alphabet ) and program P (the list of commands, defined below). A configuration of a Turing machine M is a word Note that with every command θ one can consider the inverse command θ −1 which undoes what θ does.
A computation of M is a sequence of configurations and commands from P : Here l is called the length of the computation. We choose stop states q 0 i in each Q i , then we can call a configuration w accepted if there exists a computation starting with w and ending with a configuration where all state symbols are q 0 i and all tapes are empty. Also we choose start states q 1 i in each Q i . Then an input configuration corresponding to a word u over A 1 is a configuration inp(u) of the form We say that a word u over A 1 is accepted by M if the configuration inp(u) is accepted. The set of all words accepted by M is called the language accepted by M .
The time function T M (n) of M is the minimal function such that every accepted word of length ≤ n has an accepting computation of length ≤ T M (n). The space function S M (n) of M is the minimal function such that every accepted word of length ≤ n has an accepting computation where every configuration has length ≤ S M (n).
A Turing machine M is called deterministic if for every configuration, there exists at most one command from the program P that applies to this configuration.
In this paper, we shall consider several types of machines. A machine M in general has an alphabet and a set of words in that alphabet called configurations. It also has a finite set of commands. Each command is a partial injective transformation of the set of configurations. A computation is a sequence where w j are configurations, θ 1 , ..., θ n are commands and θ i (w i ) = w i+1 for every i = 1, ..., n. Proof. Indeed, since M is deterministic, in any computation of Sym(M ) where no command is followed by its inverse inverses of command of M cannot be followed by commands of M . Thus the computation is a concatenation of two (possibly empty) parts: the first part uses only commands of M , the second part uses only inverses of commands of M .
We say that a set X of natural numbers is enumerated by a machine M if there exists a recursive encoding µ of natural numbers by input configurations of M such that a number u belongs to X if and only if µ(u) is accepted by M . The set X is recognized by M if M enumerates X and for every input configuration every computation starting with that configuration eventually halts (arrives to a configuration to which no command of M is applicable).
We We call a deterministic machine M sym-universally halting if Sym(M ) halts if it starts with any non-accepted configuration.
Theorem 2.2 (See, for example, [13]). For every recursive set X of natural numbers, that is accepted by a deterministic Turing machine M there exists a universally halting deterministic Turing machine M ′ with one tape accepting X and polynomially equivalent to M . Proof. The proof is by inspection of the proof from [17]. (c) If M is any Turing machine recognizing X then we can assume that M ′′ polynomially reduces to M .
Proof. Let M be a deterministic universally halting Turing machine with K tapes recognizing L. Consider a new Turing machine M ′ constructed as follows. M ′ has one more tape than M ′ , called the history tape. Its alphabet A ′ is in one-to-one correspondence with the set of commands P of M : ). Let P ′ be the program of M ′ . Now modify M ′ further to obtain a new Turing machine M ′′ . The program P ′′ of M ′ contains a copỹ P (the set of the main commands) of P ′ and some new, auxiliary, commands. After each main command ofP , M ′′ executes the history written on the history tape backward, without erasing the history tape: it just scans the tape from left to right, reading the symbols written there one by one and executing on the first K tapes the inverses of the commands written on the history tape. The commands that do that will be called auxiliary. If at the end of the scanning the history tape, it reaches an input configuration, M ′′ executes on the first K tapes the history written on the history tape in the natural order (scanning the history tape from right to left). After that M ′′ is ready to execute the next main command. We do not give precise definition of the program of M ′′ because it is obvious on the one hand and long on the other hand. Clearly, the state alphabet of M ′′ must be bigger than the state alphabet of M ′ . The machine M ′′ is deterministic, universally halting, and recognizes the same language L.
Let us prove properties (a) and (b) of the theorem. Since M ′′ is deterministic, every reduced (i.e. without mutually inverse consecutive commands) computation Θ of Sym(M ′′ ) is of the form Θ 1 Θ −1 2 for some computations Θ 1 , Θ 2 of M ′′ (because a command of (M ′′ ) −1 cannot be followed by its inverse).
Let us show that Sym(M ′′ ) halts when it starts with any non-accepted configuration (and then apply Lemma 2.3). Let w be a configuration of M ′′ that is not accepted by M ′′ . Since M ′′ is deterministic, every computation of Sym(M ′′ ) starting at w is a concatenation of a computation of M ′′ followed by a computation of (M ′′ ) −1 (i.e. the machine M ′′ where every command is replaced by its inverse). Since M is universally halting, there are only finitely many computations of M ′′ starting with w. Thus we only need to show that there are finitely many computations of (M ′′ ) −1 starting with w, or, equivalently, that there are only finitely many computations of M ′′ ending with w. Suppose that there are infinitely many computations of M ′′ ending with w. Then, by definition of M ′′ there must exist infinitely many input configuration inp(u) of M ′′ for which there exists a computation of M ′′ starting with inp(u) and ending at w. But that is impossible because such inp(u) is unique and is obtained by applying the inverse of the history written on the history tape of M ′′ to w.
(c) The fact that M ′′ polynomially reduces to M is proved as follows. Consider two configurations w, w ′ of M ′′ . If w is not equivalent to an input configuration, then by (b) we need to check only whether w ′ is one of O(|w|) words that belong to the longest computation of Sym(M ′′ ) containing w. That can be done in polynomial time without using the oracle checking equivalence of configurations of M . Suppose that both w and w ′ are equivalent to input configurations u, u ′ of M ′′ . Then we can find u, u ′ in polynomial time and their lengths at at most O(|w| + |w ′ |). If u = u ′ and either u or u ′ is not accepted, then by (b) w is not equivalent to w. If u = u ′ , then w is equivalent to u. Thus w is equivalent to w ′ if and only if u and v are accepted. To check that u is accepted, we need to remove letters corresponding to the extra tape from u producing a configuration u 1 of M and check whether u 1 is accepted, i.e. whether u 1 is equivalent to the stop word of M . This can be done by asking the oracle once. Thus to check whether w and w ′ are equivalent we only need polynomial time and asking the oracle about equivalence of two pairs of configurations, the lengths of which are bounded by |w| + |w ′ |. Thus M ′′ polynomially reduces to M .
2.3. Minsky machines. The hardware of a K-glass Minsky machine, K ≥ 2, consists of K glasses containing coins. We assume that these glasses are of infinite height. The machine can add a coin to a glass, and remove a coin from a glass (provided the glass is not empty). The commands of a Minsky machine are numbered. So a configuration of a K-glass Minsky machine is a K + 1-tuple (i; ǫ 1 , . . . , ǫ K ) where i is the number of command that is to be executed, ǫ j is the number of coins in the glass #j.
More precisely, a command has one of the following forms: • Put a coin in each of the glasses ##n 1 , ..., n l and go to command # j. We shall encode this command as i; → Add(n 1 , ..., n l ); j where i is the number of the command; • If the glasses ##n 1 , ..., n l are not empty then take a coin from each of these glasses and go to instruction # j. This command is encoded as i; ǫ n 1 > 0, ..., ǫ n l > 0 → Sub(n 1 , ..., n l ); j; • If glasses ##n 1 , ..., n l are empty, then go to instruction # j. This command is encoded as i; ǫ n 1 = 0, ..., ǫ n l = 0 → j; • Stop. This command is encoded as i; → 0; Remark 2.5. This defines deterministic Minsky machines. We will also need non-deterministic Minsky machines. Those will have two or more commands with the same number. Proof. The proof of the 2-glass part can be found in [36]. To ensure Property (c), we can do the following. Note that after every series of commands M (θ) the configuration of MM 2 or MM 3 has (at least) one empty glass. After the series of commands M (θ) of MM 2 or MM 3 corresponding to a command θ of M is executed, that glass is again empty. So before MM 2 or MM 3 execute the next series M (θ ′ ) we force it to move all coins from each of the non-empty glasses to the empty one and back. In the process, it will empty each glass at least once. Clearly, this modification increases the length of computation by an amount proportional to the length of configuration of MM 2 or MM 3 .
Finally property (e) is obtained as follows. Suppose that w, w ′ are two configurations of MM 3 (for MM 2 the proof is similar). By construction (see [36]) in at most O(|w|) steps of MM 3 either w turns into a configuration corresponding to a configuration of the Turing machine M or MM 3 halts. In the latter case, we check whether w is equivalent to w ′ in O(|w|) steps. So we can assume that both w and w ′ are equivalent to configurations corresponding to configurations u, u ′ of the Turing machine M whose lengths are O(|w| + |w ′ |). Now w is equivalent to w ′ if and only if u and u ′ are equivalent configurations of M . Thus we need to use the oracle once.

Simulation of Minsky machines by semigroups
3.1. The construction. Here we will show how to simulate a Minsky machine by a semigroup. All applications of Minsky machines are based on the following idea.
First, with every configuration ψ one associates a word (term) w(ψ).
Then with every command κ of the Minsky machine M one associates a finite set of defining relations R κ . The algebraic structure A(M ) will be defined by the relations from the union R of all R κ (which is finite since we have only a finite number of commands) and usually some other relations Q which are in a sense "independent" of R. We need Q, for example, to make sure A(M ) satisfies a particular identity.
We say that the algebra A(M ) simulates M if the following holds for arbitrary configurations ψ 1 , ψ 2 of M : Usually, in order to prove the property (1) one has to prove the following two lemmas.
which we shall call commutativity relations, the stop relation (3) q 0 = 0 (i.e. q 0 x = xq 0 = q 0 for every generator x), all relations of the form xy = 0 where xy is a two-letter word which is not a subword of a word of the form q i a ǫ 1 1 ...a ǫ K K A 1 ...A K modulo the commutativity relations (2), (for example q i q j = A i a i = a i q j = A i q j = 0), which we shall call 0-relations, and relations associated with commands of M according to the following table,

Command of M Relation of S(M )
i → Add(n 1 , ..., n m ); j q i = q j a n 1 ...a nm i, ǫ n 1 > 0, ..., ǫ nm > 0 → Sub(n 1 , ..., n m ); j q i a n 1 . . . a nm = q j i, ǫ n 1 = 0, ..., ǫ nm = 0 → j q i A n 1 . . . A nm = q j A n 1 . . . A nm These will be called the Minsky relations. The words in S(M ) corresponding to configurations of M are the following: w(i; ǫ 1 , ..., ǫ K ) = q i a ǫ 1 1 ...a ǫ K K A 1 ...A K . The proof that Lemmas 3.1 and 3.2 hold in S(M ) follows easily from Lemma 2.1, see [55,32]. Lemma 3.3. Suppose that a word W is not 0 in S(M ), is a subword of w(i; ǫ 1 , ..., ǫ K ) (up to the commutativity relations (2)), and does not contain either q i or one of the A j . Then there are at most O(|W |) different (up to the commutativity relations) words that are equal to W in S(M ). All these words are subwords of words of the form w(i ′ ; ǫ 1 , ..., ǫ K ) such that the configurations (i; ǫ 1 , ..., ǫ K ) and (i ′ , ǫ ′ 1 , ..., ǫ ′ K ) of M are equivalent. Proof. Since W = 0 in S(M ), the stop relations do not apply to W or to any word that is equal to W in S(M ). If W does not contain q i , then the only relations that apply to W are the commutativity relations, so the only words that are equal to W in S(M ) are the words obtained from W by the use of commutativity relations.
Suppose that W contains q i but does not contain one of the A j . Without loss of generality, we can assume that W contains every letter from w(i; ǫ 1 , . . . , ǫ K ) except some of the A j 's.
Every application of the Minsky relation to W corresponds to a command of the Minsky machine, applied to the configuration c = (i; ǫ 1 , ..., ǫ K ). Let c = c 1 → c 2 → ... be any computation of Sym(M ) starting with c. Then the sequence of commands of M applied in that computation has the form θ 1 . . . θ n θ −1 n+1 . . . θ −1 k where θ s are commands of M (by Lemma 2.1). If this sequence can be applied to W , then this computation never checks whether glass #j is empty. By Property (c) of Theorem 2.6, both n and k must be at most O(|W |). This implies the statement of the lemma.

3.2.
Residually finite finitely presented semigroups. Proof. This follows from the commutativity relations and 0-relations.  Proof. By Theorem 2.4, there exists a sym-universally halting Turing machine that recognizes Z. By Theorem 2.6 there exists a sym-universally halting Minsky machine M recognizing Z. By Lemma 3.5, the problem of recognizing equality to 0 in S(M ) is at least as hard as the membership problem in Z. By Lemma 3.6, S(M ) is residually finite.

3.3.
Residually finite semigroups with large depth function. Recall the definition of the depth function ρ: for every finitely generated residually finite universal algebra A and every number n, ρ A (n) is defined as the smallest number such that for every two different elements z, z ′ in A of length ≤ n there exists a homomorphism φ from A onto a finite algebra B of cardinality at most ρ A (n) such that φ(z) = φ(n ′ ).
The following lemma is well known [22] Lemma 3.8. Suppose that every non-zero element of a semigroup S with 0 has finitely many divisors. Then S is residually finite.
Proof. Indeed, the set of all non-divisors of a non-zero element is an ideal with finite quotient. The intersection of all these ideals is {0}.
Theorem 3.9. For every recursive function f there exists a finitely presented residually finite semigroup S such that ρ S (n) > f (n) for all n. In addition, we can assume that the word problem in S is as hard as the membership problem for any prescribed recursive set of natural numbers.
Proof. Let M be a sym-universally halting Minsky machine with K glasses and N + 1 commands numbered 0, . . . , N . Consider the following new, non-deterministic Minsky machine M n . Its hardware consists of the K glasses of M plus two more glasses. In every command of M we add the instruction to add a coin to glass K + 1 provided glass K + 2 is empty. Also for every i = 0, ..., N we add two new commands number i Thus there will be three commands for each i = 1, . . . , N : one from M and the two new ones. The new command (5) allows us to add, at any step of the computation, equal (but arbitrary) number of coins in glasses K + 1 and K + 2, and if both glasses K + 1 and K + 2 are empty, the computation can stop. But we can execute a command of M only when the glass K + 2 is empty, so a new command cannot be followed by a command of M .
Let us say that the commands coming from M have weight 1 and new commands (5), (6) have weight 0. The weight of a computation is then the sum of the weights of all commands used in the computation. We also define the weight of a configuration as the number of coins in the first K + 1 glasses minus the number of coins in glass K + 2. Every computation C of Sym(M n ) projects onto a computation π(C) of Sym(M ): we simply forget the extra two glasses and the new commands. The weight of C is equal to the length of π(C). The numbers of coins used in C and π(C) in the first K glasses are the same, the number of coins in glass K + 1 in the last configuration W of C minus the number of coins in glass K + 2 of W is equal to the weight of C.
Also any computation C of M lifts to (possibly infinitely many) computations C n of M n , the weight of each C n is the same as the length of C, and the number of coins used in the first K glasses is the same.
Note that Lemma 2.1 still holds for M n even though M n is non-deterministic. It can be easily established by using the projection π.
This implies that if M is sym-universally halting, then for every configuration W of M n the weights of all computations C without repeated configurations of Sym(M n ) are bounded, the number of coins in the first K glasses of M n used during any of these computations is bounded, and the weights of configurations appearing in these computations are bounded.
Consider the semigroup S(M n ). Every non-zero element w in S(M n ) is represented by a word of the form u(w)v(w) where where α j ∈ {0, 1}, l j ≥ 0. Note that if two non-zero words w, w ′ are equal in S(M n ), then u(w) and u(w ′ ) are equal in S(M ). We claim that S(M n ) is residually finite. Indeed, consider two words w 1 , w 2 in the generators of S(M n ) which are not equal in S(M n ), w 2 does not divide w 1 in S(M n ) (clearly w 1 and w 2 cannot divide each other without being equal in S(M n )).
Suppose first that w 1 does not contain a q-letter. Then consider the ideal Q of S(M n ) generated by all q-letters. The inequality w 1 = w 2 survives in the Rees factor-semigroup S(M n )/Q. But in S(M n )/Q every element has finitely many divisors, hence S(M n )/C is residually finite by Lemma 3.8, and so we can separate w 1 and w 2 by a homomorphism onto a finite semigroup.
Thus we can assume that w 1 starts with a q-letter q i . Suppose that u(w 1 ) = u(w 2 ) in S(M ). Adding the relation a 2 K+1 = a K+1 , a 2 K+2 = a 2 K+2 to S(M n ) we then obtain a new semigroupS(M n ) and a homomorphism φ : S(M n ) →S(M n ) which separates w 1 and w 2 . In the semigroupS(M n ), every non-zero element has finitely many divisors since it is true for S(M ) and the number of different elements of the form v(w) is finite. HenceS(M n ) is residually finite by Lemma 3.8.
Thus we can assume that u( Let us add the relations a D K+1 = a 2D K+1 , a D K+2 = a 2D K+2 to S(M n ). LetS(M n ) be the resulting semigroup, and ψ : S(M n ) →S(M n ) be the corresponding homomorphism. Then it is easy to see that ψ(w 1 ) = ψ(w 2 ). Since inS(M n ), every element has finite number of divisors (the same argument as forS(M n )), we can again use Lemma 3.8.
The function ρ(n) for the semigroup S(M n ) is at least as large as the following function Ψ(n) associated with the machine M : Ψ(n) is the smallest number such that for every nonaccepted input configuration of M of length ≤ n, the machine M halts after at most Ψ(n) steps (i.e. the co-time function of M ). Indeed let c be an input configuration of length at most n such that M halts after exactly Ψ(n) steps starting at c. Suppose that the word w(c) in S(M n ) corresponding to the configuration u can be separated from 0 in a homomorphic image E of S(M n ) with at most Ψ(n) − 1 elements. Then the images of a K+1 , a K+2 in that semigroup satisfy z D = z 2D for some D < T (n). Since the halting computation has > D steps, the letter a K+1 occurs in w(u) exactly once, and every command of M n corresponding to a command of M adds one coin in glass K + 1, there exists a word W which is equal to w(u) in S(M n ) and which has the form Modulo relations corresponding to the commands (5), this word is equal to The image of the latter word in E is equal to A K+2 which, again modulo the relations corresponding to the commands (5), is equal to which is equal to 0 by the relations corresponding to the commands (6), a contradiction.
Note that the co-time function of a Turing machine recognizing a recursive set can be larger than any given recursive function. Indeed, after the machine halts without accepting, we can make it work as long as we like. It remain to note that the co-time function of a Minsky machine simulating that Turing machine cannot be smaller.

Simulation of Minsky machines in solvable groups
Recall that a variety of algebraic structures is a class of all algebraic structures of a given type (signature) satisfying a given set of identities (also called laws). Equivalently, by a theorem of Birkhoff [36] a variety is a class of algebraic structures closed under taking cartesian products, homomorphic images and substructures. Every variety contains free objects (called relatively free algebraic structures). One can define algebraic structures that are finitely presented in a variety as factor-structures by congruence relations generated by finite number of equalities. Every finitely presented algebraic structure which belongs to a variety V is finitely presented inside V but the converse is very rarely true. See [32] for a survey of algorithmic problems for varieties of different algebraic structures (mostly semigroups, groups, associative and Lie algebras). In this section we concentrate on varieties of groups. The most well known varieties are the variety of Abelian groups A given by the identity If U and V are two varieties of groups then the class of groups consisting of extensions of groups from U by groups from V is again a variety (the product of U and V) denoted by U V. The product of varieties is associative [45]. For example the variety of all solvable groups of class c is the product of c copies of the variety A. If V is a variety of groups, then ZV is the variety consisting of all central extensions of groups from V. For example N 2 = ZA and, more generally, N c+1 = ZN c for every c ≥ 1.
The problem of finding a finitely presented group with undecidable word problem, belonging to a proper variety of groups (i.e. satisfying a non-trivial identity) was formulated by Adian [34] and solved by the first author in [28]. The construction was simplified in the unpublished dissertation [29]. In this section, we shall modify the construction from [29] to construct residually finite finitely presented solvable groups with complicated word problem.
4.1. The construction. Let M be a Minsky machine with K glasses and N + 1 commands (numbered 0, ..., N ). We are going to construct a group G(M ) simulating M . The group G(M ) will be in a sense similar to the semigroup S(M ) constructed above. The main idea will be to replace the product by another operation and make sure that with respect to the new operation the semigroup S(M ) "embeds" into our group.
Thus the group will be generated by the q-letters which will be related to the letters q i from S (M ), and also a-letters a 1 , ..., a K , A-letters, A 1 , ..., A K and some extra a-and Aletters that help us impose the necessary commutativity relations that, in particular, make the group solvable. The group we are going to construct will be a semidirect product of the Abelian normal subgroup generated by the q-letters by the semidirect product of an Abelian subgroup generated by A-letters and an Abelian subgroup generated by a-letters. Thus we should have a way to ensure that in a subgroup generated by two sets of letters Z ∪ Y , the normal subgroup generated by Z is Abelian. This is done with the help of the following lemma due to Baumslag [3] and Remeslennikov [52]. In that lemma we denote u a = a −1 ua and u a+b = u a u b (note that although u a+b is not necessarily equal to u b+a , the equality will hold if the normal subgroup generated by u is Abelian, which is going to be the case every time we apply this lemma).

Lemma 4.1 ([3, 52]). Suppose that a group H is generated by three sets
. , m} such that (1) The subgroup generated by F ∪ F ′ is Abelian; (2) For every a ∈ F and every x ∈ X we have x f (a) = x a ′ for some monic polynomial f of a which has at least two terms (in all our applications f (t) = t − 1); Then the normal subgroup generated by X in the group H = X ∪ F ∪ F ′ is Abelian, and H is metabelian.
If the elements a i and a ′ i and the set X satisfy the conditions of Lemma 4.1 we will call a ′ i , i = 1, ..., m, are BR-conjoints to a i with respect to X (and the polynomial f ). Consider the free commutative monoid generated by letters A 0 , ..., A K . Let U 0 be the set of all divisors of the element A 0 A 1 ...A K in that monoid, and U be the set of all symbols q j w, w ∈ U 0 , j = 0, ..., N . Also fix a prime p (say, p = 2).
Command of M Relation of G(M ) i → Add(n 1 , . . . , n m ); j x q i A 0 = x q j A 0 * a n 1 * ... * a nm i, ǫ n 1 > 0, . . . , ǫ nm > 0 → Sub(n 1 , . . . , n m ); j x q i A 0 * a n 1 * . . . * a nm = x q j A 0 i, ǫ n 1 = 0, . . . , ǫ nm = 0 → j The equality The equality Proof. Indeed by relations G2, Using relations G1, G3, G4, we can apply Lemma 4.1 to each of the factors in that direct product and conclude that each of them is metabelian and a semidirect product of the Abelian of exponent p normal subgroup generated by the intersection of {A i , i = 0, . . . , K} with that factor, and the Abelian group generated by the a-letters from that factor.
Lemma 4.5. The normal subgroup T of G generated by all the elements x u , u ∈ U, is Abelian of exponent p.
Proof. Relations G5 a) of the group G imply that every element x u , u ∈ U is a product of elements x z q j , z ∈ H 1 , i = 0, . . . , N. Therefore, it is enough to show that (8) x q k x z qt = x z qt x q k for any z ∈ H 1 , H 2 and any k, t. To reduce the proof of these equalities to the proof of more simple equalities notice that z = z 0 z 1 . . . z K where z i ∈ M i by G2. Therefore equalities (8) are equivalent to (9) x z 0 We can represent element x z i q j , i ≥ 1, as a product of elements of the form x Indeed we have the following sequence of equalities deduced using G2, G5, G6: Repeating this argument K times, one proves that x z 1 z 2 ...z K q j can be represented as a product of elements of the form x y u where u ∈ U , y ∈ H 2 . A similar proof (using also G4) gives that x z 0 q j is a product of elements of that form. It remains to note that elements of the form x y u , u ∈ U, y ∈ H 2 commute by Remark 4.2 and Lemma 4.1.
Remark 4.6. Note that equalities (10) and similar equalities when x q j is replaced by x u , u ∈ U , imply the following: if y is a product of elements of the form a r l i (a ′ i ) s l A i and l r l = l s l = 0, then [x u , y] is 1 if u contains A i or a product of conjugates of elements x uA i by elements from ã i × ã ′ i otherwise. Similarly, suppose that y is a product of elements from M 0 , each factor containing A 0 , and the total exponent of everyã i (resp.ã ′ i ) is 0. Then [x u , y] = 1 provided u contains A 0 and is a product of conjugates of x uA 0 by elements from a i , a ′ i provided u does not contain A 0 . By construction, the group G is a semidirect product of T and the metabelian group H H 2 1 ⋊ H 2 . By Lemma 4.4, G is solvable of class 3 and, moreover, belongs to A 2 p A. This means that G belongs to the variety ZN K+1 A.
Proof. Let P be the derived subgroup of G(M ). By Lemma 4.5, every element of P is a product of an element of T and an element of H H 2 1 . It also follows from Lemma 4.5 that [P, P ] ⊆ T , hence by Remark 4.7, it is generated by elements of the form x y u , u ∈ U, y ∈ H 2 , the word u contains at least one A i , i = 0, . . . , K. Since T is Abelian, the subgroup [P, P, . . . , P ] K+2 is generated by the commutators An easy induction shows that every such commutator is a conjugate of (11) [ . Then Remark 4.6 implies that [x u , h y ] is a product of elements of the form x y ′ u ′ where u ′ ∈ U contains letters A i 1 , ..., A is and it may not be equal to 1 only if one of the letters A i j does not occur in u. Therefore the commutator (11) is either equal to 1 or is a product of elements of the form x y ′′ u ′ where the word u ′ ∈ U contains all letters A 0 , A 1 , ..., A K , y ′′ ∈ H 2 . But every such x u ′ is in the center of G(M ) by G5 c). Hence [P, . . . , P ] K+2 is contained in the center of G(M ).
We now prove (b) and (c). For this, as we mentioned before Lemma 3.1, we need to prove Lemmas 3.1 and 3.2. Lemma 3.1 for G(M ) is proved in the same way as for the semigroup S(M ) (see [55,32],since the only property of S(M ) used there was that the word w = q i a l 1 1 . . . a l K K A α 1 1 . . . A α K K is equal to any word obtained from w by permuting a i with a j , A i with A j and a i with A j (i = j). The same is true for words of the form (12) x in G(M ) by the definition of the operation * , relations G1, G2 and Lemma 4.5.
In order to prove Lemma 3.2 we will define a new groupḠ that is a quotient of G and injective on elements of the form (12).
LetŠ be the semigroup with the same generating set as S(M ) subject all the relations of S(M ) except the relations (4) corresponding to the commands of M (that semigroup does not depend on M ). Thus non-zero elements inM have the form where l j ∈ N, α j ∈ {0, 1}. Let W be the set of all non-zero elements ofŠ containing a q-letter, and W 0 be the set of elements from W viewed as elements of S(M ) (i.e. different words may represent equal element) with A 0 inserted next to the q-letter. Consider the free Abelian group T 1 of exponent p generated by the elements z i 1 ,...,i K ,u , u ∈ W ∪ W 0 , i j ∈ {1, 2, 3}. For each element of L 1 ∪ L 2 , we define an automorphism of T 1 . The groupḠ will be the semidirect product of T 1 and the group generated by these automorphisms. For simplicity we will denote automorphisms corresponding to letters from L 1 ∪ L 2 by the same letters.
Let us start with automorphisms a j , a ′ j . We have to define z a i i 1 ,...,i K ,u and z a ′ i i 1 ,...,i K ,u for every i 1 , ..., i K . First suppose that u does not contain A j . To simplify the notation we shall denote the vector (i 1 , . . . , i K ) by i, and the standard unit vectors by e l , l = 1, ..., K. We shall write z i,u instead of z i 1 ,...,i K ,u . The j-th coordinate of i is denoted by i j .
If u contains letter A j , then let z It is easy to prove that a j is an automorphism by constructing the automorphism a −1 j . If we apply a −1 j to the third equality in (13), we will obtain the formula for z provided i j = 3: The automorphismã j is defined similarly. If u contains A 0 , then zã j If u does not contain A 0 but contains A j , i.e. u = vA j for some v, then Finally the automorphisms corresponding to A j , j = 0, ..., K, are defined as follows: The following lemma is obtained by a straightforward application of the definition of the automorphisms above and the definition of the operation * . This lemma implies thatḠ satisfies G8 if we replace x u by z 1,u (since the corresponding relations hold in S(M )).
We defineḠ as the semidirect product of T 1 and the subgroup of Aut(T 1 ) generated by the automorphisms corresponding to the elements from L 1 ∪ L 2 . From the definition of the automorphisms and Lemma 4.9, it follows thatḠ is generated by the elements z 1,u , u ∈ U , where 1 is the vector (1, 1, . . . , 1) and the automorphisms corresponding to elements of L 1 ∪ L 2 . It is easy to check that all the relations G1-G8 hold inḠ, therefore Lemma 4.10. The map that sends every a-or A-letter to itself and every x u to z 1,u extends to a homomorphism φ from G toḠ. Proof. It is easy to see that we only need to define pre-images x i,w of elements z i,u ∈Ḡ, w ∈ W ∪ W 0 . By the definition of φ, we have φ(x u ) = z 1,u for every u ∈ U so we define x 1,u = x u . The other preimages are defined by induction on the length of w and the sum of i j .
Suppose w ∈ W ∪ W 0 does not contain A j and i j = 1, i ′ is arbitrary. Then we define: We also have x i,w * A j = x i,wA j for any i.
It is easy to see that for every i and w ∈ W ∪ W 0 , we have φ(x i,w ) = z i,w . This proves the lemma.
InḠ(M ), consider the set P of elements (14) z where α i ∈ {0, 1} and the set P 0 of elements We shall need a few more properties of the group G(M ). is a product of one or several elements of the form x i ′ ,w ′ such that every letter a j occurs in w ′ at least as many times as in w (in particular if for some R > 0, w belongs to the ideal V R defined in Lemma 3.6, then w ′ ∈ V R . Proof. For y ∈ ∪M j , j ≥ 1, this follows from the way x i,u are constructed. For y ∈ M 0 , one needs to use G2, G5 c), and G6.
The proof of Lemma 4.12 actually gives the following Lemma 4.13. If v is a word in a-and A-letters (i.e. over L 1 ∪ L 2 ), then x v i,u is a product in G of elements x j,w as in Lemma 4.12 where the length of each w does not exceed the length of v (hence the total number of different x j,w occurring in this product is polynomial in terms of |v|. Lemma 4.14. The normal subgroup T generated by the elements x u , u ∈ U in G = G(M ) is the direct product of cyclic subgroups generated by the elements x i,w , i ∈ {1, 2, 3} {1,...,K} , w ∈ W ∪ W 0 .
Proof. By Lemma 4.12 elements x i,w span T . We defined elements x i,w , w ∈ W in such a way that they are pre-images of the corresponding elements inḠ under φ. Thus the elements x i,w , i ∈ {1, 2, 3} {1,...,K} , w ∈ W ∪ W 0 are linearly independent since their images under φ are linearly independent in T 1 .

4.2.
A finitely presented solvable group with undecidable word problem. By Theorem 2.6, there exists a 2-glass Minsky machine which computes a non-recursive partial function. The corresponding group G(M ) has undecidable word problem and belongs to the variety A 2 p A ∩ ZN 3 A by Theorem 4.3. Hence we obtain the following: Theorem 4.16 (Kharlampovich [28]). There exists a finitely presented group with undecidable word problem that belongs to the variety A 2 p A ∩ ZN 3 A.
Theorem 4.17. If a Minsky machine M is sym-universally halting then the group G(M ) is residually finite. Its word problem is at least as hard as the halting problem for M .
Proof. Let M be a sym-universally halting Minsky machine. Let w = 1 ∈ G(M ). We use the notation from the definition of G(M ). There exists a natural homomorphism ζ from G(M ) to the metabelian group H H 2 1 ⋊ H 2 which kills all elements from T . Since every finitely generated metabelian group is residually finite, we can assume that ζ(w) = 1. Hence w ∈ T . By Lemma 4.14, x is a product of elements of the form (16) x i,u , i ∈ {1, 2, 3} {1,...,K} , u ∈ W ∪ W 0 .
Hence w = w 0 w 1 where w 0 (resp. w 1 ) is a product of elements (16) with u ∈ W 0 (resp. W ). Suppose that w 1 is not 1. Let T ′ be the subgroup of G(M ) generated by elements (16)  Finally suppose that w 1 = 1. Let u 1 , ..., u l be the elements from W 0 that appear in the representation of w as a product of elements (16). Let E be the set of words that is equal to one of the u j in S(M ). Since M is sym-universally halting, E is finite. Let D be the maximal length of a word in E. Let, as above, Y D be the ideal inŠ consisting of 0 and all elements where one of the a-letters appears at least D times. Let Z D be the set of non-zero elements of S(M ) that are images of words from Y D under the natural homomorphism S → S(M ). Then Z D does not contain u 1 , ..., u l . Consider the subgroup F of T spanned by all elements (16) with u ∈ Z D ∪ Y D . From Lemma 4.12, it follows that F is a normal subgroup of G(M ) of finite index in T . Since Z D does not contain u 1 , ..., u l , the subgroup F does not contain w. The factor-group G(M )/F is a semidirect product of a finite group and the metabelian group H H 2 1 ⋊ H 2 , and we can complete the proof as above. Theorem 4.18. For every recursive function f , there is a residually finite finitely presented solvable of class 3 group G with Dehn function greater than f . In addition, one can assume that the word problem in G is at least as hard as the membership problem in a given recursive set of natural numbers Z or as easy as polynomial time.
Proof. The statement follows from Theorems 4.17 and 3.7.

4.4.
Residually finite finitely presented group with large depth function. Theorem 4.19. For every recursive function f there exists a finitely presented residually finite group G from A 2 p A ∩ ZN 3 A such that ρ G (n) > f (n) for all n. In addition, we can assume that the word problem in G is as hard as the membership problem for any prescribed recursive set of natural numbers.
Proof. Consider the Minsky machine M n constructed in the proof of Theorem 3.9. Then as in the proof of Theorem 4.18, one can prove that G(M n ) is residually finite. The fact that ρ G (n) > f (n) is proved the same way as in the proof of Theorem 3.9 (one only needs to replace the product by operation * everywhere in that proof).

Distortion of subgroups closed in the pro-finite topology
Let us generalize the Mikhailova construction [43]. Let G be a finitely generated group generated by a finite set X, N ≤ G a normal subgroup, generated as a normal subgroup by a finite set R = {r 1 , . . . , r k }, and φ : G → G/N the canonical epimorphism. We may assume that both sets X and R are symmetric, i.e., X = X −1 and R = R −1 . The set is a subgroup of G × G, called the equalizer of (φ, φ).
In the following lemma we summarize the main components of Mikhailova's argument (though in a much more general situation). The proof is easy and we leave it to the reader.
In particular, the distortion of E(G, N ) in G × G is at least as high as the Dehn function of G/N relative to N .
Let P be a class of finite groups closed under direct products and subgroups. Recall that the pro-P topology on a group G has as its base the set of all normal subgroups N with G/N ∈ P.
Lemma 5.2. Let P be a class of finite groups closed under direct products and subgroups. In the notation above if the group G/N is residually P then the subgroup E(G, N ) is closed in the pro-P topology on G × G.
Proof. Suppose (u, v) ∈ G × G but (u, v) ∈ E(G, N ), so φ(u) = φ(v). Since G/N is residually P there is a homomorphism η : G/N → K onto a finite group K ∈ P such that ηφ(u) = ηφ(v) in K. Therefore the image of the pair (u, v) under ηφ is not in the image of the subgroup E(G, N ) in K × K. Hence the subgroup E(G, N ) is closed in the pro-P topology on G × G.
The same argument gives the following Lemma 5.3. Under the assumptions of Lemma 5.2, the relative depth function ρ E(G,N ) is at least as large as the depth function of G/N , the time complexities of the "yes" and "no" parts of the membership problem for E(G, N ) are as high as the time complexities of the "yes" and "no" parts of the word problem in G/N .

Lemma 5.3 and Theorem 4.18 imply
Remark 5.4. The converse of Lemma 5.2 also holds, namely, if E(G, N ) is closed in the pro-P-topology, then G/N is residually P. We are not using this remark below so we leave it as an (easy) exercise.
Theorem 5.5. For any recursive function f (n) there is a finitely generated subgroup H ≤ F 2 × F 2 such that H is closed in the pro-finite topology on F 2 × F 2 and has distortion at least f (n).
Proof. Let G = X | R be a finitely presented residually finite group with Dehn function at least f (n) from Theorem 4.18. If N is the normal closure of R in F (X) then the subgroup H = E(F (X), N ) ≤ F (X) × F (X) satisfies all the requirements of the theorem. Now one can embed the free group F (X) into F 2 in such a way that the pro-finite topology induced on the image of F (X) from F 2 is precisely the pro-finite topology on F (X). Indeed, there is a finite index subgroup H of F 2 of rank |X|, the induced topology on H is the profinite topology on H. It follows that the pro-finite topology on the subgroup F |X| of F 2 is precisely the topology induced by the pro-finite topology from F 2 , as required.
Applying the same argument to the free solvable groups S 3 (X) of class 3 and generating set X one gets the following result.
Theorem 5.6. For any recursive function f (n) there is a finite set X and a finitely generated subgroup H ≤ S 3 (X) × S 3 (X) such that E is closed in the pro-finite topology on S 3 (X) × S 3 (X) and has distortion function, relative depth function, the time complexities of both "yes" and "no" parts of the membership problem and at least f (n).

Universal theories of sets of finite solvable groups
In this section we will prove the following result. For the class of all finite groups in was proved by Slobodskoi [60] (the idea of Slobodskoi's proof came from Gurevich's paper [23] where the same result was proved for semigroups).
Theorem 6.1. The universal theories of the class of finite groups from A 2 p A ∩ ZN 5 A and the class of all periodic groups are recursively inseparable. In particular, the universal theory of any set of finite groups containing all finite solvable of class 3 groups is undecidable.
Proof. It is well known [23] that there exists a Turing machine for which the set of input configurations accepted by the machine and the set of input configurations starting with which the machine never stops are recursively inseparable. Let M be a 2-glass Minsky machine with the same property.
Consider the 4-glass Minsky machine M n described in the proof of Theorem 3.9. Let S ′ (M n ) be the semigroup given by the same defining relations as S(M n ) except the relation q 0 = 0 is substituted by the relation q i A 3 A 4 = 0 for every i. It does not affect the proof of Theorem 4.3.
Let G ′ (M n ) be the group corresponding to S ′ (M n ) in the same way G(M n ) corresponds to S(M n ). Then G ′ (M n ) belongs to A 2 p A ∩ ZN 5 A and simulates M n as described in Theorem 4.3. Let R be the (finite) set of defining relations of G ′ (M n ). Let X be the set of numbers ǫ such that M n accepts the configuration (ǫ, 0, 1, 0). Let X ′ be the set of numbers ǫ such that M n works infinitely long starting with the configuration (ǫ, 0, 1, 0). Then X and X ′