Asymptotic invariants, complexity of groups and related problems

We survey results about computational complexity of the word problem in groups, Dehn functions of groups and related problems.


Introduction
The word and conjugacy problems are the most classical algorithmic problem for groups going back to the work of Dehn and Tietze at the beginning of the 20th century. These very basic problems are interested by themselves and because of applications to other areas of mathematics, especially topology (where the analogs of these problems are problems of existence of pointed or free homotopy between loops). There are very many papers devoted to these problems for various classes of groups. Different aspects of these problems are discussed in many books and surveys (see, for example, [24,35,50,72,78,91,137,147,162,181,187,207,208]). Still there are very many new results and ideas that appeared after the last of these surveys was published. We can mention, for example, the construction of finitely presented groups with polynomial-non-recursive and even quadratic-non-recursive Dehn functions [182], finding a nilpotent finitely generated group with Dehn function not of the form n α for any α [235], the proof that all groups SL n (Z), n ≥ 5, the R.Thompson group F , extensions of finitely generated free groups by cyclic groups, have quadratic Dehn functions [239,111,36], the description of all FFFL (space) functions of finitely presented groups and the algebraic characterization of groups with PSPACE word problem [183], solving the isomorphism problem for arbitrary hyperbolic groups [65] and many others. Several new methods are used in the proofs of these results and our goal in this survey is to give as gentle as possible an introduction to these results and methods. To this end, we often present not the results in their full generality but their easier to explain approximations. In this regard, this survey is similar to our survey [137] with Olga Kharlampovich.
Another feature of this survey is the emphasis on "related problems". For example we consider hyperbolic groups (Section 3.1). This is the class of groups with the smallest possible (linear) Dehn functions. The word problem in a hyperbolic group can be solved in linear time (in fact by a real time Turing machine [123]). We mention that this class is very large: almost all finitely presented groups are hyperbolic. This leads us to the discussion of various probabilistic models used to clarify the words "almost all", to small cancelation conditions (including coarse small cancelation conditions over hyperbolic group), to different ways of constructing hyperbolic groups, Gromov-Olshanskii theory of quotients of hyperbolic groups and various combination theorems, to various monsters constructed as limits of hyperbolic groups and to Gromov's random groups. We also discuss various weakening of the linear isoperimetric inequality including Wenger's result from [234] and Cartan-Hadamard local-to-global type theorems of Gromov [98] and others. Similarly, when we consider groups with quadratic Dehn functions (Section 4), we describe the very diverse zoo of examples of these groups, present ideas of the proofs of results from [239], [111] and others. Then we notice that all known groups with quadratic Dehn functions (except for automatic groups) are known to have solvable conjugacy problem although the solvability of conjugacy problem was proved by very different methods in different classes of groups. So we formulate and discuss a general problem (due to Rips).
Most of the paper is devoted to the Dehn functions which measure the time complexity of the word problem. One can argue that after the growth function (first introduced by A. S.Švarc in [226], and later by Milnor [164]) which counts the number of different elements of the group that can be represented by products of generators of length ≤ n, this is possibly the most important and basic geometric invariant of a group. 1 We also discuss the space complexity and the recent results of Olshanskii [183]. We present an algebraic characterization of groups with word problem in NP from [22] and groups with word problem in PSPACE from [183], discuss examples of groups with NP-complete [210] and coNPcomplete [21] word problems.
Note that we could not survey all aspects of complexity of the word and conjugacy problems in groups. For example we do not talk about the average case complexity, generic solutions of algorithmic problems, search problems and other complexity related issues used in group based cryptography [168]. Fortunately, there is a nice recent survey by Shpilrain [219] where at least some of these topics are discussed.
Acknowledgement. I am grateful to Efim Zelmanov who inspired me to write this survey. Several people contributed with suggestions, comments and even pieces of text. I am especially grateful to Martin Bridson, François Dahmani, Daniel Groves, Victor Guba, Sergei Ivanov, Bruce Kleiner, Igor Lysënok, Alexander Olshanskii, Denis Osin, Tim Riley, Stefan Wenger and Robert Young.

Algorithmic problems in groups
Let X be a set. A word over X is a sequence of elements of X. A group word over X is a sequence of elements of X and their inverses, i.e. symbols x −1 , x ∈ X. The length of a word W is denoted by |W |. A group word is called reduced if it does not contain subwords of the form xx −1 and x −1 x, x ∈ X. Every group word can be made reduced by removing all subwords of these forms. The set F (X) of all reduced group words over X is equipped with the binary operation: the product U V of two reduced group words U, V is the result of reducing the concatenation of U and V . With this operation, the set F (X) turns into a group which is called the free group over X. The identity element of that group is the empty word denoted by ∅. For every group G, any map X → G extends uniquely to a homomorphism F (X) → G: any word is mapped to the product of the images of its letters. In particular, every group G generated by at most |X| elements is a homomorphic image of F (X) so that the image of X generates G. Let φ be one of these (surjective) homomorphisms.
The word problem in G (related to φ) is the following In principle, the existence of an algorithm to solve Problem 2.1 or 2.2 depends not only on G but also on φ. There is, however, an important case when φ does not matter. Suppose that X is finite. Then G = φ(F (X)) is called finitely generated. Let N be the kernel of φ. It is a normal subgroup of F (X). Suppose that N is generated as a normal subgroup by a finite set R ("generated as a normal subgroup" means that every element of N is a product of conjugates of elements of R and their inverses). Then we say that G has a finite presentation X | R . For finitely presented groups the solvability of the word or conjugacy problem (i.e. the existence of the needed algorithm) does not depend on the choice of φ or even on the choice of the finite set X (as long as φ is surjective).
Note that the conjugacy problem is stronger than the word problem, that is if the conjugacy problem is solvable in G, then the word problem is also solvable. Indeed, only the identity element can be a conjugate of the identity element.

Van Kampen diagrams.
2.1.A. The definition. Let G be a finitely presented group given by a finite set of generators X and a finite set of relators R, φ is a homomorphism from F (X) onto G, N is the kernel of this homomorphism. Then words in N are precisely products of generators of G and their inverses that are equal to 1 in G. The word problem for G is then the membership problem in N for the elements of F (X). Note that if the product uv is in N then vu is also in N (as a conjugate of uv). The word vu is called a cyclic shift of uv. We shall always assume (without loss of generality) that R consists of cyclically reduced words, that is group words all of whose cyclic shifts are reduced. Since N is generated by R as a normal subgroup, a reduced group word W from F (X) is in N if and only if W can be represented in the free group as a product of conjugates of elements of R and their inverses: The minimal number m in all representations (1) of W is called the area of W for the reasons explained below. For every representation (1), we can draw a planar diagram, a bouquet of "lollipops" which is a planar labeled graph. Each "lollipop" corresponds to one of the factors s i r i s −1 i , it has a stem, a path labeled by s i (i.e. the stem is subdivided into edges labeled by the letters of s i ), and a candy, a cycle path labeled by r i (see Figure 1).
Going counterclockwise around the "lollipop" starting and ending at the tip of the stem, we read s i r i s −1 i . Thus going counterclockwise around the diagram which is the bouquet of "lollipops", we read the word which is the right hand side of (1).
In order to make the word W from this word, we need to reduce the boundary of the bouquet of "lollipops" (the boundary is traced counterclockwise): every time we see a pair of consecutive edges on the boundary of the diagram, which have the same label and the same initial or terminal vertex (see Figure 2), we identify these two edges (if the edges have both vertices in common, we identify the two edges and remove the whole subgraph bounded by them on the plane). This amounts to removing a subwords xx −1 and x −1 x from the right hand side of (1). The resulting picture is a van Kampen diagram for W over the presentation X | R , that is a planar graph with edges labeled by elements of X, the boundary of each cell (i.e. the closure of a bounded connected components of the plane minus the graph) labeled by words from R ±1 , and the boundary of the whole graph (i.e.  [147] for a slightly different definition and [175] for another one).  Figure 3 shows how a typical van Kampen diagram may look like. One can see that the cells may be of different sizes and shapes, a cell can touch itself, etc. It is important also that a diagram itself may not be an embedded disc: several pieces as on Figure 3 can be connected by paths to form a tree of discs (as the bouquet of lollipops in Fig 1). Nevertheless it is a planar graph, and there are various general methods to study van Kampen diagrams (the whole book [175] is essentially about the study of van Kampen diagrams used to construct groups with extreme properties such as infinite bounded torsion groups, Tarski monsters, etc.). This is a half of the so called van Kampen lemma [147,175,35]. The converse statement (that the boundary label of every van Kampen diagram is equal to 1 in G) constitutes the second half. In order to prove it, we have to "undo" the construction of van Kampen diagram above, and, given a van Kampen diagram, produce an equality of the form (1). The proof is by cutting a van Kampen diagram along the edges, to produce a tree of "lollipops". It is obvious that this can be done somehow. In [187], we presented a useful and economical way to cut a van Kampen diagram (so that every edge is used at most four times). Since cutting van Kampen diagrams into nice pieces is the main ingredient of many proofs involving van Kampen diagrams (see Section 4.6 below, for example), we reproduce the proof here.
Note that if a word W can be represented in the free group in the form u 1 r 1 u 2 . . . u m r m u m+1 where u 1 u 2 . . . u m+1 = 1 in the free group, then W is equal (in the free group) to (u 1 r 1 u −1 1 )(u 1 u 2 r 2 u −1 2 u −1 1 ) . . . , and so W ∈ N . The converse statement is obvious.
Proposition 2.4 (The second half of the van Kampen lemma). Let ∆ be a van Kampen diagram over a presentation X | R where X = X −1 , R is closed under cyclic shifts and inverses. Let W be the boundary label of ∆. Then W is equal in the free group to a word of the form u 1 r 1 u 2 r 2 . . . u m r d u m+1 where: (1) Each r i is a cyclic shift of a word from R ±1 ; (2) u 1 u 2 . . . u m+1 = 1 in the free group; where e is the number of edges of ∆. In particular, W is in the normal subgroup N and is equal to 1 in G.
Proof. If ∆ has an internal edge (i.e. an edge which belongs to the boundaries of two cells) then it has an internal edge f one of whose vertices belongs to the boundary. Let us cut ∆ along f leaving the second vertex of f untouched. We can repeat this operation until we get a diagram ∆ 1 which does not have internal edges. It is easy to see that the boundary label of ∆ 1 is equal to W in the free group. The number of edges of ∆ 1 which do not belong to contours of cells (let us call them edges of type 1) is the same as the number of such edges in ∆ and the number of edges which belong to contours of cells in ∆ 1 (edges of type 2) is at most twice the number of such edges of ∆ (we cut each edge from a contour of a cell at most once, after the cut we get two external edges instead of one internal edge).
Suppose that a cell Π in ∆ 1 has more than one edge which has a common vertex with Π but does not belong to the contour of Π. Figure 4. Cutting Π ′ inside Π.
Take any point O on 2 ∂(Π) which belongs to one of the edges not on ∂(Π). Let p be the boundary path of ∆ 1 starting at O and let q be the boundary path of Π starting at O. Consider the path qq −1 p. The subpath q −1 p bounds a subdiagram of ∆ 1 containing all cells but Π. Replace the path q in qq −1 p by a loop q ′ with the same label starting at O and lying inside the cell Π. Let the region inside q ′ be a new cell Π ′ . Then the path q ′ q −1 p bounds a diagram whose boundary label is equal to W in the free group F (X). Notice that Π ′ has exactly one edge having a common vertex with Π ′ and not belonging to the contour of Π ′ (see Figure 4). Thus this operation reduces the number of cells Π such that more than one edge of the diagram has a common vertex with Π but does not belong to the contour of Π. After a number of such transformations we shall have a diagram ∆ 2 which has the form of a tree T with cells hanging like leaves (each has exactly one common vertex with the tree).
The number of edges of type 1 in ∆ 2 cannot be bigger than the number of all edges in ∆ 1 , so it cannot be more than two times bigger than the total number of edges in ∆.
The boundary label of ∆ 2 is equal to W in the free group, and it has the form where m is the number of cells in ∆, u 1 u 2 . . . u m+1 is the boundary label of a tree (traced counterclockwise), so u 1 u 2 . . . u m+1 = 1 in the free group. The sum of lengths of u i is at most four times the number of edges in ∆ because the word u 1 u 2 . . . u m+1 is written on the tree T , and when we travel along the tree, we pass through each edge twice.
2.1.B. The 0-cells. One can modify van Kampen diagrams to make all cells embedded discs (so that the boundary of a cell does not touch itself) and to make the whole diagram an embedded disc by introducing the so-called 0-cells [175], i.e. cells corresponding to relations 1 · 1 = 1 and 1 · a · 1 · a −1 = 1 (see Figure 5). The edges labeled by 1 are called 0-edges. Indeed, adding these relations does not change the group. Now, for every cell in the van Kampen diagram (the cell is drawn with thick lines on Figure 6), we draw an embedded disc inside the cell, label its boundary by the same word which labels the boundary of the original cell, and connect every vertex of the boundary of the new cell with the corresponding vertex of the boundary of the original cell by a 0-edge. Thus we replace every cell of the original diagram by a new cell with the same boundary label and several 0-cells. The new van Kampen diagram will have all cells embedded discs. Similarly, if a diagram is not an embedded disc, say, it consists of two disc subdiagrams connected by a path which is a part of the boundary of the diagram, we can replace that path by a sequence of 0-cells. As a result, we can make the whole diagram an embedded disc. When we count the number of cells (edges) in a diagram we usually do not count 0-cells (0-edges). In particular 0-edges do not affect the calculation of the length of a path in a van Kampen diagram. The 0-cells were introduced by Olshanskii [175] in order to make his proofs cleaner, but they proved to be a very useful technical tool. Let us mention just two applications. First, 0-cells were used in solving quadratic equations in free groups in [174] (see 2.3.B) and subsequent papers (there are no non-0-relations in the standard presentation of the free group). Second, a natural generalization of 0-cells was a crucial tool in constructing a finitely presented non-amenable group without free subgroups [188] (see 5.4).
2.1.C. Van Kampen diagrams and tilings. An elementary school problem and its nonelementary solution. As an easy application of van Kampen diagrams, consider the following elementary problem.  Figure 7. The chess board with two squares removed and two dominos.
Proof. The elementary solution of this problem is well known. Color squares of P in black and white in the usual (chessboard) way. Then the number of squares of one color (black or white) in P differs from the number of squares of the other color by 2. Since each domino covers exactly one white and exactly one black square, P cannot be tiled by dominos.
Here is a solution which, although less elementary, can be applied to many regions of the plane square grid for which the elementary proof above does not work; it also applies to regions of non-square (say, hexagonal) lattices on the plane. This solution first appeared in a paper by W. Thurston [227]. The ideas of that paper have many applications in several areas of mathematics from combinatorics to probability to mathematical physics.
Let us label horizontal edges pointed rightward in P by the letter a and vertical edges pointed upward by the latter b. Then the (counterclockwise) boundary of P has label W = a 7 b 7 a −1 ba −7 b −7 ab. Every domino can be placed either vertically or horizontally. In the first case its boundary label is ab 2 a −1 b −2 , and in the second case its boundary label is a 2 ba −2 b −1 . Now consider the group G with the presentation a, b | ab 2 a −1 b −2 = 1, a 2 ba −2 b −1 = 1 . Suppose that P can be tiled by the dominos. Then every such tiling turns P into a van Kampen diagram over the presentation of G. Hence, by the van Kampen lemma (more precisely by Proposition 2.4), the word W would be equal to 1 in G. But consider the 6-element symmetric group S 3 and two permutations α = (1, 2), β = (2, 3) in it. Clearly both relations of G hold if we replace a by α and b by β. Hence the map a → α, b → β extends to a homomorphism G → S 3 . Note that W (α, β) = (αβ) 4 = αβ is the permutation (1, 3, 2) which is not trivial. Hence W is not equal to 1 in G, a contradiction.
This example shows a similarity between the word problem in groups and the tiling problem. Indeed, by the van Kampen lemma, a word W is equal to 1 in G = X | R if and only if we can tile a disc with boundary labeled by W by pieces whose boundaries are labeled by words from R. Nevertheless the word problem differs from the tiling problem as in Example 2.5 because, as we have seen, when we draw a van Kampen diagram, we do not fix the shapes or sizes of the tiles, only the labels of the boundaries of them, so one can view a general van Kampen diagram as a tiling of a disc by tiles made of soft rubber, while the traditional tiling problems such as Example 2.5 are about tiles made of hard plastic.
2.2. The computational complexity and algebraic systems.

2.2.A.
Solvability of algorithmic problems: theory and practice. The solvability of an algorithmic problem does not mean that the problem can be solved in practice. First of all the existence of an algorithm does not mean that it is readily available. For example, the word problem in every finite group is certainly decidable. But there is no algorithm that, given a finite set of relations r i = 1, i = 1, . . . , n, and a relation r = 1 over a finite alphabet X, would decide whether every finite group generated by X where the relations r i = 1 hold (i = 1, . . . , n) also satisfies r = 1. That is a result of Slobodskoj [221]. Thus there is no uniform algorithm solving the word problem in all finite groups at once. In general we usually ask for existence of an algorithm solving certain algorithmic problem but not for a procedure to find this algorithm. (In most concrete cases, if an algorithmic problem in a given group is decidable, we are able to write down an algorithm that solves the problem.) Another, more important, obstacle is that the algorithm can be too slow. For example, if a finitely presented group G is residually finite 3 , then the word problem can be solved by the so called McKinsey algorithm [156]: list all finite quotients of G (that is consider all finite groups H one by one, consider every map φ from the set of generators of G into H, and check if this map takes all r i to 1, and whether the images of generators of G generate H; if -yes, then include the pair (H, φ) in the list of quotients), and at the same time list all the corollaries of the defining relations of G (say, list all products of conjugates of r i 's and their inverses). Every word r is either in the list of corollaries or is not trivial in one of the finite quotients. In the first case r = 1 in G, in the second case, r = 1 in G, and after some time we will surely know which of this options holds. It is clear that in general this algorithm is very slow. Even for a short word r it would take a lot of time to decide, using this algorithm, if r = 1 in a given residually finite group.

2.2.B.
Computational complexity. Thus the next thing to do, after we find out that an algorithmic problem is decidable, is to find the computational complexity of this problem.
To make this more precise let us present here some concepts from the Computational Complexity Theory.
Any decision problem D may be considered as a membership problem for elements of some set B D in a subset S D . For example if D is the word problem for a group G = X | R , then B D is the set of all reduced group words in the alphabet X, and S D is the set of products of conjugates of elements of R and their inverses in the free group over X (i.e. the normal subgroup of the free group over X generated by R).
With any element x in B D one associates a number which is called the size of this element. Usually the size is roughly the minimal space which is needed to write x down. For example the size of a word is typically its length.
The size depends on the way we choose to represent the elements. For example if x is a natural number then we can represent x as x units. Then the size of x will be equal to x. If we represent x as a sequence of binary digits then the size of x will be approximately log 2 (x).
Algorithms may be realized by Turing machines (for a formal definition see below). We can assume that this machine is equipped with a voice synthesizer and can say two words "Yes" and "No". An algorithm solving the decision problem D starts working with an element x of B D written on the tape of the machine. When it ends, it says "Yes" if x ∈ S D or "No" if x ∈ S D . There are also different kinds of algorithms that say "Yes" when x ∈ S D and work indefinitely long without saying anything if x ∈ S D . Note that if there is an algorithm that recognizes the "yes" part and there is an algorithm recognizing the "no" part of D, then there exists an algorithm solving D.

2.2.C.
Turing machines (a formal definition). Recall one of the many equivalent definitions of a Turing machine.
A multi-tape Turing machine has k tapes and k heads observing the tapes. One can view it as a quadruple . . , k is the set of states 4 of the heads of the machine, Θ is a set of commands. Some (recognizing) Turing machines also have two distinguished vectors of state letters: s 1 is the k-vector of start states, s 0 is the k-vector of accept states.
The leftmost square on every tape is always marked by α, the rightmost square is always marked by ω. The head is placed between two consecutive squares on the tape. A configuration of a tape of a Turing machine is a word αuqvω where q is the current state of the head of that tape, u is the word in Y to the left of the head and v is the word in Y to the right of the head, and so the word written on the entire tape is uv.
At every moment the head of each tape observes two letters on that tape: the last letter of u (or α) and the first letter of v (or ω).
A configuration U of a Turing machine is the word Assuming that the Turing machine is recognizing, we can define input configurations and accepted (stop) configurations. An input configuration is a configuration where the word written on the first tape is in I + , all other tapes are empty, the head on the first tape observes the right marker ω, and the states of all tapes form the start vector s 1 . An accept (or stop) configuration is any configuration where the state vector is s 0 , the accept vector of the machine. We shall always assume (as can be easily achieved) that in the accept configuration every tape is empty.
A command of a Turing machine is determined by the states of the heads and some of the 2k letters observed by the heads. As a result of a transition we replace some of these 2k letters by other letters, insert new squares in some of the tapes and may move the heads one square to the left (right) with respect to the corresponding tapes.
For example in a one-tape machine every transition is of the following form: where u, v, u ′ , v ′ are letters (could be end markers) or empty words. The only constraint is that the result of applying the substitution uqv → u ′ q ′ v ′ to a configuration word must be a configuration word again, in particular the end markers cannot be deleted or inserted. This command means that if the state of the head is q, u is written to the left of q and v is written to the right of q then the machine must replace u by u ′ , q by q ′ and v by v ′ .
For a general k-tape machine a command is a vector where U i → V i is a command of a 1-tape machine, the elementary commands (also called parts of the command) U i → V i are listed in the order of tape numbers. In order to execute this command, the machine first checks if every U i is a subword of the configuration of tape i (i = 1, . . . , k), and then replaces all U i by V i . A computation is a sequence of configurations C 1 → · · · → C n such that for every i = 1, . . . , n − 1 the machine passes from C i to C i+1 by applying one of the commands from Θ. A configuration C is said to be accepted by a machine M if there exists at least one computation which starts with C and ends with an accept configuration.
A word u ∈ X * is said to be accepted by the machine if the corresponding input configuration is accepted. The set of all accepted words over the alphabet X is called the language accepted (recognized) by the machine Let C = C 1 → · · · → C n be a computation of a machine M such that for every j = 1, . . . , g − 1 the configuration C j+1 is obtained from C j by a command θ j from Θ. Then we call the word θ 1 . . . θ n−1 the history of this computation. The number n will be called the time (or length) of the computation. Let p i (i = 1, . . . , g) be the sum of the lengths of the configurations of the tapes in the configuration C i . Then the maximum of all p i will be called the space of the computation.
We do not only consider deterministic Turing machines, for example, we allow several transitions with the same left side.
Note that Turing machines can also be viewed as rewriting systems where objects are the configurations and substitution rules are the commands of the machine. This point of view allows one to use the tools of the string rewriting theory, normal forms, confluence, etc. One can also view Turing machines as semigroups of partial transformations of the set of configurations generated by the transformations induced by the commands. This point of view allows to explore dynamics properties of Turing machines (say, periodic computations). That is often needed when one tries to simulate a Turing machine by a group [188]. The time (space ) function t A (n) (resp. s A (n)) of a Turing machine A is the minimal number of steps of the machine (resp. number of cells on the tape visited by the machine) to decide that any element x ∈ S D of size ≤ n is indeed in S D .
We can compare the time (space) functions of algorithms as follows. If f, g : N → N, we write f ≺ g if for some constant C we have f (n) ≤ Cg(Cn) + Cn + C for every n ∈ N.
We say that f and g are equivalent, if f ≺ g and g ≺ f . Thus every sublinear function is equivalent to n, functions n 5 and 2n 5 , 2 x and 3 x are equivalent, but functions n 3.1 and n 3.2 are not.
The following relation is clear: Indeed, even if at every step the algorithm used a new tape cell, s A (n) would be only equivalent to t A (n) (since the number of cells used by the machine at every step is bounded). In reality s A (n) is almost always less than t A (n). On the other hand the following relation also holds [83]: 2.2.E. A very rarely accepting Turing machine. Turing machines, like groups, can have some extreme properties. It is well known, for example, that a Turing machine can have undecidable halting problem, that is there is no algorithm that checks whether the machine will ever stop starting with a given input configuration. This is obviously equivalent to the property that the time and space functions of this Turing machine are bigger than any given recursive function (i.e. a function N → N whose graph and its complement can be recognized by a Turing machine). In Section 4.6 we shall need a Turing machine with even stronger property. Let X be a recursively enumerable language in a 2-letter alphabet, i.e. a set of binary words recognized by a (non-deterministic) Turing machine M . If x ∈ X then the time of x (denoted time(x) or time M (x)) is, by definition, the minimal time of an accepting computation of M with input x. For any increasing function h : N → N, a real number m is called h-good for M if for every w ∈ X, |w| < m implies h(time(w)) < m.
Theorem 2.6 (Olshanskii, Sapir [182]). There exists a Turing machine M recognizing a recursively enumerable non-recursive set X such that the set of all h-good numbers for M is infinite.
It is easy to observe that the set G of h-good numbers of M cannot be recursively enumerable. Moreover it cannot contain any infinite recursively enumerable subset. Indeed, if G contained an infinite recursively enumerable subset R, then for every input configuration C of M we would start the Turing machine enumerating R, and wait till we get a number r from R that is bigger than the size of C. Then to check if M halts on C, we would check all (finitely many) computations of M starting with C and having length less than log log r. The Turing machine M halts starting with C if one of these computations ends with the accept configuration. Hence the set X would be recursive. On the other hand, the complement of G is recursively enumerable because in order to verify that r is not an h-good number, one needs only to start M and wait till it accepts an input configuration C of size ≤ r but time(C) > log log r. Thus G is an immune set and its complement is simple in terminology of [154]. Note that the very existence of simple and immune sets was a non-trivial problem solved by Muchnik and Friedberg using the famous priority argument, answering a question of Post (see [154] and [222] for details). Our proof of Theorem 2.6 also uses the priority argument and Matiyasevich's solution of the Hilbert 10th problem [158].
2.2.F. The complexity classes. If there exists an algorithm A which solves the "yes" part of problem D and t A (n) is bounded from above by a polynomial in n then we say that D can be solved in polynomial time. The solvability in exponential time, polynomial space, exponential space, etc. is defined similarly.
It is worth mentioning that if we modernize the Turing machine by, say, adding more tapes or heads, we won't change the complexity of the problem much. For example a nonpolynomial time (space) problem cannot become polynomial as a result of that. The class of all problems which can be solved in polynomial time is denoted by P.
If in the definition of the time complexity, we replace the deterministic Turing machines by non-deterministic Turing machines, then we obtain definitions of the solvability in nondeterministic polynomial time, non-deterministic exponential time, etc. Recall that a nondeterministic Turing machine is more intelligent than a deterministic one: it does not blindly obey the commands of the program, but, at every step, guesses itself what the next step should be. Roughly speaking a problem D can be solved in non-deterministic polynomial time if for every element x ∈ B D there exists a proof (usually called witness) that x belongs to S D and the length (size) of this proof is bounded by a polynomial of the size of x. The class of all problems which can be solved in polynomial time by a non-deterministic Turing machine is denoted by NP. It is not known if P=NP. This is one of the central problems in Theoretical Computer Science.
In order to prove that a problem is solvable in polynomial time it is enough to find a polynomial time algorithm solving this problem.
We have mentioned that the complexity of the McKinsey algorithm presented in 2.2 is very high, but we do not know any finitely presented residually finite group with really hard word problem. Thus we formulate Problem 2.7. Is there (time or space) complexity bound for the word problem in a finitely presented residually finite group. More precisely, what is the highest class in the time complexity hierarchy [83] where the word problem in every finitely presented residually finite group belongs? An even more bold question: is the word problem of every finitely presented residually finite group in NP? 2.2.G. Reducing one problem to another. In order to prove that a problem D cannot be solved in polynomial (exponential, etc.) time one has to take another problem Q which is known to be "hard" and reduce it to D.
There are several kinds of reductions used in the Computer Science literature. One of them, the polynomial reduction in the sense of Karp [83], is the following. A reduction of a problem Q to a problem D is a function φ from B Q to B D such that • An element x from B Q belongs to S Q if and only if φ(x) belongs to S D .
• The element φ(x) can be computed in polynomial time, in particular the size of φ(x) is bounded by a polynomial of the size of x. It is clear that if Q is "hard" and Q can be reduced to D then D is "hard" as well. Notice that, when we prove the undecidability of a problem D, we usually also reduce a problem Q known to be "hard" (which in this case means undecidable), to D. In order to reduce Q to D we find a similar mapping φ, but we do not care about the size of φ(x). The "basic hard" problem is usually the halting problem of a Turing machine (or some other kind of machine such as Minsky machines). But it could be a different kind of undecidable problem such as the Hilbert's 10th problem (see [137,Section 6] for details) or a membership in the range of a general recursive function as in McKenzie-Thompson [155].

2.2.
H. An NP-complete problem. Many known problems A are such that A is in NP and every other NP-problem polynomially reduces to A. Such problems are called NP-complete.
One of these problems is the following exact bin packing problem 2.2.I. Complexity classes and algebraic systems. A deep connection between Computer Science and algebra was found by R.Fagin [79]. He proved, in particular, the following amazing model-theoretic characterization of classes of finite algebras whose membership problem can be solved in non-deterministic polynomial time. Recall that an algebraic system is by definition a set with certain operations and predicates of various arities [154]. For example, a linearly ordered group is an algebraic system with one binary operation (product), one unary operation (taking inverses), one nullary operation (the identity element) and one binary predicate giving the order relation.
Theorem 2.9 (Fagin [79]). The membership problem for an abstract (i.e. closed under isomorphisms) class of finite algebraic systems is in NP if and only if it is the class of all finite models of a second-order formula of the following type: where Q i is a predicate, and Θ is a first-order formula.
Basically this theorem means that the membership problem of a class of finite algebraic systems is in NP if and only if we can describe the structure of algebraic systems of this class in terms of functions and relations. Since all known methods of studying the structure of algebraic systems are based on studying functions (homomorphisms, polynomial functions, etc.) and relations (congruences, orders, etc.) we can conclude that for the membership problem of a class of finite algebras being in NP is equivalent to admitting a reasonable structure description.
Note that a word in an alphabet with k letters can be considered as an algebraic system as well: a word is a linearly ordered finite set (say, {1, 2, . . . , n}) with k unary predicates L x indexed by the letters of the alphabet such that L x (p) holds if the p-th letter in the word is x. The set of all words in the k-letter alphabet can be defined by obvious axioms involving these k + 1 predicates (the order predicate should satisfy the axioms of linear order, plus for every p = 1, 2, . . . , n one and only one of the statements L x (p) holds). Reduced group words can also be defined using similar axioms, etc. Hence being able to solve the word problem in a group in non-deterministic polynomial time means, by Fagin's theorem, that we can find algebraic description of words that are equal to 1 in the group.
Note that classes where the membership problem has other types of computational complexity have similar model theoretic characterizations. Classes with membership problem in P have been characterized by Immerman [127], Sazonov [214], [213] and Vardi [230]. Classes with exponential time membership problem were characterized by Fagin [79]. Classes with non-deterministic exponential time membership problem have been characterized by Fagin [79] and Jones and Selman [132]. For a detailed survey of these results see [80].
2.2.J. Van Kampen diagrams as witnesses. Now let again G = X | R be a finitely presented group. Suppose that a word w in X is equal to 1 in G. Then by Lemma 2.3of van Kampen, w labels the boundary of a van Kampen diagram ∆ over the presentation of G. Let us consider that van Kampen diagram as a witness for the equality w = 1. Since a van Kampen diagram cannot be drawn on a tape of a Turing machine, we need to replace it by an equivalent object which is a word. Such an object is given by Proposition 2.4. Instead of ∆, we should consider the (non-reduced) product from that proposition. By Proposition 2.4, the length of that word is -up to a constant multiple -the number of cells in ∆ plus |w|. Since we identify functions that differ by a constant factor or linear summand, we can say that the size of ∆ is the number of cells, i.e. the area of ∆. Clearly given a word of the form (2), i.e. a word subdivided into subwords u i , r j , it takes linear time (in the length of the word) to check if it indeed satisfies conditions of Proposition 2.4. Thus we obtain the following Proposition 2.10 (See the proof of Theorem 1.1 [210]). Suppose that for every word w that is equal to 1 in G there exists a van Kampen diagram over the presentation of G with at most f (|w|) cells and boundary label w. Then the "yes part" of the word problem of G can be solved in non-deterministic time at most f (n).
In particular if f is bounded by a polynomial, then the "yes" part of the word problem of G is in NP.

2.3.
A. The definition. Let S = S g,n be a (possibly non-oriented) surface of genus g with n boundary components. Let again P = X | R be a finite group presentation. A diagram on S is a tessellation of S by a labeled graph with all edges labeled by letters of X and all cells (regions) being discs with boundaries labeled by words from R ±1 . In fact, a diagram on a surface can be obtained from a usual (disc) van Kampen diagram as follows. Let ∆ be a disc diagram whose boundary is a polygon where some pairs of sides are labeled by the same or mutually inverse words. If we identify these pairs of sides, we get a diagram on a surface (with possibly non-empty boundary). Conversely, given a diagram on a surface S, we can cut the surface by simple closed curves and arcs following edges of the diagram so that we obtain a polygon with some pairs of sides (corresponding to the cutting curves) labeled by the same or mutually inverse words). The result is an ordinary van Kampen diagram. For example, if we want to find out whether an element represented by a word u is a product of g commutators 5 (Olshanskii [174], see also Grigorchuk and Kurchanov [94], Griorchuk and Lysënok [95], and Lysënok [149]). For every n, the solvability problem for quadratic equations in n variables in any free group is in P (the size of the input of the problem is the sum of lengths of the coefficients).
Note that the solvability of that problem was first proved by Cromerford and Edmunds in [56].
Recently Kharlampovich, Lysënok, Myasnikov and Touikan [135] proved that the uniform solvability problem (when the number of variables is not fixed) is harder. They also use diagrams on surfaces and results from [174], plus some clever arguments reducing NPcomplete exact bin packing problem (Problem 2.8) to the problem of filling (tesselating) a surface with a van Kampen diagram over the presentation consisting of 0-cells.
Theorem 2.12 (Kharlampovich,Lysënok,Myasnikov,Touikan [135]). The (uniform problem of ) solvability of quadratic equations in the free group in NP-complete (the input of that problem is an equation, the output is "yes", if the equation has a solution in a free group).
Note that the fact that solvability of quadratic equations over free monoids is NP-hard was proved before [135] by Diekert and Robson in [69]. It is still not known whether that (seemingly easier) problem is NP-complete. As Volker Diekert told us, the NP-completeness in the case of monoids would imply Theorem 2.12, and so the case of monoids is very interesting.
2.3.C. Quadratic equations and the Poincaré conjecture. Note that quadratic equations even without coefficients (i.e. homomorphisms from the surface group) are closely related to the Poincaré conjecture. There are several reformulations of the Poincaré conjecture in terms of homomorphisms of surface groups onto the direct product of two free groups (see Jaco [131], Stallings [225], Olshanskii [174]). Here is the reformulation from Hempel [119]. Since the Poincaré conjecture has now been proved by Perelman [197,198], we formulate that reformulation as a theorem.
Theorem 2.13 (Hampel [119], Perelman [197,198]). Let S be a closed oriented surface, π 1 (S) be its fundamental group. Suppose that φ 1 and φ 2 are two homomorphisms from π 1 (S) onto the direct product of two free groups F n × F n . Then there is an automorphism α of π 1 (S) and an automorphism β of F n × F n such that αφ 1 β = φ 2 .
This theorem means that there exists essentially at most one "maximal" solution of the equation [x 1 , y 1 ] . . . [x g , y g ] = 1 in F n × F n for every n.

2.3.D.
A link between Poincaré conjecture and NP. The proof of Poincaré conjecture implies that certain algorithmic problems are in NP. For example, consider the following problem Problem 2.14. (Triviality of the fundamental group of a 3-manifold).
Input. A compact 3-manifold M without boundary represented as a union of tetrahedra.
Output. "Yes" if the fundamental group of M is trivial and "no" otherwise.
Note that the triviality problem is a part of the uniform word problem (see 2.2.A): given a presentation of a group we need to decide if each generator is equal to 1 in the group. Note also that giving a triangulation of M , we can easily find the presentation of π 1 (M ) which is of a comparable size as M .
That Problem 2.14 is in NP follows from the fact that simply connected compact 3manifolds are homeomorphic to the 3-sphere (the Poincaré conjecture) and the result of S.V. Ivanov [129] that the problem of recognizing a 3-sphere is in NP.

2.3.E.
Other equations in free groups. Arbitrary equations in free groups correspond to diagrams on more complicated 2-dimensional complexes than surfaces and the solvability problem for them is much harder. A very non-trivial result of Makanin shows that this problem is decidable. Theorem 2.15 (Makanin [151]). There exists an algorithm to check whether a system of equations over a free group has a solution.
Note that for some relatively easy groups the problem of solvability of systems of equations is undecidable (this is so for, say, a finitely generated nilpotent group, see 3.2.A).
Makanin's algorithm is very complicated and its complexity is known to be a tower of many exponents. Later, Makanin's algorithm was simplified and clarified by Razborov [201,202]. Razborov also gave a description of all solutions. Still the exact complexity of solving equations in free groups was an open problem till the following relatively recent result.
Theorem 2.16 (Diekert, Gutierrez, Hagenah [70]). The problem of solvability of an equation in a free group is in PSPACE, that is it can be solved by a deterministic Turing machine using polynomial amount of space (in terms of the size of the equation).
Other results from [70] show that PSPACE most probably cannot be improved because a very similar problem of deciding the existential theory of a free group (i.e. solving systems of equations and inequations of the form w = 1) turned out to be PSPACE-complete (the solvability of systems of equations and inequations in a free group was proved to be decidable by Makanin in [152]). See also 3.1.G below.
2.3.F. The conjugacy problem and annular diagrams. Note that the conjugacy problem is about solvability of the (quadratic) equations of the form xux −1 = v where u, v are coefficients and x is an unknown. Thus by 2.3, we need to consider diagrams on an annulus. So let the surface S be an annulus (i.e. the surface S 0,2 ), P be a presentation of some group G, and u, v be two group words over the generating set of this presentation. Then u and v are conjugate in G if and only if there exists a diagram ∆ on S over a presentation P with boundary labels u, v (that was first noticed by Schupp and appeared in [163], hence annular diagrams are sometimes called Schupp diagrams).

2.4.A.
The isoperimetric functions. Thus with every finite presentation P of group G we associate the following important function: f (n) is the smallest number such that every word w that is equal to 1 in G, |w| ≤ n, labels the boundary of a van Kampen diagram over the presentation of G with at most f (n) cells. This function is called the Dehn function of P. It is easy to see that the Dehn functions of different finite presentations of the same group are equivalent, hence we can talk about the Dehn function of a finitely presented group. Every function g : N → N with f ≺ g is called an isoperimetric function of G. The Dehn functions of groups were introduced (under different names) by Computer Scientists Madlener and Otto [150] studying complexity of the word problem in groups, and by Gromov [98] as a geometric invariant of groups (see also Gersten [86] where the name "Dehn function" was introduced).
As a direct consequence of Proposition 2.10, we get . Every Dehn function of a finitely presented group G is (equivalent to) the time function of a Turing machine solving (non-deterministically) the "yes" part of the word problem in G.

Hence we have
Theorem 2.18 (Madlener, Otto [150], Gersten [86]). The word problem in a finitely presented group is solvable if and only if the Dehn function (or one of the isoperimetric functions) is recursive.
Being a geometric invariant means, in particular, that if two finitely presented groups are quasi-isometric, then their Dehn functions are equivalent. Recall that if M 1 , M 2 are metric spaces with distance functions d 1 , d 2 , then a partial function φ : for all x, y ∈ M 1 and every point in M 1 is at distance at most C from the domain of φ, while every point in M 2 is at distance at most C from the range of φ. In this case we call φ (A, B, C)-quasi-isometry. If C = 0, we call φ an (A, B)-quasi-isometry, and if B = C = 0, then φ is an A-bi-Lipschitz map. Finally if B = C = 0 and we only have the second of the two inequalities in (3), then φ is an A-Lipschitz map.
We say that φ : M 1 → M 2 is a quasi-isometry (bi-Lipschitz map or Lipschitz map) from M 1 to M 2 if this map is a quasi-isometry (bi-Lipschitz map or Lipschitz map) between M 1 and φ(M 1 ).
The spaces M 1 and M 2 are called quasi-isometric if there exists a quasi-isometry between M 1 and M 2 . For example, any two Cayley graphs (see 2.8 for a definition) of a group with respect to two finite generating sets are quasi-isometric as metric spaces (the distance between two vertices is the length of a shortest path connecting them). Moreover any isomorphism between any subgroup of finite index in one group and a subgroup of finite index of another group induces a quasi-isometry of the Cayley graphs of these groups (that was a starting point of a proof of the celebrated Mostow rigidity theorem). Two finitely generated groups are called quasi-isometric if their Cayley graphs (with respect to finite generating sets) are quasi-isometric. It is not difficult to observe that two quasi-isometric groups have equivalent Dehn functions. In particular, if one of the two quasi-isometric groups has solvable word problem, the other group must also have solvable word problem (by Theorem 2.18).

2.4.B.
The conjugacy problem and quasi-isometry. Note that the analogous statement is not true for the conjugacy problem: there exist two pairs of groups A 1 > B 1 and B 2 > A 2 , each extension is of index 2, such that A 1 , A 2 have solvable conjugacy problem which B 1 , B 2 have unsolvable conjugacy problem (see  and Collins-Miller [55]).

2.4.C.
The isodiametric function. Another important function associated with van Kampen diagrams is the isodiametric function: for every word w that is equal to 1 in the group G = X | R , let di(w) be the minimal diameter of a van Kampen diagram with boundary label w. Then di(n) is the maximal value of di(w) for all w with |w| ≤ n, w = G 1. The area of a van Kampen diagram with boundary label w gives the number of factors in a representation of w as a product of conjugates of elements of R and their inverses (by Proposition 2.10). The diameter is roughly the maximal size of a conjugator used in this representation. It is easy to see that the isodiametric function does not depend (up to equivalence) on the finite presentation of G. It is known (D. Cohen [52], Gersten [85], Birget [20], Gersten and Riley [90]) that for every finitely presented group G the Dehn function f (n) and isodiametric function g(n) satisfy the following double exponential inequality: for some constants a, b. This immediately implies that the word problem in G is solvable if and only if the isodiametric function is recursive. It is still an open problem whether one can reduce the number of exponents to one. Problem 2.19 (D. Cohen [52], also attributed to Stallings). Is it true that for every finitely presented group G there exists a constant a > 1 such that where f, g are, respectively, the Dehn function and the isodiametric function of G?
2.5. Filling Length functions and the space complexity. If G = X | R , w = 1 in G, then there is a sequence of transformations where at every step we either insert a relation from R ±1 , insert or delete a subword of the form xx −1 , x ∈ X ±1 [150]. The area of w is equivalent to the minimal length of any such sequence. We can also estimate the space needed by a Turing machine in order to recognize words which are equal to 1 in G by looking at the sizes of the intermediate words w i . The minimum for all sequences (4) of the maximal length of w i is called the filling length of w.
One can follow Gromov [98] and define the filling length function FL(l) of a group using the filling length of words in the usual way. Gromov [99,Page 100] noticed that in many finitely presented groups FL(l) is at most a constant factor of the isodiametric function of G, and asked if it is so for all finitely presented groups. FL is probably the most obvious way to define the space function for a group, but it turned out not to be the most natural function, or, more precisely, unlike in the case of Dehn functions, different natural definition of the space function give drastically different results. In particular, as suggested in Bridson-Riley [39], one can allow taking cyclic shifts of words and splitting a word w i into a pair of words (u i , v i ) each equal to 1 in the group. The resulting function is called the fragmenting free filling length function of a group denoted FFFL(l) [39] or the space function of a group [183]. Both Gromov's filling length function FL and Bridson-Riley fragmented filling length function FFFL have nice geometric interpretation in terms of the transformation of van Kampen diagrams. Computing the filling length amounts to removing cells of the diagram one by one without changing the base point on the boundary and looking at the lengths of the boundaries of the resulting diagrams. When computing fragmenting free filling length function of a group, we are also allowed to change the base point, and divide a diagram into two subdiagrams (and then taking the sums of lengths of the boundaries of the pieces). It is proved in [39] that these functions behave differently for some finitely presented group G, for instance, the function FFFL can grow linearly while FL has exponential growth. Since cutting van Kampen diagrams is the main tool in studying them, the function FFFL seems to be the most natural group theory analog of the space function of a Turing machine (see Olshanskii [183]). As in the case of Dehn functions, the function FFFL does not depend on the presentation of a group (up to equivalence), and for every finitely presented group G there exists a non-deterministic Turing machine recognizing the word problem in G whose space function is equivalent to FFFL (this "diagram eating" Turing machine is essentially described in [210, Section 3, Proof of Theorem 1.1]). Thus the function FFFL is in general closer to the space complexity function of the word problem of a finitely presented group than FL.
Note that similarly to Theorem 2.17, but easier, one can establish the following Proposition 2.20 (Olshanskii [183]). The space function of a finitely presented group G is equivalent to the space function of a non-deterministic Turing machine. The language accepted by this machine coincides with the set of words equal to 1 in the group.

Two examples of Dehn functions.
2.6.A. Two concrete groups. We shall give two standard examples of group presentations where the Dehn function is computed explicitly.
Example 2.21. The Dehn function of Z 2 = a, b | ab = ba is at most quadratic (in fact it is exactly quadratic by, say, 2.9 or 3.2.A, but we shall prove only the upper bound).
Example 2.22. The Dehn function of the Baumslag-Solitar group BS(1, 2) = a, b | bab −1 = a 2 is at least exponential (in fact it is exactly exponential by, say, J. Groves and S. Hermiller [104], but we shall prove only the lower bound).

2.6.B. HNN-extensions.
Both examples are HNN-extensions (of the cyclic group). Recall [147] that if G is a group, A, B are isomorphic subgroups of G, and φ : A → B is an isomorphism, then the HNN-extension of G with free letter t and associated subgroups A, B is the group That is the presentation of that group is obtained by adding letter t to the generating set, keeping the defining relations of G, and adding the HNN-relation tat −1 = φ(a) for every a ∈ A. In fact it is enough to add HNN-relation for every generator of A. Hence if G is finitely presented, A, B are finitely generated, then HNN(G; A, b, φ) is finitely presented. We shall also use multiple HNN-extensions. These correspond to a group G, and a collection of pairs of isomorphic subgroups A i , B i , i = 1, . . . , n and isomorphisms φ i : A i → B i . The corresponding multiple HNN-extension of G has the following presentation A cell corresponding to an HNN-relation looks like a quadrangle (see Figure 8) with edges labeled by t i on the opposite sides pointing the same way; a and φ i (a) there are words representing the corresponding elements of A i and B i respectively.

Figure 8. An HNN-cell
If ∆ is a van Kampen diagram over this presentation, and π is an HNN-cell inside it, then each edge of π labeled by t i must be an edge of another HNN-cell, hence HNN-cells with t i -edges form t i -bands (also called t i -corridors), that is sequences of cells π 1 , . . . , π k where every two consecutive cells share a t i -edge. Bands are the main tools for studying van Kampen diagrams over multiple HNN-extensions.
The following basic lemma was essentially proved in [163].
Lemma 2.23. Let ∆ be a diagram over an HNN-extension presentation. Suppose that ∆ is minimal, i.e. has the smallest number of HNN-cells among all diagrams with the same boundary label. Then no t i -band is an annulus (i.e. one of the t i -edges of the first cell in the band does not coincide with any other t i -edge of the band).
Proof. Indeed, suppose a t i -band T is an annulus. Suppose for convenience that the t i -edges of that band point inside the annulus (the other case is similar). We can also assume that the subdiagram ∆ ′ bound by the inner boundary component of T is a smallest (under inclusion) among all subdiagrams bounded by inner boundaries of t j -annuli. Then ∆ ′ cannot contain any t j -annuli. Since the boundary of ∆ ′ does not contain any t j -edges, it cannot contain any HNN-cells except for the cells from T . Hence, by the van Kampen lemma (more precisely by Proposition 2.4) the boundary label of ∆ ′ is equal to 1 in G. Since φ i is an isomorphism, then the label of the outer boundary component of T is equal to 1 in G. Hence we can replace the annulus T together with ∆ ′ by a van Kampen diagram over G (by Lemma 2. Proof. Suppose that the natural homomorphism from G to the HNN-extension has a word w in the kernel. Then by Lemma 2.3 there exists a van Kampen diagram ∆ over the presentation of HNN(G; (A i ), (B i ), (t i )) with boundary label w. We can assume that ∆ is minimal. By Lemma 2.23 it does not contain t i -annuli. Hence every maximal t i -band in ∆ must start and end on the boundary of ∆. But the boundary of ∆ does not have t i -edges. Hence ∆ does not have HNN-cells, and it is a diagram over the presentation of G. By Proposition 2.4, w is 1 in G. Thus the kernel of the natural homomorphism is trivial. The group Z 2 can be considered as an HNN-extension in two different ways: we can consider a or b as a free letter (indeed ab = ba is equivalent to aba −1 = b and to bab −1 = a). Hence we can consider a-bands and b-bands. Let ∆ be a van Kampen diagram over the presentation of Z 2 with minimal number of cells among all diagrams with the same boundary label. It is easy to see (the proof is similar to the proof of Lemma 2.23) that every a-band can intersect every b-band in ∆ only once. Since there are no aand b-annuli in ∆ (by Lemma 2.23), every maximal a-and b-band starts and ends on the boundary of ∆. Hence if l is the length of the boundary of ∆, then there are at most l/2 maximal a-bands and at most l/2 maximal b-bands in ∆. Since every cell in ∆ is the intersection of an a-band and a b-band, the area of ∆ is at most l 2 /4 and the Dehn function is at most quadratic.
Example 2.22. We are going to use the fact that the cyclic subgroup a is exponentially distorted in the Baumslag-Solitar group BS(1, 2): b n ab −n = a 2 n (so the element a 2 n has linear in n length in BS(1, 2) and exponential length in a ).
Consider a diagram ∆ over the presentation of the Baumslag-Solitar group with boundary label ab −n a −1 b n a −1 b −n a −1 b n as on Figure 9. For simplicity suppose that n is odd. Again we can assume that ∆ has the smallest number of cells among all diagrams with the same boundary label. Since b is a free letter, we can consider b-bands in ∆. The length of the boundary of ∆ is 4n + 4. Consider the b-band that starts at a b-edge on one of the 4 sides labeled by b n . It must end on one of the other two sides labeled by b n where the direction of b-edges is opposite (we trace the boundary counterclockwise). Since b-bands do not intersect, the b-band B staring at the middle b-edge (here we use the fact that n is odd) must end at the middle b-edge of the other side of the diagram (see Figure 9). It is not difficult to show that the b-band can only be horizontal (so only one of the two possibilities on Figure 9 can occur). Let u be the label of the shorter side p of the b-band labeled by a power of a. Then the path p cuts off a subdiagram ∆ ′ with boundary label By Lemma 2.3, that word must be equal to 1 in BS (1,2). Note that b (n−1)/2 ab −(n−1)/2 = a 2 (n−1)/2 in that group. Hence u = a 2 (n−1)/2 . By Corollary 2.24, then u has length at least 2 (n−1)/2 . But that means that the length of B is at least 2 (n−1)/2 , hence the area of ∆ is at least 2 (n−1)/2 while the perimeter is linear in n. Since ∆ is a minimal diagram, the Dehn function of the BS(1, 2) is at least exponential.

Isoperimetric functions of geodesic metric spaces. Different area functions.
The notion of the area of a loop is well defined in the setting of Riemannian manifolds as well as in finitely presented groups (see for instance [37, Chapter 1, Section 8.1.4]). It also can be defined for geodesic metric spaces. In fact there are several kinds of definitions of area in that case. Below, we shall mention the definition of Bowditch [29], [28, Sections 2.3, 5], Gromov's definition from of coarse area function from [99,5.F] and a definition using metric currents [5,234,235]. In all these definitions (X, dist) is a geodesic metric space. By a loop in X we shall always assume a Lipschitz (hence rectifiable) map from the unit circle S 1 to X. Let Ω be the set of all loops in X. Each definition of an area defines a function A : Ω → R + . Then we define the corresponding isoperimetric function as the function A(l) such that A(l) = max{A(γ) | γ ∈ Ω, |γ| ≤ l}. Sometimes we shall call any function f ≻ A also an isoperimetric function of X.

2.7.
A. The definition of Bowditch. For every curve α : [0, 1] → X, by −α we denote the curve α(1 − t) (that is the same curve traced in the opposite direction). Let α 0 , α 1 , α 2 be three Lipschitz curves [0, 1] → X connecting points A, B ∈ X. Then we can form three loops α i ∪ (−α i+1 ) where i = 0, 1, 2, and +1 is understood modulo 3. Whenever three loops are obtained this way, one says that they form a θ-loop. Now let Ω be the set of all loops in X 6 . Let A be a function from Ω to R + satisfying the following two conditions (B 1 ) [The triangle inequality for θ-loops.] For every three loops The coarse definition of area function by Gromov. A δ-filling of a loop γ is a pair consisting of a triangulation of the planar unit disk D 2 and of an injective map ψ from the set of vertices of the triangulation to X such that the restriction of ψ to the points on ∂D 2 = S 1 coincides with γ. The image of the map ψ is called a filling disk of γ. We can join the images of the vertices of each triangle of the triangulation of D 2 by geodesics. If two of these vertices are on ∂D 2 , we replace the geodesic by the arc of γ(S 1 ) connecting these 6 In fact it is enough to assume that Ω is a set of loops closed under the operation of cutting a loop into two loops by a geodesic segment connecting two points on the loop. Thus the area function can be defined even for some non-simply connected spaces.
points. Thus we obtain a finite number of triangles (some sides are geodesics, some sides are arcs in γ(S 1 ). These triangles are called bricks. The length of a brick is the sum of the lengths of its sides. The maximum of the lengths of the bricks in a partition is called the mesh of the partition. The partition is called δ-filling partition of γ if its mesh is at most δ. The corresponding filling disk is called δ-filling disk of γ. The δ-area of γ is the minimal number of triangles in a triangulation associated to a δ-filling partition of γ. We denote it by A δ (γ). If no δ-filling partition of the loop γ exists, we put A δ (γ) = ∞. Given the definition of a δ-area of a loop, we can naturally define the δ-isoperimetric function A δ (l) as the supremum of δ-areas of all loops of length at most l. Note that A δ (l) may be equal ∞ even if the A δ (γ) < ∞ for every γ. The corresponding isoperimetric function will be denoted by A δ (l). We shall always assume that our space X is µ-simply connected for some µ > 0, that is A µ (l) < ∞ for every l. In this case all functions A δ , δ > µ are equivalent (see [72]). Note that if two metric spaces X and Y are quasi-isometric, and µ-simply connected, then their A δ -functions are equivalent for δ > µ.
Note that the coarse area function A δ satisfies the Bowditch conditions (B 1 ), (B 2 ).

2.7.C.
The metric currents definition. This definition was used to obtain remarkable results in [234,235]. Since it is more complicated than the other two definitions, we present only some ideas it is based on, referring the reader to [5,43]. First let M be an n-dimensional Riemannian manifold. Let D 2 be a 2-dimensional disc in R 2 equipped with the Euclidean coordinate system, and f be a differentiable map from D 2 to M . Then for every x ∈ D 2 we have two derivatives f x = (a 1 , . . . , a n ) and f y = (b 1 , . . . , b n ), vectors in the tangent space T f (x) (M ) corresponding to the two basic directions in D 2 . Then the Jacobian of f at x is the number [43,Section 15]) Integrating J(x) over D gives the area of the map f . If f is not differentiable but only Lipschitz, then by the classical Rademacher's theorem (see [43]), f is differentiable almost everywhere with respect to the Lebesques measure, so J(x) is defined for almost all x, and we still can integrate J(x) over D. So areas are defined for Lipschitz maps too. This allows one to define a Lipschitz area function L(n) in every Riemannian manifold M . Now if M is not a Riemannian manifold but just a metric space, and f is a Lipschitz map from D 2 to M then in order to define an area of f , we need to use, for example, the Hausdorff measure (one can use other measures as well, such as the inner Hausdorff measure of Busemann [43] or mass* measure of Gromov [97,Section 4.1]). Recall, that if E is a subset of M , then the 2-dimensional Hausdorff measure of E is, up to a scalar multiple, the limit as ε → 0 of the infimum of sums H 2 (E) = diam(E i ) 2 for all covers of E by countably many balls E i of diameters ≤ ε. Then the area of the map f is the H 2 -measure of f (D 2 ). Note that for Riemannian manifolds M this definition coincides (up to a constant factor) with the previous definition.
Unfortunately this definition behaves badly if we try to estimate area functions of limits of metric spaces, say, asymptotic cones: the area of a limit is not the limit of areas. In order to overcome that we need to further generalize the Lipschitz maps.
A singular Lipschitz chain is a finite formal combination C = i m i f i where f i is a Lipschitz map from some region in R 2 to the metric space X. The area of a chain C is m i Area(f i ) (see [97]). The boundary of one Lipschitz map f : D 2 → X is the restriction of f to ∂D 2 . The boundary of a linear combination is the linear combination of boundaries which can be viewed as an element of the first homology group of X. Thus we can define a (homological) area of a loop γ as the smallest area of a singular Lipschitz chain with boundary γ. 7 One can introduce a metric on the space of all singular Lipschitz chains from R 2 to X. It can be, say, the flat metric (the definition goes back to Witney [236]), similar to the well-known Gromov-Hausdorff metric [103]. Then one can complete the space of singular Lipschitz chains with respect to this metric and obtain the space of 2-dimensional integral metric currents. The area of an integral metric current is the limit of areas of the corresponding singular Lipschitz chains. Thus one can view 2-dimensional integral metric currents as infinite linear combinations of Lipschitz maps from D 2 into X.

2.8.
Cayley complexes and metric spaces. Let P = X | R be a presentation of a group G with X finite, X = X −1 . Consider the Cayley graph Γ = Γ X (G) (i.e. G as the set of vertices and {(g, gx) | x ∈ X, g ∈ G} as the set of edges. If we label every edge of Γ by the corresponding element x ∈ X, we get a labeled Cayley graph. The graph is directed but each edge (g, gx) has the inverse (gx, x). Thus when we travel from g to gx, we read x, when we travel from gx to g, we read x −1 .
Every group word w in X labels unique path in the labeled Cayley graph starting from any vertex g ∈ G. It ends at the vertex gw. In particular the equality w = 1 is true in G if and only if that path labels a loop starting from any vertex of G. For every r ∈ R, let us consider all loops γ in Γ labeled by r and let us glue in a disc D γ to Γ with ∂D γ = γ. This way we obtain an (edge-labeled) CW-complex CΓ(P), the Cayley complex of G corresponding to the presentation P. Note that every simplicial loop in that complex (i.e. loop consisting of edges of Γ) is labeled by a group word W in X which is equal to 1 in G. By the van Kampen lemma, there exists a van Kampen diagram over P with boundary label W . That van Kampen diagram is a tesselated disc D 2 , and the labels of its edges give us a map from D 2 to the labeled graph CΓ(P) such that the image of ∂D 2 is γ. This immediately implies that, • if P is finite then A µ (l) for CΓ(P) is always finite for, say, µ = max{|r| | r ∈ R}, i.e. CΓ(P) µ-simply connected; • the Dehn function of G is equivalent to A δ for every sufficiently large δ and satisfies Bowditch's conditions (B 1 ), (B 2 ) with 1 k equal to the square of the maximal length of a defining relator. Notice the following important topological properties of the Cayley complex.
The group presentation is finite if and only if the corresponding Cayley complex is locally compact.
2.9. The Dehn functions of groups acting on metric spaces. Geometric group theory studies groups acting by isometries on metric spaces. We say that the action of G on X is geometric if it is co-compact, and properly discontinuous. Here co-compact means that there exists a compact subset Y ⊂ X such that X = G · Y ; properly discontinuous means that for every point x ∈ X there exists a neighborhood U ∋ x such that y · U ∩ U = ∅ for every y ∈ G \ {1}. For example the fundamental group of a compact Riemannian manifold M acts geometrically on the universal cover of M . It is easy to observe (see Gromov [99]) that if a finitely presented group acts geometrically on a simply connected metric space with the coarse space function A δ (see 2.7.B), then the Dehn function of G is equivalent to A δ . 7 Note that a homological version of the Dehn function of groups can be defined in a similar way [86].
The homological Dehn function is easier to compute, and it bounds the ordinary Dehn function from below. It has been studied, for example, in [15].
For Riemannian manifolds, the fact that the Riemannian area function (see 2.7.C) is equivalent to A δ is not obvious, although it is a part of "folklore". It was proved in all details by Bridson in [35].
Theorem 2.25 (Bridson [35]). If a finitely presented group acts geometrically on a simply connected Riemannian manifold, then the Dehn function of the group and the corresponding filling function of the manifold are equivalent.
Since Z 2 acts geometrically on the Euclidean plane R 2 by translations, and, as we learned in Real Analysis (and ancient Greeks knew without any Real Analysis), the area function of R 2 is l 2 4π , we finally deduce that the Dehn function of Z 2 is quadratic. 2.10. Asymptotic cones of metric spaces and groups.

2.10.
A. The definition. Let X be a metric space with distance function dist, say, a Cayley graph of a finitely generated group. Consider a non-decreasing sequence of numbers d = (d i ) with lim d i = ∞, called scaling constants and the sequence of metric spaces X/d i where W/n denotes the space W with distance function dist n (dist is the distance function in W ). We want to define a limit of this sequence of spaces. One of the reasons is that we want to "forget" about local properties of X in favor of global properties. In particular, we want to study properties of configurations of points in X which are satisfied by configurations with arbitrary large pairwise distances between points, and do not want to be disturbed by properties which only hold for configurations of small diameter. The limit space is supposed to have the global properties of X while the local properties may disappear. One way to define a limit was proposed by Gromov in [96]. The limit is called the Gromov-Hausdorff limit but that definition can only be applied when the spaces X/d i are uniformly locally compact which is true in the case of Cayley graphs of groups of polynomial growth but not true in general. A much more universal definition (which coincides with Gromov's when his definition applies) was given by van der Dries and Wilkie [229]. The limit is called an asymptotic cone of X and is defined as follows. In fact we shall define a limit of any sequence of spaces (X i , dist i ). As usual for definitions of boundaries and completions of different sorts, the limit will consist of sequences (x i ), x i ∈ X i . Thus consider the direct product Z = ΠX i . The most obvious way to define a distance function on Z is coordinate-wise: The problem is that the limit may not exist (we ignore the issue that the limit can be ∞ for now). To circumvent this problem, let us pick a non-principle ultrafilter on N, that is a set ω of substsets of N which is closed under intersections, super-sets, does not contain finite subsets, and with every set U ⊂ N either U or N \ U is in ω. The sets from ω are called big, the other sets are small. We can also view ω as a finitely additive measure on the set of all subsets of N which has only two values 0, 1, and ω(N) = 1 while ω(∅) = 0. If A i is a sequence of statements and A i holds for all i from a big subset, we say that A i holds ω-almost surely.
To prove the existence of ultrafilters one needs the Axiom of Choice (or, more precisely, a slightly weaker hypothesis, the so called Boolean prime ideal theorem [57]), so although we believe that ultrafilters exist, one cannot actually define an explicit ultrafilter using only axioms of Zermelo and Frenkel. Anyway, as soon as we fix an ultrafilter ω, we can modify the definition of the limit of a sequence of numbers, introducing the definition of the ω-limit. It is almost exactly the same as the standard calculus definition: if b i is a sequence of real numbers, then So the only difference with the standard definition of a limit is that we replaced the words "for almost all indices i" by "the set of indices i is big". The fact that ω is an ultrafilter immediately implies -exercise -that any sequence of numbers has unique ω-limit (it may be equal to ±∞). For example the limit of the sequence 1, 0, 1, 0, 1, . . . is 0 if the set of even numbers is big, and 1 otherwise. Now we have a "distance" function on the set ΠX i : dist((x i ), (y i )) = lim ω dist i (x i , y i ). Clearly the triangle inequality is satisfied. Since this function can take infinite values, we consider a "connected component". Namely, pick a point o = (o i ) in ΠX i , and let Π b X i be the set of all points at finite distance from o. This restriction of the "distance function" to Π b X i is a quasi-distance, because different points can be at distance 0. For example, if two points (x i ), (y i ) are such that x i = y i for all i from a big set, then the distance between these points is 0. The relation (x i ) ∼ (y i ) iff dist((x i ), (y i )) = 0 is an equivalence relation and the quotient Π b X i / ∼ is a metric space with the induced metric. That metric space is called the ω-limit of X i corresponding to the observation point (o i ). Asymptotic cones of a group endowed with some metric (say, a finitely generated group with the word metric 8 or a Lie group with Riemanian metric) are asymptotic cones of the corresponding metric space.
2.10.B. Some properties of asymptotic cones. Here are some basic properties of asymptotic cones that can be found in [229], [99], [72] and other papers. Some of these properties are quite easy to prove, for other properties we provide references.
(1) If the metric space X is homogeneous (say, X is the Cayley graph of a group), then asymptotic cones of X do not depend on the choice of the observation point (so we shall always assume, if X is a Cayley graph, that the observation point is (e) and we shall omit it from the notation for an asymptotic cone of a group). (2) If two metric spaces X and Y are quasi-isometric, then their asymptotic cones corresponding to the same ultrafilters, same scaling constants and appropriately chosen observation points are bi-Lipschitz equivalent, hence homeomorphic. In particular, the asymptotic cones (with the same parameters) corresponding to two Cayley graphs of the same finitely generated group but different finite generating sets are bi-Lipschitz equivalent.
acting transitively on any asymptotic cone Con ω (G, (d i )) by left multiplication. (4) Any asymptotic cone is a complete metric space.
, or an A-Lipschitz path, then p is an A-bi-Lipschitz path or A-Lipschitz path respectively (i.e. an A-bi-Lipschitz or an A-Lipschitz map from an interval to X). (6) In particular, the ω-limit of every sequence of geodesic paths in X is a geodesic path in C = Con ω (X; (d i ), (o i )) (it can be infinite or a point or empty). It is not true that every geodesic path in C is an ω-limit of geodesic paths in X but if g is a geodesic in C, then for every ε > 0 there exists a piecewise geodesic path g ′ with k = k(ε) geodesic pieces that is an ω-limit of k-piecewise geodesic paths in X, and g ′ (resp g) is in ε-neighborhood of g (resp. g ′ ) [74], i.e. the Hausdorff distance between g, g ′ is at most ε. (7) Suppose that C = Con ω (X; (d i ), (o i )) contains a simple geodesic triangle T . Then for every ε, there exists a k = k(ε) and a sequence of geodesic k-gons Π i in X such that Π = lim ω Π i is at Hausdorff distance at most ε from T , contains the midpoints of the three sides of T , and each Π i is thick, that is the middle third of each side of Π i is far from the union of the other sides, and every vertex of Π i is far from the vertices not adjacent to it (a precise definition of thick geodesic polygons can be found in [74]). (8) An asymptotic cone of a finitely generated group may depend on the ultrafilter, and the scaling constants (d i ) [228,140,74]. If the Continuum Hypothesis is false, then there exists a group (an uniform lattice in SL n (R)) with the set of non-homeomorphic asymptotic cones 2 2 ℵ 0 . On the other hand if the Continuum Hypothesis is true then the set of all asymptotic cones of all countable metric spaces is of cardinality continuum [140], and there exists one finitely generated (and recursively presented) group with continuum pairwise non-π 1 -equivalent (hence non-homeomorphic) asymptotic cones [74]. There are also finitely presented groups with 2 non-π 1 -equivalent asymptotic cones [193].
2.11. Asymptotic cones and Dehn functions. We have mentioned that asymptotic cones of finitely generated groups capture asymptotic properties of groups, i.e. properties that manifest themselves on configurations of elements of arbitrary large diameters. For example, working with hyperbolic groups (that is groups all whose asymptotic cones are R-trees (see 3.1.A) it is convenient to remember that configurations of elements with large diameter in a hyperbolic group behave like points on a tree with similar pairwise distances.
In particular, asymptotic cones of a group reflect the properties of the Dehn function.
Theorem 2.27 (Gromov [99]). Suppose that all asymptotic cones of a group G are simply connected. Then G is finitely presented, has polynomial isoperimetric function and linear isodiametric function.
Proof. The proof of this remarkable statement is very easy. Indeed, if, say, the Dehn function f of G grows faster than any polynomial, then for every k ≥ 1 f satisfies inequality f (n) ≥ kf (n/2) for infinitely many n. For each k let γ k be a loop in the Cayley complex of G such that Area(γ k ) ≥ kf (|γ k |/2). Let d k be the length of γ k . For some ultrafilter ω, consider the asymptotic cone C = Con ω (G, (d k )), and the ω-limit γ of γ k in C. Then γ has length 1. Since C is simply connected by assumption, there exists a continuous map φ : Since φ is continuous, and D 2 is compact, φ is uniformly continuous. Hence there exists an ε > 0 and a decomposition of D 2 into, say, n triangles of ∆ 1 , . . . , ∆ n such that the perimeter of each φ(∆ i ) is at most 1/3. By Property 6 of asymptotic cones from 2.10.B, it follows, that we can assume that the sides of all curvy triangles φ(∆ i ) except for the sides contained in g(∂D 2 ), are ω-limits of geodesics from the Cayley graph of G. Therefore each φ(∆ i ) is an ω-limit of some sequence of loops δ i m , m = 1, . . ., in the Cayley graph. But that means a disc bounded by the loop γ m can be decomposed into n loops δ i m of length ≤ d m /3 + o(d m ) ω-almost surely for all m ≥ 1. Hence Area(γ m ) ≤ nf (d m /2) ω-almost surely for all m, a contradiction.
Riley [203] proved that under the assumptions of Theorem 2.27, the group also has linear FL function (and hence linear FFFL function as well).
The converse statement of Theorem 2.27 is not true. In [192], we showed that the group (a multiple HNN-extension of a free group) has cubic Dehn function, linear isodiametric function and non-simply connected asymptotic cones. The "cubic" in that statement can be improved to as low as n 2 log n by modifying the group using the method from [191]. It is impossible, though, to have an example with quadratic Dehn function.
Theorem 2.28 (Papasoglu [196]). All asymptotic cones of a finitely presented group having quadratic Dehn function are simply connected.
As Theorem 2.27, this is a very important fact, but the idea of the proof is not that difficult. We take a van Kampen diagram ∆ over the presentation of our group G, and start removing layers of cells one by one from the outside in, peeling the diagram as an apple, so that each layer is 1-cell wide. The fact that the area is quadratic lets us estimate the number of layers and their lengths. This allows us to decompose the diagram into a constant, say, c, number of subdiagrams whose perimeters are at most half of the perimeter of ∆. That, in turn, implies that every loop γ in any asymptotic cone of G can be decomposed into O(1) loops of length |γ/2|. This implies simple connectivity of the asymptotic cone.
An analysis of Gromov's proof of Theorem 2.27 allowed Papasoglu to obtain the following useful result.
Theorem 2.29 (Papasoglu [73]). Suppose that all asymptotic cones of G are simply connected and have Gromov coarse area function A δ (l) bounded by l c . Then the Dehn function of G is bounded by n c+ε for every ε > 0.
That result was used by Druţu in [73] (see also 3.2.A and 4.1.A below). Note that Theorem 2.28 does not give metric properties of asymptotic cones, only the topological property of being simply connected. In a simply connected asymptotic cone even of a nice nilpotent group, a Lipschitz loop does not necessarily bound a Lipschitz 2-disc or even an integral 2-current (it can be deduced from [6] for the Heisenberg group H 3 , for other examples see Wenger [235], and 3.2.C). But for groups with quadratic Dehn functions, the asymptotic cones are much nicer.
Theorem 2.30 (Wenger [235]). Let G be a finitely presented group with quadratic Dehn function. Then every asymptotic cone of G admits a quadratic isoperimetric inequality in terms of 2-currents, that is for some constant C, every Lipschitz loop γ bounds an integral 2-current of area ≤ C|γ| 2 .

Isoperimetric functions of important groups
3.1. Hyperbolic groups and asymptotic cones.

3.1.A. The definition and a characterization.
There are several equivalent definitions of δhyperbolic (in other terminology, word hyperbolic) metric spaces and groups [98]. Rips' definition is the following.
Definition 3.1. A geodesic metric space X is δ-hyperbolic for some δ ≥ 0 if every geodesic triangle 9 in X is δ-thin, that is every side of it is in the closed δ-neighborhood of the union of the other two sides. A finitely generated group G is δ-hyperbolic if its Cayley graph is δ-hyperbolic. A geodesic metric space (a finitely generated group) is hyperbolic if it is δ-hyperbolic for some δ ≥ 0.
This property is obviously invariant under quasi-isometry, so, in particular, hyperbolicity of a finitely generated group does not depend on the choice of a (finite) generating set.
Note that if X is δ-hyperbolic, then for every real d > 0 we have that X/d is δ/dhyperbolic. This and Properties 6, 7 from 2.10.B immediately imply that every asymptotic cone of a hyperbolic geodesic metric space is 0-hyperbolic. Hence in every geodesic triangle in the asymptotic cone every side is contained in the union of the other two sides. This means that every asymptotic cone of a hyperbolic geodesic metric space is an R-tree. Thus all asymptotic cones of a hyperbolic group are complete R-trees.
The converse statement is true also. Thus we have [98,99]). A finitely generated group G is hyperbolic if and only if every asymptotic cone of G is an R-tree.
It turns out that the R-trees that can appear as asymptotic cones of groups are of three kinds only because of the following recent theorem of Sisto.   The classical (motivating) examples of hyperbolic groups are fundamental groups of compact negatively curved Riemannian manifolds. In particular co-compact lattices in Lie groups SO n,1 (R) are hyperbolic because these are fundamental groups of compact Riemannian manifolds of constant negative curvature. Hence if the genus of a closed oriented surface S g is at least 2, then the fundamental group of S g is hyperbolic. It has the presentation found by Dehn. It is easy to see that no two cyclic shifts of the relator or its inverse have a common prefix of length more than 1. Thus this presentation satisfies the small cancelation condition C ′ ( 1 4g−1 ).

3.1.C. The small cancelation conditions.
Definition 3.6. Let λ > 0, and let P = X | R be a group presentation. We say that P satisfies the condition C ′ (λ) if the length of any common prefix of any two different cyclic shifts of words r 1 , r 2 ∈ R ±1 (called a piece) is strictly less than λ min(|r 1 |, |r 2 |) (recall that all words in R are cyclically reduced by our assumption (see 2.1.A), so cyclic shifts of these words and their inverses are reduced words).
The main property of C ′ (λ)-presentations is the following Greendlinger Lemma [147], proved by Dehn in the case of π 1 (S g ), g ≥ 2.
Theorem 3.7 (See Lyndon, Schupp [147]). Let P = X | R be a group presentation satisfying C ′ (λ), λ ≤ 1/6, then every minimal van Kampen diagram ∆ over P either has no cells or has a cell π with ∂(π) = uv with |u| > |v|, and u is a subpath of ∂∆ (this cell is called a Greendlinger cell).
Theorem 3.7 implies that in every group where the (finite) presentation P satisfies C ′ ( 1 6 ) the word problem can be solved by the following Dehn algorithm. Take a reduced word w. If a cyclic shift w ′ of w contains more than a half of a cyclic shift of a defining relation r or its inverse (r ′ = uv, w ′ = uw ′′ , |u| > |v|), then replace uw ′′ by v −1 w ′′ and reduce the resulting word. The new words is equal to 1 modulo P if and only if w = 1 modulo P. Since v −1 w ′′ is shorter than w, we can continue until we either get the empty word (in which case we conclude that w = 1 modulo P) or we get a non-empty word for which the shortening is no longer possible (in which case we conclude that w = 1 modulo P).
Note that every group possessing a finite presentation for which Dehn's algorithm works (we shall call this a Dehn presentation) has linear Dehn function because removing Greendlinger cells from a minimal van Kampen diagram reduces the length of the boundary (hence the area of the diagram cannot exceed the perimeter).
3.1.D. The linear isoperimeric inequality. Gromov proved that every hyperbolic group has linear Dehn function and in fact possesses a finite Dehn presentation. Here is a relatively short proof of this fact using asymptotic cones. Proof. We present here a proof that illustrates the usefullness of asymptotic cones. Let G be a hyperbolic group. We need the following statement first. A path γ : [0, l] → Γ in the Cayley graph Γ of a group G is called a bump if the the length l of the path is at least 3 times bigger than the distance between its end points. Lemma 3.9. There exists a constant C such that every bump of length l > C contains a sub-bump of length at most 1 3 l. Proof. Suppose that such a constant does not exist. Then for every n = 1, 2, . . . there exists a bump γ n of length d n > n which does not contain sub-bumps of length ≤ d n /3. Pick an ultrafilter ω and consider the asymptotic cone C = Con ω (G, (d i )). The space C contains the ω-limit γ of paths γ i . Then by Property (5) of asymptotic cones from 2.10.B γ is a path of length 1 and the distance between its endpoints is at most 1/3. Recall that C is an R-tree (Theorem 3.2). A path of length 1 on a tree with enpoints at distance ≤ 1/3 must have a subpath of arbitrary small (but non-zero!) length which is a loop. Let α be that loop of length at most 1 4 , a be the beginning (= the end) point of that loop. Then a is a double point on γ: a = γ(p) = γ(p + ε) where ε ≤ 1 4 . That means the distance between γ i (p) and γ i (p + ε) is o(d i ) while the length of the subpath γ i between p and p + ε is O(d i ) and at most 3 d i ω-almost surely. Therefore for infinitely many values of i the subpath γ i (p, p + ε) is a bump of length at most 1 3 d i , a contradiction. Now Theorem 3.8 can be deduced as follows. Let G be a hyperbolic group. Let C be the constant from Lemma 3.9. Let R be the (finite) set of all words in the generators X of G which are of length ≤ 2C and equal to 1 in G. Lemma 3.9 immediately implies that every loop in the Cayley graph of G has a subpath that is a bump of length at most C (note that a loop is a bump too). The bump together with a geodesic connecting its endpoints is a loop of length at most C + 1 3 C < 2C. Therefore every group word in the alphabet X that is equal to 1 in G contains more than a half (in fact, at least 3 4 ) of a relator from R. This implies that X | R is a finite presentation of G satisfying the Dehn property.
The converse statement for Theorem 3.8 is also true and well-known: 3.1.E. Equations in hyperbolic groups. Several other algorithmic problems have nice solutions in hyperbolic groups too. For example, Theorem 2.11 about solvability of quadratic equations is true if we replace there "free" by "hyperbolic" [174]. All solutions of quadratic equations without coefficients in a hyperbolic group can be described by the following Theorem 3.13 (Lysënok [148]). Let S be a closed surface. For every hyperbolic group G there exist only finitely many homomorphisms of π 1 (S) to G up to the action by the mapping class group of π 1 (S) and the action of Aut(G) on G.
The proof of this theorem is relatively elementary comparing with the similar looking Theorem 2.13.
An analog of Makanin's Theorem 2.15 for free groups is also true for torsion-free hyperbolic groups in general.
Theorem 3.14 (Rips, Sela [205]). There exists an algorithm to check whether a system of equations over a given torsion-free hyperbolic group has a solution.
3.1.F. The language theory point of view. A language is a set of words in a finite alphabet X. An automaton is a finite labeled directed graph (labels are taken from a finite alphabet X) with two distinguished subsets of vertices S (start) and F (finish) [50]. For example, any finite subgraph of a labeled Cayley graph becomes an automaton if we choose the start and finish vertices. A language L is recognized by an automaton M if it consists of all words that can be read on M starting from a start vertex and ending at a finish vertex. Such languages are called regular.
It is easy to see that the language of all group words in generators of a finitely generated group that are equal to 1 in that group (let us denote it by W (G)) is regular if and only if the group is finite. For a bigger class of context-free languages, i.e. languages recognized by more advanced pushdown automata (for the definition see [125]), the situation is already different as shown by the following non-trivial theorem. Recall that a group G is called inaccessible if it contains an infinite sequence of subgroups G > G 1 > G 2 > . . . and a sequence of actions by isometries on simplicial trees 11 T i such that G i is the stabilizer of a vertex in T i and all stabilizers of edges are finite. An inaccessible finitely generated group was constructed by Dunwoody [75]. A group is accessible if it is not inaccessible. All finitely presented groups are accessible [76].
By Holt [123], the word problem in every hyperbolic group is real time recognizable, which, informally, means that it can be recognized by the Turing machine that reads its input at a constant rate, and stops at the end of the input (regular and context-free languages are real time recognizable). This implies, in particular, Theorem 3.16 (See, for example, Holt [123]). The word problem in every hyperbolic group can be solved in linear time by a (multi-tape) Turing machine.
Note that the fact that the Dehn function is linear directly implies only that the word problem can be recognized in linear time by a non-deterministic Turing machine. The proof of Theorem 3.16 uses the fact that every hyperbolic group has a Dehn presentation (Theorem 3.8). In [124], Holt and Rees used the fact that nilpotent and some relatively hyperbolic groups have so-called generalized Dehn presentations introduced earlier by Cannon, Goodman and Shapiro. They prove that groups from these classes also have real time recognizable word problem.
For more information about groups with word problem from some more complicated language classes see Gilman [91].
3.1.G. The isomorphism problem for hyperbolic groups. The isomorphism problem for hyperbolic groups 12 is decidable. This is a very strong theorem proved in the torsion-free case by Sela (see [217] for a weaker result, and [63] by Dahmani and D. Groves for the full proof) and in the general case by Dahmani and Guirardel [65].
Here we present a short description of the solution of isomorphism problem which is a somewhat modified text sent to us by François Dahmani.
Let G = X | r 1 , . . . , r m and G ′ = X ′ | r ′ 1 , . . . , r ′ n be two hyperbolic groups. Suppose first that every action of each of the groups G, G ′ on a simplicial tree is trivial, i.e. fixes a point. Every homomorphism φ : G → G ′ induces an action of G on the Cayley graph of G ′ : a • φ g = φ(a)g. Let d φ be the maximal number such that every vertex of the Cayley graph of G ′ is moved by at least d φ by some generator of G under this action. It is easy to check that if a sequence of homomorphisms φ 1 , φ 2 , . . . consists of pairwise non-conjugate in G ′ homomorphisms, then the sequence of numbers d φ i is unbounded, hence G acts nontrivially on the asymptotic cone of G ′ corresponding to the scaling sequence (d φ i ) which is an R-tree by Theorem 3.2. In fact one can deduce (as in Sela [217, Theorem 9.1] using Rips' theory of groups acting on R-trees and its generalizations) that in this case G acts non-trivially on a simplicial tree as well. Hence we can assume that any set of pairwise non-conjugate homomorphisms G → G ′ and G ′ → G is finite.
Therefore in order to decide whether G and G ′ are isomorphic we need to compute a finite list of homomorphisms containing a representative of each of the homomorphisms G → G ′ (up to conjugacy), and a similar finite list of homomorphisms G ′ → G. Assuming that this can be done, one checks whether a composition of a homomorphism from the first (second) list with a homomorphism from the second (first) list is conjugate to the identity map, that is whether the images of the generators of G (of G ′ ) under this composition are conjugate to the generators of G (of G ′ ) by the same conjugator. It is a system of quadratic equations (with one variable -the conjugator) and can be solved essentially as one quadratic equation (see 3.1.E), or using the general technique by Rips and Sela [205].
There is an obvious infinite procedure to write down these lists: check all maps X → G ′ and X ′ → G one by one, and verify which maps induce homomorphisms. The problem is to understand when to stop: that is when a list contains representatives of all conjugacy classes of homomorphisms. For this, the key idea is to solve systems of equations. Indeed, morphisms G → G ′ are in 1:1 correspondence with solutions in G ′ of the system of equations The difficulty here is that we want not only that the solution be different from the recorded morphisms, but we want that it is not conjugated to any of them. For that, we record only homomorphisms G → G ′ that are short meaning that one cannot shorten the total length of the images of x ∈ X by "obvious" conjugations in G ′ (such as cyclic shifts). The key points are that, first, shortness can be interpreted as the membership in a regular language (we view a homomorphism φ : G → G ′ as one large word containing all words φ(x), x ∈ X separated by special symbols); that membership is called a regular constraint, and, second, that the number of short homomorphisms is finite. The fact that some kind of shortness of a homomorphism can be recognized by an automaton is similar to Lemma 3.9 above which characterizes words in a hyperbolic group without arbitrary bumps as words which do not contain bumps of at most constant length. It is easy to see that for every finite set of words W , the set of all words in a finite alphabet that do not contain subwords from W is regular. Thus the certificate for completeness of our list of homomorphisms is unsolvability of the system of equations and inequations subject to the regular constraint described above. Thus one needs to generalize Makanin's Theorem 2.15 and Rips and Sela's Theorem 3.14 to systems of equations and inequations subject to regular constraints in not necessarily torsion-free groups. This has been done in a long series of papers including [152,215,70,62,64,143]. If the groups G, G ′ may act on simplicial trees without fixed points, the strategy is to find a canonical action on a tree where the vertex stabilizers are in a sense the smallest possible for both G and G ′ (the so-called JSJ-decomposition introduced by Rips and Sela [204]), and then study vertex stabilizers as in the previous paragraph.

3.1.H.
More examples of hyperbolic groups. As we have seen, every group having a presentation with the small cancelation condition C ′ (1/6) is hyperbolic (see 3.1.D). Moreover almost every finitely presented group is hyperbolic [98]. That statement was proved in [98,11,176,102], but although the statements proved there have basically the same formulations, the results are different because of different meanings of the words "almost every", or, more precisely, different choices of the probabilistic model.

3.1.H1. Few relator model.
In the few relator model (also known as the asymptotic density model or Arzhantseva-Olshanskii model introduced by Gromov in [98] and Olshanskii in late 80s, we choose an alphabet of generators A and a number of relations k, and then for every n > 1 choose k group words r 1 , . . . , r k of length ≤ n in the alphabet A, and consider the group given by the presentation A | r 1 = 1, . . . , r k = 1 . Let N (n) be the number of all presentations we get this way, and N h (n) be the number of presentations of hyperbolic groups among them. Then the quotient N h (n) N (n) tends to 1 very rapidly with n (see Arzhantseva-Olshanskii [11]). In fact, as proved in [11], in the few relator model, we can count even the number of presentations with small cancelation C ′ (λ) for any fixed λ > 0, and the quotient will tend to 1 almost as rapidly. Note that in [11], Arzhantseva and Olshanskii proved that almost all groups having presentations with m generators and k relators have the property that all their (m − 1)-generator subgroups are free (that strengthened a previous result of Guba about 2-generated subgroups of 1-related groups with at least 4 generators [108]). The same or similar models were used by Arzhantseva and Champetier [7,8,9,47,48] where other generic properties of groups in the few relator model were found.

3.1.H2. Different lengths model.
If we allow lengths of different relators satisfy different length restrictions, we obtain a stronger theorem.
Theorem 3.17 (Formulated by Gromov [98], proved by Olshanskii [176]). Let A be an alphabet, |A| = k ≥ 2. Let d ≥ 0 and let (n 1 , · · · , n d ) be an increasing sequence of positive integers. Let N = N (k, i, n 1 , . . . , n d ) be the number of group presentations G = A|r 1 , · · · , r d such that r 1 , · · · , r d are reduced group group words in the alphabet A such that the length of r j is n j for j = 1, 2, . . . , i. If N h is the number of infinite hyperbolic groups in this collection, then for every ε > 0 there exists n = n(ε, d) such that for every choice of k, n 1 ≥ n, n 1 ≤ . . . ≤ n d the quotient N h (k,n 1 ,...,n d ) N (k,n 1 ,...,n d ) exceeds 1 − ε.
The probabilistic model used in this theorem is very different from the few relator model. In particular, the probability of having a small cancelation presentation tends to 0 in that model (say, if we assume that n d ≈ exp n 1 ).

3.1.H3. Gromov's random groups model.
If we also allow varying the number of relations d, we obtain one of the probabilistic models used by Gromov in constructing his random groups [102] (see also Gromov [99] and Arzhantseva-Delzant [10]).
Here is the definition of the Gromov's random groups model. Let Γ = (V, E) be an oriented graph and X be a finite set. A labelling of Γ is a map α : E → X ∪ X −1 . Let W be the normal subgroup of the free group π 1 (Γ, p) (where p ∈ V ) generated by all words that label a loop in Γ based at p. Let G α be the factor-group π 1 (Γ, p)/W . This map obviously defines a homomorphism α * from the free group π 1 (Γ, p) to G α , where p is a vertex in V . The kernel of that map is generated (as a normal subgroup) by all words that label loops of Γ. Note that by construction there exists a natural homomorphism α + from the graph Γ to the Cayley graph of G induced by α. The general questions to ask about G is whether G is infinite and hyperbolic, whether α + is "almost an isometric embedding", and whether α * is injective on a large enough ball in the Cayley graph of the free group π 1 (Γ, p). The set of all labellings (X ∪ X −1 ) E has a natural uniform probability measure, and a random choice of a labelling α gives a random group G α .
In particular, if Γ is a disjoint union of finite number of circles of lengths n 1 , . . . , n d , we get a random model similar to the different lengths model only the number of relations may be big comparing to the lengths of them. In fact, if the total number of possible labelings is exp Cn, where n is the maximal length of a cycle, for some constant C, and n i do not differ very much from n, then d can be chosen as large as exp αCn for some 0 < α < 1. The number α is called the density. If the density is > 1/2, then the random group is trivial with probability close to 1 (assuming that n → ∞), but if α < 1 2 , then the group is infinite and non-virtually cyclic hyperbolic with probability tending to 1 as n → ∞.
It is important to note that, as with the case of small cancelation presentations, we can impose the random relations not only on the free group, but on any non-virtually cyclic hyperbolic group. Thus if Γ i is a sequence of graphs, we obtain a sequence of random non-virtually cyclic hyperbolic groups and surjective homomorphisms (5) G → G 1 → . . . → G n → . . . similarly to 3.1.J. Particularly strong results are obtained in [102] when the graph Γ runs over an expander sequences of graphs, that is k-regular finite graphs having the following property: the second highest eigenvalues of their adjacency matrices is bounded away from the highest eigenvalue 13 . If the expander sequence is chosen well enough, the Cayley graph of the inductive limitḠ of the sequence 5 contains "almost isometric" copy of the union of Γ i , and soḠ cannot be coarsely embedded into a Hilbert space, every action ofḠ on a CAT(0) manifold has a global fixed point, etc. For more details see the book by Ollivier [172] and I. Kapovich and Schupp [133].
3.1.I. Almost all 1-related groups are residually finite. Note that for 1-related groups all three models give the same results. A random 1-relator group with at least 2 generators is infinite and hyperbolic. The following theorem says that these groups have some other nice properties as well at least when the number of generators is ≥ 3.
The proof employs facts from different areas of mathematics: properties of Brownian motions in R k [61]; some dynamical properties of polynomial maps over finite fields (existence of periodic quasi-fixed points) [25] and p-adic completions of number fields [26]; existence of dense free subgroups in p-adic Lie groups [32]; the congruence extension property of subgroups of the free groups generated by sets of words satisfying some small cancelation condition [180]; the fact that every asceding HNN-extension of a free group is coherent [81], and a characterization of 1-related groups with 2 generators that are ascending HNNextensions of free groups [41]. It turned out also that in Theorem 3.18 one can count not just presentations but groups up to isomorphism (which is obviously harder) due to a very strong result of I. Kappovich, Shpilrain, Schupp [134]. See [209] for more details.
Note that Theorem 3.18 gives many examples of residually finite hyperbolic groups because random 1-related groups are hyperbolic (and even have small cancelation presentations). But we still do not know if all hyperbolic (and even small cancelation) groups are residually finite [98,99]. The general belief is that the answer is "no", and there are many potential counterexamples (see, say, [130]), but it is very hard to show that a particular hyperbolic group is not residually finite because hyperbolic groups typically have many quotients.
3.1.J. Quotients of hyperbolic groups. One of the key properties of hyperbolic groups discovered by Gromov in [98] is that they behave similar to the free groups as far as the multitude of quotients is concerned.
For example, Delzant and Olshanskii independently proved the following Theorem 3.19 (Delzant [67], Olshanskii [180]). Every non-virtually cyclic hyperbolic group G is SQ-universal, that is every countable group embeds into a quotient of G.
A version of the following very useful theorem was stated in [98] in a slightly wrong form. It was corrected and proved in full generality in [179]. We give a not the strongest formulation here. 13 Equivalently, an expander sequence of graphs can be defined as a sequence of k-regular graphs Γi = (Vi, Ei) such that for every subset M ⊂ Vi of vertices of Γi with |M | < |Vi|/2, |∂M | |M | > λ for some constant λ > 0 [144]. Here ∂M is the set of edges with one vertex in M and the other in V \ M .
14 A group is called residually (finite p-) group, p a prime, if the intersection of all its subgroups with indexes powers of p is trivial. It is easy to deduce from Theorem 3.20 that every noncyclic torsion-free hyperbolic group has a quotient which is a torsion-free Tarski monster (i.e. it is torsion-free and all proper subgroups are cyclic); every non-virtually cyclic hyperbolic group has an infinite torsion quotient; every noncyclic torsion-free hyperbolic group has an infinite quotient all of whose proper subgroups are finite (see [98]). These groups with "extreme" properties are obtained as inductive limits of hyperbolic groups and surjective homomorphisms It is also easy to deduce that every two non-virtually cyclic hyperbolic groups H 1 , H 2 have a common non-virtually cyclic quotient [98, 5.5.A] (consider the free product G = H 1 /E(H 1 ) * H 2 /E(H 2 ) where E(H) is the maximal finite normal subgroup of H). It was observed by Osin, that it implies, by induction, that there exists a finitely generated nonvirtually cyclic infinite group G ∞ that is a common quotient of all non-virtually cyclic hyperbolic groups. One can easily see that G ∞ has Kazhdan property (T), is generated by an element of order 2 and an element of order 3, and also by two elements of orders p and q for any two primes with pq ≥ 6 (since G ∞ is a quotient of the free product Z/pZ * Z/qZ). In particular G ∞ is not residually finite, and in fact does not have any non-trivial finite homomorphic image G 0 (otherwise G 0 would contain an element of order p > |G 0 | which is impossible). Note that G ∞ is not unique, and depends, in particular, on how we enumerate all hyperbolic groups. Still since the set of all finite hyperbolic group presentations is recursively enumerable, we can assume, using [179], that G ∞ has a recursive presentation.
Theorem 3.20 also allows one to construct a finitely generated infinite torsion group (and even with all proper subgroups cyclic), see [98]. It does not allow to get a bound on the exponents of elements of that group. It was done in papers by Olshanskii [178] (odd exponents) and S.V. Ivanov and Olshanskii [130, Theorem A, Lemma 19] (even exponents) where the following remarkable theorem was proved. That result was the strongest in the very important series of results on the bounded Burnside problem including the celebrated Novikov-Adian's solution for odd exponents [171,1], and extremely involved S.V.Ivanov's solution for even exponents [128].
3.1.K. The small cancelation once more. The main ingredient in the proofs of Theorems 3.19, 3.20 and the results in [102] is small cancelation over hyperbolic groups. The following coarse version of small cancelation from Olshanskii-Osin-Sapir [184] is close to the ones used in papers [67,179,180].

Definition 3.22.
Let H be a group generated by a set X. Let R be a symmetrized set of reduced words in X ±1 (that is a set of words closed under taking cyclic shifts and inverses). 15 Recall that G k denotes the (normal) subgroup of G generated by all kth powers of elements of G; hence G/G k has exponent dividing k.
For ε > 0, a subword U of a word R ∈ R is called a ε-piece if there exists a word R ′ ∈ R such that: ( (2) U ′ = Y U Z in H for some words Y, Z such that max{|Y |, |Z|} ≤ ε; (3) Y RY −1 = R ′ in the group H. Note that if U is an ε-piece, then U ′ is an ε-piece as well.
Definition 3.23. Let ε ≥ 0, µ ∈ (0, 1), and ρ > 0. We say that a symmetrized set R of words over the alphabet X ±1 satisfies the small cancelation condition C(ε, µ, ρ) over a hyperbolic group H = X , if (C 1 ) All words from R label geodesics in the Cayley graph of H corresponding to X; (C 2 ) |R| ≥ ρ for any R ∈ R; (C 3 ) The length of any ε-piece contained in any word R ∈ R is smaller than µ|R|. Figure 10 shows the difference between the usual small cancelation (see Definition 3.6) and the coarse version of it. The key feature of the coarse version of small cancelation is the coarse Greendlinger Lemma (see [184,Lemma 4.6]). The role of that lemma is very similar to the role of the usual Greendlinger Lemma (Theorem 3.7) -see [179,180,184] for more details. Figure 10. The standard and coarse definitions of a piece.
3.1.L. Combination theorems for hyperbolic groups. Another way to construct new hyperbolic groups is to use various combination theorems. Suppose that a finitely generated group G acts by automorphisms on a simplicial tree T and there are no invariant proper subtrees for this action. From the Bass-Serre theory [218], the group G is a multiple HNNextension of the stabilizer of a vertex with associated subgroups isomorphic to stabilizers of edges if and only if the action is transitive on vertices; and G is an amalgamated product of stabilizers of two vertices with the stabilizer of an edge as an amalgamated subgroup if and only if the action has two orbits of vertices and two orbits of edges. In general, G can be composed from the stabilizers of vertices using HNN-extensions and amalgamated products. A statement saying that some property P holds for G provided it holds for stabilizers of vertices of the action of G on T (plus some condition on the stabilizers of edges) is called a combination theorem. For example, the statement "if all stabilizers of vertices are torsion-free, then G is torsion-free" is such a theorem.
A very general combination theorem for hyperbolic groups is proved by Bestvina and Feighn [17,18]. There are several other versions proved in [92,136,160] and other papers. Here we present the "greatest common divisor" of all these statements. It was proved in each of these papers, and reflects the nature of all combination theorems.
Theorem 3.24. If G is a hyperbolic group, A and B isomorphic virtually cyclic subgroups, φ : A → B is an isomorphism between these subgroups, then the HNN-extension H = HNN (G; A, B, φ) is hyperbolic if and only if for every g ∈ G the set A ∩ gBg −1 is finite.

3.1.M.
The subquadratic isoperimetric inequality. The "if part" of Theorem 3.10 can be strengthened significantly: one can prove that even a quadratic isoperimetric inequality with small enough coefficient of the square implies hyperbolicity [98].  [177], Bowditch [29], Papasoglu [195]). There exists a universal constant υ such that the following holds. Let G = X | R be a group presentation, p = max(|r|, r ∈ R). If the Dehn function f of G satisfies f (n) < υ p 2 n 2 for all sufficiently large n, then G is hyperbolic.
One can actually estimate the universal constant υ by analyzing the proof of Bowditch [28], for example. The estimate would be far from optimal, though. Stefan Wenger informed us that one can take any υ < 1 2 : it follows from his paper [234]. It is an interesting question to find exact value of υ or at least a good approximation. Note that for the free Abelian group Z 2 with the standard presentation x, y | [x, y] = 1 , the Dehn function is f (n) = 1 16 n 2 (it essentially follows from 2.6.C) and p = 4. So υ cannot be bigger than 1.
Bowditch [29] proved actually a much more general result about all geodesic metric spaces with area functions satisfying Conditions (B 1 ) and (B 2 ) from 2.7.A (the role of 1/p 2 is played by the constant k from (B 2 )). Wenger's results in [234] are also much more general and apply to arbitrary geodesic metric spaces as well. In fact he proves that the isoperimetric inequality of the Euclidean plane, is optimal in the sense that every geodesic metric space with stronger isoperimetric inequality (for some natural area function) is hyperbolic. The crucial tool in his proof is metric currents as in 2.7.C. ). There are constants C 1 , C 2 , and C 3 with the following property. Let Γ be the Cayley graph of a finitely presented group where the length of every defining relation is at most d and every ball of An analogous result for metric spaces is also true (see [98]). The proof in [98] is based on the fact that Theorem 3.25 remains true even if restrict the assumption to relatively short loops: if every relatively short loop in Γ has quadratic area with sufficiently small coefficient, then the group is hyperbolic. This theorem and its variations are very important in the theory of hyperbolic groups. A version of it from [101] is used in the paper by Gromov and Delzant [68] (see also Coulon [60]) to provide possibly the most conceptually easy proof of the theorem of Novikov-Adian [171] and Olshanskii [178]. Another version of Theorem 3.26 for CAT(0) spaces is proved in Bridson and Haefliger [37]. A yet another version was recently proved by Shalom and Tao [216]: it significantly strengthens the polynomial growth result of Gromov [96] (mentioned in 3.2.A). 16 Recall that the original Cartan-Hadamard theorem states that the universal cover of a Riemannian maniford of non-negative sectional curvature (a local property) is diffeomorphic to a Euclidean space (a global property). Theorem 3.27 (Shalom, Tao [216]). For some constant C, the following holds for every finitely generated group G, and all d > 0. If there is some R 0 > exp(exp(Cd C )) for which the number of elements in a ball of radius R 0 in a Cayley graph of G is bounded by R d 0 , then G has a finite index subgroup which is nilpotent (of nilpotency class < C d ).
Problem 3.28. It would be interesting to find a common cause of all these (and other) theorems of Cartan-Hadamard type. Perhaps a recent model theoretic approach of Hrushovski will give such a common cause (see [126,Section 7] [184]) that Theorem 3.26 almost immediately implies the following stronger version of Theorem 3.2.
Theorem 3.29 (See [184]). Let G be a finitely presented group with one asymptotic cone an R-tree. Then G is hyperbolic (and hence all asymptotic cones of G are R-trees).
Indeed, if one of the asymptotic cones Con ω (G, (d i )) is an R-tree, then the ball in the Cayley graph of G or radius d i is o(d i )-hyperbolic for ω-almost all i, and it remains to apply Theorem 3.26 since d i → ∞.
Theorem 3.29 is far from true for infinitely presented groups. The first example of a finitely generated infinitely presented (hence non-hyperbolic) group was constructed by Thomas and Velickovic in [228]. Their group is given by an infinite small cancelation presentation such that the lengths of relators form a very lacunary (sparse) sequence of numbers. As an almost direct application of Greendlinger's Lemma (see Theorem 3.7), one can prove that any cone Con ω (G, (d i )) is an R-tree if d i is far from the length of relators, and is not an R-tree if d i is the length of the relator number i. Accordingly, in [184], we introduced a class of groups some of whose asymptotic cones are R-trees, and called these lacunary hyperbolic groups. It is easy to deduce from Theorem 3.26, that every lacunary hyperbolic groups is an inductive limit of hyperbolic groups and surjective homomorphisms. A converse statement is also true if one imposes certain restrictions on the "injectivity radia" of these surjective homomorphisms. Tarski monsters, infinite finitely generated torsion groups, etc. can be constructed as lacunary hyperbolic groups. Surprisingly, some amenable (see 5.4.A) non-virtually cyclic groups turned out to be lacunary hyperbolic as well. For more details and results see [184].

3.2.A.
The asymptotic cones and the c + 1-problem. Let G be a nilpotent group of class c (that is the cth member of its lower central series, G c , is the identity; G n is defined by G 0 = G, G n+1 = [G n , G], n = 0, 1, . . .). Every finitely generated nilpotent group is finitely presented. Algorithmic properties of nilpotent groups have been studied extensively, and this class can be considered relatively "tame". Note that the word, conjugacy and even the isomorphism problem are solvable for nilpotent groups (Grunewald, Segal [106,107]). Still some natural algorithmic problems are known to be undecidable in some nilpotent groups. For example, there exists a finitely generated nilpotent group with undecidable endomorphic reducibility problem: given two elements of a group find out if one of them is an endomorphic image of another (Romankov [206]). In particular, for some finitely generated nilpotent group, the problem of solvability of systems of equations in that group is undecidable. Indeed, G = x 1 , . . . , x m | r 1 = 1, . . . , r n = 1 , is a finitely presented group, then the endomorphic reducibility of u to v is equivalent to the solvability of the following system of equations with respect to unknowns w 1 , ..., w m : w 1 , . . . , w m ) = 1 . . . r n (w 1 , . . . , w m ) = 1 u(w 1 , . . . , w m ) = v Among nilpotent groups, particularly nice ones are graded nilpotent groups (sometimes also called homogeneous). Recall that a nilpotent group is called graded if for every x ∈ G k \ G k+1 , y ∈ G l \ G l+1 , the commutator [x, y] belongs to G k+l \ G k+l+1 or is equal to 1 (in a non-graded nilpotent group, the commutator [x, y] can belong to any G n , n ≥ k + l). With every nilpotent group G, one can associate a graded nilpotent group G gr as follows G gr is generated by the union of all sets G k /G k+1 subject to all possible relations of the form [xG k , yG l ] = [x, y]G k+l .
The growth function of a nilpotent group is always equivalent to a polynomial and its degree can be computed explicitly using the following formula: [115], Bass [13]). If G is a finitely generated nilpotent group, and γ S (n) its growth function for some generating set S, then there are constants A, B > 0 such that In fact by the famous result of Gromov [96], groups having nilpotent subgroups of finite index are precisely the finitely generated groups whose growth functions are bounded by polynomial. The paper [96] was the first paper where asymptotic cones of nilpotent groups were studied. It was proved later by Pansu [194] that all asymptotic cones of any nilpotent group are isometric to a graded Lie group canonically associated to G. Here is the description. Let tor(G) be the finite normal subgroup of G generated by all elements of finite order (the fact that tor(G) is finite can be found, say, in [116]). The nilpotent group G = G/tor(G) is without torsion, hence it can be embedded, according to Malcev [153], as a uniform (co-compact) lattice in a nilpotent Lie group. The corresponding graded nilpotent groupĜ is also a Lie group. That group equipped with the so-called Carnot-Carathéodory metric 17 (see, for example, [100]) is isometric to all asymptotic cones of G. In particular, all asymptotic cones of a nilpotent group are simply connected, and by Theorem 2.27, every nilpotent group has polynomial isoperimetric function. That result was strengthened by Gersten [84]: he proved that G admits a polynomial isoperimetric inequality of degree 2 h , where h is the Hirsch length of G i.e. the sum of torsion-free ranks of all Abelian groups G i /G i+1 . The degree was improved to 2 · 3 c by Conner [58], and then to 2c by Hidber [120]. One can deduce the bound c + 1 + ε for every ε > 0 by the following argument. Pittet [199], proved that a lattice in a simply connected graded nilpotent Lie group of class c admits a polynomial coarse area function A δ of degree c + 1. Hence by the result of Pansu [194], every asymptotic cone of any nilpotent group of class c has isoperimetric function n c+1 . It remains to apply Theorem 2.29 of Papasoglu. It is not clear how to remove ε from the estimate. The methods of Druţu [73] are based on the fact that the asymptotic cones in her situation were buildings. The first proof of the polynomial bound of degree c + 1 was outlined by Gromov in [99, 5.A5]. A complete combinatorial proof was found in [89]. 17 In order to define that metric, consider the tangent space Tx at every point x ofĜ. It is isomorphic to the Lie algebra g =Ĝ/Ĝ1 ⊕Ĝ1/Ĝ2 ⊕ . . . (with the natural commutator product) corresponding toĜ, and we can identify Tx with g in a smooth way with respect to x. The elements of the first summand,Ĝ/Ĝ1, generate g. The corresponding directions inĜ are called horizontal. A horizontal path onĜ is any path that has horizontal direction at every point. Every two points ofĜ are connected by a horizontal path, and the length of a shortest such path connecting two points is the Carnot-Carathéodory distance between these two points.
Theorem 3.31 (Gersten,Holt,Riley [89]). Every finitely generated nilpotent group of class c has isoperimetric function that is polynomial of degree c + 1.
The isoperimetric inequality n c+1 is the best possible bound for some nilpotent groups of class c. For example if G is a free nilpotent group of class c with at least 2 generators, then its Dehn function is polynomial of degree c + 1 (see [15] or [87] for the lower bound and [199] for the upper bound, see also [38] for other examples of nilpotent groups with maximal possible Dehn functions). Moreover, the following general theorem is proved in [15,99]. In particular, Z 2 is the factor-group of the Heisenberg group H 3 by its derived subgroup which contains an infinite cyclic group. Hence we get another proof that the Dehn function of Z 2 is n 2 .
3.2.B. The Heisenberg groups. Nevertheless n c+1 is not the smallest isoperimetric function for many nilpotent groups: Theorem 3.33 (Allcock [3], Olshanskii-Sapir [185]). The 2n+1-dimensional integral Heisenberg groups admit quadratic isoperimetric functions for n > 1; these groups are all nilpotent of class 2. The proof of Theorem 3.33 from [185] is relatively simple but it has some features in common with very complicated computations of Dehn functions (say, [111,238,239]), so we provide some details here.
In order to describe the proof of Theorem 3.33, let n = 2, that is let us consider the group . Consider a word w in the generators x 1 , x 2 , y 1 , y 2 . Suppose that w = 1 in H 5 . We need to show that applying the relations of H 5 we can reduce the word to 1 so that the number of steps is O(|w| 2 ). First notice that we can use the commutativity relations [x i , y j ] = 1 and rewrite w as w x w y where w x is a word in x 1 , x 2 , w y is a word in y 1 , y 2 . We need O(|w| 2 ) steps to do that. Now the proof consists of several reductions.
Reduction 1. Note that if u(x 1 , x 2 ) is a product of commutators, then w(x 1 , x 2 ) = w(y 1 , y 2 ) in H 5 . We prove that using at most O(|u| 2 ) steps we can transform u(x 1 , x 2 ) to u(y 1 , y 2 ). Therefore we can restrict ourselves to words u(x 1 , x 2 ) in two variables only (but we use the other two generators for auxiliary transformations).
Reduction 2. Every word u(x 1 , x 2 ) can be drawn on the (x 1 , x 2 )-planar square grid: x 1 labels horizontal edges, x 2 labels vertical edges (similar to 2.1.C). A word u is a product of commutators if and only if it labels a loop. Then it is easy to see that in H 5 , u is equal to c a for some a called the symplectic area of the region spanned by the curve. In particular u equals 1 in H 5 if and only if its symplectic area is 0. Reduction 3. Now let us define a symplectic area of any word u(x 1 , x 2 ), not necessarily a product of commutators. Let k i , i = 1, 2 be the sum of exponents of x i in u. Then the word is a product of commutators. Then the symplectic area of u is, by definition, the symplectic area of C(u).
Reduction 4. A rectangular word R(p, q) where p, q > 0 are integers is, by definition [x p 1 , x q 2 ]. Its symplectic area is pq. Our goal is for every word u(x 1 , x 2 ), reduce the word C(u) to a rectangular word of the same area in number of steps quadratic in |u|. If u = 1 in H 5 , then the rectangular word is empty, and we are done. Figure 11. The division in halves trick Reduction 5. We are using the "division in halves" trick shown on Figure 11. The word u is represented as a product of two subwords u 1 , u 2 with |u i | ≈ |u|/2. Note that the loop corresponding to the word C(u) on Figure 11 is decomposed into three loops: the loops corresponding to C(u 1 ), C(u 2 ) and a rectangular word R with sides < |u|. Then in order to reduce C(u) to a rectangular word, we need to be able to reduce C(u 1 ) to a rectangular word R 1 , reduce C(u 2 ) to a rectangular word R 2 , and then reduce the product of three rectangular words R 1 RR 2 to a rectangular words. The first two tasks can be done arguing by induction on the length of the words. Hence it remains to consider products of rectangular words. That is done by cutting and pasting (using Reduction 1).
Note that Reduction 1 is crucial in this proof. Here we use the fact that H 5 has two extra generators y 1 , y 2 which we use as a temporary storage. Say, in order to transform u(x 1 , x 2 )v(x 1 , x 2 ) to v(x 1 , x 2 )u(x 1 , x 2 ) where u, v are products of commutators, we first transform u(x 1 , x 2 ) into u(y 1 , y 2 ) then use commutativity of generators and transform u(x 1 , x 2 )v(y 1 , y 2 ) to v(y 1 , y 2 )u(x 1 , x 2 ) and then transform v(y 1 , y 2 ) back into v(x 1 , x 2 ). This transformation is used in many places in the proof.

3.2.C.
Other central products. The central powers of nilpotent groups of class 2 have very small Dehn functions. The following result is mentioned in our paper with A. Olshanskii [185]. Its proof is essentially the same as the proof for the Heisenberg group H 2n+1 . Another proof is given in R. Young [237]. Young's approach is in a sense a combination of approaches from [3] and [185]. We were unable to show that the central square of every nilpotent group of class 2 has quadratic isoperimetric function. More concretely, let F N (2, 10) be the free nilpotent group of class 2 generated by a 1 , b 1 , . . . , a 5 1, where x, y, z ∈ {a 1 , . . . , a 5 , b 1 , . . . , b 5 }). Let G 10 be the factor-group of F N (2, 10) by the central cyclic subgroup generated by [a 1 , b 1 ] . . . [a 5 , b 5 ]. Then we were not able to prove that the central square of G 10 has quadratic Dehn function (this example and the question can be found in [237,Section 5]).

3.2.D.
Wenger's result. Recently S. Wenger [235] showed that in fact central squares of nilpotent groups of class 2 often do not admit quadratic isoperimetric inequality. In particular, he proved Theorem 3.36 (Wenger [235]). The Dehn function of the central square of G 10 is strictly greater than quadratic.
The idea of the proof is the following. Let H be the central square of G 10 . Suppose that H has quadratic Dehn function. Then by Theorem 2.30, in the asymptotic cone of H, every Lipschitz loop γ must bound an integral 2-current of area c|γ| 2 for some constant c. Wenger gets a contradiction by finding a loop that does not bound an integral 2-current at all. In the proof, he essentially uses Pansu's description of asymptotic cones of nilpotent groups as nilpotent Lie groups with a Carnot-Caratéodory metric (see [194] and 3.2.A). Assuming that his loop bounds an integral 2-current ζ, he approximates ζ by singular Lipschitz chains. He then uses the fact that every Lipschitz map from a disc D 2 to a Carnot-Carathéodory space is differentiable almost everywhere (Pansu [194]). Therefore Wenger obtained the first example of a finitely generated nilpotent group whose Dehn function is not equivalent to a polynomial (compare with Guivarc'h-Bass' Theorem 3.30 about the growth functions). 3.2.E. Other nilpotent groups. Note also that R. Young [238] constructed nilpotent groups of arbitrary class with quadratic Dehn functions.
3.3. Semihyperbolic groups. Semihyperbolic groups were defined by Gromov [98] and with more precision by Alonso and Bridson [4]. Let X be a metric space. Let A ≥ 1, B ≥ 0 be two numbers. A discrete (A, B)-path in X is any (A, B)-quasi-isometry f from an interval of natural numbers [0, T f ] into X. We shall extend f to the whole N by f (n) = f (T f ) for n ≥ T f . Suppose that for some A, B we can choose a discrete (A, B)-path p a,b between any pair of points a, b ∈ X in such a way that for every four points a, b, a ′ , b ′ and every t ∈ N we have the combing condition: for some constants C 1 , C 2 . Then we say that X is semihyperbolic. It is easy to prove that every hyperbolic metric space is semihyperbolic (exercise: prove that using asymptotic cones) and that a metric space that is quasi-isometric to a semihyperbolic space is semihyperbolic space itself. A finitely generated group G is called semihyperbolic if its Cayley graph with respect to some (equivalently, any) finite generating set is semihyperbolic and p ga,gb = gp a,b for every g, a, b ∈ G. In that case we can consider the language of words labeling the paths p 1,b for all b ∈ G. If that language is regular, then the semihyperbolic group is called bi-automatic. If the language is regular, but we only assume that the combing property (6) holds provided a = a ′ , then the group is called automatic [78].

3.3.A.
CAT(0)-spaces. Let X be a geodesic metric space, ABC be a geodesic triangle in X with side lengths p, q, r. We can construct the triangle A ′ B ′ C ′ in the Euclidean space R 2 with the same side lengths. If a, b are points on the sides AB, BC of the triangle ABC, let a ′ , b ′ be the corresponding points on the sides A ′ B ′ and B ′ C ′ of A ′ B ′ C ′ . If for every choice of the triangle ABC and points a, b we have dist(a, b) ≤ dist(a ′ , b ′ ), then X is called a CAT(0) space [99,37]. In every CAT(0) space there exists unique geodesic p x,y connecting any two points x, y. These geodesics satisfy the combing property (6) (see [37]). Hence every CAT(0) metric space is semihyperbolic. The class of CAT(0) spaces includes all hyperbolic spaces, complete simply connected Riemannian manifolds all of whose sectional curvatures are non-positive (that is, Hadamard manifolds), including symmetric spaces of non-compact type, Euclidean buildings (see [42]). Every group acting properly discretely and co-compactly on a CAT(0)-space is semihyperbolic. This implies that many non-hyperbolic groups are semihyperbolic. For example: (1) All Coxeter groups [4] (the fact that these groups are automatic was proved in [40]).
(2) Right angled Artin groups 18 and many other Artin groups (see [49]) including the direct product of two free groups (see 2.3.C). (3) The mapping class group 19 of a surface with genus g and n punctures provided 3g − 3 + n ≥ 2 (the fact that these groups are automatic was proved by Mosher [165], the bi-automaticity was proved by Hamenstädt [117]). (4) All co-compact lattices in SL n (R) (and many other Lie groups) because the corresponding symmetric space SL n (R)/SO n (R) is CAT(0). (5) All lattices in SO n,1 (R) [37].
3.3.B. Semihyperbolic metric spaces have quadratic isoperimetric inequality. Indeed, consider every loop γ of length l and point a on that loop. Subdivide the loop into a sequence of subarcs of length δ (for some constant δ) by points b 1 , . . . , b n , and connect a with b 1 , . . . , b n by the (A, B)-quasi-geodesics p a,b i . The length of each of these quasi-geodesics is at most O(l). Then by the combing property (6) for every t ∈ N the distance between p a,b i (t) and p a,b i+1 (t) is bounded by some constant C = C(δ). Thus we obtain a collection of points p a,b i (t), t ∈ N, containing the points b 1 , . . . , b n , having mesh O(1). The number of points in that set is at most O(l 2 ). Hence the Gromov coarse area function A δ (l) is bounded by O(l 2 ) for a sufficiently large δ.
A very similar proof shows that automatic groups have quadratic isoperimetric functions too.
Proof. Let G be a semihyperbolic group, u, v ∈ G. Suppose that x −1 ux = v for some x ∈ G, and the length |x| of x is the smallest possible. Clearly it is enough to bound |x| as some recursive function in |u| + |v|. We show that one can take an exponential function exp(c(|u| + |v|) for some constant c. Indeed, let x = a 1 a 2 . . . a m be a product of generators and their inverses. Note that in the Cayley graph of G we have two paths p 1 , p 2 labeled by x, p 1 connecting 1 and x, and p 2 connecting u and ux. Since xv = ux, the initial points of p 1 and p 2 are at distance |u| from each other and the terminal points are at distance |v| from each other. By the combing property (6), then there exist paths q i , i = 1, . . . , m connecting a 1 . . . a i with ua 1 . . . a i of length ≤ C(|u| + |v|) for some constant C. If m is larger than the number of different words in the generators of G of length C(|u| + |v|) (which is exp(c(|u| + |v|) for some constant c), then two of these paths, say q i , q i+j have 18 A right angled Artin group is a group given by a presentation of the form a1, . . . , an | aiaj = ajai for some i, j ∈ {1, . . . , n} 19 The mapping class group of a surface S is the group of all isotopy classes of orientation preserving self-homeomorphisms of S. This group is finitely presented for every S. the same labels. Then let y = a 1 . . . a i a i+j+1 . . . a m . It is easy to see that y −1 uy = v which contradicts the minimality of x.
Geometrically this proof can be described as follows. Cut a mimimal annular diagram ∆ by a simple path p connecting the boundary components. Using the combing property (6), for every vertex of p we can find a (non-null-homotopic) loop in ∆ based at that point and having length at most C times the sum of lengths of the boundary components. We can assume that these loops do not cross each other. If p is too long, two of these loops will have the same label. Cutting off the annular subdiagram bounded by these two loops from ∆, we obtain another, smaller, annular van Kampen diagram with the same boundary labels (see also the proof of Lemma 4.8 below).

Dehn functions and algorithmic properties of groups
4.1. The zoo of groups with quadratic Dehn functions. We have already mentioned that automatic and semihyperbolic groups have quadratic isoperimetric functions. In fact the set of finitely presented groups with quadratic isoperimetric functions is quite diverse. finitely generated nilpotent group of class 2, some finitely generated nilpotent groups of arbitrary nilpotency class (Alcock [3], Olshanskii-Sapir [185], Young [237,238]). (4) All extensions of finitely generated free groups by cyclic groups (Bridson, Groves [36]). (5) The Stallings group [224] (the kernel of the homomorphism from the direct product of three free groups of rank 2 to Z that sends all generators to the generator of Z) (Dison, Elder, Riley, Young [71]) 21 (6) The groups SL(n, Z), n ≥ 5 (Young [239]). (7) The R. Thompson group F (Guba [111]) 22 .
The proofs of the quadratic isoperimetric inequality for each of these classes of groups is quite complicated and quite different. 4.1.A. . Druţu's groups from Part (1) of the theorem are uniform lattices in solvable Lie groups of class 2. She proves quadratic isoperimetric inequality for these Lie groups using asymptotic cones (these are horoballs in direct products of Euclidean buildings) and Theorem 2.29. 4.1.B. . Nilpotent groups from part (2) have been discussed already in Section 3.2. 20 A group G is called polycyclic if it has a series of subnormal subgroups G = G0 G1 . . . Gn = {1} with cyclic factors Gi/Gi+1. 21 Note that Stallings' group was the first example of a finitely presented group which is not of type F P3.
Thus by [71] and a previous results of Riley [203] and Papasoglu [196] this group is the first example of a finitely presented group with simply connected but not 2-connected asymptotic cones. 22 The R.Thompson group F consists of all piecewise-linear increasing bijective functions [0, 1] → [0, 1] whose derivatives have finitely many dyadic break points and the slopes of linear pieces are powers of 2. The operation: composition of functions. That remarkable group has a presentation with two generators and two defining relations [46]. 4.1.C. . The proof for free-by-cyclic groups is very long and compicated. Here is a short description of that proof sent to us by Daniel Groves.
Let G = F n ⋊ φ Z be a free-by-cyclic group, with natural HNN-presentation Let ∆ be a minimal diagram over this presentation. Consider the maximal t-bands in ∆. By Lemma 2.23, every maximal t-band starts and ends on the boundary of ∆. Every HNN-cell has a top side (corresponding to a letter x i ) and a bottom side (corresponding to φ(x i )). Passing from the bottom to the top corresponds to applying the automorphism φ. , so its Dehn function is linear. It was proved in [78] that SL 3 (Z) has exponential Dehn function. The fact that the Dehn function of SL n (Z) is at most exponential for every n was stated in Gromov [99,5.A7] and fully proved by Leuzinger [142]. The book [78] contains a statement attributed to Thurston that for n ≥ 4 the Dehn function is quadratic. Recently Robert Young proved in [239] that the Dehn function is quadratic for n ≥ 5. The case n = 4 is still open. To explain Young's proof, we need some preliminaries. Note that SL n (R) is quasiisometric to the symmetric space SL n (R)/SO n (R), and the word metric in SL n (Z) is quasiisometric to the restriction of the Riemannian metric on SL n (R) to SL n (Z) by Lubotzky-Mozes-Raghunathan [145]. The action of SL n (Z) on the symmetric space SL n (R)/SO n (R) is not co-compact. The typical case is when n = 2. In that case the symmetric space SL 2 (R)/SO 2 (R) is just the Hyperbolic plane H 2 , say the upper half of the complex plane with the hyperbolic metric. The group SL 2 (R) acts on H 2 by the Möbius transformations: Then the action SL 2 (Z) on H 2 has a fundamental domain which is an ideal triangle (with one vertex at infinity). Consider the set H ∞ of all z ∈ H 2 with imaginary part at least 1 (the number 1 can be replaced by any real number > 0). It is a horoball in H 2 . Note that H ∞ is stable under the action of the subgroup of SL 2 (R) consisting of upper uni-uppertriangular matrices from SL 2 (R) of the form 1 b 0 1 . The orbit of H ∞ under the action of the group SL 2 (Z) consists of horoballs too. The union of these horoballs is the thin part of the symmetric space H 2 . The complement of the thin part is the thick part. The thick part is clearly invariant under the action of SL 2 (Z) and the action is co-compact. A similar picture exists if n ≥ 3. The symmetric space SL n (R)/SO n (R) has a horoball H ∞ . The union of the horoballs SL n (Z) · H ∞ is called the thin part of the symmetric space, its complement is the thick part, and the action of SL n (Z) on the thick part is co-compact. Thus it is natural to consider "equivalent objects": the group SL n (Z) and the thick part of the corresponding symmetric space. These preliminaries should be enough to understand a short description of the proof from [239] sent to us by Robert Young. Every word w that is equal to 1 in SL n (Z) corresponds to a loop in the symmetric space SL n (R)/SO n (R). The basic idea of the proof is to use fillings of loops in the symmetric space SL(p; R)/SO(p) (which has quadratic area function being semihyperbolic) as templates for fillings loops in the Cayley graph of SL n (Z). Fillings which lie in the thick part of SL n (R)/SO n (R) correspond directly to fillings in SL n (Z), but in general, an optimal filling of a curve which is contained in the thick part may have to go deep into the thin part. The proof uses both geometric and algebraic points of view. Different horoballs in the thin part correspond to different parabolic subgroups of SL n (Z) (each of these subgroups is conjugate to some group of block upper triangular matrices), so one key step was to develop geometric techniques to cut the quadratic filling of a loop in the symmetric space into pieces (subdiscs) with each piece lying in one horoball. The word corresponding to the boundary of each piece is a product of three words, each of which represents an element of the parabolic subgroup (these elements are determined by the geometry of the filling). Each of these three words can be written as a product of a block diagonal matrix and a unitriangular matrix. The uni-triangular matrix can be written as a product of boundedly many elementary matrices (one non-zero off-diagonal entry) where the off-diagonal entry may be of exponential size. The elementary matrices can be replaced by relatively short (linear length) words in SL n (Z) (shortcuts) using the technique from Lubotzky-Mozes-Raghunathan [145]. For this purpose Young includes the elementary matrices in some solvable subgroups with quadratic Dehn functions where the cyclic subgroups generated by these elements are exponentially distorted. This is not very mysterious. We have already seen (see 2.6.C) that the cyclic group generated by a is exponentially distorted in BS(1, 2). The group BS(1, 2) is metabelian and has the following matrix representation b = 2 0 0 1 , a = 1 1 0 1 . Note that a is an elementary matrix in this representation. Of course as we have seen (Example 2.22), the Baumslag-Solitar group has exponential Dehn function. But Druţu [73], and later Cornulier and Tessera [59] have shown that metabelian groups of higher ranks can have quadratic Dehn functions.
The key property of Young's solvable subgroups is the freedom of choosing them. This is similar to having two commuting copies of the Heisenberg group H 3 in H 5 in the proof from 3.2.B and this is why Young's proof does not work in SL 4 (Z): there is simply not enough room to place several commuting solvable subgroups with quadratic Dehn functions. The shortcuts allow Young to reduce the problem of filling the original loop to the filling problem for loops in some subgroups of block-diagonal matrices (which are just direct products of matrix groups of smaller sizes). Repeatedly applying these geometric and combinatorial techniques breaks the original word into pieces which lie in smaller and smaller subgroups of SL n (Z) and ultimately to a quadratic filling.
4.1.F. . The result that the R.Thompson group F has quadratic Dehn function [111] was quite unexpected. It was conjectured by Gersten [88] that the Dehn function of F is exponential. It was proved to be at most exp(log 2 n) in our paper with Guba [112]. For some time we thought that exp(log 2 n) is equivalent to the Dehn function of F because we had a concrete sequence of loops in the Cayley graph of F for which we could not find fillings with smaller areas. A big breakthrough was achieved by Guba in [109] where a polynomial isoperimetric function was found. Still it was hard to believe that the Dehn function is quadratic because the polynomial fillings used in [109] seemed optimal.
Recall that the R. Thompson group F has infinite presentation It is also generated by x 0 , x 1 subject to 2 defining relations. It turned out that instead of dealing with a finite presentation of F one is better off working with the infinite presentation obtained from (7) by considering only those relations where j − i ≤ 5. It is not difficult to prove that this (infinite) presentation defines F and the quadratic isoperimetric inequality for this presentation is equivalent to a quadratic inequality for a finite presentation of F . (In defining an isoperimetric inequality for the infinite presentation, one needs to modify the notion of length of a word; Guba uses a notion of complexity of a word which is the length plus the difference between the highest and the lowest indices of letters occurring in the word.) Guba's proof is in a sense similar to the proof from [185] outlined in 3.2.B. He also introduces normal forms of elements in F (in the case of H 5 these were the rectangular words R(p, q)), and considers the process of reducing a word to its normal form rather than reducing a word that is equal to 1 in F to the empty word. Using a trick similar to the "division in halves trick" from 3.2.B, Guba reduces the problem to finding the filling area of words of the form uv where u, v are normal forms. These are called the triangular words in [111]. He then transforms these words to words of different shapes (rectangular, vertical, horizontal). In order to do that Guba uses various elaborate cutting and pasting techniques. Some of these amount to dividing a word into three parts which are of approximately (but not quite) equal lengths. Instead of three pieces as on Figure 11, Guba gets 8 pieces, and the following recurrent formula for the isoperimetric function f (n): The fact that the coefficient of f (n/3) in that formula is smaller than 9 = 3 2 plays crucial role. It turns out that the reduction procedure described in [111] not always decreases the complexity of a word. In order to make increases rare enough, one needs to cut the words into three parts of not quite equal lengths and the required freedom is allowed by the fact that 8 < 9. 4.1.G. Rips' question. Note that the following problem formulated by Rips in 1994 is still open.

Problem 4.2 (Rips). Is F automatic?
It is known [112] that F admits a regular language of (unique) normal forms. But that set of normal forms does not satisfy the combing property (6).
Also note that in [110], Guba proved that the simple R.Thompson groups T and V [46] have polynomial isoperimetric functions (these groups are finitely presented by [121]). Exact Dehn functions of these groups are unknown. Simple finitely presented groups with quadratic Dehn functions (and even bi-automatic) are known (see Burger and Mozes [44,45]).  For polycyclic (in particular, nilpotent) groups, solvability of the conjugacy problem was proved by Formanek [82] using representation of these groups by matrices.

4.2.B.
. For finitely presented metabelian groups, the solvability of conjugacy problem was proved by Noskov [170] using the Magnus representation of metabelian groups by matrices over commutative rings, and some very non-trivial commutative algebra (finding groups of units of commutative rings, in particular). 4.2.C. . For free-by-cyclic groups it was proved by Bogopolski, Martino, Maslakova and Ventura [23] using some deep properties of the dynamics of free group automorphisms, in particular, Bestvina-Feighn-Handel train tracks [19].

4.2.D.
. For the Stallings group, the solvability of conjugacy problem is proved by Bridson in [34]. He applies the following nice result (we weaken the formulation a little bit in order not to introduce new terminology).
Theorem 4.4 (Bridson [34]). Let G be a bi-automatic group and let N ⊳ G be a normal subgroup. If the membership problem for finitely generated subgroups of G/N is solvable, then N has a solvable conjugacy problem.
Recall that the input of the membership problem for finitely generated subgroups of a group G is a tuple of elements of G (given as words in generators) u 1 , . . . , u n , v. The output is "yes" if v belongs to the subgroup generated by u 1 , . . . , u n .

4.2.E.
. For the group SL(n, Z), the solvability of conjugacy problem was proved independently by Grunewald [105] and Sarkisyan [212]. The diversity of methods used in these papers is reflected by the following paragraph taken from Roger Lyndon's review of [105] in Math Reviews: "The arguments, given explicitly, are long and intricate but appear for the most part to use hardly more than standard ideas from the theory of linear groups and their modules, from number theory, and, from the theory of groups, coset representations and the Reidemeister-Schreier theorem." 4.2.F. . The conjugacy problem in the R.Thompson group F was first solved by Guba and myself in [113] using the diagram group representation of F . More recently a different solution was found by Belk and Matucci [16]. 4.2.G. Rips' problem. These results justify the following general problem which was formulated in 1994 as a conjecture (before most of the above results confirming this conjecture appeared): The fact that this problem is difficult is shown by the diversity of methods used in proving solvability of the conjugacy problem mentioned above, and the following "quasi-proof" from our paper with A. Olshanskii [191].
In order to formulate the statement, we shall need the well known constructive version of a limit of a sequence of numbers. In fact we are going to use that definition only in the case when the limit is 0.
Definition 4.6. Let g : N → R be a function. We say that the constructive limit of g(n) as n → ∞ is 0 if for every integer A > 0 there exists N = N (A) such that for every n > N , |f (n)| ≤ 1/A, and the function N (A) is recursive. In that case we shall write lim c n→∞ g(n) = 0. It is easy to see that lim c n→∞ g(n) = 0 if and only if there exists an increasing recursive function f (n) such that g(k) ≤ 1 n for every k ≥ f (n). Quasi-Theorem 4.7. Let d(n) be the Dehn function of a finite group presentation P . Suppose that lim c n→∞ d(n) n 2 log n = 0. Then P has decidable conjugacy problem.
Proof. We shall need the following lemma.
Lemma 4.8. Let ∆ be a minimal annular diagram with boundary components p, p ′ over a finite group presentation P . Let x be a shortest path connecting p and p ′ . Then the area of ∆ is at least C|x| log |x| for some constant C depending on P .
Proof. Consider the following construction. Let p 0 = p (considered as a cyclic path) be the inner contour of the diagram ∆. Suppose that we have constructed a cyclic path p i surrounding the hole of the diagram in ∆ such that p i does not have common vertices with p ′ . Let K i be the annulus bounded by p 0 and p i . Let M i+1 be the set of cells of ∆ outside K i that have common vertices with p i . Then let K i+1 be the minimal annular subdiagram of ∆ with simple contours that contains K i and all cells from M i+1 . Let p i+1 be the outer contour of K i (the inner contour of K i is p = p 0 ).
It follows that every edge of the path p i+1 belongs to the contour of one of the cells of M i+1 . Hence every vertex of p i+1 can be connected with a vertex of p i by a path such that (0) the length of the path is bounded by a constant, From (1), it follows that the number of subdiagrams K i is O(|x|). Furthermore, more than a half of the paths p i have length at most log c |x| where c is, say, four times the number of letters in the alphabet of the presentation P . Indeed, otherwise we would have two paths p i and p j , i = j with the same labels, and we could remove the annular subdiagram between p i and p j reducing the area of ∆ (that would contradict the minimality of ∆), see an analogous trick in the proof of Theorem 3.38 above.
From (2), it follows that at least half of the subsets M i contain at least O(log |x|) cells each. Since these sets do not intersect, the number of cells in ∆ is at least O(|x| log |x|). Now the "quasi-proof" of the Quasi-Theorem 4.7 proceeds as follows. Suppose that P is a finite presentation with undecidable conjugacy problem. Suppose that the constructive limit of d(n) n 2 log n is 0. Then, in particular, d(n) is bounded from above by a recursive function, and P has solvable word problem.
Note that if an annular diagram ∆ with boundary labels u and v has a simple path x with label t connecting the boundary components, then we can cut ∆ along x and obtain a disc van Kampen diagram with boundary label t ±1 ut ∓1 v −1 . So if |t| is recursively bounded in terms of |u| and |v| (for every u and v that are conjugate modulo P ) then the conjugacy of u and v can be algorithmically verified.
Pick an increasing recursive function f (n) with d(3k) Ck 2 log k < 1 n for every k > f (n) where C is the constant from Lemma 4.8 (as in Definition 4.6). Since the conjugacy problem for P is undecidable, there exists a minimal annular diagram ∆ with contours p, p ′ such that any path in ∆ connecting p and p ′ has length at least f (|p| + |p ′ |). Let n = |p| + |p ′ |. Let x be a shortest path connecting p and p ′ . Thus (8) |x| ≥ f (n), and so Since x is a shortest path connecting p and p ′ , x is simple. Let us cut ∆ along x and obtain a disc diagram Γ with boundary label zuz −1 v −1 where z is the label of x ±1 . By Lemma 4.8, the area of Γ is at least C|x| log |x|. Now we can take an integer m between |x| n − 1 and |x| n . We attach m copies of Γ consecutively to each other along the sides labeled by z to get a van Kampen diagram Π with boundary label zu m z −1 v −m . The perimeter r of Π is between 2|x| and 3|x|, and the area is m times the area of ∆. So, by Lemma 4.8, the area of Π is at least C|x| 2 log |x| n . By (9), we can deduce that the area of Π is bigger than d(r). This contradicts the definition of Dehn function of a group presentation.
Remark 4.9. The only gap in the preceding argument is contained in the last phrase. We cannot guarantee that Π has minimal area among all diagrams with the same boundary label, and, in principal, the area of a minimal diagram with this boundary label may be even quadratic in terms of the perimeter r. Still we do not know any groups for which this proof does not work. Note that we do not need Π to be minimal: only that the minimal diagram with the same boundary label does not have too few cells compared to Π. Also note that Π cannot be "obviously" non-minimal. For example, it is easy to see that Π cannot contain two cells that share an edge and are mirror images of each other (if such a pair of cells existed, we would replace their union by a diagram with no cells, reducing the number of cells). Also we have freedom of choosing u, v, ∆ and x. We do not need x to be a minimal length path connecting the boundary components of ∆. We only need that the area of ∆ exceeds O(|x| log |x|) divided by a recursive function in n (depending only on the presentation). In addition, the number m should only be O( |x| |u|+|v| ). Thus Quasi-theorem 4.7 seems true for a very large class of groups and possibly for all finitely presented groups.

4.2.H.
The sharp bound of n 2 log n. It is shown in [191] that Quasi-theorem 4.7 is true for groups that are multiple HNN-extensions of free groups (for the definition see 2.6.B).
Theorem 4.10 (See [191]). Let d(n) be the Dehn function of a multiple HNN-extension of a free group. Suppose that lim c n→∞ d(n) n 2 log n = 0. Then P has decidable conjugacy problem.
The bound n 2 log n in Theorem 4.10 turned out to be sharp. Theorem 4.11 (See [191]). There exists a finitely presented group that • is a multiple HNN-extension of a free group, • has undecidable conjugacy problem, • has Dehn function n 2 log n.
It is pointed out in [191] and [36] that Theorem 4.10 and the results of [36] give an alternative prove of solvability of the conjugacy problem in free-by-cyclic groups (originally proved in [23]). 4.2.I. The conjugacy problem in automatic groups. It is still unknown whether the conjugacy problem is decidable in every automatic group although it is decidable in every bi-automatic group (by Theorem 3.38 since these groups are semihyperbolic).
Problem 4.12. Can the quasi-proof above and, specifically, Remark 4.9 be applied to prove that automatic groups have decidable conjugacy problem? Is it possible to apply the quasi-proof to groups from Theorem 4.1 in order to obtain a unified proof of the solvability of conjugacy problem for these groups? 4.3. A description of isoperimetric functions ≥ n 4 . It is most probable that the class of Dehn functions ≥ n 2 log n is as wide as the class of time functions of Turing machines with the same restriction. The next theorem confirms that conjecture in the case of Dehn functions ≥ n 4 .

4.3.A. Superadditivity.
We call a function f superadditive if for all natural numbers m, n we have f (m + n) ≥ f (m) + f (n). The problem of whether all Dehn functions of finitely presented groups are superadditive (up to equivalence) is still open. We conjectured in [210] that the answer is "yes". In [114] Guba and myself proved that every free product G * H where G and H are non-trivial finitely presented groups, has a superadditive Dehn function. For example, for every group G the free product G * Z has a superadditive Dehn function (here Z is the infinite cyclic group). In fact the Dehn function of G * Z is the superadditive closure of the Dehn function of G, i.e. the smallest superadditive function exceeding the Dehn function of G. Thus if there was an example of a finitely presented group G with non-superadditive Dehn function, it would be also an example of a group G whose Dehn function is not equivalent to the Dehn function of G * Z. Note that a presentation of G * Z is obtained from the presentation of   2. For every non-deterministic Turing machine T with time function T (n) such that T 4 (n) is superadditive there exists a finitely presented group G with Dehn function T 4 (n) and the isodiametric function T 3 (n) such that the problem recognized by T reduces polynomially to the word problem in G (see 2.2.G).

Note that we do not know if
• Every time function of a non-deterministic Turing machine is equivalent to a superadditive function. • The square root of a time function of a non-deterministic Turing machine is equivalent to the time function of a non-deterministic Turing machine.
If one or both of these statements were true, it would be (obviously) possible to simplify the formulation of Theorem 4.13.

4.4.
A. NP-complete groups. Now let us take a non-deterministic Turing machine M recognizing some NP-complete problem, say, the exact bin packing Problem 2.8. Let G be the group from Theorem 4.13 that corresponds to M . Then G has polynomial Dehn function, hence its word problem is in NP (by 2.2.J). On the other hand since the problem recognized by M polynomially reduces to the word problem in G, the word problem in G is NP-complete. Hence we get Corollary 4.14. (See [210, Corollary 1.1]). There exists a finitely presented group with NP-complete word problem.

4.4.B.
CoNP-complete groups. Theorem 4.13 only treats the "yes" part of the word problem. Van Kampen diagrams are not witnesses of the "no" part (i.e. inequalities w = 1). Nevertheless we have the following Theorem 4.15 (Birget [21]). There exists a finitely presented group G whose word problem is coNP-complete.
Birget reduces a known coNP-complete problem, namely the circuit equivalence problem 23 to the word problem in a concrete finitely presented subgroup of the Thompson-Higman group V [46] (denoted by G 2,1 in Higman [121]).
That group, as all Thompson groups, consists of (partial) transformations of the set of infinite binary words. One of the simple key ideas is that if G is a group of transformations of a set X, then a witness to the inequality f = g for f, g ∈ G is just an element x ∈ X such that f (x) = g(x). We have mentioned already in 2.2.G that the group V has been used before in constructing finitely presented groups with undecidable word problem by McKenzie-Thompson [155]. They used V to simulate a construction of arbitrary recursive functions from simple building blocks (somewhat similar to boolean circuits).

4.5.
The isoperimetric spectrum. The isoperimetric spectrum consists of all numbers α ≥ 1 such that n α is equivalent to the Dehn function of a finitely presented group. By Theorem 3.25, the intersection of the isoperimetric spectrum with the open interval (1, 2) is empty. The next theorem gives an almost complete description of all numbers in the isoperimetric spectrum that are ≥ 4. Theorem 4.16 (Sapir [210]). The isoperimetric spectrum contains all numbers α ≥ 4 whose n-th binary approximation 24 can be computed by a deterministic Turing machine in time less than 2 2 n . On the other hand, Theorem 2.17 implies that if α is in the isoperimetric spectrum then the n-th binary approximation of α can be computed in time ≤ 2 2 2 n .
The difference in the number of 2's in these expressions, is the difference between P and N P in Computer Science (if P = N P then there should be two 2's in both expressions).
All "constructible" numbers (rational numbers, algebraic numbers, values of relatively easy computable analytic functions at rational points such as e + 2, π + 1, 10 sin 3 4 , etc.) which are ≥ 4 satisfy the condition of Theorem 4.16. In fact, these and most other familiar numbers can be computed by very fast quasilinear algorithms (see J. Borwein and P. Borwein [27]).
The intersection of the isoperimetric spectrum with the interval [2,4] is less known, although it is known that it is dense, contains all rational and many transcendental numbers from that interval [33,30,31]. The advantage of the snowflake construction and its modifications employed in these papers is that one obtains explicit and often quite short presentations of groups with prescribed Dehn functions of the form n α while the groups from [210] are obtained by simulating Turing machines computing binary approximations of α (and so the number of generators and relators are usually quite large and writing explicit presentations is not a feasible task). 23 A boolean circuit [232] is a "device" built of elementary circuits connected by wires; each elementary circuit represents a basic operation from logic: AND, OR, NOT and FORK. Every circuit has input wires and output wires. Every time we send signals 0 or 1 through input wires, we get a combination of signals 0 or 1 coming through the output wires. Hence every circuit represents a boolean function. The circuit equivalence problem asks, given two (acyclic) boolean circuits, do they have the same input-output function? 24 That is a rational diadic number β = a/2 n which is within O(2 −n ) from α 4.6. Groups with polynomial-non-recursive and quadratic-non-recursive Dehn functions. Let F low and F high be two classes of increasing functions N → N. We say that a function h : N → N is an F low −non−F high -function, if for some f ∈ F low , h(n) ≤ f (n) for infinitely many n, and for every g ∈ F high , h(n) > g(n) for infinitely many n. For example, a function h is polynomial-non-recursive if h is not bounded above by any recursive function, but is smaller than some fixed polynomial f (n) for infinitely many values of n. Similarly we can define the class of functions which are quadratic-non-recursive, etc.
Proof. We shall sketch a proof showing how to deduce Theorem 4.17 from the results of [210]. It shows a general strategy of estimating Dehn functions of groups from [210] corresponding to the so called S-machines (see Section 6). Let M be the Turing machine from Theorem 2.6, that is M has undecidable halting problem and infinitely many h-good numbers where h(n) = exp exp n. In [210], it is shown how for every Turing machine M , one can construct a group G = G(M ). The fact that M has undecidable halting problem, and [210,Propositions 4.1,12.1] imply that G has undecidable word problem. The presentation of G from [210] shows that G is obtained from some special multiple HNN-extension of a free group (called an S-machine) with free letters θ 1 , . . . , θ m by imposing one relation which is called the hub (see 6.3).
To estimate the Dehn function of G, we need to estimate the areas of van Kampen diagrams over the presentation of G from [210]. Let ∆ be a van Kampen diagram over that presentation with boundary label w of length l. We assume that ∆ is minimal.
The diagram E 1 and for every i the diagram Γ i contain no hubs. So by [210,Lemma 8.1], for some constant c 1 . We also know that |∂(E 1 )| ≤ |w| and by [210,Lemma 11.21] |∂(Γ i )| ≤ c 2 |∂(E i+1 )| ≤ c 2 |w|, for i = 1, . . . , s − 1 and some constant c 2 . Therefore for some constant c 3 . It remains to estimate the area of Π i . Let Π be one of the Π i , i = 1, . . . , s − 1. Let W be the word written on the boundary of Π. Note that |W | ≤ c 5 l for some constant c 5 because the union of E i , Γ i , Π i is E i+1 by property (S3) and the estimates of the length |∂(E i )|, |∂(Γ i )| above.
We can assume that Π is a minimal diagram. Then by [210,Propositions 4.1] some subword w of the boundary label of Π is an accepted configuration of M and the area of Π is bounded from above by c 6 time(w) 2 for some constant c 6 . Moreover time(w) is either bounded by c 7 |w| or does not exceed c 8 time(w ′ ) for some input configuration w ′ with |w ′ | ≤ |w| and some constants c 7 , c 8 . Now suppose b 2c 5 < l < b c 5 for some h-good number b ≫ 1. Then |w ′ | ≤ |w| < b. Hence time(w ′ ) < log b ≤ l. Thus the area of Π is bounded by c 9 l 2 for some constant c 9 . Thus the area of ∆ is bounded by some polynomial in l.
Since the set of h-good numbers of M is infinite by the choice of M , the group G has polynomial-non-recursive Dehn function. 4.6.B. Quadratic-non-recursive. The following theorem which is much stronger than Theorem 4.17 was proved in [182].
Comparing with 4.6.A, the proof of Theorem 4.18 requires significantly new ideas. It is easy to see that the proof from 4.6.A does not give quadratic upper bound for infinitely many values of n (for instance, the S-machines usually have cubic Dehn functions [210], so one cannot hope to bound the area of even E 1 by a quadratic function). The proof of Theorem 4.18 involves • A new construction of an S-machine simulating the Turing machine from Theorem 2.6. • A new way of decomposing a diagram over the presentation of the S-machine plus the hub. Several parameters of diagrams and several cutting and pasting operations on diagrams are introduced. One of the parameters is the area, another one is the perimeter, and there are several others. Every cutting operation reduces certain expression involving the parameters which makes the induction possible.

5.1.
A characterization of groups with word problem in NP and other time complexity classes. The fundamental result of Higman [122] gave an algebraic characterization of groups with recursively enumerable word problem (i.e. recursively enumerable set of words that are equal to 1 in the group).
Theorem 5.1 (Higman [122]). The word problem in a finitely generated group G is recursively enumerable if and only if G is a subgroup of a finitely presented group.
There are several different proofs of Theorem 5.1. See, for example, Rotman's book [207] and Manin's book [157] (where you can also find an interesting discussion of the philosophical significance of that theorem). See also 6.4 below.
After [122], there were several results showing that embedding into a finitely presented group can preserve or even improve the algorithmic properties of the group. First results have been obtained by Clapham [51] and Valiev [231] (see [189] for the history of these results): they proved that the solvability (even r.e. degree) of the word problem and the level in the polynomial hierarchy of the word problem is preserved under some versions of Higman embedding.
By Theorem 2.18 the Dehn function of a finitely presented group is recursive if and only if the group has decidable word problem. Moreover by Proposition 2.10 for every finitely presented group G with Dehn function T (n) there exists a nondeterministic Turing machine M (G) which solves the word problem in G and has time function equivalent to T (n). Roughly speaking, this Turing machine takes a word w over the generators of G and just inserts relators of G. It stops and accepts w when it gets 1. Clearly this machine solves the word problem in every finitely generated subgroup of G as well. Therefore if a finitely generated group G is a subgroup of a finitely presented group with polynomial isoperimetric function then the word problem in G is in NP (i.e. it can be solved by a non-deterministic Turing machine with polynomial time function).
The drawback is that the word problem in a group G can be easy to solve but the Dehn function of G can be huge. A typical example is the Baumslag-Solitar group BS(1, 2) = a, b | bab −1 = a 2 . This group has exponential Dehn function (see Example 2.22). But we have mentioned in 4.1.E that this group is a subgroup of GL(2, Q) so the word problem in BS(1, 2) can be solved in at most quadratic time: it is easy to see that the word problem of every finitely generated group of matrices over a field of rational numbers can be solved in at most quadratic time by a deterministic Turing machine. In fact it is possible to solve the word problem there in time n(log n) 2 (log log n) (this follows from the fact that the product of two n-digit numbers can be computed in time n log n log log n [141]).
Nevertheless, the following theorem shows that every finitely generated group G with word problem in NP can be embedded into a finitely presented group H with polynomial isoperimetric function. Thus if we can solve the word problem in G using a very clever and fast Turing machine, then we can use the simple-minded but almost as fast Turing machine M (H) to solve the word problem in G.
Theorem 5.2 (Birget, Olshanskii, Rips, Sapir [22]). Let G be a finitely generated group with word problem solvable by a non-deterministic Turing machine with time function ≤ T (n) such that T (n) 4 is superadditive (that is T (m + n) 4 ≥ T (m) 4 + T (n) 4 for every m, n). Then G can be embedded into a finitely presented group H with isoperimetric function equivalent to n 2 T (n 2 ) 4 . In particular, the word problem of a finitely generated group is in NP if and only if this group is a subgroup of a finitely presented group with polynomial isoperimetric function.
For matrix groups over rationals, Theorem 5.2 implies that every such group is embedded into a finitely presented group with Dehn function at most n 10+ε for every ε > 0. Young's result from [239] (see 4.1.E) implies that for matrix groups over Z one can reduce the exponent 10 + ε to 2.
Problem 5.3. Is it true that every finitely generated matrix group over a field (in particular, the field of rational numbers) embeds into a finitely presented group with quadratic Dehn function?
Theorem 5.2 immediately implies Corollary 5.4. The word problem of a finitely generated group G is in NP if and only if G embeds into a finitely presented group with polynomial isoperimetric function. By Fagin's Theorem 2.9 this corollary gives an algebraic characterization of groups where the word problem admits an algebraic description.
Using Corollary 5.4 (and the proof of Theorem 5.2), in order to embed a finitely generated group with word problem in NP into a finitely presented group with polynomial isoperimetric function, one needs first construct a Turing machine which solves the word problem, then convert it into an S-machine, then convert the S-machine into a group. As a result the group we construct will have a relatively complicated set of relations. In some particular cases like the free Burnside groups B m,n , the free solvable groups, etc., it is possible to modify this construction and get simple presentations of groups with polynomial isoperimetric functions where these groups embed [186]. For the Baumslag-Solitar group BS(1, 2) it was done in [186] too, and an even much simpler presentation was found by Cornulier and Tessera [59] (see also Theorem 4.1, part (2) ).
Theorem 5.2 is also interesting from the logic point of view. One can consider a group as a logical system where the defining relations are axioms, and the inference rules are constructing step by step van Kampen diagrams. The van Kampen diagrams are then proofs of their boundary labels. Then the Dehn function becomes the syntactic complexity of proofs. The computational complexity of the word problem is the complexity in the metaworld of the group theory. The embedding of groups becomes a conservative extention of theories.
With this vocabulary, the result means that the complexity of proofs in the outer world of groups (that is, in its metamathematical semantics) after appropriate conservative extension to a finitely axiomatized theory becomes the complexity of proofs in the syntactical sense (up to a polynomial correction).
In this formulation, one can ask a similar question for any logical system. Several results of this type for general logical systems can be found in [139].
By Theorem 2.27 the class of groups with polynomial isoperimetric functions is a subclass of the class of groups with simply connected asymptotic cones. Hence we formulate Problem 5.5. Does every finitely generated group with word problem in NP embed into a group all of whose asymptotic cones are simply connected? Equivalently (by Theorem 5.2): does every finitely presented group with polynomial isoperimetric function embed into a group all of whose asymptotic cones are simply connected?
Note that by Theorem 2.27, groups with simply connected asymptotic cones have linear isodiametric functions. On the other hand, for every k ≥ 4 there are groups with Dehn function n k and isodiametric function n 3 4 k by [210]. Thus if the embedding as in Problem 5.5 exists, it must distort lengths at least polynomially. The embeddings in [22] distort lengths linearly (see [22]), so a quite different construction should be used.

The space functions of Turing machines and the filling functions of groups.
The results about space functions of Turing machines and the FFFL functions of groups [183] are similar to the results about time functions and isoperimetric functions obtained in [210,22]. On the one hand the situation is simpler because it is easier to control the space function than the time function of a machine. Also the issues with superadditivity does not appear in this situation. Indeed, if we want to check if two words are equal to 1 using a Turing machine, then the time spent will be the sum of times needed to checking each word (hence we need superadditivity of the Dehn function) but the storage space is approximately the maximal of the spaces needed for each word because after we proved that one of these words is 1, we can clean the tape and start proving that the other word is equal to 1 (so superadditivity is not needed). On the other hand, the descriptions are more finalized which requires more detailed consideration of geometry of van Kampen diagrams.
If in [210,22], for example, we just cut the diagram into pieces and estimate the number and the areas of the pieces, here one needs to carefully examine the lengths of the cuts. Also working with time and Dehn functions, we could modify Turing and S-machines by adding artificial tapes which can potentially by very long, but serve as a "temporary storage space". Dealing with space and FFFL functions, we cannot do that which adds to the complexity of the situation.
Nevertheless the following complete description of FFFL functions of groups was obtained by Olshanskii.
Theorem 5.6 (Olshanskii [183]). Every space function f (n) of a Turing machine is equivalent (as in 2.2.D) to the FFFL function of some finitely presented group.
Since space functions of Turing machines can be "arbitrarily complicated" (similar to time functions), we obtain finitely presented groups with "arbitrary complicated" FFFL functions.
Also similar to the isoperimetric spectrum one can define the space spectrum of finitely presented groups as the set of all α such that n α is equivalent to the FFFL function of a finitely presented group. Since the problem with non-determinism vs determinism is not an issue, we have a complete description of the space spectrum.
Theorem 5.7 (Olshanskii [183], compare with Theorem 4.16). For a real number α ≥ 1, the function n α is equivalent to the FFFL function of a finitely presented group if and only if the nth binary approximation of α is computable by a deterministic Turing machine using space ≤ 2 2 n .
An analog of Theorem 5.2 also was obtained in [183]. Note that by Proposition 2.20, the space complexity of the word problem in a finitely presented group G does not exceed the FFFL function of G. It follows from [150], [53] that the converse statement fails. An easier example is given by the Baumslag 1-relator group from [14] G = a, b | (aba −1 )b(aba −1 ) −1 = b 2 . Its FFFL function is not bounded from above by any multi-exponential function (see Gersten [86] and Platonov [200]) while the space complexity of the word problem for G is polynomial (see A. G. Miasnikov, A. Ushakov, and Dong Wook Won [167]).
Theorem 5.8 (Olshanskii [183]). Let G be a finitely generated group such that the word problem for G is decidable by a deterministic Turing machine with space function f (n). Then G embeds into a finitely presented group with FFFL function equivalent to f (n).
Similarly to Corollary 5.4 one immediately deduces Corollary 5.9 (Olshanskii [183]). The word problem of a finitely generated group G belongs to P SP ACE if and only if G embeds into a finitely presented group with polynomial FFFL function.

5.3.
Collins' problem. The conjugacy problem turned out to be much harder to preserve under embeddings. We have already mentioned the results of Collins and Miller [55] and Gorjaga and Kirkinskiȋ [93]: even subgroups of index 2 of finitely presented groups do not inherit solvability or unsolvability of the conjugacy problem (see 2.4.B).
In 1976 D. Collins [138] posed the following question (problem 5.22): Does there exist a version of the Higman embedding theorem in which solvability of the conjugacy problem is preserved? The solution is given by the following two theorems.
Theorem 5.10 (Olshanskii, Sapir [189]). A finitely generated group H has solvable conjugacy problem if and only if it is Frattini embedded into a finitely presented group G with solvable conjugacy problem.
Theorem 5.11 (Olshanskii,Sapir [190]). Every countable recursively presented group with solvable word and power problems is embeddable into a finitely presented group with solvable conjugacy and power problem.
Recall that a subgroup H of a group G is Frattini embedded in G if every two elements of H that are conjugate in G are also conjugate inside H. It is clear that if H is Frattini embedded into G and G has solvable conjugacy problem, then H also has solvable conjugacy problem. We say that G has solvable power problem if there exists an algorithm which, given u, v in G says if v = u n for some n = 0.
Theorem 5.11 is a relatively easy application of Theorem 5.10. In the same Problem 5.22 of [138] Collins asked whether a version of Higman embedding can preserve the recursively enumerable hierarchy (the recursive sets are at the bottom of that hierarchy). The affirmative answer is also given in [189].
The construction in [189] is much more complicated than in [22] or [183]. First we embed H into a finitely presented group H 1 preserving the solvability of the word problem (say, one can use [22]). Then we use the Miller S-machine M (H 1 ) (see 6.1.A) to solve the word problem in H. In order to overcome technical difficulties, we needed certain parts of words appearing the computation to be always positive 25 . The standard positivity checkers do not work because they are S-machines as well, and can insert negative letters. So we used some ideas from the original Boone-Novikov proofs. That required introducing new generators, x-letters (in addition to the a-, q-, and θ-letters in S-machines) and Baumslag-Solitar relations (as in Example 2.22). In addition, to analyze the conjugacy problem in G, we had to consider annular diagrams which are more complicated than van Kampen disc diagrams. Different types of annular diagrams (spirals, roles, etc.) required different treatment.
We do not have any reduction of the complexity of the conjugacy problem in H to the complexity of the conjugacy problem in G. In particular, solving the conjugacy problem in G, in some cases required solving systems of equations in free groups (i.e. the Makanin-Razborov algorithm, see 2.3.E).

5.4.
A finitely presented non-amenable group without free subgroups. One of the most important applications of our versions of Higman embeddings so far was the construction of a finitely presented counterexample to the von Neumann problem, i.e. a finitely presented non-amenable group without non-Abelian free subgroups [188].

5.4.A.
Short history of the problem. Suppose that a group G acts on a metric space X by isometries. We say that the action is paradoxical if X can be decomposed as a disjoint union A 1 ⊔ . . . ⊔ A m ⊔ B 1 ⊔ . . . ⊔ B n (m, n > 1) such that for some g 1 , . . . , g m , h 1 , . . . , h n of [118] proved that the action of SO(3) on the unit sphere minus a countable set of points is paradoxical: one can subdivide the 2-sphere minus a countable set of points into 3 parts A, B, C, such that the union of any two of these parts can be obtained by rotating the third part (hence we can cut a sphere into a finite number of pieces, rotate these pieces and obtain two spheres of the same radius). Banach and Tarski [12] generalized Hausdorff's result and proved what is now known as Banach-Tarski paradox. Von Neumann [233] noticed that the cause of the paradoxes is the structure of the group G. In particular the cause of the Hausdorff paradox is that SO 3 (R) contains a non-Abelian free subgroup. He called the groups that cannot act paradoxically amenable. Von Neumann showed that the class of amenable groups contains Abelian groups, finite groups and is closed under taking subgroups, extensions, and infinite unions of increasing sequences of groups. Day [66] and Specht [223] showed that this class is closed under homomorphic images. The class of groups without non-Abelian free subgroups is also closed under these operations and contains Abelian and finite groups.
The problem of existence of non-amenable groups without non-Abelian free subgroups probably goes back to von Neumann and became known as the "von Neumann problem" in the fifties.
First counterexamples to the von Neumann problem were constructed by Olshanskii [173]. He proved that the Tarski monsters, both torsion-free and torsion (see [175]), are not amenable. Later Adian [2] showed that the non-cyclic free Burnside group of odd exponent n ≥ 665 with at least two generators, that is the group given by the presentation a 1 , . . . , a m | u n = 1, for all group words u in the alphabet {a 1 , . . . , a m } , is not amenable. Recently Lück and Osin [146] constructed the first example of residually finite non-amenable torsion group.
All these examples are not finitely presented. For the Tarski monsters and Burnside groups this is because these groups are not hyperbolic but are inductive limits of hyperbolic groups (see 3.1.J), a similar argument applies to the groups of Lück and Osin. The question about existence of finitely presented counterexample to von Neumann's problem was explicitly formulated by Grigorchuk in [138] and by J. Cohen in [54]. There exists one finitely presented group without non-Abelian free subgroups for which the problem of amenability is non-trivial: it is the R.Thompson's group F (for the definition of F look in Section 2.1.A). The question of whether F is not amenable was formulated by R. Thompson in the beginning of the 70s (unpublished), and, in print, by R. Geoghegan in 1979. A considerable amount of work has been done to answer this question but it is still open.

5.4.B.
The result. Together with A. Olshanskii, we proved the following theorem.
Theorem 5.12 (Olshanskii, Sapir [188]). For every sufficiently large odd n, there exists a finitely presented group G which satisfies the following conditions.
(1) G is an HNN-extension of a finitely generated infinite group of exponent n.
(2) G is an extension of a group of exponent n by an infinite cyclic group.
(3) G contains a subgroup isomorphic to the free Burnside group of exponent n with 2 generators. (4) G is a non-amenable finitely presented group without free non-cyclic subgroups.
By a theorem of Adian [2], part (3) implies that G is not amenable. Thus parts (1) and (3) imply part (4). 5.4.C. The proof. Let us present the main ideas of our construction. We first embed the free Burnside group B(m, n) = B of odd exponent n ≫ 1 with m > 1 generators {b 1 , . . . , b m } = B into a finitely presented group G ′ = C | R where B ⊂ C. This is done as in [22,187] using an explicitly constructed S-machine. Then we take a copy A = {a 1 , . . . , a m } of the set B, and a new generator t, and consider the group given by generators C ∪ A ∪ {t} and the following three sets of relations.
(1) the set R of the relations of the finitely presented group G ′ ; (2) (u-relations) y = u y , where u y , y ∈ C, is a certain word in A satisfying the small cancelation condition C ′ (λ) for a very small λ; (3) (HNN-relations) t −1 a i t = b i , i = 1, . . . , m; these relations make A a conjugate of its subgroup of exponent n (of course, the group A gets factorized). The resulting group G is obviously generated by the set A ∪ {t} and is an HNN-extension of its subgroup A with the stable letter t. Every element in A is a conjugate of an element of B , so A is an m-generated group of exponent n. This immediately implies that G is an extension of a group of exponent n (the union of increasing sequence of subgroups t s A t −s , s = 1, 2, . . .) by a cyclic group. It turns down that A contains a copy of the free Burnside group B(2, n). Thus the group is non-amenable by Adian [2] (see [188] and [208] for details).

Methods: S-machines and related constructions
Several theorems discussed in this survey have been proved using S-machines, a natural blend of Turing machines and groups (see Theorems 4.11,4.13,4.16,4.17,4.18,5.2,5.6,5.8,5.10,5.12). Here we shall present a short introductions to S-machines.
6.1. The definition. There are several ways to look at S-machines introduced in [210]. One can view them as a version of Turing machines (see 2.2.C), as rewriting systems (as in [210]), as groups that are certain multiple HNN-extensions of free groups (see [191]) or as semigroups of partial 1:1-transformations of the set of admissible words as in [188]. For different applications, one needs different points of view. Probably the easiest way to introduce S-machines is by defining them as multiple HNN-extension of a free group. 6.1.A. The Miller machine. Let us start with an example that we call the Miller machine. This important example is due to C. Miller [161]. Let G = X | R be a finitely presented group. The Miller machine is the group M (G) generated by X ∪ {q} ∪ {θ x | x ∈ X} ∪ {θ r | r ∈ R} subject to the following relations θx = xθ, θ x xq = qxθ x , θ r q = qrθ r where θ is any letter in Θ = {θ x | x ∈ X} ∪ {θ r | r ∈ R}. Clearly, this is an HNN-extension of the free group X, q with free letters θ ∈ Θ. The main feature of M (G) discovered by Miller is that M (G) has undecidable conjugacy problem provided G has undecidable word problem. In fact it is easy to see that qw is conjugated to q in M (G) if and only if w = 1 in G.
To see that M (G) can be viewed as a version of a Turing machine, consider any word uqv where u, v are words in X ∪ X −1 . If we conjugate uqv by θ r , we get the word uqrv because θ r q = qrθ r and θ r commutes with u and v (here and below we do not distinguish words that are freely equal). Hence conjugation by θ r amounts to executing a command [q → qr]. Similarly, conjugation by θ x amounts to executing a command [q → x −1 qx]. If u ends with x, then executing this command means moving q one letter to the left. Thus conjugating words of the form uqv by θ's and their inverses, we can move the "head" q to the left and to the right, and insert relations from R.
The work of the Miller machine M (G) can be drawn in the form of a van Kampen diagram (see Figure 12) that we call a trapezium. It consists of horizontal θ-bands. The bottom side of the boundary of the trapezium is labeled by the first word in the computation (uqv), the top side is labeled by the last word in the computation (q), the left and the right sides are labeled by the history of computation, the sequence of θ's and their inverses corresponding to the commands used in the computation uqv → . . . → q. The words written on the top and bottom sizes of the θ-bands are the intermediate words in the computation. We shall always assume that they are freely reduced. Viewed as a Turing machine, the Miller machine has one tape and one state letter.
6.1.B. The formal definition. Like Turing machine (see 2.2.C) general S-machines can have many tapes and many state letters. Here is a formal definition. Let F (Q, Y ) be the free group generated by two sets of letters Q = ⊔ N i=1 Q i and Y = ⊔ N −1 i=1 Y i where Q i are disjoint and non-empty (below we always assume that Q N +1 = Q 1 , and Y N = Y 0 = ∅).
The set Q is called the set of q-letters, the set Y is called the set of a-letters.
In order to define an HNN-extension, we consider also a collection Θ of N -tuples of θletters. Elements of Θ are called rules. The components of θ are called brothers θ 1 , . . . , θ N . We always assume that all brothers are different. We set θ N +1 = θ 1 .
With every θ ∈ Θ, we associate two sequences of elements in The words U i , V i satisfy the following restriction: (*) For every i = 1, . . . , N , the words U i and V i have the form where k i , k ′ i ∈ Q i , u i and u ′ i are words in the alphabet Y ±1 i , v i−1 and v ′ i−1 are words in the alphabet Y ±1 i−1 . Now we are ready to define an S-machine S by generators and relations. The generating set X of the S-machine S consists of all q-, a-and θ-letters. The relations are: . . , s, θ j a = aθ j for all a ∈ Y j (θ).
Every S-rule θ = [U 1 → V 1 , . . . , U s → V s ] has an inverse θ −1 = [V 1 → U 1 , . . . , V s → U s ]; we set Y i (θ −1 ) = Y i (θ). Remark 6.1. Every S-machine is indeed an HNN-extension of the free group F (Y, Q) with finitely generated associated subgroups. The free letters are θ 1 for every θ ∈ Θ. We leave it as an exercise to find the associated subgroups.
6.1.C. Turing machines as S-machines. Every Turing machine T (see the definition in 2.2.C) can be considered as an S-machine S ′ (T ) in the natural way: the generators of the free group are all tape letters and all state letters. The commands of the Turing machine are interpreted as rules of the S-machine. The main problem in that conversion is the following: there is a much bigger freedom in applying S-rules than in executing the corresponding commands of the Turing machine. Indeed, a Turing machine is in general not symmetric (i.e. if [U → V ] is a command of the Turing machine then [V → U ] is usually not) while every S-machine is symmetric. Another -more important -difference is that Turing machines work only with positive words, and S-machines work with arbitrary group words. This, for example, means that the rules [aqb → cqd] and [q → a −1 cqdb −1 ] are equivalent (the corresponding group presentations are equivalent). Hence an S-machine is blind: it decides which rule to apply based solely on the state letters of the configuration, it does not "see" the content of the tape (unlike the Turing machine which takes into account the tape letters observed by the head). Hence the language accepted by S ′ (T ) is usually much bigger than the language accepted by T .
Nevertheless, it can be proved that if T is symmetric, and a computation w 1 → w 2 → . . . of the S-machine S ′ (T ) involves only positive words, then that is a computation of T .
This leads to the following idea of converting any Turing machine T to an S-machine S(T ). First we construct a symmetric Turing machine T ′ that is equivalent to T (recognizes the same language). That is a fairly standard Computer Science trick (see [210]).
The second step is to compose the S-machine S ′ (T ′ ) with a machine that checks positivity of a word. That machine starts working after every step of S ′ (T ′ ). That is if an application of a rule of S ′ (T ′ ) gives a non-positive (reduced) word, then the checking machine does not allow the machine S ′ (T ′ ) to proceed to the next step.
There are several checking machines. One of them -the adding machine -is very simple but its time function is exponential (see [191]). Another one is very complicated but it has a quadratic time function (it was first constructed in [210] and used in [210,22]).
6.2. The conjugacy problem. If a Turing machine M has undecidable halting problem, then the S-machine S(M ′ ) (where M ′ is a symmetric Turing machine that is equivalent to M ) has undecidable conjugacy problem [210]. It is easy to see that the Dehn function of any S-machine is at most cubic. The adding machine used for checking positivity of admissible words exponentially slows down the machine. A nice consequence is that the Dehn function of the resulting S-machine is n 2 log n [191] which is the minimal possible for multiple HNN-extensions of free groups having undecidable conjugacy problem by 4.2.H. Note that although the upper bound of n 2 log n is easy to anticipate, a proof of it involves very non-trivial combinatorics of van Kampen diagrams (some Vassiliev-type invariants of chord diagrams are used, see [191]).
6.3. The word problem. To obtain a group with undecidable word problem and, more generally, with a prescribed Dehn function, we do the following. Take an S-machine S, with many tapes. In order to make the number of tapes large enough, just glue the copies of the same S-machine side-by-side so that the trapezia of the new S-machine are glued from copies of trapezia of the old S-machine as in 4.2.G. Let h be the accept word of the Smachine (all tapes are empty, all state letters are in the accept state). Let the presentation of the group G(S) be obtained from the presentation of S by adding one hub relation h = 1. Now if W is an admissible word accepted by S, we can take the corresponding trapezium ∆, identify the left and right sides of it (labeled by the history of computation) to obtain an annulus with W on the outside boundary component and h on the inside boundary component, then glue the cell corresponding to the hub relation to the inside boundary component to obtain a disc ∆ ′ . This van Kampen diagram shows that if W is accepted by S, it is equal to 1 in G(S). The fact that if W = 1 in G(S) then W is accepted by S is proved by some small cancelation argument using the fact that the number of tapes is large. Thus, for example, if S has undecidable halting problem, then G(S) has undecidable word problem. Proving that the Dehn function of G polynomial in terms of the time function of S is much harder [210] and requires the snowman decomposition discussed in 4.6.A. 6.4. The embedding. Let now P = X | R be a finitely generated group with recursively enumerable set of relators R. Take an S-machine S that recognizes the language R. We can assume that S has sufficiently many tapes, and in tape 1 (between the state letters k 1 , k 2 ) we have the input word in the input configuration. Consider now another S-machine S ′ which is a copy of S (with copies of the state letters) but it does not do anything in the input sector, so during the work of S ′ the input sector is always empty. We can also assume that the state letters of the input configurations in S, S ′ do not appear in the middle of (reduced) computations, the input state letters of S and S ′ coincide, all other state letters are different. Now let H be the amalgamated product of G(S) and G(S ′ ) with the amalgamated subgroup generated by the input state letters and all tape letters. Let W (u) be the input configuration of S corresponding to the input word u. It has subword u between state letters k 1 and k 2 . Let W (u) ′ be the word W (u) with this occurrence of u deleted. Then W (u) = 1 in G(S)(and thus u ∈ R) and W ′ (u) = 1 in G(S ′ ). Hence u = 1 in H. Therefore every u ∈ R is equal to 1 in H and there exists a homomorphism from P to H. It can be proved that this homomorphism is injective. This proves the Higman embedding theorem. Proving that H has isoperimetric function that polynomially depends on the time function of S is of course more complicated (see [187]). Again it involves cutting diagrams into pieces which are treated separately.

Open problems
Here we collect the open problems mentioned in the text. It may be more convenient for the reader to have all the problems in one place.  [83] where the word problem in every finitely presented residually finite group belongs? An even more bold question: is the word problem of every finitely presented residually finite group in NP? Problem 7.2 (D. Cohen [52], also attributed to Stallings, see 2.4.C). Is it true that for every finitely presented group G there exists a constant a > 1 such that f (n) ≤ a g(n) where f, g are, respectively, the Dehn function and the isodiametric function of G.  Can Remark 4.9 be applied to prove that automatic groups have decidable conjugacy problem? Problem 7.9 (See 5.1). Is it true that every finitely generated matrix group over a field (in particular, the field of rational numbers) embeds into a finitely presented group with quadratic Dehn function? Problem 7.10 (See 5.1). Does every finitely generated group with word problem in NP embed into a group all of whose asymptotic cones are simply connected? Equivalently (by Theorem 5.2): does every finitely presented group with polynomial isoperimetric function embed into a group all of whose asymptotic cones are simply connected?
About the list of references. After each reference in the bibliography we put numbers of sections where we cite this reference. Each of the section numbers is the highest section number preceding the reference. Say, Section number 4.1 means that the reference appears in Section 4.1, before 4.