Parallel algorithms for power circuits and the word problem of the Baumslag group

Power circuits have been introduced in 2012 by Myasnikov, Ushakov and Won as a data structure for non-elementarily compressed integers supporting the arithmetic operations addition and (x,y)↦x·2y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x,y) \mapsto x\cdot 2^y$$\end{document}. The same authors applied power circuits to give a polynomial time solution to the word problem of the Baumslag group, which has a non-elementary Dehn function. In this work, we examine power circuits and the word problem of the Baumslag group under parallel complexity aspects. In particular, we establish that the word problem of the Baumslag group can be solved in NC—\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textemdash$$\end{document}even though one of the essential steps is to compare two integers given by power circuits and this, in general, is shown to be P-complete. The key observation is that the depth of the occurring power circuits is logarithmic and such power circuits can be compared in NC.


Introduction
The word problem of a finitely generated group G is as follows: does a given word over the generators of G represent the identity of G? It was first studied by Dehn as one of the basic algorithmic problems in group theory [10].Already in the 1950s, Novikov and Boone succeeded to construct finitely presented groups with an undecidable word problem [7,32].Nevertheless, many natural classes of groups have an (efficiently) decidable word problemmost prominently the class of linear groups (groups embeddable into a matrix group over some field): their word problem is in LOGSPACE [21,37] -hence, in particular, in NC, i.e., decidable by Boolean circuits of polynomial size and polylogarithmic depth (or, equivalently decidable in polylogarithmic time using polynomially many processors).
There are various other results on word problems of groups in small parallel complexity classes defined by circuits.For example the word problems of solvable linear groups are even in TC 0 (constant depth with threshold gates) [18] and the word problems of Baumslag-Solitar groups and of right-angled Artin groups are AC 0 -Turing-reducible to the word problem of a non-abelian free group [41,17].Moreover, Thompson's groups are co-context-free [20] and hyperbolic groups have word problem in LOGCFL [22].All these complexity classes are contained within NC.On the other hand, there are also finitely presented groups with a decidable word problem but with arbitrarily high complexity [35].
A mysterious class of groups under this point of view are one-relator groups, i.e. groups that can be written as a free group modulo a normal subgroup generated by a single element (relator).Magnus [25] showed that one-relator groups have a decidable word problem; his algorithm is called the Magnus breakdown procedure (see also [24,26]).Nevertheless, the complexity remains an open problem -although it is not even clear whether the word problems of one-relator groups are solvable in elementary time, in [5] the question is raised whether they are actually decidable in polynomial time.
In 1969 Gilbert Baumslag defined the group G 1,2 = a, b | bab −1 a = a 2 bab −1 as an example of a one-relator group which enjoys certain remarkable properties.It is infinite and non-abelian, but all its finite quotients are cyclic and, thus, it is not residually finite [6].Moreover, Gersten showed that the Dehn function of G 1,2 is non-elementary [14] and Platonov [33] made this more precise by proving that it is (roughly) τ (log n) where τ (0) = 1 and τ (i + 1) = 2 τ (i) for i ≥ 0 is the tower function (note that he calls the group Baumslag-Gersten group).Since the Dehn function gives an upper bound on the complexity of the word problem, the Baumslag group was a candidate for a group with a very difficult word problem.Indeed, when applying the Magnus breakdown procedure to an input word of length n, one obtains as intermediate results words of the form v x1 1 • • • v xm m where v i ∈ {a, b, bab −1 }, x i ∈ Z, and m ≤ n.The issue is that the x i might grow up to τ (log n); hence, this algorithm has non-elementary running time.However, as foreseen by the above-mentioned conjecture, Myasnikov, Ushakov and Won succeeded to show that the word problem of G 1,2 is, indeed, decidable in polynomial time [29].Their crucial contribution was to introduce so-called power circuits in [30] for compressing the x i in the description above.
Roughly speaking, a power circuit is a directed acyclic graph (a dag) where the edges are labelled by ±1.One can define an evaluation of a vertex P as two raised to the power of the (signed) sum of the successors of P .Note that this way the value τ (n) of the tower function can be represented by an n + 1-vertex power circuit -thus, power circuits allow for a non-elementary compression.The crucial feature for the application to the Baumslag group is that power circuits not only efficiently support the operations +, −, and (x, y) → x • 2 y , but also the test whether x = y or x < y for two integers represented by power circuits can be done in polynomial time.The main technical part of the comparison algorithm is the so-called reduction process, which computes a certain normal form for power circuits.
Based on these striking results, Diekert, Laun and Ushakov [11] improved the algorithm for power circuit reduction and managed to decrease the running time for the word problem of the Baumslag group from O(n 7 ) down to O(n 3 ).They also describe a polynomial-time algorithm for the word problem of the famous Higman group H 4 [15].In [31] these algorithms have been implemented in C++.Subsequently, more applications of power circuits to these groups emerged: in [19] a polynomial time solution to the word problem in generalized Baumslag and Higman groups is given, in [12] the conjugacy problem of the Baumslag group is shown to be strongly generically in P and in [3] the same is done for the conjugacy problem of the Higman group.Here "generically" roughly means that the algorithm works for most inputs (for details on the concept of generic complexity, see [16]).
Other examples where compression techniques lead to efficient algorithms in group theory can be found e.g. in [13] or [23,Theorems 4.6,4.8 and 4.9].Finally, notice that in [28] the word search problem for the Baumslag group has been examined using parametrized complexity.
Contribution.The aim of this work is to analyze power circuits and the word problem of the Baumslag group under the view of parallel (circuit) complexity.For doing so, we first examine so-called compact representations of integers and show that ordinary binary representations can be converted into compact representations by constant depth circuits (i.e., in AC 0 -see Section 3).We apply this result in the power circuit reduction process, which is the main technical contribution of this paper.While [30,11] give only polynomial time algorithms, we present a more refined method and analyze it in terms of parametrized circuit complexity.The parameter here is the depth D of the power circuit.More precisely, we present threshold circuits of depth O(D) for power circuit reduction -implying our first main result: Proposition A. The problem of comparing two integers given by power circuits of logarithmic depth is in TC 1 (decidable by logarithmic-depth, polynomial-size threshold circuits).
We then analyze the word problem of the Baumslag group carefully.A crucial step is to show that all appearing power circuits have logarithmic depth.Using Proposition A we succeed to describe a TC 1 algorithm for computing the Britton reduction of uv given that u and v are already Britton-reduced (Britton reductions are the basic step in the Magnus breakdown procedure -see Section 5 for a definition).This leads to the following result: Theorem B. The word problem of the Baumslag group G 1,2 is in TC 2 .
In the final part of the paper we prove lower bounds on comparison in power circuits, and thus, on power circuit reduction.In particular, this emphasizes the relevance of Proposition A and shows that our parametrized analysis of power circuit reduction is essentially the best one can hope for.Moreover, Theorem C highlights the importance of the logarithmic depth bound for the power circuits appearing during the proof of Theorem B.
Theorem C. The problem of comparing two integers given by power circuits is P-complete.
Power circuits can be seen in the broader context of arithmetic circuits and arithmetic complexity.Thus, results on power circuits also give further insight into these arithmetic circuits.Notice that the corresponding logic over natural numbers with addition and 2 x has been shown to be decidable by Semënov [36].In Proposition 30 we show that, indeed, for every power circuit with a marking M there is an arithmetic circuit of polynomial size with +-, −-, and 2 x -gates evaluating to the same number and vice-versa.Moreover, the transformation between these two models can be done efficiently.
This work is the full and extended version of the conference publication [27].Besides giving full proofs of all results, here we explore the connections between power circuits and arithmetic circuits with +-, −-, and 2 x -gates and in Theorem 53 we give a refined variant of Theorem C which also yields hardness results for power circuits of logarithmic depth.

Notation and preliminaries
General notions.We use standard O-notation for functions from N to non-negative reals R ≥0 , see e.g.[9].Throughout, the logarithm log is with respect to base two.The tower function τ : N → N is defined by τ (0) = 1 and τ (i + 1) = 2 τ (i) for i ≥ 0. It is primitive recursive, but τ (6) written in binary cannot be stored in the memory of any conceivable real-world computer.Moreover, we set log We denote the support of a function f : Let Σ be a set.The set of all words over Σ is denoted by Σ * = n∈N Σ n .The length of a word w ∈ Σ * is denoted by |w|.A dag is a directed acyclic graph.For a dag Γ we write depth(Γ) for its depth, which is the length (number of edges) of a longest path in Γ.

Complexity
We assume the reader to be familiar with the complexity classes LOGSPACE and P (polynomial time); see e.g.[2] for details.Most of the time, however, we use circuit complexity within NC.
Throughout, we assume that languages L (resp.inputs to functions f ) are encoded over the binary alphabet {0, 1}.Let k ∈ N. A language L (resp.function f ) is in AC k if there is a family of polynomial-size Boolean circuits of depth O(log k n) (where n is the input length) deciding L (resp.computing f ).More precisely, a Boolean circuit is a dag (directed acyclic graph) where the vertices are either input gates x 1 , . . ., x n , or Not, And, or Or gates.There are one or more designated output gates (for computing functions there is more than one output gate -in this case they are numbered from 1 to m).All gates may have unbounded fan-in (i.e., there is no bound on the number of incoming wires).A language L ⊆ {0, 1} * belongs to AC k if there exist a family (C n ) n∈N of Boolean circuits such that x ∈ L ∩ {0, 1} n if and only if the output gate of C n evaluates to 1 when assigning x = x 1 • • • x n to the input gates.Moreover, C n may contain at most n O (1) gates and have depth O(log k n).Here, the depth of a circuit is the length of the longest path from an input gate to an output gate.Likewise AC k -computable functions are defined.
The class TC k is defined analogously with the difference that also Majority gates are allowed (a Majority gate outputs 1 if its input contains more 1s than 0s).Moreover, NC = k≥0 TC k = k≥0 AC k .For more details on circuits we refer to [39].Our algorithms (or circuits) rely on two basic building blocks which can be done in TC 0 : Example 1. Iterated addition is the following problem: n numbers A1, . . ., An each having n bits Output: This is well-known to be in TC 0 .Example 2. Let (k 1 , v 1 ), . . ., (k n , v n ) be a list of n key-value pairs (k i , v i ) equipped with a total order on the keys k i such that it can be decided in TC 0 whether k i < k j .Then the problem of sorting the list according to the keys is in TC 0 : the desired output is a list (k π (1) , v π(1) ), . . ., (k π(n) , v π(n) ) for some permutation π such that k π(i) ≤ k π(j) for all i < j.
We briefly describe a circuit family to do so: The first layer compares all pairs of keys k i , k j in parallel.For all i and j the next layer computes a Boolean value P (i, j) which is true if and only if |{ | k < k i }| = j.The latter is computed by iterated addition.As a final step the j-th output pair is set to (k i , v i ) if and only if P (i, j) is true.
Remark 3. The class NC is contained in P if we consider uniform circuits.A family of circuits is called LOGSPACE-uniform (or simply uniform) if the function 1 n → C n is computable in LOGSPACE (where 1 n is the string consisting of n ones and C n is given as some reasonable encoding).Be aware that for classes below LOGSPACE usually even stronger uniformity conditions are imposed.In order not to overload the presentation, throughout, we state all our results in the non-uniform case -all uniformity considerations are left to the reader.

Parametrized circuit complexity.
In our work we also need some parametrized version of the classes TC k , which we call depth-parametrized TC k .Let par : {0, 1} * → N (called the parameter).Consider a family of circuits (C n,D ) n,D∈N such that C n,D contains at most n O (1) gates (independently of D) 1 and has depth O(D • log k n).A language L is said to be accepted by this circuit family if for all n and D and all x ∈ {0, 1} n with par(x) ≤ D we have x ∈ L if and only if C n,D evaluates to 1 on input x.Similarly, f : {0, 1} * → {0, 1} * is computed by (C n,D ) n,D∈N if for all n and D and all x ∈ {0, 1} n with par(x) ≤ D the circuit C n,D evaluates to f (x) on input x.We define DepParaTC k as the class of languages (resp.functions) for which there are such parametrizations par : {0, 1} * → N and families of circuits (C n,D ) n,D≥0 .Note that this is not a standard definition -but it perfectly fits our purposes.Proof.Let w ∈ {0, 1} n be some input.First decide whether par(w) ≤ log n (by the hypothesis this is in TC k+ ).If yes, the circuit C n,C• log n can be used to decide whether w ∈ L. Clearly, the combined circuit has polynomial size.Its depth is O(log Hence, we have obtained a TC k+ circuit.
We introduce this parametrized TC k classes because later for computing reduced power circuits we apply a non-constant number of TC 0 computations f one after each other.The number of these computations is the depth of the power circuit.The crucial step is to show that after any number of applications of f , the output is still polynomially bounded.Putting things together, we obtain a DepParaTC 0 computation parametrized by the depth of the power circuit.Let us formalize this idea: Denote the i-fold composition of f by f (i) (i.e., f (0) is the identity function and In order to allow circuits to compute functions having outputs of different lengths for inputs of the same length, we can assume that each output gate also carries an enable bit (or equivalently we can think that there is an additional padding symbol in the output alphabet).Lemma 5. Let f : {0, 1} * → {0, 1} * be TC k -computable such that for all x ∈ {0, 1} * there is some ω x ≤ |x| with f (ωx) (x) = f (ωx+1) (x).Further, assume that there is some polynomial p such that for all x ∈ {0, 1} * and for all i ∈ N we have f (i) (x) ≤ p(|x|).
Proof.Let (C n ) n∈N be the family of TC k circuits computing f .We construct a new family of circuits (C n,ω ) n,ω∈N .Let Cm be a circuit consisting of C i for all i ∈ [0 .. m] in parallel.We can compose Cp(n) • C n by feeding the outputs of C n into the C i (as part of Cp(n) ) with the appropriate number of input bits.By iterating this, we obtain a circuit Cp(n) . By the hypothesis of the lemma, we can assume ω ≤ n, so this circuit contains at most n
We have ε(Λ P ) = log 2 (ε(P )), i.e., the marking Λ P plays the role of a logarithm.Note that leaves (nodes of out-degree 0) evaluate to 1 and every node evaluates to a positive real number.However, we are only interested in the case that all nodes evaluate to integers: Definition 6.A power circuit is a pair (Γ, δ) with δ : Γ×Γ → { − 1, 0, +1} such that (Γ, σ(δ)) is a dag and all nodes evaluate to some positive natural number in 2 N .
The size of a power circuit is the number of nodes |Γ|.By abuse of language, we also simply call Γ a power circuit and suppress δ whenever it is clear.If M is a marking on Γ and S ⊆ Γ, we write M | S for the restriction of M to S. Let (Γ , δ ) be a power circuit, Γ ⊆ Γ , δ = δ | Γ×Γ , and δ | Γ×(Γ \Γ) = 0. Then (Γ, δ) itself is a power circuit.We call it a sub-power circuit and denote this by (Γ, δ) ≤ (Γ , δ ) or, if δ is clear, by Γ ≤ Γ .
If M is a marking on S ⊆ Γ, we extend M to Γ by setting M (P ) = 0 for P ∈ Γ \ S. With this convention, every marking on Γ also can be seen as a marking on Γ if Γ ≤ Γ .Example 7. A power circuit of size n + 1 can realize τ (n) since a directed path of n + 1 nodes represents τ (n) as the evaluation of the last node.The following power circuit realizes τ (5) using 6 nodes: ε(P ) Example 8. We can represent every integer in the range [−2 n − 1, 2 n − 1] as the evaluation of some marking in a power circuit with node set {P 0 , . . ., P n−1 } with ε(P i ) = 2 i for i ∈ [n].Thus, we can convert the binary notation of an n-bit integer into a power circuit with n vertices, O(n log n) edges (each successor marking requires at most log n + 1 edges) and depth at most log * n.For an example of a marking representing the integer 23, see Figure 1.
A reduced power circuit of size n is a power circuit (Γ, δ) with Γ given as a sorted list Γ = (P 0 , . . ., P n−1 ) such that all successor markings are compact and ε(P i ) < ε(P j ) whenever i < j.In particular, all nodes have pairwise distinct evaluations.
It turns out to be crucial that the nodes in Γ are sorted by their values.Still, sometimes it is convenient to treat Γ as a set -we write P ∈ Γ or S ⊆ Γ with the obvious meaning.Whenever convenient we assume that ε(Λ Pi ) = ∞ for i ≥ n.
Notice that in [11] the data structure for a reduced power circuit also contains a bit-vector indicating which nodes have successor markings differing by one -we will compute this information on-the-fly whenever needed.For more details on power circuits see [11,30].

Compact signed-digit representations
In this section we will show that for every binary number we can efficiently calculate a so-called unique compact representation.This will be crucial tool for the power circuit reduction process.

Definition 11. (i)
(i.e., no two successive digits are non-zero).A non-negative binary number is the special case of a signed-digit representation where all b i are 0 or 1 (note that, in general, they are not compact).Also negative binary numbers can be seen as special cases of signed-digit representations -though the precise form depends on the representation: A negative number given as two's complement is a signed-digit representation where the most-significant digit is a −1 and the other non-zero digits are 1s; a negative signed magnitude representation can be viewed as signed-digit representation where all non-zero digits are −1s.In particular, every integer k can be represented as a signed-digit representation.However, in general, a signed-digit representation for an integer k is not unique.Still, we will prove in this section that each integer k, indeed, has a unique compact signed-digit representation (see also [30]).
Note that by setting b i = 0 for i ≥ m, one can extend every signed-digit representation B = (b 0 , . . ., b m−1 ) to an arbitrarily long or infinite sequence.By doing so, val(B) and the digit-length of B do not change.
Computing compact signed-digit representations.In the following we will, amongst other things, show that for every binary number A there exists such a compact signed digit representation B of A and that B is unique with this property.We start with the existence and the complexity of calculating B. While in [30, Section 2.1] a linear-time algorithm for calculating B has been given, we aim for optimizing the parallel complexity.
Theorem 12.The following is in AC 0 :

Output:
A compact signed-digit representation of A.
Notice that Theorem 12 implies that every integer has a compact signed-digit representation.Moreover, be aware that, clearly, the theorem is only true if we choose suitable encodings -in particular, we assume that the three values −1, 0, 1 are all encoded using two bits.
Proof.Let A = (a 0 , . . ., a m−1 ) be a binary number.For i ≥ m we set a i = 0. We view the a i as Boolean variables and aim for constructing (almost) Boolean formulas for the compact representation.Since the digits of a compact representation are from the set {−1, 0, 1}, we treat the Boolean values 0, 1 as a subset of the integers and we will mix Boolean operations (∧, ∨, ⊕) with arithmetic operations (+, •).Here ⊕ denotes the exclusive or, which is addition modulo two.
For i ≥ 0 we define Remark 13.It is clear that the c i can be computed in AC 0 and so the same holds for the b i .This implies that on input of A = (a 0 , . . ., a m−1 ), one can compute B in AC 0 .By the very definition as a Boolean formula, it is clear, that it is actually in uniform AC 0 (see Remark 3).Thus, in order to prove Theorem 12, it remains to show that B = (b 0 , . . ., b m ) is compact and that val(B) = val(A).Claim 14.The c i satisfy the following recurrence: Proof.For i = 0, the claim holds because the empty disjunction is equal to 0. Now we assume that i ≥ 1 and that the recurrence holds for i − 1.We set X j = a j ∧ a j−1 and Y j = a j ∨ a j−1 .Then we obtain This proves the claim.Claim 15.Let a i , b i and c i be as above.Then for all k ≥ 0 we have Proof.By Claim 14 we have c k+1 = (a k+1 ∧ a k ) ∨ (c k ∧ (a k+1 ∨ a k )).Thus, we can express both b k and c k+1 in terms of a k , a k+1 and c k .This leads us to the following table: If we now take the values in the table as integer values and put them into the above equation, we see that the equation holds in all cases.
Claim 16.Let a i , b i and c i be as above.Then for all k ≥ 0 we have Proof.We use induction on k.Since c 0 = 0 we have a 0 = 2 • c 1 + b 0 by Claim 15.Therefore, the equation holds for k = 0. Now let k ≥ 0. Then we obtain k+1 i=0 This proves the claim.Proof.We have to make sure that there is no i ∈ [m] such that b i = 0 and b i+1 = 0.In order to do so, we express b i and b i+1 in terms of a i , a i+1 and c i .Notice that b i+1 is not fully determined by a i , a i+1 and c i .Still these three values tell us whether b i+1 is zero or not.This leads us to the following table, which shows that B is, indeed, compact: Now we are ready to finish the proof of Theorem 12. Let A = (a 0 , . . ., a m−1 ) and B = (b 0 , . . ., b m−1 , b m ) as above.By Claim 17, B is compact.Moreover, we have Therefore, B is a compact signed-digit representation for A as claimed in Theorem 12.By Remark 13, it can be computed in AC 0 .

Uniqueness of compact signed-digit representations.
The following lemmas are crucial tools both for proving uniqueness of compact representations and for the power circuit reduction process, which we describe later.In [30, Section 2.1] similar statements can be found.

Lemma 18.
Let A be a compact signed-digit representation and let B = (b 0 , . . ., b n−1 ) be a compact signed-digit representation of digit-length n such that b i = n − i mod 2 (i.e., b n−1 = 1 and then B alternates between 0 and 1).Then we have Proof.First, we want to calculate val(B).If n is even, then showing that in any case val(B) = 2 n+1
In order to see (ii), we denote A = (a 0 , . . ., a n−1 ).If val(A) ≤ 0, then clearly val(A) ≤ val(B).Hence, assume that the digit-length of A is at most n and consider the following operations: , then set a i = 1 and a i−1 = 0 (technically, this rule subsumes the previous rule).

If a
Let A be the number we obtained after applying at least one of the above operations to A (if this is possible).Then A is also a compact signed-digit representation, the digit-length of A is at most n, and val(A) < val(A ).Moreover, if A = B, then we always can apply one of these rules.This shows that val(A) ≤ val(B).
On the other hand, assume that the digit-length of A is m with m ≥ n + 1.First, assume that a m−1 = 1 and set A = (a 0 , . . ., a m−3 ).Then, since A is compact, we have a m−2 = 0 and, hence, val(A) = 2 m−1 + val(A ).By the previous implication and part (i), we know Proof.Notice that (i) is an immediate consequence of (ii).In order to see (ii), observe that it suffices to show only one implication.Let A = (a 0 , . . ., a i0 ) and B = (b 0 , . . ., b i0 ) and assume that 0 = a i0 < b i0 = 1 (the cases involving the value −1 follow with the same argument).Now, A and B are compact signed-digit representations, so by Lemma 18, From this lemma together with Theorem 12 it follows that each k ∈ Z can be uniquely represented by a compact signed digit representation CR(k).Likewise for a signed digit representation A, we write CR(A) for its compact signed digit representation.

Corollary 20.
The following problems are in AC 0 : (ii) Input:

Signed-digit representations A and B. Output:
The compact signed-digit representation of val(A) + val(B).
Proof.Given a signed-digit representation A = (a 0 , . . ., a m−1 ), we can split it into two non-negative binary numbers B, C such that val(A) = val(B) − val(C) (i.e., b i = max {0, a i } and c i = − min {0, a i }).From these binary numbers we can compute the difference in AC 0 and then make the result compact using Theorem 12.To see (ii), we proceed exactly the same way.For comparing two signed-digit representations, we compute their compact representations using part (i) and then compare them in AC 0 by evaluating the condition in Lemma 19.

4
Operations on power circuits

Basic operations
Before we consider the computation of reduced power circuits, which is our main result in this section, let us introduce some more notation on power circuits and recall the basic operations from [30,11] under circuit complexity aspects.
As we do for Γ, we treat a chain both as a sorted list and as a set.(ii) We call a chain C maximal if it cannot be extended in either direction.We denote the set of all maximal chains by C Γ .As a set, a reduced power circuit is the disjoint union of its maximal chains.(iii) Let M be a marking in the reduced power circuit (Γ, δ) and let C = (P i , . . ., P i+ −1 ) ∈ C Γ and define a j = M (P i+j ) for i ∈ [ ]. Then we write digit C (M ) = (a 0 , . . ., a −1 ).(iv) There is a unique maximal chain C 0 containing the node P 0 of value 1.We call C 0 the initial maximal chain of Γ and denote it by C 0 = C 0 (Γ).
For an example of a power circuit with three maximal chains, see Figure 2.
This power circuit is an example for a reduced power circuit with three maximal chains: The first one consists of the nodes of values 1, 2, 4, 8, the next one is formed by the nodes of values 2 8 and 2 9 and the node of value 2 2 9  is a maximal chain of length 1.
We will show how to computationally find the maximal chains in Corollary 27.The following facts are clear from the definition of maximal chains: Fact 22.Let (Γ, δ) be a reduced power circuit and let M be a marking on Γ.Then the following holds: Lemma 23.Let (Γ, δ) be a reduced power circuit.Let L and M be compact markings in Γ . Therefore, by the assumption However, this follows immediately from the fact that This finishes the proof of the lemma.

Comparison of markings.
Lemma 24.Given a reduced power circuit (Γ, δ) and a node P ∈ Γ one can decide in AC 0 whether P ∈ C 0 .Remark 25.Since membership in AC 0 often highly depends on the encoding of the input, in the following we always assume that power circuits are given in a suitable way.In particular, we may assume that an n-node power circuit is given by the n × n matrix representing δ where each entry from {0, ±1} is encoded using two bits.Moreover, in order to represent power circuits with fewer nodes within the same data structure, we can allow one deleted bit for every row and column of the matrix.Markings can be encoded the same way by a sequence of n symbols from {0, ±1}.Moreover, if the power circuit is reduced, we also assume that the matrix representing δ is already in the sorted order (in particular, the ordering is not given by some separate data structure).
In the following, we do not further consider these encoding issues.Moreover, as soon as we are dealing with TC 0 circuits, there is a lot of freedom how to encode inputs.
Proof of Lemma 24.Let Γ = (P 0 , . . ., P n−1 ).For each i we define a signed-digit representation A i = (a i,0 , . . ., a i,n−1 ) by a i,j = Λ Pi (P j ).These signed-digit representations might not be compact, but, if P i ∈ C 0 , then A i is compact (this is because, by Remark 10, P i has only successors in C 0 ).Using Corollary 20, we can compute the maximal i max such that A imax is compact and for all i < i max also A i is compact and val(A i+1 ) = val(A i ) + 1 = i + 1 (checking whether A i is compact, clearly, can be done in AC 0 ).By a straightforward induction, we obtain that for all i ≤ i max we have val(A i ) = ε(Λ Pi ) and P i ∈ C 0 .On the other hand, clearly, P imax+1 ∈ C 0 .Hence, we have computed C 0 .
Proof.Let us choose ≤ as (the other cases follow from this case in a straightforward way).
Let Γ = (P 0 , . . ., P n−1 ).By Lemma 19 (a) we can check in . Now, i 0 can be found in AC 0 and, hence, the whole check is in AC 0 .This proves part (a).
For part (b) we first check whether The markings M | Γ\C0 and L| Γ\C0 are still compact markings in a reduced power circuit, and so we are able to decide in AC 0 if that equality holds by part (a).So it remains to check if ε(L| C0 ) ≤ ε(M | C0 ) + k.This amounts to an addition and a comparison of signed-digit representations of digit-length at most |C 0 | + 1 (according to Lemma 18), which both can be done in AC 0 (see Corollary 20).Thus, ε(L) ε(M ) + k can be checked in AC 0 .Corollary 27.We can decide in AC 0 , given a reduced power circuit (Γ, δ) and nodes P, Q ∈ Γ, whether P and Q belong to the same maximal chain of Γ.
Proof.Let P = P i and Q = P j with i < j.Then P and Q belong to the same maximal chain if and only if ε(Λ P +1 ) = ε(Λ P ) + 1 for all ∈ [i .. j − 1].The latter can be checked in AC 0 using Proposition 26.

Calculations with markings.
Lemma 28.The following problems are all in TC 0 : (a) Input: A power circuit (Π, δ Π ) together with markings K and L. Output:

(c) Input:
A power circuit (Π, δ Π ) together with markings K and L such that ε(L) ≥ 0. Output: The proof of this lemma uses the following construction (see also [11]): Definition 29.Let (Π, δ) be a power circuit and let M be a marking on Π.(a) Let P ∈ Π.We define a new power circuit Π ∪ {Clone(P )} where Clone(P ) is a new node with Λ Clone(P ) = Λ P .(b) We define a marking Clone(M ) as follows: First we clone all the nodes in σ(M ).
It is clear that the problem, given a power circuit (Π, δ) and a marking M , compute a new power circuit (Π , δ ) containing Clone(M ) is in TC 0 -and even in AC 0 when defining the underlying data structure properly.Notice that |Π | ≤ 2 • |Π| and depth(Π ) = depth(Π).
Part (a): First, we clone the marking K leading to a power circuit (Π , δ Π ) of size at most 2 • |Π|.Now Clone(K) and L certainly have disjoint supports.Then we define Clearly, ε(M ) = ε(K) + ε(L), and M can be output in TC 0 .To show (b), we set As for (a), to define M (P ) we only have to look up L(P ) and change the sign and we do not have to create any new nodes or edges -so this can be done even in AC 0 .To (c): To obtain M , we follow a similar approach as described in [11, Section 2].We first clone the markings K and L and so obtain markings Clone(K) and Clone(L).At this point, the size of Π increased by a factor of at most three.
Next we create new edges from every node P ∈ σ(Clone(K)) to every node Q ∈ σ(Clone(L)) such that δ(P, Q) = Clone(L)(Q).This operation does not change the size of the power circuit, but it increases the depth by at most 1 since there are no incoming edges to nodes in σ(Clone(K).Then the marking Clone(K) is the marking we search for.
Notice that the construction in (c) also yields ε(M ) = ε(K) • 2 ε(L) in the case that ε(L) < 0. However, then the resulting graph might not be a power circuit anymore since it might have nodes of non-integral evaluation.Note that [11] is not very precise here: it is actually not sufficient that ε(K) • 2 ε(L) ∈ Z in order to assure that there are no nodes of non-integral evaluation.

Relation to arithmetic circuits with + and 2 x gates.
Before we proceed to the power circuit reduction, our main result on power circuits, let us elaborate on the relation of power circuits to more general arithmetic circuits.A (constant) (0, +, −, 2 x )-circuit is a dag where each node is either a 0-(i.e., a constant-), +-, −-or a 2 x -gate.0-gates have zero inputs, +-gates two and −-and 2 x -gates have one input.There is one designated output gate.The evaluation eval(C) of such a circuit C is defined in a straightforward way (as a real number -in general, it might not be an integer).The 2 x -depth of a circuit C denoted by depth 2 x (C) is the maximal number of 2 x -gates on any path in the circuit.
If the input of every 2 x -gate of C is non-negative, then (Π, δ) is, indeed, a power circuiti.e., all nodes evaluate to positive integers.
Proof.In order to transform (Π, δ) with a marking M into a (0, +, −, 2 x )-circuit C, we proceed as follows: we create one 0-gate; for every leaf (node of out-degree zero) of Π we create a 2 x -gate with input coming from the 0-gate, for every other node of Π, we create a 2 x -gate whose input we describe next.For every marking (both M and the successor markings Λ P ) we create a tree of +-gates (possibly with some −-gates) of logarithmic depth where the leaves correspond to some of the (already created) 2 x -or 0-gates and the last +-gate (i.e., the root) evaluates to ε(M ) (resp.ε(Λ p )). Now, the 2 x -gate corresponding to a node P ∈ Π receives its input from the +-gate corresponding to Λ P .It is straightforward that this construction can be done in TC 0 .
Clearly, this process introduces at most one +-gate and one −-gate for every pair in the support of δ and every node in the support of M .So we have at most 2 |σ(δ)| + 2 |Π| many + and −-gates.Since there is one 0-gate and |Π| many 2 x -gates, the total number of gates is at most 2 |σ(δ)| + 3 |Π| + 1.It is clear that depth 2 x (C) = depth(Π) + 1 (note that the depth increases by one because leaves of Π are replaced by 2 x -gates with input from the 0-gate).Moreover, the depth of each +-tree is bounded by log(|Π|) ; introducing the −-gates and connecting to the 2 x -gates increases the depth further by 2 (note that for the 2 x -gates with single input from a 0-gate this is a huge over-estimate).Since depth 2 x (C) = depth(Π) + 1 and we have one additional +-tree for the marking M , the total depth is at most (depth(Π) + 2) • ( log(|Π|) + 2).Now consider a (0, +, −, 2 x )-circuit C with n gates.Let h 1 , . . ., h k be the 2 x -gates of C. As a first step, we replace each 2 x -gate h i by an "input" gate X i and cut its incoming wire.Thus, we obtain an arithmetic circuit over Z with +-and −-gates.Each +-or −-gate g computes a linear combination k i=1 a g,i X i with a g,i ∈ Z.By [38,Theorem 21], the a g,i can be computed in GapL, and hence, in NC 2 ([1, Theorem 4.1]).Notice that |a g,i | < 2 n for all g and i.
Now, to construct the power circuit (Π, δ), we proceed as follows: we start with n singleton nodes.For each 2 x -gate h j in C we construct nodes Q j,0 , . . ., Q j,n−1 .The aim is to define Λ Q j, such that ε(Q j, ) = eval(h j ) • 2 ; in particular, ε(Q j,0 ) = eval(h j ) and C j = (Q j,0 , . . ., Q j,n−1 ) is a chain (by a slight abuse of the notation of Definition 21 since now the power circuit is not reduced).Let h j be some 2 x -gate and g the gate from where h j receives its input.If g is a 0-gate, we define a j,i = 0 for all i ∈ [1 .. k]; if g is a 2 x -gate h m , we set a j,m = 1 and a j,i = 0 for all i = m.Otherwise, g is a + or −-gate.In this case we define a j,i = a g,i where a g,i ∈ Z is as above.Then for all ∈ [n] we define Λ Q j, on each chain C i such that digit Ci (Λ Q j, ) is the binary representation of a j,i (notice that a j,i requires only n bits and all the chains C i for different i are disjoint, so this is well-defined).Moreover, we add a + edge from Q j, to many of the singleton nodes.We do this for all 2 x -gates in parallel.By induction we see that, indeed, ε(Q j, ) = eval(h j ) • 2 .
If the output gate of C is a 2 x -gate h j , we obtain a marking evaluating to the same value by simply marking Q j,0 with one; if the output gate is a +-or −-gate, we obtain a corresponding marking in the same fashion as for the Λ Qj,0 described above.
Clearly, the whole computation also can be done in NC 2 .The bound |Π| ≤ n 2 + n is straightforward: we introduced at most n singleton nodes and then for every of the at most n 2 x -gates we introduced n additional nodes.The bound on the depth is because we can have an edge from Q j, to Q j , only if there is a path from h j to h j in C. Adding the edges from Q j, to the singleton nodes only increases its depth if the depth without these edges was zero, i.e., if h j is a 2 x -gate whose input is a sum of 0-gates.However, we counted the depth of such 2 x -gates already as one -so also in this case the depth does not increase.

Power circuit reduction
While compact markings on a reduced power circuit yield unique representations of integers, in an arbitrary power circuit (Π, δ Π ) we can have two markings L and M such that L = M but ε(L) = ε(M ).Therefore, given an arbitrary power circuit, we wish to produce a reduced power circuit for comparing markings.This is done by the following theorem, which is our main technical result on power circuits.Theorem 31.The following is in DepParaTC 0 parametrized by depth(Π): A power circuit (Π, δ Π ) together with a marking M on Π.
For a power circuit (Π, δ Π ) with a marking M we call the power circuit (Γ, δ) together with the marking M obtained by Theorem 31 the reduced form of Π.The proof of Theorem 31 consists of several steps, which we introduce on the next pages.The high-level idea is as follows: Like in [30,11], we keep the invariant that there is an already reduced part and a non-reduced part (initially the non-reduced part is Π).The main difference is that in one iteration we insert all the nodes of the non-reduced part that have only successors in the reduced part into the reduced part.Each iteration can be done in TC 0 ; after depth(Π) + 1 iterations we obtain a reduced power circuit.
Insertion of new nodes.The following procedure, called InsertNodes, is a basic tool for the reduction process.Let (Γ, δ) be a reduced power circuit and I be a set of nodes with Γ ∩ I = ∅.Assume that for every P ∈ I there exists a marking Λ P : Γ → {−1, 0, 1} satisfying: ε(Λ P ) ≥ 0 for all P ∈ I, Λ P is compact for all P ∈ I, and We wish to add I to the reduced power circuit (Γ, δ).For this, we set Γ = Γ ∪ I and define δ : Γ × Γ → {−1, 0, 1} in the obvious way: δ | Γ×Γ = δ, δ | Γ ×I = 0 and δ (P, Q) = Λ P (Q) for (P, Q) ∈ I × Γ.Now, (Γ , δ ) is a power circuit with (Γ, δ) ≤ (Γ , δ ) and for every P ∈ I the map Λ P is the successor marking of P .Moreover, each node of Γ has a unique value.In order to obtain a reduced power circuit, we need to sort the nodes in Γ according to their values: Since for every node P ∈ Γ the marking Λ P is a compact marking on the reduced power circuit Γ, by Proposition 26, for P, Q ∈ Γ we are able to decide in AC 0 whether ε(Λ Q ) ≤ ε(Λ P ).Therefore, by Example 2 we can sort Γ according to the values of the nodes in TC 0 and, hence, assume that Γ = (P 0 , . . ., P |Γ |−1 ) is in increasing order.
Observe A power circuit (Γ, δ) and a set I with the properties described above.Output: A reduced power circuit (Γ , δ ) such that (Γ, δ) ≤ (Γ , δ ) and such that for every P ∈ I there is a node Q in Γ with ΛQ = ΛP .In addition, The three steps of the reduction process.The reduction process for a power circuit (Π, δ Π ) with a marking M consists of several iterations.Each iteration starts with a power circuit The aim of one iteration is to integrate the vertices Min(Ξ i ) ⊆ Ξ i into Γ i where Min(Ξ i ) is defined by and to update the marking M i accordingly.Each iteration consists of the three steps UpdateNodes, ExtendChains, and UpdateMarkings, which can be done in TC 0 .We have Ξ i+1 = Ξ i \ Min(Ξ i ).Thus, the full reduction process consists of depth(Π) + 1 many TC 0 computations.
Let us now describe these three steps in detail and also show that they can be done in TC 0 .After that we present the full algorithm for power circuit reduction.

Lemma 33 (UpdateNodes). The following problem is in TC 0 :
Input: A power circuit (Γ ∪ Ξ, δ) as above.Output: for every node Q ∈ Min(Ξ) there exists a node P ∈ Γ with ε(P For the proof, we define the following equivalence relation ∼ ε on Γ ∪ Min(Ξ): For P ∈ Γ ∪ Min(Ξ) we write [P ] ε for the equivalence class containing P .
Proof.Consider the equivalence relation ∼ ε as defined above on Γ ∪ Min(Ξ).Define a set I ⊆ Min(Ξ) by taking one representative of each ∼ ε -class not containing a node of Γ.Such a set I can be computed in TC 0 : Clearly, Min(Ξ) can be computed in TC 0 .The ∼ ε -classes can be computed in AC 0 by Proposition 26.Finally, for defining I one has to pick representatives.For example, for every ∼ ε -class which does not contain a node of Γ one can pick the first node in the input which belongs to this class.These representatives also can be found in TC 0 .Now, we can apply Lemma 32 to insert I into Γ in TC 0 .This yields our power circuit (Γ , δ ).The size bounds follow now immediately from those in Lemma 32 (notice that |I| ≤ |Min(Ξ)|).

Lemma 34 (ExtendChains).
The following problem is in TC 0 : 3 (where, as before, C0 = C0(Γ ) is the initial maximal chain of Γ ) Output: A reduced power circuit (Γ , δ ) such that (Γ , δ ) ≤ (Γ , δ ), and Proof.First assume that |C 0 | = 1.Then |Γ | = 1 and µ ≤ 1.If µ = 1, then just one node has to be created, namely the one of value 2 and we are done.Thus, in the following we can assume that |C 0 | ≥ 2. Now, the proof of Lemma 34 consists of two steps: first, we extend only the chain C 0 to some longer (and long enough) chain in order to make sure that the values of the (compact) successor markings of the nodes we wish to introduce can be represented within the power circuit; only afterwards we add the new nodes as described in the lemma.
Step 1: We first want to extend the chain C 0 to the chain C0 of minimal length such that C0 is a maximal chain, C 0 ⊆ C0 , and the last node of C0 is not already present in Γ .The resulting power circuit will be denoted by Γ.We define Here, we use the convention that P |Γ | has value infinity, so i 0 indeed exists.Furthermore, we define Thus, in order to obtain Γ, we need to insert a new node between P i and P i+1 into Γ for each i ∈ I (resp.one node above P i0 ).Since the successor markings of these new nodes might point to some of the other new nodes, we cannot apply Lemma 32 as a black-box.Instead, we need to take some more care: the rough idea is that, first, we compute all positions I where new nodes need to be introduced (I is as defined above), then we compute compact signed-digit representations for the respective successor markings, and, finally, we introduce these new nodes all at once knowing that all nodes where the successor markings point to are also introduced at the same time.In order to map the positions of nodes in Γ to positions of nodes in Γ, we introduce a function λ : Observe that λ(i) = i for i ∈ [|C 0 |], and λ(i + 1) = λ(i) + 2 for i ∈ I, and λ(j) = j + |I| for j ≥ i 0 + 1.
For every i ∈ I we define the successor marking of the node Be aware that, since Q i ∈ C0 , also the successor marking of Q i (of value ε(Λ Pi ) + 1) can be represented using only the nodes from C0 (see Remark 10), so this is, indeed, a meaningful definition (be aware that to represent ε(Λ Pi ) + 1, we might need some of the additional nodes Q i , but never a node that is not part of the chain C0 ).Clearly, this yields ε(Λ Qi ) = ε(Λ Pi ) + 1 as desired.We obtain a reduced power circuit ( Γ, δ) with (Γ , δ ) ≤ ( Γ, δ) where the map δ : Γ → {−1, 0, 1} is defined by the successor markings.Moreover, C0 ⊆ Γ has the required properties.
It remains to show that Γ can be computed in TC 0 : As |C 0 | ≥ 2, according to Proposition 26, we are able to decide in AC 0 whether the markings Λ Pi and Λ Pi+1 differ by 1, 2, or more than 2 -for all i ∈ [|Γ |] in parallel.Now, i 0 can be determined in TC 0 via its definition as above.Likewise I and the function λ can be computed in TC 0 .By Corollary 20, CR (ε(Λ Pi ) + 1) for i ∈ I can be computed in AC 0 (since showing that altogether Γ can be computed in TC 0 . Step 2: The second step is to add nodes above each chain of Γ as required in the Lemma.The outcome will be denoted by (Γ , δ ).We start by defining In order to obtain (Γ , δ ) from ( Γ, δ), for every i ∈ [| Γ|] and every h ∈ [1 .. d i ] we have to insert a node R (i,h) such that Observe that the numbers d i can be computed in TC 0 : since by Proposition 26, we can check in AC 0 whether ε(Λ If the respective inequality holds, we obtain by Lemma 23 that ε(Λ For the latter we have signed-digit representations of digit-length at most | C0 |.Hence, this difference can be computed in TC 0 .Since P| C0|−1 ∈ Γ and in Step 1 we have not introduced any vertex above P| C0|−1 , we know that P| C0|−1 is not marked by Λ P for any P ∈ Γ.Therefore, for all i ∈ [| Γ|] we have and, hence, by Lemma 18, ε(Λ Pi | C0 ) + h can be represented as a compact marking using only nodes from C0 for every h ∈ [1 .. d i ].Thus, for every d i = 0 and every h ∈ [1 .. d i ] we define a successor marking of According to Lemma 32 we are able to construct in TC 0 a reduced power circuit (Γ , δ ) such that ( Γ, δ) ≤ (Γ , δ ) and such that for each R ∈ I there exists a node Considering the size of Γ , observe that during the whole construction, for every node P i ∈ Γ we create at most µ new nodes between P i and P i+1 .
Moreover, we only create new nodes between P i and P i+1 if P i is the last node of a maximal chain of Γ .Furthermore, notice that the only node of Γ above which we have introduced new nodes in both Step 1 and Step 2 is the second largest node of C0 : in Step 1 we have created one new node and in Step 2 we have created at most µ − 1 new nodes above it.Thus, for every chain of Γ we have introduced at most µ new nodes.Thus, Finally, the new nodes we create only prolongate the already existing chains, so we do not create any new chains.This finishes the proof of the lemma.
In the following, (Γ , δ ) denotes the power circuit obtained by UpdateNodes when starting with (Γ ∪ Ξ, δ), and (Γ , δ ) denotes the power circuit obtained by ExtendChains with µ = log(|Min(Ξ)|) + 1 on input of the power circuit (Γ , δ ) (observe that, by the assumption |C 0 (Γ)| ≥ log(|Ξ|) + 1, the condition on µ in Lemma 34 is satisfied).The value of µ is chosen to make sure that in the following lemma one can make the markings compact.Indeed, if Min(Ξ) = {P 1 , . . ., P k } and all P i have the same evaluation and are marked with 1 by M , then we might need a node of value 2 µ • ε(P 1 ) in order to make M compact.

Lemma 35 (UpdateMarkings).
The following problem is in TC 0 : The power circuit (Γ , δ ) as a result of ExtendChains with µ = log(|Min(Ξ)|) + 1 and a marking M on Γ ∪ Ξ. Output: Proof.Consider again the equivalence relation ∼ ε as defined above on Γ ∪ Min(Ξ).For the equivalence class of a node P ∈ Γ ∪ Min(Ξ) we write [P ] ε .We will define the marking M on Γ by defining it on each maximal chain.Recall that we can view M as a marking on Γ ∪ Ξ by defining M (P ) = 0 if P ∈ Γ ∪ Ξ.Let C = (P i , . . ., P i+h−1 ) ∈ C Γ be a maximal chain of length h and let We wish to find a compact marking MC with support contained in C ⊆ Γ and evaluation ε( MC ) = ε(M | S ).First define the integer Then we have Thus, defining MC by digit C ( MC ) = CR(Z M,C ) gives our desired marking.However, be aware that, for this, we have to show that the digit-length of CR(Z M,C ) is at most |C| = h.Let k be maximal such that P i+k ∈ Γ .Then, in particular, no node in S with higher evaluation than P i+k is marked by M .Moreover, by the properties of Thus, by Lemma 18, the digit-length of CR(Z M,C ) is at most k + log(|Min(Ξ)|) + 2 ≤ h.By Corollary 27, the maximal chains can be determined in TC 0 .Now, for every maximal chain C the (binary) number Z M,C can be computed in TC 0 using iterated addition and made be compact in AC 0 using Theorem 12. Thus, the marking MC can be computed in TC 0 .The marking M as desired in the lemma is simply defined by all the markings MC can be computed in parallel.
Proof of Theorem 31.Now we are ready to describe the full reduction process based on the three steps described above.We aim for a DepParaTC 0 circuit where the input is parametrized by the depth of the power circuit.The input is some arbitrary power circuit (Π, δ Π ) together with a marking M on Π.We start with some initial reduced power circuit (Γ 0 , δ 0 ) and some non-reduced part Ξ 0 = Π and successively apply the three steps to obtain power circuits (Γ i ∪ Ξ i , δ i ) and markings M i for i = 0, 1 . . .while keeping the following invariants: We first construct the initial reduced power circuit (Γ 0 , δ0 ) which consists exactly of a chain of length = log(|Π|) + 1.This can be done as follows: Let Γ 0 = (P 0 , . . ., P −1 ) = C 0 and define successor markings by digit C0 (Λ Pi ) = CR(i) for i ∈ [ ].This defines δ0 .Now we set Ξ 0 = Π and we define δ 0 : (Γ 0 ∪ Ξ 0 ) × (Γ 0 ∪ Ξ 0 ) → {−1, 0, 1} by δ 0 | Γ0×Γ0 = δ0 , δ 0 | Ξ0×Ξ0 = δ Π and δ = 0 otherwise.We extend the marking M to Γ 0 by setting M (P ) = 0 for all P ∈ Γ 0 .So we obtain a power circuit of the form (Γ 0 ∪ Ξ 0 , δ 0 ) with the properties described above.Now let the power circuit (Γ i ∪ Ξ i , δ i ) together with the marking M i be the input for the i + 1-th iteration meeting the above described invariants.We write δi = δ i | Γi×Γi .Now we apply the three steps from above: 1.Using UpdateNodes (Lemma 33) we compute a reduced power circuit (Γ i , δ i ) with (Γ i , δi ) ≤ (Γ i , δ i ) such that for every P ∈ Min(Ξ i ) there is some Q ∈ Γ i with ε(Q) = ε(P ).

Using
3 in Lemma 34 is satisfied.
The result of this step is denoted by (Γ i , δ i ). 3. We apply UpdateMarkings (Lemma 35) to obtain markings M i and ΛP for P ∈ Observe that these markings restricted to Γ i are compact.

Each iteration ends by setting Γ
Finally, δ i+1 is defined as δ i on Γ i+1 and via the successor markings ΛP for P ∈ Ξ i+1 .
After exactly depth(Π) + 1 iterations we reach Ξ d+1 = Ξ d \ Min(Ξ d ) = ∅ where d = depth(Π).In this case we do not change the resulting power circuit any further.It is clear from Lemma 33, Lemma 34 and Lemma 35 that throughout the above-mentioned invariants are maintained.Thus, (Γ, δ) = (Γ d+1 , δ d+1 ) is a reduced power circuit and for every node Again by Lemma 33 and Lemma 34 we have (by Lemma 34) (by ( 1)) Since |Γ 0 | = log(|Π|) + 1, we obtain by induction that Let D ∈ N and assume that depth(Π) ≤ D. By Lemma 33, Lemma 34 and Lemma 35 each iteration of the three steps above can be done in TC 0 .Notice here that the construction of the markings M i and ΛP during UpdateMarkings can be done in parallel -so it is in TC 0 , although Lemma 35 is stated only for a single marking.Now, the crucial observation is that, due to Claim 36, the input size for each iteration is polynomial in the original input size of (Π, δ Π ).Therefore, we can compose the individual iterations and obtain a circuit of polynomial size and depth bounded by O(D) as described in Lemma 5. Thus, we have described a DepParaTC 0 circuit (parametrized by depth(Π)) for the problem of computing a reduced form for (Π, δ Π ).This completes the proof of Theorem 31.
Remark 37. (1) While Theorem 31 is only stated for one input marking, the construction works within the same complexity bounds for any number of markings on (Π, δ Π ) since during UpdateMarkings these all can be updated in parallel.(2) Moreover, note that for every maximal chain C ∈ C Γ there exists a node Q ∈ Π (i.e., in the original power circuit) such that ε(Q) = ε(start(C)).This is because new chains are only created during UpdateNodes, the other steps only extend already existing chains.(3) Further observe that |σ( M )| ≤ |σ(M )|.Looking at the construction of M we see that we first make sure that M does not mark two nodes of the same value, then we make the marking compact.Both operations do not increase the number of nodes in the support of the marking.
Example 38.In Figure 3 we illustrate what happens in the steps UpdateNodes, ExtendChains and UpdateMarkings during the reduction process.Picture a) shows our starting situation.In b) we already inserted the nodes of value 2 3 and 2 32 into the reduced part.Now the reduced part consists of three chains: one starting at the node of value 1 and the nodes 2 5 and 2 32 as chains of length 1.Because |Min(Ξ)| = 3, we have to extend each chain by three nodes or until two chains merge.So in c) we obtain two chains, one from 1 to 2 8 and the one from 2 32 to 2 35 .In d) we then updated the markings and discarded the nodes from Min(Ξ).
Example 39.In Figure 4 we give an example of the complete power circuit reduction process by showing the result after each iteration.We start with a non-reduced power circuit of depth 2 in a).This power circuit has size 5, so we first construct the starting chain of length 4 in b).Part c) and d) show the result after inserting layer 0 and layer 1, respectively.In e) we finally inserted all layers and thus have constructed the reduced power circuit.
For comparing two markings L and M on an arbitrary power circuit, we can proceed as follows: first compute the difference (Lemma 28), then reduce the power circuit (Theorem 31) and, finally, compare the resulting compact marking with zero (Proposition 26).This shows: The following is in DepParaTC 0 parametrized by depth(Π):    Remark 41.By Corollary 40 comparing two numbers m 1 and m 2 represented by a (0, +, −, 2 x )-circuit C can be done in DepParaTC 0 parametrized by depth 2 x (C) + log 2 |C| if the input of every 2 x -gate of C is non-negative: by Proposition 30 we can find a power circuit (Π, δ) with depth(Π) ≤ depth 2 x (C) and markings M 1 and M 2 evaluating to m 1 and m 2 in NC 2 ⊆ TC 2 .It remains to compare ε(M 1 ) and ε(M 2 ).

Operations with floating point numbers
In the following, we want to represent a number r ∈ Z[1/2] using markings in a power circuit.For this, we use a floating point representation.Observe that for each such r ∈ Z[1/2] \ {0} there exist unique u, e ∈ Z with u odd such that r = u • 2 e .Lemma 42.The following problem is in DepParaTC 0 parametrized by depth(Π): In addition, Proof.First note that we are searching for a marking representing the maximal e ∈ Z with 2 e | ε(M ).For finding e, we need the compact representation of M .Therefore, we construct the reduced form (Γ, δ) of Π and a compact marking M on Γ such that ε( M ) = ε(M ).According to Theorem 31 this is possible in DepParaTC 0 .Now we have We assume that the Q i are ordered according to their value, i.e., ε(Q i ) < ε(Q j ) for i < j.Hence, e = ε(Λ Q1 ).
Before we can define the markings U and E, we have to introduce some new nodes.First we add log(|Γ|) new nodes to Π each of value 1 (i.e., with empty successor marking).Then for each j ∈ [0 .. log(|Γ|) ] we create a node of value 2 j and depth 1 in the following way:  . . .
The complete process of power circuit reduction -inserting layer after layer.For an explanation of the colors, see Figure 3.
the successor marking of such a node marks exactly j nodes of value 1 with +1 and all the other nodes with 0.
In order to define U , we aim for adding a node We proceed as follows: For each i ∈ [1 .. k], let C i ∈ C Γ denote the maximal chain to which Q i belongs.Note that for different i these chains could be equal.By Remark 37, we know that there exist nodes R To find the nodes R i , we can for example remember the equivalence classes we obtain during the reduction process.Now there exist We can find m i as the difference of the indices of Q i and start(C i ) in the sorted order of Γ, and so we can find all the m i in AC 0 .Note that the binary representation of m i uses at most log(|Γ|) + 1 bits.We define markings M i on the newly defined nodes of depth 1 using the binary representation of m i such that ε(M i ) = m i for i ∈ [1 .. k].Now we are ready to define the marking E by E = Λ R1 + M 1 .We now want to define a marking U , with ε(U . Because E and M i could have supports with non-trivial intersection (as well as E and Λ Ri ), we have to clone the nodes in σ(E) for the addition.Then the marking Regarding the size of Π, observe that to define the markings M i , we insert 2 Considering the depth, when inserting the new nodes of depth 1, the depth only increases if depth(Π) = 0.When inserting a node S i the depth increases only if depth(Π) ≤ 1. Definition 43.A power circuit representation of r ∈ Z[1/2] consists of a power circuit (Π, δ Π ) together with a pair of markings (U, E) on Π such that ε(U ) is either zero or odd and

Lemma 44. (a)
The following problems are in TC 0 :

Input:
A power circuit representation for r ∈ Z[1/2] over a power circuit (Π, δ Π ) and a marking M on Π.

Input:
Power circuit representations for r, s ∈ Z[1/2] over a power circuit (Π, δ Π ) such that r s is a power of two.

Output:
A marking L in a power circuit ( Π, δΠ) such that ε(L) = log( r s ).

(b)
The following problems are in DepParaTC 0 parametrized by depth(Π): Input: A power circuit (Π, δ Π ) and a marking M on Π.

Output:
A power circuit representation of r + s over a power circuit ( Π, δΠ).
Proof.During the whole proof, let U, V, E, F be markings in Π such that ε(U (M ) .According to Lemma 28 point (a), the marking E + M can be obtained in TC 0 as marking in a power circuit ( Π, δ Π) that satisfies all the required properties.The computation of −r is clear by Lemma 28.

Part (a):
If r s is a power of two, we know that ε(U ) = ε(V ), and so r s = 2 ε(E)−ε(F ) .Now again Lemma 28 finishes the proof of part (a).

Part (b):
The first point is due to Lemma 42.For the addition, first observe that We can decide in DepParaTC 0 whether ε(E) ≤ ε(F ) using Corollary 40.W. l.o. g. let ε(E) ≤ ε(F ) (otherwise we switch the roles of r and s).Next, we construct a marking To decide if r 0, we just have to check if ε(U ) 0. According to Corollary 40 this is possible in DepParaTC 0 .To decide if r ∈ Z, since ε(U ) is odd, we just need to decide if ε(E) ≥ 0. Again, this can be done using Corollary 40.In the affirmative case, we just have to apply Lemma 28 point (c) to produce the desired output.The word problem of the Baumslag group Before we start solving the word problem of the Baumslag group, let us fix our notation from group theory.

Group presentations.
A group G is finitely generated if there is some finite set Σ and a surjective monoid homomorphism η : Σ * → G (called a presentation).Usually, we do not write the homomorphism η and treat words over Σ both as words and as their images under η.We write v = G w with the meaning that η(v) = η(w).If Σ = S ∪ S −1 where S −1 is some disjoint set of formal inverses and R ⊆ Σ * × Σ * is some set of relations, we write Σ | R for the group Σ * /C(R) where C(R) is the congruence generated by R together with the relations aa The word problem for a fixed group G with presentation η : Σ * → G is as follows: A word w ∈ Σ * Question: Is w =G 1?
For further background on group theory, we refer to [24].
The Baumslag-Solitar group.The Baumslag-Solitar group is defined by We have BS 1,2 ∼ = Z[1/2] Z via the isomorphism a → (1, 0) and t → (0, 1).Recall that Indeed, due to bab −1 = t, we can remove t and we obtain exactly the presentation a, b | bab −1 a = a 2 bab −1 .Moreover, BS 1,2 is a subgroup of G 1,2 via the canonical embedding and we have b(q, 0)b −1 = (0, q), so a conjugation by b "flips" the two components of the semidirect product if possible.Henceforth, we will use the alphabet Σ = {1, a, a −1 , t, t −1 , b, b −1 } to represent elements of G 1,2 (the letter 1 represents the group identity; it is there for padding reasons).
Britton reductions.Britton reductions are a standard way to solve the word problem in HNN extensions.Here we define them for the special case of G 1,2 .Let with β i ∈ b, b −1 and (s i , n i ) ∈ BS 1,2 for all i (i.e., w does not have two successive letters from BS 1,2 ) and there is no factor of the form b(q, 0)b −1 or b −1 (0, k)b with q, k ∈ Z.If w is not Britton-reduced, one can apply one of the rules in order to obtain a shorter word representing the same group element.The following lemma is well-known (see also [24, Section IV.2]).
Example 46.Define words w 0 = t and w n+1 = b w n a w −1 n b −1 for n ≥ 0. Then we have n) .While the length of the word w n is only exponential in n, the length of its Britton-reduced form is τ (n).

Conditions for Britton reductions
The idea to obtain a parallel algorithm for the word problem is to compute a Britton reduction of uv given that both u and v are Britton-reduced.For this, we have to find a maximal suffix of u which cancels with a prefix of v.The following lemma is our main tool for finding the longest canceling suffix.It is important to note that for all suffixes the conditions can be checked in parallel.
Then w ∈ BS 1,2 if and only if the respective condition in the following table is satisfied.Moreover, if w ∈ BS 1,2 , then w = G1,2 ŵ according to the last column of the table.
Notice that in the case β 1 = b −1 and β 2 = b, we have r = 0 and s = 0.
Finally, if Let us fix the following notation for elements v, w ∈ G 1,2 written as words over ∆: with (r j , m j ), (s j , n j ) ∈ Z[1/2] Z and β j , βj ∈ b, b −1 .We define Notice that as an immediate consequence of Britton's Lemma we obtain that, if u and v as in (3) are Britton-reduced and uv[i, i] ∈ BS 1,2 for some i, then also uv[j, j] ∈ BS 1,2 for all j ≤ i.Moreover, uv is Britton-reduced if and only if For ∈ N let X denote some set of variables.Denote by PowExp(X ) the set of expressions which can be made up from the variables X using the operations +, −, (r, s) → r • 2 s if s ∈ Z (and undefined otherwise), and (r, s) → log(r/s) if log(r/s) ∈ Z (and undefined otherwise).
Lemma 49.For every β ∈ {b, b −1 , ⊥} 4 there are expressions θ β , ξ β , ϕ β , ψ β ∈ PowExp(X 12 ) such that the following holds: Let u, v ∈ G 1,2 as in (3) be Britton-reduced and assume that uv Be aware that here we have to read the set V i of cardinality (at most) 12 as assignment to the variables X 12 .In particular, given that uv[i − 1, i − 1] ∈ BS 1,2 , one can decide whether uv[i, i] ∈ BS 1,2 by looking at only constantly many letters of uv -this is the crucial observation we shall be using for describing an NC algorithm for the word problem of G 1,2 (see Lemma 50 below).
Proof.W. l.o. g. i ≥ 4. We follow the approach of Example 48.By assumption we know that there exist q, k ∈ Z such that uv[i − 1, i − 1] = G1,2 (q, k) ∈ BS 1,2 .According to the conditions in Lemma 47, to show Lemma 49 it suffices to find expressions ϕ β (V i ), ψ β (V i ) for q and k respectively.If (β i−1 , β i−2 ) = (b, b −1 ), this follows directly from the rightmost column in Lemma 47.Otherwise, we know that (β i−2 , β i−3 ) = (b, b −1 ) and so we obtain the expressions for q and k by applying Lemma 47 to uv ).This proves the lemma.

The algorithm
A power circuit representation of u ∈ G 1,2 written as in (3) consists of the sequence B = (β h , . . ., β 1 ) and a power circuit (Π, δ Π ) with markings U i , E i , M i for i ∈ [0 .. h] such that (U i , E i ) is a power circuit representation of r i (see Definition 43) and m i = ε(M i ).
Lemma 50.The following problem is in DepParaTC 0 parametrized by max i depth(Π i ): Input: Britton-reduced power circuit representations of u, v ∈ G1,2 over power circuits Π1, Π2.

Output:
A Britton-reduced power circuit representation of w ∈ G1,2 over a power circuit Π such that w =G Before showing Theorem B, we prove the following slightly more general result.Recall that Σ = {1, a, a −1 , t, t −1 , b, b −1 }.
Theorem 51.The following problem is in TC 2 :

Input:
A word w ∈ Σ * .Output: A power circuit representation for a Britton-reduced word w red ∈ ∆ * such that w =G 1,2 w red and the underlying power circuit has depth O(log |w|).
Proof.Let w = w 1 • • • w n with w j ∈ Σ be some input.Since we can pad with the letter 1, we can assume n = 2 m for m ∈ N. The idea for the proof is simple: First, we transform each letter w j into a power circuit representation.After that, the first layer computes the Britton reduction of two-letter words using Lemma 50, the next layer takes always two of these Britton-reduced words and joins them to a new Britton-reduced word and so on.After m = log n layers we have obtained a single Britton-reduced word.By the bound in Lemma 50, the size of the resulting power circuits stays polynomial in n and their depth in O(log n).In particular, each application of Lemma 50 is in TC 1 and, hence, the whole computation is in TC 2 .Let us detail this high-level description a bit further: For j ∈ [1 .
. n] we set w j = w (1) j .Now for each word w

Hardness of comparison in power circuits
The main result of this section is to show how Boolean circuits can be simulated by power circuits.This leads to P-completeness of the comparison problem for power circuits (Theorem C).In this section we consider functions computable in DLOGTIME-uniform AC 0 .The reader unfamiliar with the precise definitions might simply think of LOGSPACE-computable.We start by introducing some normalization steps for Boolean circuits.For a Boolean circuit C with input gates x 1 , . . ., x n and some a ∈ {0, 1} n we write eval a (C) for the evaluation of the output gate of C when assigning a to the inputs.
Elimination of And-gates.By de Morgan's rule we can simulate each And-gate by a circuit of depth 3 using an Or-gate and Not-gates.So for each AC-circuit C of depth D there is an equivalent Boolean circuit C of depth at most 3D using only Or and Not-gates such that eval a (C) = eval a (C ) for all a ∈ {0, 1}.Moreover, the circuit C clearly can be computed in DLOGTIME-uniform AC 0 .Layered circuits.A circuit is called layered if we can assign a level number to each gate such that input gates are on level 0 and gates on level k only receive inputs from level k − 1.
Given an arbitrary AC-circuit C of depth D, we can construct a layered AC-circuit C of depth D as follows: We first make D + 1 copies of all gates of C numbered from 0 to D. The input gates of C are the input gates in copy 0. Then we introduce wires between the gates as in the original circuit, but only between copy i and copy i + 1.Moreover, for k ≥ 1 we replace an input gate in copy k by a fan-in one Or-gate which receives its input from the corresponding input gate in copy k − 1.The output gate of C is the output gate in copy D. So we obtain a layered AC-circuit of depth D and size (D + 1) • |C|.Because the paths that connect input gates with output gates are the same in both circuits, the new circuit evaluates to 1 if and only if this is the case for C.
Notice that we also can perform this construction if D is not the exact depth but only an upper bound.Moreover, if D is given in the input, the construction can be computed in DLOGTIME-uniform AC 0 .Also note that if C uses only Or and Not-gates, then also the layered circuit will only use those gates.
Theorem 53.Let C be a layered AC-circuit made of unbounded fan-in Or-gates and Not-gates of size L and depth D and input gates x 1 , . . ., x n .There exists a power circuit (Γ, δ) with special vertices V 1 , . . ., V n and , A, and B satisfying the following properties: For a ∈ {0, 1} n we define a graph (Γ, δ a ) where δ a (V i , ) = a i and on all other nodes δ a agrees with δ.Then for all a ∈ {0, 1} n we have (Γ, δ a ) is a power circuit, depth(Γ, δ a ) = 2D + + 1 and |Γ| ≤ 3(L + D) + + 3, P = Q ⇒ ε(P ) = ε(Q) for all nodes P, Q ∈ Γ, ε(A) ≤ ε(B) if and only if eval a (C) = 1.Moreover, given C, the power circuit (Γ, δ) can be computed in LOGSPACE.
Note that if each gate actually "knows" its layer (i.e., if the layer number is part of the gate number), (Γ, δ) can be computed even in DLOGTIME-uniform AC 0 .
Corollary 54.Let k ≥ 1.The following problem is in TC k and it is hard for AC k under AC 0 -Turing reductions: For every input gate x i we create a node V i with δ(V i , X i ) = δ(V i , T −1 ) = 1.Notice that 2 L ≤ 2 τ (log * L) = τ ( − 2), so X i and T −1 are guaranteed to be distinct nodes and, thus, δ is well-defined.We write P gi as an alias for V i .Note that by the definition of δ a in the theorem we have δ a (V i , ) = a i .For every Or-gate g i with incoming edges from gates u 1 , . . ., u h we create nodes P gi , Q gi ∈ Γ and set δ(P gi , Q gi ) = 1 and δ(Q gi , P uj ) = 1 for 1 ≤ j ≤ h.In addition we set δ(Q gi , X i ) = δ(P gi , X i ) = 1.
For every Not-gate g i on level k with incoming edge from gate u, we create a node P gi ∈ Γ and set δ(P gi , R k ) = δ(P gi , S k ) = 1, and δ(P gi , P u ) = −1.In addition, we set δ(P gi , X i ) = 1.Finally, we set A = T 2D+1+ and B = P g where g is the output gate.Thus, once we have proved (6), we know that ε(A) ≤ ε(B) if and only if eval a (C) = 1.

Size and depth bounds:
Observe that for every gate g i of C we introduce at most two nodes P gi and Q gi plus the node X i in Γ.So, by adding the number of nodes T i and S i plus X 0 , we obtain that |Γ| ≤ 3(L + D) + + 3.
In the following we define the depth of a node as the length of a longest path starting from it (i.e., the depth of Γ is the maximal depth of its nodes).Each node X i for i ∈ [0 .. L] has depth at most (see also Example 8).Each of the nodes T k has depth k, so here we obtain nodes of depth at most 2D + + 1.The node S k only points to the node R k−1 and to X 0 , so it also has depth 2k − 1 + with k ≤ D.
By induction we see that if g is a gate on level k, then the depth of P g is at most 2k + + 1 Therefore, for each a ∈ {0, 1} n the depth of (Γ, δ a ) is exactly 2D + + 1.

Complexity of the construction:
The whole construction can be done in LOGSPACE: We can compute the level of each gate and the depth of the circuit in LOGSPACE just by following any path from the gate to the input gates.Since the circuit is layered this always gives the same result.
The actual construction of (Γ, δ) is straightforward: We start by creating only the nodes, without the edges.First create the nodes X i for i ∈ [0 .. L].Then we add the nodes T 0 , . . ., T 2D+1+ , and S 1 , . . ., S D .Here we need to be careful and identify each node T i with X τ (i−1) if it exists.Since log * can be computed in LOGSPACE, both and this identification can be computed in LOGSPACE.Now it remains to create the nodes P g and (only for Or-gates) Q g corresponding to the gates, which also clearly can be done in LOGSPACE.
For every node the outgoing edges can be determined in a straightforward way from the definition.
Let us briefly outline that the construction is also in DLOGTIME-uniform AC 0 given that each gate contains its level as part of the number.In this case, the depth of the circuit can be computed as the maximum over all the levels.Now, note that for a log n-bit number i, the number log * i can be computed in time O(log n).Hence, can be computed in DLOGTIME and the whole construction is straightforward in AC 0 .For the identification of the nodes T i with X τ (i−1) we use again that log * i can be computed in DLOGTIME.In particular, we can decide in DLOGTIME whether T i = X j for given i and j.Therefore, even in DLOGTIME-uniform AC 0 we can hard-wire this identification (notice that these nodes only depend on the size L but not on the actual circuit).
Is the word problem of the Baumslag group with power circuit representations as input P-complete? (By Corollary 58 this holds for the subgroup membership problem for BS 1,2 in G 1,2 .Moreover, as a consequence of Proposition 59, this variant of the word problem is NL-hard.)By Corollary 54 for every k the comparison problem for power circuits of depth log k n is in TC k and hard for AC k under AC 0 -Turing reductions.Thus, the question remains whether, indeed, this problem is complete for TC k under AC 0 -Turing-reductions.

Figure 3
Figure 3The three steps of power circuit reduction.The already reduced part consist of blue nodes and Min(Ξi) is colored in cyan.The red signs indicate a marking.Three dots • • • in between two nodes mean that we omitted some nodes.A dashed edge --means that we actually omitted the outgoing edges of the right node.
With initial chain Γ0.
is the set of dyadic fractions with addition as group operation.The multiplication inZ[1/2] Z is defined by (r, m) • (s, n) = (r + 2 m s, m + n).Inverses can be computed by the formula (r, m) −1 = (−r • 2 −m , −m).In the following we use BS 1,2 and Z[1/2] Z as synonyms.The Baumslag group.A convenient way to understand the Baumslag group G 1,2 is as an HNN extension 3 of the Baumslag-Solitar group: a power circuit such that |Π (1) j | = 1 for j ∈ [1 .. n].The we define markings U i , E i and M i as follows: If w (1) j

Figure 5 Figure 6
Figure 5 Power circuit for Or-gates.