Factorization of Polynomials Given by Arithmetic Branching Programs

Given a multivariate polynomial computed by an arithmetic branching program (ABP) of size s, we show that all its factors can be computed by arithmetic branching programs of size poly(s). Kaltofen gave a similar result for polynomials computed by arithmetic circuits. The previously known best upper bound for ABP-factors was poly (slogs)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ (s^{ {\rm \log} s}) $$\end{document}.


Introduction
Polynomial factoring is a classical question in algebra. For factoring multivariate polynomials, we have to specify a model for representing polynomials. A standard model in algebraic complexity to represent polynomials is arithmetic circuits (aka straight-line programs). Other well-known models are arithmetic branching programs (ABP), arithmetic formulas, dense representations, where the coefficients of all n-variate monomials of degree ≤ d are listed, or sparse representations, where only the nonzero coefficients are listed. Given a polynomial in some model, one can ask for efficient algorithms for computing its factors represented in the same model. That leads to the following question.
Recently, Dutta, Saxena & Sinhababu (2018) and also Oliveira (2016) considered factoring in restricted models like formulas, ABPs, and small-depth circuits. They reduce polynomial factoring to approximating power series roots of the polynomial to be factored. Then, they use the well-known technique of Newton iteration for approximating the roots. Let x = (x 1 , . . . , x n ). If p(x, y) is the given polynomial and q(x) is a root w.r.t. y, i.e., p(x, q(x)) = 0, then y − q(x) is a factor of p. Newton iteration repeatedly uses the following recursive formula to approximate q: where p is the derivative of p w.r.t. variable y.
If p is given as a circuit, the circuit for y t+1 is constructed from the circuit of y t . For the circuit model, we can assume that p(x, y) has a single leaf node y where we feed y t . But for formula and branching programs, we may have d many leaves labeled by y, where d is the degree of p in terms of y. As we cannot reuse computations in formula or branching programs, we have to make d copies of y t in each round. This leads to d log d blowup in size. Oliveira (2016) used the idea of approximating roots via an approximator polynomial function of the coefficients of a polynomial. This gives good upper bound on the size of factors of ABPs, formulas, and bounded depth circuits under the assumption that the individual degrees of the variables in the input polynomial are bounded by a constant.
Recently, Chou, Kumar & Solomon (2019a) proved closure of VP under factoring using Newton iteration for several variables for a system of polynomial equations. This approach also faces the same problem for the restricted models.
Instead of lifting roots, another classical technique for multivariate factoring is Hensel lifting, where factors modulo an ideal are lifted. Hensel lifting has a slow version, where the power of the ideal increases by one in each round. The other version due to cc Factorization of Polynomials Given by Page 7 of 47 15 Zassenhaus (1969) is fast, the power of the ideal gets doubled in each round.
Kaltofen's proofs use slow versions of Hensel lifting iteratively for d rounds, where d is the degree of the given polynomial (Kaltofen 1987(Kaltofen , 1989. That leads to an exponential blowup of size in models where the previous computations cannot be reused, as using previous lifts twice would need two copies each time. Kopparty, Saraf & Shpilka (2015) use the standard way of doing fast Hensel lifting for log d rounds, where in each round the lifted factors are kept monic. To achieve this, one has to compute a polynomial division with remainder. Implementing this version of Hensel lifting for ABPs or formulas seems to require to make d copies of previous computations in each round. Thus, that way would lead to a d log d size blowup.
Here, we use a classic version of fast Hensel lifting, that needs log d rounds and additionally, in each round, we have to make copies of previous computations only constantly many times. As we mentioned earlier, we avoid to maintain the monicness.
Though various versions of Hensel lifting (factorization lifting) and Newton iteration techniques (root lifting) are equivalent in the sense that one can be derived from the other (von zur Gathen 1984), it is interesting that the former gives a better factor size upper bound for the model of ABP.
Organization of the paper. In Section 2, we give some basic facts on algebra and the computational model of ABP that we use in this paper. In Section 3, we discuss the preprocessing steps and other lemmas that we use to prove in our main result. In Section 4, we prove our main result (Theorem 4.1). In Section 5, we discuss how our proof can be converted to a randomized algorithm for computing the factors. We also highlight the steps where randomization is used and argue that they can deterministically be turned into PIT-problems. In Section 6, we discuss how to apply Theorem 4.1 to extend the classic hardness to derandomization result of Kabanets and Impagliazzo to the model of ABPs.

Preliminaries
We consider multivariate polynomials over a field F of characteristic 0. A polynomial p is called irreducible, if it cannot be factored into the product of two non-constant polynomials. Polynomial p is called square-free, if for any non-constant factor q, the polynomial q 2 is not a factor of p. By deg(p), we denote the total degree of p. Let x and z = (z 1 , . . . , z n ) be variables and p(x, z) be a (n+1)-variate polynomial. Then, we can view p as a univariate polynomial By poly(n), we denote the class of polynomials in n ∈ N.

Polynomial Identity Test (PIT).
There is a randomized algorithm by the DeMillo-Lipton-Schwartz-Zippel to find out whether a given polynomial is identical zero; see Chou et al. (2019b) and the references therein for more details and history of this theorem.
Theorem 2.1 (Polynomial Identity Test). Let p(x) be an n-variate nonzero polynomial of total degree d. Let S ⊆ F be a finite set. For α ∈ S n picked independently and uniformly at random, Since we assume the field F to have characteristic 0, we can choose the set S in Theorem 2.1 large enough, for example, |S| = 2d, to keep the probability Pr[ p(α) = 0 ] small. In case of finite fields, we may have to work over a field extension so that the field is large enough.

Rings and ideals.
Let R be a commutative ring with identity. A set I ⊆ R is an ideal of R, if it is closed under addition and under scalar multiplications by elements of R. That is, for every r ∈ R and every a ∈ I, the product ar is in I. Two elements r, s ∈ R are congruent modulo I, if r − s ∈ I. This is denoted as r ≡ s (mod I).
For a set S ⊆ R, the ideal generated by S is denoted by S . It consists of all elements r ∈ R that can be written as a finite sum, for some ≥ 1, r i ∈ R, and s i ∈ S, for i = 1, . . . , . It is easy to check that S is indeed an ideal of R.
For an ideal I of R and m ≥ 1, the mth power of I is the ideal I m generated by the elements s of the form s = a 1 a 2 · · · a m , where a i ∈ I, for i = 1, . . . , m.
We will apply these notions in the following setting. The ring R will be the polynomial ring R = K[x, y], for two variables x, y and another polynomial ring K = F[z], where F is a field and z is a tuple of variables. Note that K[x, y] is commutative and has an identity. As ideal, we consider I = y , the ideal generated by polynomial y ∈ K[x, y]. Then, I contains all polynomials that have factor y, Similarly, the mth power of I contains all polynomials that have factor y m , Computational models. An arithmetic circuit is a directed acyclic graph, whose leaf nodes are labeled by the variables x 1 , . . . , x n and various constants from the underlying field. The other nodes are labeled by sum gates or product gates. A node labeled by a variable or constant computes the same. A node labeled by sum or product computes the sum or product of the polynomials computed by nodes connected by incoming edges. The size of an arithmetic circuit is the total number of its edges. An arithmetic formula is a special kind of arithmetic circuit. A formula has the structure of a directed acyclic tree. Every node in a formula has out-degree at most one. As we cannot reuse computations in a formula, it is considered to be weaker than circuits.
An arithmetic branching program (ABP) is a layered directed acyclic graph with a single source node and a single sink node. An edge of an ABP is labeled by a variable or a constant from the field. The weight corresponding to a path from the source to the sink is the product of the polynomials labeling the edges on the path. The polynomial f (x 1 , . . . , x n ) computed by the ABP is the sum of the weights of the all possible paths from source to sink.
The size of an ABP is the number of its edges. The size of the smallest ABP computing f is denoted by size ABP (f ). The degree of a polynomial computed by an ABP of size s is at most s.
Instead of arithmetic, the above models are also called algebraic circuits, algebraic formulas, and algebraic branching programs, respectively.

Properties of ABPs.
Univariate polynomials have small ABPs. Let p(x) be a univariate polynomial of degree d over any field. It can be computed by an ABP of size 2d + 1, actually even by a formula of that size.
For univariate polynomials p(x), q(x) over any field, the extended Euclidean algorithm computes the GCD h = gcd(p, q) and also the Bézout-coefficients, polynomials a, b such that ap + bq = h, where deg(a) < deg(q) and deg(b) < deg(p). Let p have the larger degree, d = deg(p) ≥ deg(q). Then, clearly deg(h), deg(a), deg(b) ≤ d, and consequently, all these polynomials, p, q, h, a, b have ABPsize at most 2d + 1.
For the sum of two ABPs B p , B q , one can put B p and B q in parallel by merging the two source nodes of B p and B q into one new source node, and similar for the two sink nodes. For the product, one can put B p and B q in series by merging the sink of B p with the source of B q . Another basic operation is substitution. Let p(x 1 , . . . , x n ) and q 1 (x), . . . , q n (x) be polynomials. Let size ABP (q i ) ≤ s, for i = 1, . . . , n. Then, we have size ABP (p(q 1 , . . . , q n )) ≤ s · size ABP (p).
To get an ABP for p(q 1 (x), . . . , q n (x)), replace an edge labeled x i in the ABP B p for p by the ABP B q i for q i .
It is known that the determinant of a symbolic matrix of dimension n can be computed by an ABP of size poly(n) (Berkowitz 1984;Chistov 1985;Csanky 1976;Mahajan & Vinay 1999). By substitution, the entries of the matrix can itself be polynomials computed by ABPs.
Resultant, subresultant, and GCD. Given two polynomials p(x, y) and q(x, y) in variables x and y = (y 1 , . . . , y n ), consider them as polynomials in x with coefficients in F[y]. The resultant of p and q w.r.t. x, denoted by Res x (p, q), is the determinant of the Sylvester matrix of p and q. For the definition of the Sylvester matrix, see von zur Gathen & Gerhard (2013, Section 6.3). Note that Res x (p, q) is a polynomial in F[y]. When we have ABPs for the coefficients of p and q, there is a poly-size ABP that computes the resultant Res x (p, q).
Basic properties of the resultant are that it can be represented as a linear combination of p and q and that it provides information about the GCD of p and q.
Lemma 2.2 (Resultant and GCD). Let p(x, y) and q(x, y) be polynomials of degree ≤ d and h = gcd(p, q). See von zur Gathen & Gerhard (2013, Section 6.3) for a proof of the lemma. Note that by item (iii), we can find out whether p and q have a non-trivial GCD via a polynomial identity test (PIT). In the contrapositive, when Res x (p, q) = 0, we have that deg x (h) = 0, and hence, h is a polynomial just in y. It is known that when p and q are monic in x, actually h = gcd(p, q) = 1.
Subresultants of p and q are polynomials S j (p, q) ∈ F[x, y] of x-degree j, computed from determinants of submatrices of the Sylvester matrix, for j = 0, 1, . . . , deg(q) ≤ deg(p). For the definition, see, for example, the textbook Geddes et al. (1992, Chapter 7). See also Sasaki & Suzuki (1992, Section 4) for a quick overview. Like for the resultant, when we have ABPs for the coefficients of p and q, there are poly-size ABPs that compute the subresultants.
The interest in subresultants comes from the fact that they can be used to get poly-size ABPs to compute the GCD. Let h = gcd(p, q) and d = deg x (h). Note first that d can be determined by several zero-tests of minors of the Sylvester matrix of p and q, see Geddes et al. (1992, Theorem 7.3). It follows from the fundamental theorem of polynomial remainder sequence, see Geddes et al. (1992, Theorem 7.4), that the dth subresultant S d (p, q) is a multiple of h: There is a k ∈ F[y] such that S d (p, q) = kh.
In our case, polynomials p and q will be monic in x. Hence, the GCD h will be monic in x too. Thus, the leading coefficient of S d (p, q) w.r.t. x is the above polynomial k ∈ F[y]. So if we divide S d (p, q) by its leading coefficient w.r.t. x, we recover the GCD h. For ABPs, the division can be done with small increase in size using Strassen's classic technique.
Lemma 2.3. Given two monic polynomials p, q ∈ F[x, z] computed by ABPs of size at most s, there is an efficient randomized algorithm that outputs an ABP of size poly(s) that computes their GCD.

Preprocessing steps and algebraic tool kit
Before we start the Hensel lifting process, a polynomial should fulfill certain properties that the input polynomial might not have. In this section, we describe transformations of a polynomial that achieve these properties such that ABPs can compute the transformation and its inverse, and factors of the polynomials are maintained.
We also explain how to compute homogeneous components and how to solve linear systems via ABPs. We show how to handle the special case when the given polynomial is just a power of an irreducible polynomial.

Computing homogeneous components and coefficients of a polynomial.
Let p(x, z) be a polynomial of degree d in variables x and z = (z 1 , . . . , z n ). Let B p be an ABP of size s that computes a polynomial p. Write p as a polynomial in x, with coefficients from F[z], We show that all the coefficients p i (z) have ABPs of size poly(s, d).
The argument is similar to Strassen's homogenization technique for arithmetic circuits, an efficient way to compute all the homogeneous components of a polynomial. The same technique can be used for ABPs. See Saptharishi (2021, Lemma 5.2 and Remark). Here, we sketch the proof idea.
Each node v of B p we split into d + 1 nodes v 0 , . . . , v d , such that node v i computes the degree i part of the polynomial computed by node v, for i = 0, 1, . . . , d. Consider an edge e between node u and v in B p .
• If e is labeled with a constant c ∈ F or a variable z i , then we put an edge between u i and v i with label c or z i , respectively.
• If e is labeled with variable x, then we put an edge between u i and v i+1 with label 1.
The resulting ABP has one source node and d + 1 sink nodes. The ith sink node computes p i (z). For each edge of B p , we get either d or d + 1 edges in the new ABP. Hence, its size is bounded by s(d + 1).
The technique can easily be extended to constantly many variables. For two variables, consider p(x, y, z) = i,j p i,j (z) x i y j . Then, from an ABP of size s for p we get ABPs for the coefficients p i,j (z) of size s(d + 1) 2 similarly as above.
In homogenization, we want to compute the homogeneous com- where p i is a homogeneous polynomial of degree i. From an ABP B p for p, we get ABPs for the p i 's similarly as above: In the definition of the new edges, only for constant label, we put the edge from u i to v i . In case of any variable label z j , we put the edge from u i to v i+1 with label z j . Then, the ith sink node computes p i (z). The size is bounded by s(d + 1).

3.2.
Computing q from p = q e . A special case is when the given polynomial p(z) is a power of one irreducible polynomial q(z), i.e., p = q e , for some e > 1. This case is handled separately. Kaltofen (1987) showed how to compute q for circuits, ABPs, and arithmetic formulas. Here, we give a short proof from Dutta (2018).
Proof. We may assume that p is nonzero; otherwise, the claim is trivial. We want to apply Newton's binomial theorem to compute q = p 1/e . For this, we need that p(0, . . . , 0) = 1. If this is not the case, we first transform p as follows: 1. If p(0, . . . , 0) = 0, let α = (α 1 , . . . , α n ) be a point where p(α) = 0. By Theorem 2.1, a random point α will work, with high probability. Now we shift the variables and work with the shifted polynomial p 1 (z) = p(z + α).
Still, p 1 (0, . . . , 0) might be different from 1. In this case, we also apply the next item to p 1 .
Note that both transformations are easily reversible. Hence, in the following we simply assume that p(0, . . . , 0) = 1. By Newton's binomial theorem, we have Note that p 1/e is a polynomial of degree d q = deg(q). Since p−1 is constant-free, the terms (p − 1) j in the right-hand side of (3.4) have degree > d q , for j > d q . Thus, (3.4) turns into a finite sum modulo the ideal z dq+1 , Then, we clearly have size ABP (Q) ≤ poly(s). To compute q = Q mod z dq+1 , we have to truncate the terms in Q of degree > d q . This can be done by computing the homogeneous components of Q as described in Lemma 3.2. We conclude that size ABP (q) ≤ poly(s).
Recall that the underlying field F has characteristic 0. Note that we above proof would not work when F had finite characteristic p that divides e.

Reducing the multiplicity of a factor.
In the earlier works on bivariate and multivariate polynomial factoring, typically the problem is reduced to factoring a square-free polynomial. This is convenient at various places in the Hensel lifting process. The technique to reduce to the square-free case is via taking the GCD of the input polynomial and its derivative.
As we will see, we do not need the polynomial to be square-free. It suffices to have one irreducible factor with multiplicity one, and another coprime factor. Thereby, we avoid GCD computations and the approach stays feasible even for formulas.
Let p(z) be the given polynomial, for z = (z 1 , . . . , z n ). The special case that p is a power of one irreducible polynomial we just handled in Section 3.2. Hence, we may assume that p has at least two irreducible factors. So, let p = q e p 0 , where q is irreducible and coprime to p 0 .
Consider the derivative of p w.r.t. some variable, such that q also depends on this variable, say z 1 .
Note that q does not divide the factor (e − 1) ∂q ∂z 1 p 0 + q ∂p 0 ∂z 1 in (3.6). Hence, the multiplicity of factor q in ∂p ∂z 1 is reduced by one compared to p.
For the ABP-size, we write p as a polynomial in z 1 , i.e., p(z) = d i=0 a i z i 1 , where the coefficients a i are polynomials in z 2 , . . . , z n . By Lemma 3.1, when p has an ABP of size s, the coefficients a i can be computed by ABPs of size s = s(d + 1). We observe that then the coefficients of the derivative polynomial ∂p have ABPs of size s + 1. We repeat taking derivatives k = e−1 times and get ∂ k p ∂z k 1 , which has the irreducible factor q with multiplicity one, as desired.
The coefficients of ∂ k p ∂z k 1 can be computed by ABPs of size s + 1.
This yields an ABP of size poly(s) that computes ∂ k p ∂z k 1 .

Transforming to a monic polynomial.
Given a polynomial p(z) in variables z = (z 1 , . . . , z n ) over field F, there is a standard trick to make it monic in a new variable x by applying a linear transformation on the variables: for α = (α 1 , . . . , α n ) ∈ F n , let We denote the term z β = z β 1 1 · · · z βn n . Then, the homogeneous component of degree d in p can be written as a d (z) = |β|=d c β z β . Note that a d is a nonzero polynomial.
For the transformed polynomial p α , we have deg x (p α ) = d and the coefficient of the leading x-term x d is a d (α) = |β|=d c β α β . By Theorem 2.1, when we pick α at random, a d (α) will be a nonzero constant in F with high probability. In this case, 1 Lemma 3.7 (Transformation to monic). Let p(z) be polynomial of total degree d. Let S ⊆ F be a finite set. For α ∈ S n picked independently and uniformly at random, Given an ABP of size s that computes p(z), we can construct another ABP of size 3s that computes p α (x, z). For the new ABP, replace edge labeled by z i by the ABP computing α i x + z i . For each old edge, this requires adding two new edges with labels α i and x.
Note that we can derandomize the transformation with an oracle for ABP-PIT.

Handling the starting point of Hensel lifting.
After doing the above preprocessing steps on the given polynomial p(z), we call the transformed polynomial f (x, z). We can assume that f of degree d can be factorized as f = gh, where g and h are coprime and g is irreducible. We start Hensel lifting by factoring the univariate polynomial f (x, 0, . . . , 0) ≡ f (x, z) (mod z). Clearly, we have the factorization f (x, 0, . . . , 0) = g(x, 0, . . . , 0) h(x, 0, . . . , 0), but these two factors might not be coprime. In this case, we do another transformation. Remark 3.8. Although it would suffice for our purpose to start with two coprime factors, the transformation below produces one irreducible factor.
Let g 0 be an irreducible factor of the polynomial g(x, 0, . . . , 0). Then, we have for some univariate polynomial h 0 (x) and for h 0 ( We want that g 0 is coprime to h 0 and h 0 . Directly, this might not be the case because all factors of f (x, 0, . . . , 0) might have multiplicity > 1. However, we argue how to ensure this after a random shift α of f . That is, we consider the function f (x, z + α) 1. First, we show how to achieve that g 0 is coprime to h 0 .
Since g is irreducible, it is also square-free, and hence, we get gcd(g, ∂g ∂x ) = 1. The discriminant of g is the resultant Polynomial r(z) is nonzero and of degree ≤ 2d 2 , by Lemma 2.2. Hence, at a random point α ∈ [4d 2 ] n , we have r(α) = 0, with high probability. At such a point α, we have that g(x, α) is square-free. Therefore, g(x, z) is square-free modulo (z − α), or, equivalently, g(x, z + α) is square-free modulo z. Hence, when we define g 0 and h 0 from g(x, z + α) instead of g(x, z), they will be coprime.
2. Similarly, we can achieve that g 0 is coprime to h 0 . By the first item, it now suffices to get g 0 coprime to h(x, 0, . . . , 0).
Combining the two items, a random point α ∈ [4d 2 ] n will fulfill both properties with high probability. So instead of factoring f (x, z), we do a coordinate transformation z → z + α and factor f (x, z + α) instead. From these factors, we easily get the factors of f (x, z) by inverting the transformation.
Remark 3.9. (i) The construction maintains the monicness: In the next section, we do another transformation on the input polynomial. We apply a map on the variables that maps x to x and z i is mapped to yz i , for a new variable y and i = 1, . . . , n. Then, we factorize the transformed polynomial modulo y. Note that in this case, going modulo y has the same effect of going modulo z. So we can use the above argument to ensure the starting condition for Hensel lifting is satisfied.

Reducing multivariate factoring to the bivariate case.
Factoring multivariate polynomials can be reduced to the case of bivariate polynomials, see Kopparty, Saraf & Shpilka (2015). Let x, y, and z = (z 1 , . . . , z n ) be variables, and let f (x, z) be the given The point now is to consider f as a polynomial in F[z] [x, y], that is, as a bivariate polynomial in x and y with coefficients in F[z]. We list some properties. 1, z).
By property 4, factors of f yield factors of f . The following lemma shows that also the irreducibility of the factors is maintained.
Lemma 3.10. Let f be monic in x and g be a monic irreducible factor of f . Then, g is a monic irreducible factor of f .
Proof. By properties 3 and 4, g is a monic factor of f . We argue that g is irreducible.
Let g = uv be a factorization of g. By item 5, this yields a factorization of g as g = u(x, 1, z) v(x, 1, z). Since g is irreducible, either u(x, 1, z) or v(x, 1, z) is constant. Because g is monic in x, either u or v must be constant too.
Thus, to get an ABP for an irreducible factor g of f , first we show that there is an ABP for the irreducible factor g. This yields an ABP for g by substituting g = g(x, 1, z).
Given an ABP B f of size s for f , we get an ABP B f for f by putting an edge labeled y in series with every edge labeled z i in B f , so that B f computes yz i at every place where B f uses z i . Hence, the size of B f is at most 2s.  Proof. After swapping rows of M , we ensure that the j ×j submatrix M j that consists of the first j rows and the first j columns has full rank, iteratively for j = 1, 2, . . . .
For j = 1, this means to find a nonzero entry in the first column and swap that row with the first row. If the first column is a zerocolumn, then v = 1 0 · · · 0 T is a solution and we are done.
To extend from j to j + 1, suppose we have ensured that M j has full rank. Now we search for a row from row j + 1 on, such that after a swap with row j + 1, the submatrix M j+1 has full rank, i.e., its determinant is nonzero. This can be tested by Theorem 2.1. If no such row exists, then the process stops at j. If j = m, then M has full rank and the zero vector is the only solution. Otherwise, assume the above process stops with j < m. Now Cramer's rule can be used to find the unique solution u = u 1 u 2 · · · u j T of the system (3.12) M j u = m 1,j+1 m 2,j+1 · · · m j,j+1 T .
where M i j is the matrix obtained by replacing the ith column of M j by the vector m 1,j+1 · · · m j,j+1 T . Now, Then, v is a nonzero solution to the original system. Note that the vector on the right-hand side of (3.12) might be the zero vector, in which case also u will be the zero vector. Still v is nonzero because it has the nonzero entry − det M j at position j + 1. The entries of v are determinants of matrices with entries computed by ABPs of size s. Hence, all the entries of v have ABPs of size poly(k, m, s).
Remark 3.13. The ABP in Lemma 3.11 can be constructed by a randomized algorithm in time poly(k, m, s). Randomization comes in by Theorem 2.1 to compute the matrices M j . In fact, the determinant polynomials we get for PIT can be computed by ABPs. Hence, the construction algorithm can be made deterministic with an oracle for ABP-PIT.

Factors of arithmetic branching programs
In this section, we prove that ABPs are closed under factoring over fields of characteristic 0. Over fields of characteristic p, our proof fails if one of the irreducible factors has multiplicity e > 0, where p divides e.
Theorem 4.1. Let p be a polynomial over a field F with characteristic 0. For all factors q of p, we have size ABP (q) ≤ poly(size ABP (p)) .
We prove Theorem 4.1 in the rest of this section. First, observe that it suffices to prove the poly(s) size upper bound for the irreducible factors of p. This also yields a poly(s) bound for all the factors.
In case when p is irreducible there is nothing to show, because the only factor, p, has a small ABP by assumption. The case when p = q e is proved in Section 3.2. So it remains to consider the general case when p = p e 1 1 · · · p em m , for m ≥ 2, where p 1 , . . . , p m are the different irreducible factors of p. We want to prove an ABP-size upper bound for an irreducible factor, say p 1 .
We start by several transformations on the input polynomial p(z), where z = (z 1 , . . . , z n ). 1. As described in Section 3.3, taking k = e 1 − 1 times the derivative w.r.t. some variable, say z 1 , we get the polynomial , where the factor p 1 has multiplicity 1.
2. Next, by Lemma 3.7, we transform p(z) to a polynomial p(x, z) that is monic in x, for a new variable x. Thereby, also the factors of p(z) are transformed, maintaining their irreducibility and multiplicity. The degree of p is twice the degree of p.
3. At this point, we may have to shift the variables z as described in Section 3.5 to ensure the properties needed for starting Hensel lifting. This shift preserves the monicness and the irreducibility of the factors.
4. Finally, the transformation to a bivariate polynomial is explained in Section 3.6. This yields polynomial P (x, y, z), with new variable y and monic in x. We rewrite P as a polynomial in x and y with coefficients in the ring K = F[z] and call the representation f . That is, f (x, y) ∈ K[x, y]. By Lemma 3.10, the transformation maintains irreducible factors. Note also that by the definition of P , we have f (x, 0) = P (x, 0, 0, . . . , 0) = f (x, y) mod y, so that f (x, y) mod y is univariate.
The main part now is to factor f (x, y) ∈ K[x, y], say f = gh, where g ∈ K[x, y] is irreducible and coprime to h ∈ K[x, y], and f, g, h are monic in x and have x-degree ≥ 1. Let d be the total degree of f in x, y.
From the factor g of f , we will recover the factor p 1 of p by reversing the above transformations. We show that g can be computed by an ABP of size poly(s). It follows that the irreducible factor p 1 has an ABP of size poly(s).
The strategy is to first factor the univariate polynomial f mod y, and then apply Hensel lifting to get a factorization of f mod y t , for large enough t. Finally, from the lifted factors modulo y t , we compute the absolute factors of f .

Hensel lifting.
Hensel lifting is named after Kurt Hensel (Hensel 1899(Hensel , 1904(Hensel , 1908(Hensel , 1918 The following lemma presents a method for lifting that is usually attributed to Hensel. There is also a certain uniqueness property: Note that for any u ∈ I, we have (1+u)(1−u) ≡ 1 (mod I 2 ).
Proof. We first show the existence part. Let 1. e = f − gh, 2. g = g + be and h = h + ae, We verify that g , h are a lift of g, h. Because f ≡ gh (mod I), we have e = f − gh ≡ 0 (mod I). In other words, e ∈ I. Hence, we get condition (ii ) that g ≡ g (mod I) and h ≡ h (mod I).
Next, we show condition (i ) that f ≡ g h (mod I 2 ).
In the second line, note that e 2 ∈ I 2 . The last equality holds because e ∈ I and 1 − (ag + bh) ∈ I. To show condition (iii ), we verify that a g +b h ≡ 1 (mod I 2 ). First, observe that Hence, c ∈ I and we conclude that a ≡ a (mod I) and b ≡ b (mod I). Now, For the uniqueness part, let g * , h * be another lift of g, h. Let α = g * − g and β = h * − h . By Definition 4.2 (ii ), we have g ≡ g ≡ g * (mod I) and h ≡ h ≡ h * (mod I) and therefore α, β ∈ I.
We first show (4.5) βg + αh ≡ 0 (mod I 2 ). by (4.5) and because a g + b h ≡ 1 (mod I 2 ), we have For the ABP-size, recall that the size just adds up when doing additions or multiplications. Hence, when f, g, h, a, b have ABPs of size ≤ s and we construct ABPs for g , h , a , b according to steps 1-4 in the above proof, we get ABPs of size O(s). In more detail, the reader may verify that the ABPs for g and h have size ≤ 4s and the ABPs for a and b have size ≤ 10s.
Similarly, with respect to the degree, when f, g, h, a, b have degree ≤ d, g , h , a , b have degree O(d). Namely, g , h have degree ≤ 3d, and a , b have degree ≤ 5d.
Remark 4.6. In the monic version of Hensel lifting, there is a division in addition to the four steps from above. When we assume that g is monic, we can compute polynomials q and r such that g − g = qg + r, where deg x (r) < deg x (g). Then, one can show that g = g + r and h = h (1 + q) are a lift of g, h w.r.t. f , and g is again monic. Also the Bézout-coefficients a, b can be computed. For c = a g + b h − 1, let a = a(1 − c) and b = b(1 − c).
An advantage of the monic version is that the result is really unique (modulo I 2 ). There is no 1 + u factor between monic lifts. A disadvantage is the extra division which would blow up the ABPsize too much.

Iterating Hensel lifting.
We apply Hensel lifting iteratively in the ring R = K[x, y], where K = F[z]. Let f ∈ K[x, y] be a polynomial of total degree d in x, y that can be factored into f = gh, where g ∈ K[x, y] is irreducible and coprime to h ∈ K[x, y], and f, g, h are monic in x and have x-degree ≥ 1.
To start the Hensel lifting procedure, we factor the univariate polynomial f (x, 0) = f mod y as f (x, 0) = g 0 (x) h 0 (x), where g 0 is a divisor of g mod y, and coprime to h 0 and deg x (g 0 ) ≥ 1. Recall that by the preprocessing in Section 3.5, we may assume that there is such a decomposition of f (x, 0).
By the Euclidean algorithm, there are polynomials a 0 (x), b 0 (x) such that a 0 g 0 +b 0 h 0 = 1. Hence, we have a 0 g 0 +b 0 h 0 ≡ 1 (mod I 0 ), for I 0 = y , and initiate Hensel lifting with The ABP-size of g 0 , h 0 , a 0 , b 0 is bounded by the ABP-size of f , actually by deg x (f ), because we have univariate polynomials here.
We iteratively apply Hensel lifting to g 0 , h 0 as described in the proof of Lemma 4.4. Each time, the ideal gets squared. For k ≥ 1, let I k = I 2 k 0 . That is, we get polynomials g k , h k such that and g k , h k are a lift of g k−1 , h k−1 w.r.t. f and I k−1 . For the ABP-size of g k and h k , we observed at the end of Section 4.1 that the size increases by a constant factor in each iteration. Hence, when we start with size ABP (f ) = s, after k iterations, we get size ABP (g k ), size ABP (h k ) = s 2 O(k) .
Similarly, the degree of the lifted polynomials increases by a constant factor in each iteration. We start with deg(f ) = d. Then, after k iterations, we get deg( The following lemma states that g k divides g modulo I k , for all k ≥ 0. In a sense, the g k 's approximate a factor of g modulo increasing powers of y.
Lemma 4.7. With the notation from above, for all k ≥ 0 and some polynomial h k , Moreover, g k , h k are a lift of g k−1 , h k−1 w.r.t. g, for k ≥ 1, and Proof. The proof is by induction on k ≥ 0. For the base case, we have that g 0 divides g modulo I 0 , as explained above. Thus, for some polynomial h 0 that is coprime to g 0 , we have Hence, we have h 0 ≡ h 0 h (mod I 0 ). Note that h 0 might be just 1.
For the inductive step, assume that Let g k , h k be a lift of g k−1 , h k−1 w.r.t. g and I k−1 , so that in particular (4.9) g k h k ≡ g (mod I k ).
We claim that then g k , h h k are a lift of g k−1 , h h k−1 , i.e., of g k−1 , h k−1 by (4.8), w.r.t. f . Proof. We check the three conditions for a lift in Definition 4.2. For the product condition (i ), we have by (4.9) For condition (ii ), we have g k ≡ g k−1 (mod I k−1 ) by assumption and similarly By Remark 3 after Definition 4.2, condition (iii ) already follows now. This proves Claim 4.10.
Recall that also g k , h k are a lift of g k−1 , h k−1 . Hence, by the uniqueness property of Hensel lifting, there is a u ∈ I k−1 such that (4.11) g k ≡ g k (1 + u) (mod I k ) and h h k ≡ h k (1 − u) (mod I k ).
Now observe that we can move the factor 1 + u: We have that g k (1 + u), h h k are a lift of g k−1 , h k−1 and then also g k , h h k (1 + u) are a lift of g k−1 , h k−1 . Proof. We check the conditions for a lift in Definition 4.2. The first two of them are trivial: Moving the factor 1+u clearly does not change the product. Because u ∈ I k−1 , we still have the equality with the factors g k−1 and h k−1 modulo I k−1 , respectively. By the remark after Definition 4.2, the third condition already follows, but it is also easy to check now: Let a, b ∈ R such that ag k + bh k ≡ 1 (mod I k ). It follows by Equation (4.11) that This proves Claim 4.12.
By (4.11), we have By (4.14), we have It follows from (4.15) that gh ≡ g k h k h (mod I k ). Now we want to cancel h in the last equation and conclude that g ≡ g k h k (mod I k ). This we can do because h is monic in x, it does not contain a factor y, i.e., h ∈ I 0 . Hence, together with (4.13), we conclude that g k , h k are a lift of g k−1 , h k−1 w.r.t. g.
For the x-degree of h k , consider the equation h k ≡ h h k (mod I k ). Since h is monic in x, the highest x-degree term in the product h (h k mod I k ) will survive the modulo operation. Therefore,

Factor reconstruction for ABP.
We show how to get the absolute factor g of f from the lifted factor. This is called the jump step in Sudan's lecture notes (Sudan 1998). The difference to the earlier presentations is that our lifted factor might not be monic.
Let f = gh, where f has total degree d, factor g is irreducible and coprime to h, and f, g, h are monic in x. In the previous section, we started with a factorization f ≡ g 0 h 0 (mod I 0 ), where g 0 is irreducible and coprime to h 0 . Moreover, g ≡ g 0 h 0 (mod I 0 ), for some h 0 such that h 0 = h h 0 (mod I 0 ).
Then, we apply Hensel lifting, say t-times. We will see below that it suffices for our purpose to have 2 t ≥ 2d 2 + 1.
Hence, we define t = log (2d 2 + 1) . By Lemma 4.7, we get a factorization f ≡ g t h t (mod I t ) such that for some h t such that h t ≡ h h t (mod I t ). Equation (4.16) gives us a relation between the known g t and the unknown g, via the unknown h t . We set up a linear system of equations to find a polynomial g ∈ K[x, y] with the same x-degree as g, such that for some polynomial h. We give some more details to the linear system next.
Details for setting up the linear system. Equation (4.17) can be used to set up a homogeneous system of linear equations. For the degree bounds of the polynomials, let d x = deg x (g) and d y = deg y (g). Let D x = deg x (g t mod I t ) and D y = 2 t − 1. Let Since we are working with g t mod I t , we do not consider the terms with powers y k , where y k ∈ I t . Similarly, recall from Lemma 4.7 that deg x (h t mod I t ) ≤ deg x (h t ) = D x . Therefore, we set up coefficients for h only up to x-degree D x .
Since we have an ABP that computes g t , there are ABPs for computing the coefficients c i,j of g t by Lemma 3.1. The coefficients r i,j , s i,j of g and h we treat as unknowns. Equation (4.17) now becomes Now we equate the coefficients of the monomials x k y l on both sides in (4.18), for all k, l that occur in (4.18) such that l ≤ D y . By restricting the exponent of y to D y , the (mod I t )-operation is already implemented. The equations we get are now absolute equations, without modulo operations.
We get a homogeneous system of (D x + D x + 1)(D y + 1) many equations in 1 + d x (d y + 1) + (D x + 1)(D y + 1) many unknowns r i,j and s i,j . This system can be expressed in the form M v = 0, for a matrix M and unknown vector v. By Lemma 3.11, an ABP can efficiently compute a nonzero solution vector v of polynomials from F [z]. Note that by (4.16), a non-trivial solution is guaranteed to exist. Obtaining g from g. Recall that g is monic, whereas we put leading coefficient r d x,0 ∈ F[z] at x dx in g. The reason to do so is that we want the linear system to be homogeneous, which would not be the case when we would fix the coefficient to be 1. Hence, our solution g might not be monic.
The following lemma shows that when we divide g by its leading coefficient r d x,0 , we get precisely g.
Lemma 4.19. Let g be a solution of (4.18) with leading x-coefficient r d x,0 ∈ F[z], for t = log (2d 2 + 1) . Then, Proof. Consider the resultant r(y) = Res x (g, g). We show that r(y) = 0. Then, it follows from Lemma 2.2 that g and g share a common factor with positive x-degree. Since g is irreducible, it must be a divisor of g. As g is monic, we have g = r d x,0 g as claimed.
As g and g have the same x-degree, r d x,0 is a polynomial in F[z].
To argue that r(y) = 0, recall from Lemma 2.2 that the resultant can be written as r(y) = ug + v g, for some polynomials u and v. Since deg(g), deg( g) ≤ d, we have deg(r) ≤ 2d 2 . By (4.16) and (4.17), we have Consider g t and w = uh t + v h as polynomials in y with coefficients in x. Let where c i ∈ K[x], for i = 0, 1, . . . , D y and D y = 2 t − 1. By the properties of Hensel lifting, we have g t ≡ g 0 (mod I 0 ), and therefore, c 0 (x) = g 0 (x). Recall that g 0 is non-constant, deg(g 0 ) ≥ 1. Now consider w. Let j ≥ 0 be the least power of y that appears in w and let its coefficient be w j (x). Suppose for the sake of contradiction that j < 2 t . Then, the least power of y in g t w is also j, and its coefficient is g 0 (x)w j (x), which is a nonzero polynomial in x.
The monomials present in g 0 (x)w j (x)y j cannot be canceled by other monomials in g t w because they have larger y-degree. It follows that g t w mod I t is not free of x. On the other hand, r(y) ≡ g t w (mod I t ) and r(y) ∈ K[y] is a polynomial with no variable x. This is a contradiction.
We conclude that j ≥ 2 t , which means that w ≡ 0 (mod I t ). Hence, we get r(y) ≡ 0 (mod I t ). Recall that deg y (r) ≤ 2d 2 and 2 t > 2d 2 . Hence, the (mod I t )-operation has no effect here and we can conclude that indeed r(y) = 0.
The final division to obtain an ABP for g can be accomplished by adapting Strassen's division elimination for ABPs. The size increase is polynomial in the ABP-size of g.
Remark 4.20. The ABPs for g and g can also be algorithmically constructed. One point to notice here is that when we set up the linear system above, we used the degrees d x = deg x (g) and d y = deg y (g) of g that we actually do not have in hand at that point. yield a polynomial f of degree d f ≤ 2d p and size ABP (f ) = poly(s). Then, we do t = log (2d 2 f + 1) iterations of Hensel lifting. The initial polynomials f 0 , g 0 , h 0 have ABP-size bounded by 2d f . Hence, the polynomials after the last iteration have ABP-size bounded by 2 t poly(s) = poly(s, d p ) = poly(s).
From the lifted factor, we construct the actual factor of f . This step involves solving a linear system. We argued that the resulting polynomial g has ABP-size poly(s).
Finally, we reverse the transformations from the beginning and get a factor of p that has an ABP of size poly(s). This finishes the proof of Theorem 4.1.  (2015) showed for arithmetic circuits.

Construction algorithm and reduction to PIT
Theorem 5.1. Given a polynomial p(z) computed by an ABP of size s, there is a randomized poly(s)-time algorithm to compute all its irreducible factors represented as ABPs. Moreover, the factorization problem reduces in poly(s)-time to PIT for ABPs.
We already mentioned for many steps that they are constructive. Here, we summarize the construction and fill the gaps.
Step 1: Transformation to monic. We modify the order of the steps as described in the proof of Theorem 4.1 and start with the transformation to make the input polynomial p(z) monic in a new variable x, as described in Section 3.4. For the ease of notation, we still call the polynomial p, it is now p(x, z), however. Step 2: If p = q e . Next, we consider the case of Section 3.2, i.e., when p = q e , for some polynomial q and an integer e ≥ 2. Note that to prove Theorem 4.1, we could simply assume that q is irreducible and that we know e. Now, we have also to find out algorithmically whether this is the case. What we do is, we try for all possible values of e that are divisors of d, starting from the maximum possible value d in decreasing order. When we reach e = 1, we proceed to step 3.
For each e, we compute the polynomial q = p 1 e as described in Section 3.2. Now we have to check whether q is indeed a factor of p.
Recall that p is monic in x. Hence, a factor must be monic too. So if q is not monic, we can proceed to the next e. Otherwise, we test if q is a divisor of p. This we can do by using the classical Euclidean univariate long-division algorithm. Here, we need that p and q are monic polynomials in x. By Lemma 3.1, we can get the ABPs that compute the coefficients of the polynomials p and q w.r.t. x. Now, we can compute the ABPs that compute the quotient and the remainder in randomized polynomial time using univariate long division. To test whether the remainder is zero, we need a PIT for ABPs.
At this point, we have verified that p = q e . Still, polynomial q might not be irreducible. To check this, we try to factorize q. That is, we restart the algorithm on q. Since e is maximum such that p = q e , polynomial q itself cannot be of the form q = h e , for some e ≥ 2. Hence, for factoring q, we can directly go to step 3.
Step 3: Reducing the multiplicity of a factor. Now p is either an irreducible polynomial or a reducible polynomial of the form p = q e 1 1 · · · q em m , for monic square-free polynomials q 1 , . . . , q m , where 1 ≤ e 1 ≤ e 2 < · · · ≤ e m ≤ d, for some m ≥ 2.
If p is irreducible or all the irreducible factors have multiplicity one, then we proceed to step 4. Otherwise, to reduce the multiplicity, we take derivatives as described in Section 3.3. However, we do not know the exponents e 1 , e 2 , . . . , e m . Therefore, we take derivatives of p of order e, for all 1 ≤ e ≤ d − 2, and factorize the corresponding derivatives of p. Note that for e = e i − 1, the eth derivative contains the square-free factor q i . Recall that the derivative also contains other factors, that are not factors of p, see equation (3.6) on page 16. However, we can always check factors via the division subroutine mentioned in step 2.
Step 4: Preprocessing for Hensel lifting. We shift the variables as explained in Section 3.5 to fulfill the assumptions needed for Hensel lifting.
Step 5: Reduction to bivariate We introduce a new variable y as explained in Section 3.6 and consider the resulting polynomial f (x, y) as bivariate with coefficients in F(z). For the current e from step 3, polynomial f contains all irreducible factors of p that have multiplicity e − 1.
Step 6: Hensel lifting. The univariate polynomial f (x, 0) can be represented in the dense way. Note that its coefficients are over F. We can use any known univariate factorization algorithm, like the famous LLL-algorithm over rationals. We try all the irreducible factors of multiplicity one of f (x, 0) as starting point for g 0 as described in Section 4.2. Note that there are at most d irreducible factors of f (x, 0). Hence, we can try all of them. We can use the extended Euclidean algorithm to compute the polynomials a 0 and b 0 such that a 0 g 0 + b 0 h 0 = 1.
Then, we apply Hensel lifting and linear system solving as described in Sections 4.2 and 4.3. This yields a factor, say f 1 , of f .
Step 7: Factor test. We reverse the preprocessing steps for f 1 to get a candidate factor, say q, of the original polynomial p. For some choices made in previous steps, q is some other polynomial and not a factor of p. But this we can check by using the division algorithm. If q has the same degree as the original polynomial p, we conclude that p is irreducible. Summary. The algorithm described above will efficiently compute all the irreducible factors of the original polynomial p. The