IBP reduction coefficients made simple

We present an efficient method to shorten the analytic integration-by-parts (IBP) reduction coefficients of multi-loop Feynman integrals. For our approach, we develop an improved version of Leinartas' multivariate partial fraction algorithm, and provide a modern implementation based on the computer algebra system Singular. Furthermore, We observe that for an integral basis with uniform transcendental (UT) weights, the denominators of IBP reduction coefficients with respect to the UT basis are either symbol letters or polynomials purely in the spacetime dimension $D$. With a UT basis, the partial fraction algorithm is more efficient both with respect to its performance and the size reduction. We show that in complicated examples with existence of a UT basis, the IBP reduction coefficients size can be reduced by a factor of as large as $\sim 100$. We observe that our algorithm also works well for settings without a UT basis.


Introduction
With the end of Large Hadron Collider (LHC) run-II and the upgrade to HL-LHC [1,2], there is an eager demand for high-precision physics computations. The computation of integration-by-parts (IBP) identities [3,4], which can be used to reduce a large number of Feynman integrals to a small set of master integrals, is a critical and often bottleneck step for the evaluation of multi-loop scattering amplitudes in precision physics.
Besides the development of the computational techniques for IBP reductions, there is another problem which was less addressed in the literature. Frequently, after an analytic IBP reduction of complicated multi-loop Feynman integrals, we obtain reduction coefficients with a huge size, as rational functions of the spacetime parameter D and kinematic variables. The huge coefficients are difficult to store, to transfer, to use for analytic scattering amplitude computations, and also very cumbersome for numerical evaluations. Thus, an important question arises: How do we simplify the analytic IBP reduction coefficients in practice?
One natural idea to make analytic IBP reduction coefficients shorter, is to choose a "good" master integral basis. Early attempts were made to test different integral orderings in the Laporta algorithm, in order to get shorter reduction coefficients. However, it is difficult to dramatically shorten IBP reduction coefficients by simply changing the integral ordering. Recently, new methods were presented [44,45] to find a good master integral basis such that the dimensional parameter D factorizes out in the final IBP reduction coefficients and makes the reduction much easier. In ref. [33], the master integral basis with uniform transcendental (UT) weights [46,47] was suggested to shorten the size of IBP reduction coefficients.
In this paper, we propose a powerful method to reduce the byte size of the analytic IBP reduction coefficients, which is based on our modern version of Leinartas' multivariate partial fraction algorithm [48,49]. Leinartas' algorithm has been used for solving basis transformation matrix in Meyer's UT determination algorithm [50], and for the reconstruction and simplification of the planar two-loop five-parton pentagon function coefficients [51]. We develop an improved version of Leinartas' algorithm and implement it in a library for the open source computer algebra system Singular [52].
From the examples we have tested, this method can rewrite a huge rational function in IBP reduction coefficients as a much shorter sum of simpler rational functions.
The improvements to Leinartas' algorithm include an additional decomposition step between the first step (Nullstellensatz decomposition) and second step (algebraic dependence decomposition) of the algorithm which reduces the size of the denominators (and numerators) by doing a (multivariate) division with remainder by the denominator factors. Moreover, in addition to Leinartas' original algorithm, we add a third step in the algorithm, which implements a numerator decomposition as suggested in [50] and uses a syzygy computation to reduce the size of the decomposition expression. In particular in the case of examples arising from IBP reductions, due to the additional decomposition step and by reducing the size of the algebraic relations used, we were able to drastically reduce the runtime of the second step of Leinartas' algorithm, which relies on algebraic relations between the denominator factors. For this we make use of Singular's efficient algorithms for calculating Gröbner bases, syzygy modules and polynomial factorizations. We provide a detailed description of the algorithm in pseudocode.
As an algorithm based on partial fractioning, the size reduction ratio and the running time depend on the degree of irreducible denominators. We combine our partial fractioning approach with the strategy of choosing a "good" master integral basis. In particular, as mentioned in ref. [33], we suggest that when a UT master integral basis for the integral family under consideration exists, it is advantageous to first reduce Feynman integrals to the UT basis, and then run our partial fraction algorithm to shorten the size of the IBP coefficients. The reason is that, in the examples we have tested, for Feynman integrals with each D i defined as a square of a Z-linear combination of loop and external momenta minus the mass term, IBP reduction coefficients with respect to a UT basis have the following good properties: • The spacetime dimension parameter D factorizes out in the denominator of the reduction coefficients.
• Except the factors purely in D's, the other factors in the IBP reduction coefficients' denominators, are (a subset of) the symbol letters.
Therefore, using a UT basis, we usually get much simpler irreducible factors in the denominators of IBP reduction coefficients. This property makes the partial fractioning much faster and the result usually shorter than that from the usual master integral choice. We tested various IBP reduction coefficients from simple diagrams to complicated frontier diagrams. In some complicated IBP reduction coefficients examples, we observe that our partial fractioning algorithm, combined with the UT basis choice, dramatically shortens the coefficient size by a factor of as large as 100. In the Appendix B, we explicitly list an example of one coefficient, before and after the partial fraction decomposition to provide an impression of this dramatic reduction of size.
We distribute the Singular code of our partial fraction implementation as an open source Singular library for download: https://github.com/Singular/Singular/tree/spielwiese/Singular/LIB/pfd.lib This paper is organized as follows: In Section 2 we set up the notations and review the concepts of IBP reduction and master integrals. In Section 3, we present our improved verion of Leinartas' algorithm to shorten IBP reduction coefficients. In Section 4, we provide several IBP reduction simplifications, and also emphasize the benefit of using UT bases in case they exist. In Section 5, we summarize our discoveries and discuss possible directions for future research. In the appendices, we provide a manual describing the use of our Singular library for multivariate partial fractioning, and an explicit example of the coefficient size reduction.

Integration-by-Parts Identities and master integrals
There are many algebraic relations between different Feynman integrals and it is very efficient to use these relations to obtain further Feynman integrals from the ones we already know. A very useful set of relations can be obtained via the integration-by-parts (IBP) identities, which relate different integrals of a given integral family.

Consider a Feynman integral with any loops
where L is the number of loops, α i are integer indices and the denominators are given by i.e are quadratic or linear functions of the external momenta p i and the loop momenta l i . The standard IBP relation [3,4] is, where m = 1, . . . , L with q k a linear combination of loop momenta and external momenta.
With the IBP identities, we can find the basis of a given integral family, which are called master integrals (MIs). The finiteness of master integrals was proven in ref. [53].

Differential equation and UT basis
Since the master integrals are functions of scalar products of external momenta, it is natural to consider the derivatives with respect to the scalar products. By introducing a vector here I i are the master integrals of a corresponding Feynman diagram, we can set up the following differential equation where A is a n × n matrix. Normally, every element of A is a rational function of spacetime dimension D and kinematic variables.
While, Johannes Henn showed that with a new choice of MIs, differential equations can simplify in a way that they can be solved easily order by order [46,47]. With suitable MIs, the differential equation can be written like that This is called the canonical form of differential equations, here we set D = 4 − 2ǫ and each A k is a constant matrix, S k are functions of Lorentz invariants, which are called symbol letters. With (2.7), the differential equations can be solved order by order in an ǫ-order expansion: The key property of these suitable master integrals can be described with the concept of the degree of transcendentality T (f ) of a function. T (f ) defines the fold number of iterated integrals needed in the function f . Moreover, we require T (f 1 f 2 ) = T (f 1 ) + T (f 2 ). So that, we can see T (Li k (x)) = k, T (log x) = 1, T (ζ n ) = T (Li n (1)) = n, (2.10) T (algebraic f actors) = 0, T (ζ 2 ) = T ( π 2 6 ) = 2 ⇒ T (π) = 1.
If the function also satisfies (2.11) then the function f is called a pure function. With this definition we can see that if we multiply a pure function with an algebraic function of x, the resulting function would still have the same uniform transcendentality but no longer be a pure function anymore, since the derivative is also applied on the algebraic function. Because of (2.7) and (2.9), we can see that the functions in I ′ k are all pure functions, hence the I ′ is called uniform transcendental (UT) basis.
There are many ways to construct a UT basis. For examples, we can construct it via Fuchsia and epsilon, based on the Lee's algorithm [54][55][56]. Meyer proposed a package CANONICA to find a transformation to get UT integrals [50]. What is more, by means of leading singularity analysis and the dlog ansatz, we can also construct a UT basis [57]. A UT basis can be also constructed via Baikov analysis [58], and systematically via the dlog form in a general representation and the intersection theory [59]. And recently, it was discovered that the full UT basis from only one UT integral [60].

Symbol of a transcendental function
In section 2.2, we proposed the canonical form of differential equation In the case where the symbol letter alphabet can be written in terms of rational functions (in at least one variable), one can write the answer in terms of Goncharov polylogarithms (also called hyperlogarithms, multiple logarithms) [47,61]. The Goncharov polylogarithms can be defined iteratively as follows, G(a 1 , . . . , a n ; z) = z 0 dt t − a 1 G(a 2 , . . . , a n ; t), a i ∈ C, (2.14) with G(z) ≡ G(; z) = 1. (2.15) In the special case where all the a i are zero, we define, using the obvious vector notation a n = (a, ..., a), a ∈ C, G( 0 n ; z) = 1 n! log n z, G( a n ; z) = 1 n! log n 1 − z a . (2.16) A Goncharov polylogarithm T k of transcendentality degree k can be written as a linear combination (with rational coefficients) of k-fold iterated integrals of the form [62] where a and b are rational numbers, R i (t) are rational functions with rational coefficients and the iterated integrals are defined recursively by (2.18) in physics, there d log R k are just the ones appeared in eq (2.7), with R k equal S k in (2.8).
There is one useful quantity associated with T k called the symbol, which is an element of the k-fold tensor product of rational functions modulo constants [63], denoted by S. The symbol of the function T k is that is why S k in (2.8) are called symbol letters. There are many other properties of the symbol, see ref. [64,65] for a discussion of their properties.

Improved Leinartas' Algorithm and Modern Implementation
In this section we describe an algorithm based on the work of E.K.Leinartas [48,49] to reduce the size of rational functions by writing them as a sum of functions with "smaller" numerators and denominators.
The improvements to the algorithm described in the original paper by Leinartas lie mainly in an additional decomposition step described in Algorithm 4, which reduces the size of the numerators and denominators by doing a (multivariate) division with remainder by the denominator factors, as well as the changes discussed in Remark 3.13, which aim at improving the performance of the second decomposition step (Algorithm 2) by reducing the size of the required annihilating polynomials. In addition to Leinartas' original algorithm we also add, as suggested in [50], a numerator decomposition as the final step of the algorithm and use a syzygy module computation to reduce the size of the decomposition (see Algorithm 3 and Remark 3.11). Thus, while Leinartas' original algorithm calculates a decomposition satisfying only the first two conditions in Theorem 3.5, we add an additional condition. In our implementation, we make use of the computer algebra system Singular, which provides efficient algorithms for the calculation of Gröbner bases and syzygy modules as well as polynomial factorization.
To state more precisely what we mean by "smaller" numerators/denominators, we first need the following definitions. The goal is then an algorithmic proof of Theorem 3.5. For this, let in the following K[x 1 , . . . , x d ] or short K[x] be the polynomial ring over some field K in d variables x = (x 1 , . . . , x d ) and let K denote the algebraic closure of K. Definition 3.2 (monomial ordering). A monomial ordering for K[x] is a total ordering ">" on the set x α α ∈ N d of monomials (writing "x α " for x α 1 1 · . . . · x α d d ), such that > is compatible with multiplication, i.e. for all α, β, γ ∈ N d it holds and > is called global if it is a well ordering or equivalently if 1 < x i for all i = 1, . . . , d. For write L(f ) for its lead monomial, that is the largest monomial with respect to >. with respect to a given global monomial ordering is a finite subset G ⊆ I such that the ideals generated by all lead monomials of G and of I coincide: Definition 3.4 (division with remainder). After the choice of a (global) monomial ordering there exists an algorithm (multivariate reduced division with remainder, see [66,§3 Theorem 3]) to determine for any polynomials f, g 1 , . . . , g r ∈ K[x] a division expression such that none of the lead monomials L(a i g i ) are bigger than L(f ) and no term of r is divisible by any lead monomial L(g i ). Call a polynomial r with this property reduced with respect to g 1 , . . . , g r . In case G = (g 1 , . . . , g r ) is a Gröbner basis of an ideal I ⊆ K[x], it can be shown [66, §6 Proposition 1], that the remainder r only depends on the monomial ordering and I. In this case call r reduced with respect to I.
Call G a reduced Gröbner basis, if every g ∈ G is reduced with respect to G\{g}. It can be shown [ where all nonzero summands satisfy the following conditions (1) the polynomials {q i |i ∈ S} have a common zero in K d (2) the polynomials {q i |i ∈ S} are algebraically independent (3) f S is reduced with respect to the ideal q i |i ∈ S ⊆ K[x] Note that (3) depends on the monomial ordering. In order to get numerator polynomials of low degree, a degree ordering (i.e. deg( ) should be chosen. In our Singular implementation we used the graded reverse lexicographic ordering defined by and the last nonzero entry of α − β is negative (3.5) Furthermore condition (2) ensures, that at most d different irreducible factors occur in each denominator of the decomposition, since it can be shown, that any set of at least d + 1 polynomials (in d variables) is algebraically dependent. (This follows directly from the Jacobian criterion 3.7.) In view of condition (1) in Theorem 3.5, the following corollary to Hilbert's weak Nullstellensatz can be used to eliminate factors from the denominators if the q i have no common zero. In this case there exist polynomials h 1 , . . . , h m ∈ K[x] such that Proof. This is exactly the weak Nullstellensatz [66, §4.1 Theorem 1] with the exception, that we require . However the equation 1 = m i=1 h i f i can be seen as a set of linear equations (with coefficients in K) in the coefficients of the polynomials h i and by the weak Nullstellensatz we know, that it is solvable over K. But then it is solvable over K as well, since all the coefficients in these linear equations lie in K. Hence we may assume Given a rational function f /g as in Theorem 3.5 for which the irreducible factors q i of g have no common zero in K d , we know that q e 1 1 , . . . , q em m have no common zero as well and if (h 1 , . . . , h m ) is a Nullstellensatz certificate, we can simply multiply f by 1 = m k=1 h k q e k k to get a decomposition where each denominator contains only m − 1 different irreducible factors.
To calculate this decomposition (Algorithm 1), we compute a reduced Gröbner basis G of q e 1 1 , . . . , q em m as well as the transformation matrix T from the original ideal generators q e 1 1 , . . . , q em m to G. This can be done with Buchberger's algorithm for the computation of Gröbner bases as implemented in the Singular function liftstd.

Algorithm 1 NSSdecompStep (Nullstellensatz decomposition step)
Output: set of rational functions with sum f /g 1: calculate the reduced Gröbner basis G of q e 1 1 , . . . , q em m as well as the transformation matrix T from the generators q e i i to G 2: if G = {c} for c ∈ K (so q e 1 1 , . . . , q em m = 1 ) then 3: Repeated application of Algorithm 1 will yield a decomposition satisfying condition (1) in Theorem 3.5. For (2) let's assume f /g is a rational function and g = m i=1 q e i i as in Theorem 3.5. There is a simple criterion to test for algebraic dependence: is algebraically independent if and only if the Jacobian matrix ∂f i ∂x j i,j ∈ K[x] m×d has rank m over the field K(x) of rational functions. (A proof can be found in [67].) So the Jacobian matrices of {q 1 , . . . , q m } and {q e 1 1 , . . . , q em m } have the same rank over K(x).
If now the factors q 1 , . . . , q m of the denominator g are algebraically dependent, then so are q e 1 1 , . . . , q em m and if p ∈ K[y] = K[y 1 , . . . , y m ] is an annihilating polynomial we can write such that c α y α is one of the terms of smallest degree (using multi-indices β ∈ N m , so deg(y β ) = |β|= β 1 + · · · + β m ). Writing q for the vector (q e 1 1 , . . . , q em m ), it holds Since y α has minimal degree, for every β occurring in Equation (3.10) it holds β i ≥ α i + 1 for at least one index i. Therefore the factor q i does not appear in the denominator of the corresponding term. So we obtain a sum of rational functions with at most m − 1 different irreducible factors in their denominators and thus, as with Algorithm 1, repeated application of this step leads to a decomposition satisfying condition (2) in Theorem 3.5. But in order to turn this into an algorithm, we need a way of computing annihilating polynomials: Proof. Let p ∈ K[y] be an annihilating polynomial for f 1 , . . . , f m and write f = (f 1 , . . . , f m ).
Thenp(x, 0) = p(f ) = 0 and thus every term ofp must be divisible by some Algorithm 2 algDependDecompStep (algebraic dependence decomposition step) calculate the reduced Gröbner basis G of y 1 − q e 1 1 , . . . , y m − q em m ⊆ K[x, y] with respect to an elimination ordering for x 1 , . . . , x d (y = (y 1 , . . . , y m )) 3: p = some element of G ′ (choose a "simple" one, e.g. with smallest degree) 5: write p = c α y α + β c β y β where y α has minimal degree 6: In order to calculate the rank of the Jacobian matrix in line 1 of Algorithm 2, we can test, if the syzygy module of the K[x]-module generated by the rows of the Jacobian matrix (that is the module of all K[x]-linear relations of the rows) is zero (e.g. with the Singular command syz). Instead of calculating the rank of the Jacobian, we could also just check whether G ′ is empty, however the derivatives ∂q i ∂x j are in general of much lower degree than q e i i , so using the Jacobian criterion is cheaper, especially for small factors q i . Also, if d < m the criterion becomes trivial since the rank is at most d.
The previous two strategies to decompose a rational function only decrease the size of the denominators while leaving the numerator mostly untouched. To simplify the numerators as well, it makes sense to do a (reduced) division with remainder of the numerator f by a Gröbner basis G of the ideal q 1 , . . . , q m generate by all the irreducible factors in the denominator. This gives a division expression as in Definition 3.4. Rewriting this in terms of the ideal generators q 1 , . . . , q m we get The first term with numerator r already fulfills condition (3) of Theorem 3.5 and all other terms have one irreducible factor q i less in their denominator. Thus repeated application of Algorithm 3 results in a decomposition satisfying (3).
Output: set of rational functions with sum f /g 1: calculate the reduced Gröbner basis G of q 1 , . . . , q m as well as the transformation matrix T from the generators q i to G 2: divide f by G (reduced division with remainder) to get a division expression by its remainder after division by a Gröbner basis of the syzygy module of (q 1 , . . . , q m ) . After doing this, the polynomials b k will still satisfy Equation (3.12), since we just changed (b 1 , . . . , b m ) by an element (s 1 , . . . , s m ) of the syzygy module and m i=1 s i q i = 0. This step is optional, but in practice we found, that reducing the coefficients by the syzygy module can dramatically reduce the runtime of Algorithm 5. In Singular we can just apply the procedures syz and std to calculate a Gröbner basis of the syzygy module and reduce the coefficients b k with reduce.
Using all three decomposition techniques one after the other yields Algorithm 5 which calculates a partial fraction decomposition fulfilling conditions (1), (2) and (3), finally proving Theorem 3.5.
However, in practice the calculation of annihilating polynomials can be quite slow if the degrees of the polynomials q e i i get too big. Therefore it is more efficient to do an additional "short" numerator decomposition before the algebraic dependence decomposition (see Algorithm 5), in order to simplify the denominators. For this we repeatedly apply Algorithm 4, which is identical to Algorithm 3 with the exception, that whenever the remainder r is nonzero, we return the input and do not decompose further since the term corresponding to r would not have a smaller denominator anyway. Note that this is only effective, because most of the rational functions we are interested in (i.e. those arising from IBP-reductions) have the property, that the numerator is already contained in the ideal q 1 , . . . , q m , such that the remainder r becomes 0. Thus, while it is not needed in order to get a decomposition fulfilling the conditions of Theorem 3.5, the insertion of a short numerator decomposition (lines 6 to 8 in Algorithm 5) reduces runtimes.

Algorithm 4 shortNumeratorDecompStep (short numerator decomposition step)
Output: set of rational functions with sum f /g 1: calculate the reduced Gröbner basis G of q 1 , . . . , q m as well as the transformation matrix T from the generators q i to G Lemma 3.12. Algorithm 5 terminates for any input f /g and returns a partial fraction decomposition of f /g satisfying all three conditions in Theorem 3.5.
Proof. As shown above, Algorithms 1 to 4 applied to any rational function f /g always return a set of rational functions with sum f /g. So in Algorithm 5, at all times the elements of D sum up to the input of the algorithm. It is also easy to see, that if Algorithm 3.5 terminates, the returned decomposition D indeed fulfils conditions (1) to (3): If no term of the decomposition is decomposed further when applying Algorithm 1 (i.e. the first while loop terminates), then by Lemma 3.6 in each denominator the irreducible factors q i have a common zero. Similarly, if a rational function is not decomposable by Algorithm 2, then by Lemmata 3.9 and 3.10 and Corollary 3.8 the q i are algebraically independent. And finally, if a rational function f /g is not decomposed further in Algorithm 3, this means that f = r in Equation (3.12) and thus the numerator is reduced with respect to the ideal generated by the factors q i in the denominator.
Also, Algorithm 2, 3 and 4 only ever change the denominators by removing irreducible factors (and changing their exponents) and thus preserve the properties (1) and (2) in Algorithm 5 Partial fraction decomposition Input: rational function f /g where f, g ∈ K[x 1 , . . . , x d ] Output: partial fraction decomposition as a set of rational functions. It remains to show that all while loops terminate. As argued above, each term in the decomposition returned by Algorithm 1 has fewer different irreducible factors in its denominator than the input and thus after applying NSSdecompStep to each element of D, all terms have at most m − 1 different irreducible factors q i and thus the first loop terminates by induction on m. The same argument also works for algDependDecompStep. In numeratorDecompStep each element of the returned decomposition has one less irreducible factor in its denominator than the input with the exception of the term corresponding to the remainder r in Equation (3.12), which has the same denominator as the input. However terms of this form are not decomposed further (since in Equation (3.12) r is already reduced with respect to q 1 , . . . , q m ) and can thus be disregarded in the argument. Now by induction on m i=1 e i the fourth loop terminates as well. A very similar argument works for shortNumeratorDecompStep.
Since the calculation of annihilating polynomials can still be quite slow for some of the more complicated rational functions, we make the following modification to the algorithm. Remark 3.13 (simplified algDependDecompStep). In Algorithm 2 it is also possible to use an annihilating polynomial for q 1 , . . . , q m rather than q e 1 1 , . . . , q em m . Instead of Equation 3.10 we then get the decomposition where p = c α y α + β c β y β is the annihilating polynomial and c α y α a term of minimal degree as in Equation (3.9). Since the polynomials q i are of lower degree than q e i i , this will speed up the calculation of annihilating polynomials at the cost of needing more steps in the algebraic dependence decomposition in Algorithm 5, since the number of different irreducible denominator factors then does not decrease in every step. (If β i < α i + e i for all i, it stays the same.) In fact it is not at all clear that Algorithm 5 terminates with the simplified algDependDecompStep and indeed this depends on the choice of α. However, if α is chosen minimal with respect to the graded reverse lexicographic ordering > grevlex on K[y] as defined in Equation (3.5), it can be shown, that Algorithm 5 still terminates with a correct decomposition: Proof. All we have to show is that the third while loop in Algorithm 5 terminates, the rest follows as in the proof of Lemma 3.12. For this, take any sequence f 1 /g 1 , f 2 /g 2 , . . . of rational functions such that f i+1 /g i+1 is one of the terms in algDependDecompStep(f i /g i ).
It is enough to show, that in each such sequence eventually a rational function is reached, which has fewer different factors in its denominator or satisfies (2) already. Assume this is not the case. Then all the denominators g i have the same irreducible factors (with different exponents). Thus in each call of the simplified algDependDecompStep the same annihilating polynomial p is chosen (assuming a deterministic implementation of the algorithm). Write p = c α y α + r+s j=1 c β (j) y β (j) (r, s ∈ N) (3.14) where |β (j) |= |α| for j ≤ r and |β (j) |> |α| for j > r. Since y α is minimal with respect to > grevlex , for each j = 1, . . . , r there exists an index k j ∈ {1, . . . , m} such that Without loss of generality we may assume that k 1 ≥ . . . ≥ k r . If we factorize the denomi- for some index j i depending on i. Since we assumed that in g i+1 no irreducible factor vanishes, it holds e l . If j i ≤ r, it stays the same. Thus the case j i > r can only occur for finitely many i and after that it always holds j i ≤ r, but then |β (j i ) |= |α| and by Equations (3.15) and (3.17) it holds Since the exponents e (i) l have to stay positive and k 1 ≥ . . . ≥ k r , in the sequence β (j 1 ) , β (j 2 ) , . . . the multi-index β (1) can only appear finitely often and after that β (2) can only appear finitely often and so on. But f 1 /g 1 , f 2 /g 2 , . . . was an infinite sequence, a contradiction.
Thus in the sequence f 1 /g 1 , f 2 /g 2 , . . . the number of different irreducible factors in the denominators decreases until a rational function is reached, that satisfies condition (2) and the third while loop terminates after finitely many iterations.
Remark 3.14. The multivariate partial fractioning can be combined with the rational reconstruction scheme for commutative algebra developed in [69,70] as long as a consistent factorization patterns can be guaranteed.
Now apply Algorithm 5 using the graded reverse lexicographic ordering with x 1 > · · · > x 5 and employ the modification to algDependDecompStep discussed in Remark 3.13. For simplicity we omit the syzygy reduction step described in Remark 3.11.

Nullstellensatz decomposition
As with most entries of this IBP-matrix, the denominators have already a common zero, namely 0, since none of the factors q i have a constant term. Thus Algorithm 1 (NSSdecompStep) does nothing.

(simplified) algebraic dependence decomposition
For the first two terms the Jacobian matrices have full rank and therefore Algorithm 2 (algDependDecompStep) does nothing. For the third term the Jacobian matrix is and has only rank 3, so q 1 , q 3 , q 4 , q 5 are algebraically dependent. Indeed it is obvious, that q 1 = x 1 , q 4 = x 1 − x 5 and q 5 = x 5 are even linearly dependent and thus a possible annihilating polynomial for q 1 , q 3 , q 4 , q 5 is p = y 1 − y 3 − y 4 ∈ R[y 1 , y 2 , y 3 , y 4 ] (3.25) leading to the relation Now {q 3 , q 4 , q 5 } as well as {q 1 , q 4 , q 5 } are algebraically independent and we are done. Note that the exponent of q 5 increased and the number of irreducible factors in the denominators stayed the same (4), but the number of different irreducible factors decreased from 4 to 3. Overall, we now have and all terms fulfil conditions (1) and (2) of Theorem 3.5.

numerator decomposition
The first two numerators (2 and −1) are obviously reduced with respect to q 1 , q 3 , q 4 and q 1 , q 2 , q 4 respectively. So Algorithm 3 (numeratorDecompStep) does nothing. For the third and fourth term we get: reduced Gröbner basis of q 3 , q 4 , q 5 : reduced Gröbner basis of q 1 , q 3 , q 5 : Thus, merging terms with the same denominator, we get in total where all terms satisfy conditions (1) to (3).
Example for syzygy reduction: If we had done the syzygy reduction step, for the first step in the short numerator decomposition above, i.e. the decomposition arising from the division expression we would have to calculate a Gröbner basis of the syzygy module of the ideal q 1 , q 2 , q 3 , q 4 , q 5 . Using Singular we get the reduced Gröbner basis with respect to the graded reverse lexicographic ordering with priority to monomials (see [68,Definition 2.3.1]). The original relation (3.32) corresponds to the module element (0, −2x 1 + 3x 4 , 0, 0, x 2 + x 3 − x 4 ) and dividing by the Gröbner basis yields the remainder (−2x 3 , 3x 4 + 2x 5 , −x 5 , 0, 0). So we would use the relation instead of relation (3.32), leading to the decomposition step This simple example may not show, why syzygy reduction should be an advantage, since here the decomposition seems to get longer, but, as mentioned above, for more complicated input functions we observe a significant improvement of the performance of the algorithm.

Examples
In the following, we discuss the application of the partial fraction decomposition to IBP reduction coefficients of Feynman integrals, in examples of various complexity, and with and without use of a UT basis.

A baby example: Nonplanar two-loop four-point with an external massive leg
In this subsection, we present a "baby" example, one-massive crossed box, showing how partial fraction decomposition simplifies the IBP reduction coefficients. The physical kinematic conditions are that p 1 , p 2 and p 3 are massless, while p 2 4 = m 2 , 2p 1 · p 2 = s and 2p 2 · p 3 = t. The propagators are The parameters are thus ǫ = (4 − d)/2, s, t, and m 2 .
We study the IBP reduction coefficients of integrals in the sector (1, 1, 1, 1, 1, 1, 1, 0, 0) with the ISP degrees up to 5. This is a simple example, the IBP reduction can be easily done with LiteRed/FIRE6 [9,15]. There are 29 master integrals. With LiteRed/FIRE6's master integral choice, the byte size of the IBP reduction coefficients is around 9.5MB.
We discuss the coefficients in more detail, listing the irreducible denominator factors (poles) below: It is not surprising that there is a pole −10m 2 ǫ − 6m 2 + 12sǫ + 8s , with the dependence in both ǫ and the kinematic parameters. There is also a nonlinear pole, m 2 s − m 2 t − s 2 − st occurring in the list above.
We then convert the IBP reduction coefficients to a UT basis. It is easy to find the UT basis via leading singularity analysis or Wasser's dlog algorithm [57,71]. The IBP reduction coefficients of the UT basis clearly have simpler poles: We find the previously occurring factor −10m 2 ǫ − 6m 2 + 12sǫ + 8s with mixed dependence in ǫ and kinematic variables is now absent. Furthermore, all the kinematic dependent poles are symbol letters, as seen by a comparison with the canonical differential equation. The previously occurring denominator factor m 2 s − m 2 t − s 2 − st, which is not a symbol letter, is also absent. Note that in this example, the size of the IBP coefficients with respect to the UT basis, is around 9.0MB. By converting to the UT basis, the byte size of coefficients does not decrease much, but the denominator structure becomes much simpler.
We then apply our implementation of our partial fractioning algorithm to the IBP reduction coefficients, both with respect to the Laporta and the UT basis, to simplify the coefficients.
• After applying the algorithm, the size of IBP coefficients with respect to the Laporta basis (LiteRed/FIRE6) is shortened from 9.5MB to 3.0MB (2.7MB if indexed), simplified by about a factor of 3.4.
• Converting to the UT basis and then applying the algorithm, the resulting coefficients are only of size 1.9 MB (1.5 MB if indexed). With respect to the original Laporta basis a 6.5-times size reduction.
This example indicates that our method works for both the Laporta basis and UT basis, but the size reduction ratio is larger for UT basis. Since this is a baby example, our method runs fast in both cases.

A cutting-edge example: Nonplanar two-loop five-point
In this section, we present a computationally cutting-edge example, the two-loop five-point nonplanar double pentagon. The diagram is shown in Figure 4.2.

(4.4)
Where p i...j = j k=i p k . A UT basis for this diagram and its symbol form was found in ref. [58,72], and the analytic expressions for the master integrals were obtained in ref. [58].
Relying on the module intersection IBP reduction method and its implementation in the Singular-GPI-Space framework for massively parallel computations, the analytic IBP reduction coefficients were calculated for the integrals with ISP up to the degree 4 in the sector (1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0) in ref. [33]. 2 The size of the IBP reduction coefficients with respect to a Laporta basis is 2.4GB (with all parameters analytic).
When reducing target integrals to the Laporta basis, we found some "mixed" denominator factors in the coefficients, which are mixtures of the spacetime parameter ǫ and kinematic variables. They are listed in the following:  As we have observed in ref. [33], if we reduce the target integrals to the UT basis, the size of coefficients is reduced to 712MB. More importantly, the IBP reduction coefficients with respect to UT basis have no "mixed" poles and all kinematic denominators are symbol letters. The irreducible factors in the IBP reduction coefficients with respect to the UT basis are given as, We see that except for the factors only in ǫ, all other factors are (powers of) even symbol letters. (The symbol letters of all two-loop five-point massless topology were obtained in ref. [73].) Note that the last factor above is the Gram determinant G(1, 2, 3, 4). Since and ǫ 5 is a symbol letter, the last factor G(1, 2, 3, 4) is a power of a symbol letter. In addition to the computation in ref. [33], we further checked the IBP reduction of the integrals with ISP up to the degree 5 for the same diagram, and this pole structure property still holds. At this point, it is interesting to compare the pole structure in the UT basis with the same double pentagon diagram reduced in the basis choice of ref. [45]. In ref. [45] G(1, 2, 3, 4), the other 6 nonlinear factors in ref. [45] are not symbol letters.
Despite the fact that the size of the coefficients is already simplified by about 3 times, if we change the basis from Laporta to UT, the coefficients are still huge with a size of 712MB. We now apply our improved Leinartas' algorithm to shorten these coefficients with respect to the UT basis. The size of the coefficients is magically shortened to only 24MB (19MB in indexed form). Compared with the 2.4GB IBP reduction file we started out with, those IBP reduction coefficients are made simpler by over 100 times 3 ! As a comparison, without using the UT basis, our algorithm can also reduce the IBP coefficients size from 2.4G to 864MB. However, the reduction ratio is not as dramatic and the running time is much longer.
In Appendix B we present a visual impression about how powerful our algorithm is: a 5-page-long coefficient is shortened to only 9 lines.

An elliptic example: two-loop four-point with a top quark loop and a pair of external massive legs
Our algorithm also works well for cases without the existence of a UT basis. In this subsection, we present an elliptic example, the double box diagram with one massive internal  The kinetic conditions are p 2 1 = p 2 2 = 0, p 2 3 = m 2 , 2p 1 · p 2 = s, 2p 2 · p 3 = t − m 2 and 2p 2 · p 3 = m 2 − s − t. The propagators are The parameters are ǫ, s, t, m 2 , m 2 t . It is clear that there are fully massive sunset subdiagrams in this topology and the UT basis does not exist.
We have reduced integrals in the sector (1, 1, 1, 1, 1, 1, 1, 0, 0), with the ISP degree up to 5 to the Laporta basis, using FIRE6. The size of the resulting coefficients is in total 175MB.
In applying our algorithm to shorten these coefficients, it is important to pull the nonlinear factors out and do the partial fractions over the linear factors. After applying our algorithm, the size of simplified coefficients is reduced to only 24 MB. This is also a significant simplification, by about 7 times in byte-size.
This example indicates that although one should prefer a UT basis in doing partial fraction, for diagrams without the existence of UT basis, this algorithm is still powerful.

Performance of the algorithm
In this section, we summarize the computing resources used for our examples, and the reduction ratio in different formats.
In all examples a Singular implementation of Algorithm 5 with the improvements described in Remarks 3.11 and 3.13 was used. Table 1 shows the resources used for applying the algorithm to all matrix entries one after the other or in parallel using 32 cores. Due to the simple form of parallelism, the computation will scale similarly up to the number of entries.
When comparing the time taken for each decomposition step, we found that the short numerator decomposition (Algorithm 4) needs 60-95 % of the total runtime. Especially for large numerators and small (by degree) denominator factors, partial fraction decomposition can drastically reduce the size of IBP-matrices, as can be seen in Table 2. Since most of the irreducible factors in the denominators are linear, it makes sense to leave any nonlinear factors untouched in the algorithm. In the tables in this subsection, the symbol "⋄" means that we leave the nonlinear factors untouched in our partial fraction algorithm. The phrase "(Laporta)" or "(UT)" means that we are dealing with coefficients in a Laporta integral basis or a UT basis, respectively.
The use of a UT basis typically leads to a shorter runtime (Table 1) and also reduces the size of the output (Table 2). Finally, instead of writing out the denominator, we can just store in the data structure the indices i and exponents e i of the irreducible factors q i appearing in each denominator together with all factors q i , which also reduces the size (last column in Table 2). We also find an interesting phenomenon that zipping both the input and output files in some examples leads to a further increase of the relative size reduction (see Table 3).

Summary and Discussion
In this manuscript, we develop an improved Leinartas' algorithm of multivariate partial fraction and present an modern implement of this algorithm, to simplify the complicated analytic IBP reduction coefficients in multi-loop computations. We show that for cases with the existence or without the existence of the UT basis, our algorithm works well to reduce the IBP reduction coefficients size. We observe that in the cases we studied, the IBP reduction coefficients in the UT basis have simple structures: (1) the spacetime dimension parameter D factorizes out in the denominators (2) the rest irreducible factors in the denominators are a subset of the symbol letters. Thus usually the UT basis provides a simpler denominator factor list, and our algorithm works particularly well with shorter running time and higher reduction ratio.
In complicated examples, our algorithm achieves dramatic size reduction in the coefficients of IBPs.
We expect that our algorithm will have broad applications in the multi-loop IBP computations, to get easier-to-use analytic reduction results and make the numeric evaluation much faster.
We present a Singular library for our algorithm of multivariate partial fraction. It can be used for simplifying IBP coefficients in general purposes. Furthermore, we expect that the partial-fraction library can be used to simplify multiloop integrand and the transcendental function coefficients in scattering amplitudes, as the partial-fraction examples shown in [51]. We expect that this library can be combined with current finite field and rational reconstruction packages [9,27,[29][30][31] for multiloop scattering amplitude computations. Our library would also find applications in analytic computations outside scattering amplitudes, in broader research areas in theoretical physics.
Besides this partial fraction library, in the future we will also develop an arithmetic library to perform arithmetic computations of ration functions in partial fraction form and keep the output in partial fraction form.
It would also be interesting to study the IBP reduction coefficients in a UT basis in details. After partial fraction, it seems that each team in a coefficient looks much simpler. It is then of theoretical interests to relate these terms to the leading singularities of Feynman integrals.

A Manual of the Partial Fractioning Singular Library
In this section, we give a short outline of how to use the features of the Singular library pfd.lib. Together with a complete documentation, it can be downloaded from https://raw.githubusercontent.com/Singular/Singular/spielwiese/Singular/LIB/pfd.lib and should be placed within the user's Singular search path. The latest release of Singular can be downloaded from the Singular website https://www.singular.uni-kl.de. 4 The website also provides an online documentation of Singular and all libraries distributed with the release.
After starting up Singular, the library can be loaded by typing LIB "pfd.lib"; at the Singular promt. The main algorithm for partial fraction decomposition can be accessed via the procedure pfd. This procedure takes as input two polynomials (numerator and denominator) and returns a partial fraction decomposition encoded as a list. The first entry is a list containing the denominator factors, the second entry is a list of summands, each of which is encoded as a list of numerator, indices of denominator factors and exponents of denominator factors. The decomposition can be displayed using the procedure displaypfd, and checked with the procedure checkpfd, which verifies whether a rational function (first argument) is mathematically equal to a decomposition returned by pfd (second argument) and returns a boolean (see Example A.1). An example of how to use a procedure can be displayed by typing example <name-of-procedure>; at the Singular prompt.
The second argument (the denominator polynomial) can alternatively be given in factorized form (as a list of a Singular ideal generated by irreducible non-constant polynomials and an intvec containing the exponents), in case the denominator factors are known to the user. As an example > pfd(x+2*y, (x+y)^2*(x-y)^3); is equivalent to > pfd(x+2*y, list(ideal(x+y,x-y), intvec(2,3))); Using the procedure pfdMat, we can calculate the decompositions of a matrix of rational functions. The computation is done in parallel, relying on the library parallel.lib. 5 By default, pfdMat also calls checkpfd for each decomposition and ignores nonlinear denominator factors (as described in Section 4).
The input of pfdMat is the name of a .txt-file (as a Singular string), which contains the matrix as a list of lists (row by row) enclosed in the symbols "{" and "}", and separated by commas (see Example A.2). Each rational function has to be an expression of the form "a", "(a)/(b)", "(b)^(-n)" or "(a)*(b)^(-n)", where "n" stands for a positive integer and "a", "b" stand for arbitrary polynomials (using the operators "+", "-", "*", "^" and brackets "(",")"). A minus sign "-" followed by such an expression is also allowed. Note that the library also has options to use the Singular binary serialization data format .ssi for highly efficient input and output from within Singular.
There are four optional arguments which determine whether checkpfd should be applied (-1: exact test, 0: do not apply checkpfd, positive integer: do this amount of probabilistic tests, default value is -1), whether nonlinear factors should be extracted (1 or 0, default value 1), whether additional output files should be created (integer from 1 to 4, default value 1) and whether the algorithm should be run in parallel over all matrix entries (1 or 0, default value 1). The options should be specified in this order. The third optional argument (integer from 1 to 4) controls the output files created: 1: The output, that is, the matrix containing the decompositions, is stored in a .txt-file in indexed form (as described in Section 4). The denominator factors are saved in a separate file and a logfile is created, which protocols runtimes and memory usage. 2: Additionally, the decompositions are saved in non-indexed form. 3: Additional .ssi-files containing the input and output matrix as well as some intermediate results are created. 4: Additionally to mode 3, for every rational function, the result of pfd is immediately saved in a seperate .ssi-file. (This creates a file for every matrix entry.) For more details refer to the documentation of the library. Before calling pfdMat, a polynomial ring must be defined (as in Example A.1) such that the variable names match the names used in the input file. Furthermore, with the command setcores(n); the number of processor cores used for the parallelization can be set to an integer n. By default, all cores are used.

B An Explicit Example of the Size Reduction
In this Appendix, we explicitly show an IBP reduction coefficient, before and after our partial fraction computations to see the size reduction.