An analytic result for the two-loop seven-point MHV amplitude in N=4 SYM

We describe a general algorithm which builds on several pieces of data available in the literature to construct explicit analytic formulas for two-loop MHV amplitudes in N=4 super-Yang-Mills theory. The non-classical part of an amplitude is built from $A_3$ cluster polylogarithm functions; classical polylogarithms with (negative) cluster X-coordinate arguments are added to complete the symbol of the amplitude; beyond-the-symbol terms proportional to $\pi^2$ are determined by comparison with the differential of the amplitude; and the overall additive constant is fixed by the collinear limit. We present an explicit formula for the seven-point amplitude $R_7^{(2)}$ as a sample application.


Introduction
This note is a natural continuation of the research program that has been pursued in the papers [1][2][3][4] and has been heavily guided by earlier mathematical work of Goncharov on both the structure of polylogarithm functions and on cluster algebras (see in particular [5] and [6]). The physics goal of our program is, narrowly, to understand the rich mathematical structure of two-loop amplitudes in N = 4 supersymmetric Yang-Mills (SYM) theory [7], and more broadly, to develop a toolkit of mathematical techniques useful for unlocking the structure of multi-loop amplitudes in general field theories. An example of the latter is the symbol calculus, which following [1] has become a very useful workhorse for dealing with the kinds of polylogarithm functions which are ubiquitous in multi-loop calculations, while the intimate connection between amplitudes and cluster algebras unearthed in [3] is a prime example of the very special structure exemplified by SYM theory in particular.
In this paper we tie together several threads which have run through the earlier work [1-4] but have not yet been fully wrapped up. Our immediate goal will be to construct an explicit analytic formula for the two-loop seven-point MHV amplitude R (2) 7 in SYM theory 1 . While it may be interesting in its own right, we do not view the formula itself as the primary result of this paper. Rather our aim is to first review the various obstacles that arise in the pursuit of writing such analytic formulas, and then to bring together the relevant ideas and results from [1][2][3][4]13] to argue that the problem of constructing analytic formulas for R (2) n for any 1 More precisely R (L) n stands for the n-particle L-loop remainder function, after the infrared singularities of the amplitude have been subtracted in a now standard way following [8,9]. Dual conformal symmetry requires R (L) n to vanish for n < 6 at any loop order [10,11], but a numerical study [12] established that R (2) 6 is nonzero. desired n may be considered "solved" (modulo the availability of sufficient computer power, of course). By this we mean that we describe an algorithm which, building on the scaffolding provided by Caron-Huot's computation [14] of the symbol of R (2) n , may be used to construct an analytic formula for any desired n. The result for n = 6 appeared in [1], and we present a result for n = 7 here as a specific application of our algorithm. Numerical studies of R (2) n have been carried out for n = 6 in [12,15,16] and for higher n in [17,18], and explicit formulas are known for the special case when all particles have momenta lying in a common R 1,1 subspace of four-dimensional Minkowski space [19][20][21].
We do not address here the question of how the computational complexity of our algorithm scales with n because we hope that this will ultimately be an irrelevant question. As has happened often before in physics, and especially so in the study of SYM theory, we believe that once suitably packaged and digestible results accumulate for various relatively small values of n, the structure might become clear enough that one can extrapolate an all-n formula, which could subsequently be proven to be correct or at least could be checked to be consistent with all known properties of the true amplitudes.
Amplitudeology is a data-driven enterprise where insights gleaned by analyzing the results of a seemingly difficult calculation have often revealed hidden structure which trivialize the original calculation, and help to make the next set of calculations simpler (or even just possible). We very much anticipate that the formula we obtain for R (2) 7 will not be the simplest or "best" one possible, but hope that the algorithm described in this paper will prove useful for generating new data for the amplitude community.
Section 2 contains some brief background material and definitions. Section 3 comments on the difficulties of integrating symbols in general, and on the tools we employ to overcome these difficulties. We also discuss the relation of our work to a complementary approach to similar problems which has been used by Dixon and collaborators to achieve several impressive results on multi-loop six-point amplitudes [22][23][24][25]. Section 4 outlines our general algorithm, while section 5 discusses its application to the specific case of R (2) 7 , culminating in the construction of a complete analytic formula for this amplitude, some properties of which are discussed in section 6.

Background
This section is a brief review of some of the more advanced mathematics that will appear throughout the rest of the paper, namely the coproduct δ and cluster algebras. For a more thorough introduction to these topics, see [2,4].
The space of polylogarithm functions modulo products is a Lie coalgebra with coproduct 2 δ. The coproduct maps a polylogarithm function of weight 4 (the case of relevance to twoloop amplitudes) into two component spaces, Λ 2 B 2 and B 3 ⊗ C * . Here, B k refers to the Bloch group, which roughly speaking represents the space of classical weight k polylogarithm functions modulo functional relationships amongst Li k and modulo products of functions of lower weight. Elements of B k are linear combinations of objects denoted by {x} k , which stands for the equivalence class containing the function − Li k (−x). The Λ 2 B 2 component of the coproduct captures the obstruction to writing a function in terms of the classical polylogarithm functions Li k [27,28]. The B 3 ⊗ C * component of the coproduct encapsulates all of the intrinsically weight 4 terms in a function.
Cluster algebras are generated by a preferred set of variables ("cluster coordinates") grouped in disjoint sets called clusters related to each other by a transformation called mutation. The cluster algebra relevant for two-loop MHV scattering amplitudes in SYM theory is the Gr(4, n) Grassmannian cluster algebra, which is related to the kinematic configuration space for n particles, Conf n (P 3 ). These coordinates come in two flavors, Aand X -coordinates. An example of A-coordinates are the standard Plücker coordinates ijkl = det(Z i Z j Z k Z l ) (in terms of momentum-twistor variables [29]). Slightly more complicated examples that will appear later in this paper are of the type a(bc)(de)(f g) ≡ abde acf g − abf g acde , Cluster X -coordinates are a special class of cross-ratios built from A-coordinates. These two topics, polylogarithms and cluster algebras, merge beautifuly in the arena of SYM theory. Firstly, only cluster A-coordinates for Gr(4, n) appear in the symbol for R (2) n . Moreover, the coproduct of R (2) 7 was calculated in [2] and it was noted that the elements {x} 2 and {x} 3 appearing in the coproduct were cluster X -coordinates of the Gr(4, 7) Grassmannian cluster algebra. Furthermore, it was noted that the function for R (2) 6 obtained in [1] can be written purely in terms of classical polylogarithms Li k with (negative) X -coordinates as arguments. In this paper we extend these connections to a general algorithm for constructing the function R (2) n . Let us note that the Gr(4, n) cluster algebra has infinitely many A-and X -coordinates when n > 7, but we believe that this presents no obstruction to our algorithm since it is evident from the result of [14] that only finitely many (in fact, precisely 3 2 n(n − 5) 2 ) of the A-coordinates actually appear in the two-loop MHV amplitude R (2) n , and our experience has shown that the "most complicated part" of these amplitudes (see [4] for details) can be expressed in terms of the X -coordinates belonging to finitely many A 3 subalgebras of Gr(4, n). For the special cases n = 6, 7, we expect that the two-loop symbol alphabet (which contains already all available A-coordinates) will be sufficient to express all amplitudes (whether MHV or not) to all loop order, but for n > 7 we know of no reason to exclude the possibility that the symbol alphabet could grow larger at higher loops (indeed we expect it to become infinite for ten-point N 3 MHV amplitudes starting already at only two loops).
A salient feature of cluster X -coordinates is that they are positive when evaluated inside the positive Grasmmannian, defined as the subset of the Euclidean domain where ijkl > 0 whenever i < j < k < l. This is incredibly important because it allows us to impose analyticity inside the positive domain with relative ease (since Li k (x) is smooth for x < 0), in particular without having to worry about branch cuts. It would be interesting to check the extension of our final formula to more general Euclidean kinematics, for which it would be necessary to specify where to take the branch cut of each Li k (x) (as was done for example in [1] for n = 6). It would also be interesting to explore the analytic continuation to other regions outside the Euclidean domain, for example to make contact with work on the seven-point amplitude in the multi-Regge regime [30][31][32].
Before we describe our algorithm we would first like to clarify the difficulties that our cluster algebraic approach allows us to overcome.

The Problem of Integrating Symbols
The problem of finding an explicit polylogarithm function whose symbol matches a given random (but integrable) symbol is hopeless; no algorithm exists in general. Fortunately, amplitudes in SYM theory do not have random symbols, nor do we expect them to be expressed in terms of completely random functions.
In such happier cases the problem can be tractable if the desired function may be expressed in terms of some class of generalized polylogarithm functions whose arguments are all drawn from some particular finite collection of well-behaved variables. Then the problem of integrating the symbol becomes simply one of linear algebra: one writes a general linear combination of the functions in the ansatz, and chooses the coefficients to match the desired symbol. Ideally, the ansatz should be just big enough to contain the answer, and not too big. If the ansatz is too overcomplete 3 there can be considerable ambiguity in choosing a functional representative for the integrated symbol.
If one were merely interested in being able to obtain numerical values for SYM amplitudes, then such ambiguity would be of little concern. If the goal however is to unlock their mathematical structure, then it is desirable to have functional representations which manifest, to the extent possible, all of their known properties. From this point of view, any ambiguity in how to write an amplitude is seen as an inefficiency, a wasted opportunity.
In a series of papers [22][23][24][25], Dixon and collaborators have pursued one approach to this problem by studying "hexagon functions", defined as polylogarithm functions whose symbol can be expressed in terms of a certain 9-letter alphabet (in our terminology, the alphabet of A-coordinates for the Gr(4, 6) Grassmannian cluster algebra) and which have the appropriate analytic structure for scattering amplitudes (specifically, that they must be analytic everywhere inside the Euclidean domain, with branch points on the boundary of the Euclidean domain when i i+1 j j+1 = 0 for some i, j). By systematically classifying such hexagon functions through weight eight, and by using physical input about the near-3 Some overcompleteness is inevitable in our approach due to Li k identities involving configurations of points in projective space (see for example [27,28,33]), but such identities are rare when the arguments are restricted to be (negative) cluster X -coordinates. The only currently known non-trivial identities of this type are the 5-term A2 identity (Abel's identity) for Li2 and the 40-term D4 identity for Li3 [2]. collinear limit derived from the Wilson loop OPE approach [34][35][36][37][38] and from the multi-Regge limit [23,[39][40][41][42][43][44][45][46], they have determined analytic expressions for the six-point NMHV amplitude at two loops, and the six-point MHV amplitude at three and four loops.
It would be extremely interesting to pursue a similar approach for n > 6, by exploring for example the space of "heptagon functions". Our trepidation to take this route stems from the fact that the required symbol alphabet grows rapidly with n: as mentioned above, the symbol alphabet for R (2) n has 3 2 n(n − 5) 2 entries [14], so the space of weight-four symbols has dimension 4 O(n 12 ).
We have pursued instead the somewhat orthogonal approach of organizing our calculations not from left-to-right in the symbol, but rather in order of decreasing mathematical complexity of the functional constituents. At weight four, this means that we first focus our attention on the "non-classical" part of the amplitude: the Λ 2 B 2 component of its coproduct. The remaining purely classical pieces of an amplitude can be systematically computed in order from most to least complicated by following the procedure outlined in [1]. This approach has the disadvantage of leaving the analytic properties of amplitudes obscure, while it has the advantage of making some remarkable mathematical properties-the relation to the cluster structure on the kinematic domain-manifest.
The very first step in this approach is the one most fraught with peril, as we now explain. The Λ 2 B 2 component of the coproduct of R (2) n can be expressed [2,13] as a linear combination of various {x i } 2 ∧{x j } 2 where the x's are drawn from the X -coordinates of the Gr(4, n) cluster algebra. Moreover, the x's always appear together in pairs satisfying {x i , x j } = 0 with respect to the natural Poisson structure on the kinematic domain Conf n (P 3 ); this implies that each pair of variables generates an A 1 × A 1 subalgebra of the Gr(4, n) cluster algebra.
For several years a guiding aim of this research program, strongly advocated by Goncharov, has been that it should be possible to write each amplitude under consideration as a linear combination of special functions associated with smaller building blocks ("atoms"). For example, it is well-known that the function 5 has the simple Λ 2 B 2 coproduct component {x} 2 ∧ {y} 2 . Therefore one might be tempted to construct the non-classical part of a desired R n by writing down an appropriate linear combination of L 2,2 (x i , x j ) functions; the difference between this object and R (2) n must then be expressible in terms of the classical functions Li k only. 4 This is rather too pessimistic; the analyticity condition cuts this down by one power of n and the integrability condition no doubt cuts down by some more powers of n. 5 We caution the reader that several variants of this function exist in the literature, beginning with [47], all of which differ from each other by the addition of terms proportional to Li4, or products of lower-weight Li k 's. In fact even in this short paper we will use a second variant K2,2 momentarily. All of these variants have the same Λ 2 B2 coproduct component. The particular L2,2(x, y) used here may also be expressed as L2,2(x, y) = 1 2 Li2,2(x/y, −y) − (x ↔ y) in terms of the Li2,2 function.
The fatal flaw in this approach is that while L 2,2 (x i , x j ) indeed has a simple coproduct, it is poorly adapted to applications where one wants to manifest cluster structure because its symbol has some entries of the form x i − x j , which is never expressible as a product of cluster A-coordinates (and thus can never be an X -coordinate). Therefore one would have to considerably enlarge the symbol alphabet under consideration in order to fit all of the classical pieces of the amplitude left over by subtracting a linear combination of L 2,2 's. Just as bad, one would almost inevitably generate Li k functions whose arguments range over the entire real line, greatly complicating the problem of arranging all of the branch cuts of the individual terms to conspire to cancel out everywhere in the positive domain.
So if we want to maintain a connection to the cluster structure (and, more practically, to avoid enormously complicating the calculation by being forced to clean up unwanted mess in the symbol), we should abandon the idea that each individual term {x i } 2 ∧ {x j } 2 may be thought of as an atom 6 . The problem of identifying the smallest building block manifesting all of the known cluster properties of R (2) n was solved (at least, for a few of the simplest cluster algebras, and more generally conjectured) in [4]. The solution is a function associated to the A 3 cluster algebra which we can write in the form 3) (3.4) The expression for K 2,2 given here differs from the one presented in [4] by the addition of terms proportional to products of logarithms as well as the final Li 2 Li 2 term, none of which affect the coproduct of K 2,2 .
As long as the three x i generate an A 3 algebra x 1 → x 2 → x 3 (which could be a subalgebra of a larger algebra), the A 3 function accomplishes a remarkable feat: • the Λ 2 B 2 component of its coproduct, • the B 3 ⊗ C * component of its coproduct can be written in terms of X -coordinates (the Li 4 term in K 2,2 is crucial here); • its symbol can be written entirely in terms of A-coordinates (here the Li 3 log term is crucial); • and it is smooth and real-valued everywhere inside the positive domain (i.e., as long as x 1 , x 2 , x 3 > 0), thanks to the terms which were added compared to [4].
The Li 2 Li 2 term in (3.2) is completely innocuous and was chosen for inclusion because it was observed to nicely package together most of the Li 2 Li 2 terms in the amplitude R 7 . It would be very interesting to see if a more optimal packaging of subleading terms could be obtained, whether for n = 7 or even for all n.
Working with A 3 functions, rather than the underlying individual L 2,2 's, therefore allows us to avoid having to enlarge the symbol alphabet beyond the set of cluster A-coordinates. Moreover, when expressing the classical contributions to an amplitude we are able to restrict our attention to the functions Li k (−x), which are smooth and real-valued throughout the positive domain as long as the arguments x are always taken from the set of cluster Xcoordinates.

The Algorithm for R (2) n
The algorithm is naturally broken into four steps. (1) As discussed in the previous section, we start by writing down a linear combination of A 3 cluster functions with the same Λ 2 B 2 content as the desired R (2) n . After subtracting this linear combination from the amplitude we are left with a function which (2) we express in terms of the classical polylogarithms Li k following the algorithm described in [1]. One minor difference with respect to [1] is that we prioritize the Li 4 terms over those which can be written as products of lower-weight Li k 's, since only the former contribute to the B 3 ⊗ C * component of the coproduct. So, to be explicit, we proceed in the following order: f A 3 , Li 4 , Li 2 Li 2 , Li 2 log log, Li 3 log, log log log log.
At this stage we have a function with the same symbol as the amplitude, so the difference is expected to be equal to π 2 times polylogarithm functions of weight two. We ought not find any terms proportional to iπ times a function of weight three since at each step we work with functions that are manifestly free of branch cuts in the positive domain. (3) The O(π 2 ) terms can be found by comparison to the known [3,14] all-n formula for the differential dR (2) n of the amplitude. (4) Finally, the overall additive constant in the amplitude can be determined by enforcing smoothness of the collinear limit R  We present here some details about the expression for R (2) 7 generated by our algorithm. Some of the contributions, in particular the terms of the form Li 2 log log or log log log log, are too numerous to reasonably display in the text, so we refer the reader to the Mathematica file associated to this note for the full symbolic result 7 .
We begin by recalling the representation of the non-classical pieces of R As we emphasized in [4], the difference between R 7 and (5.1) is a weight-four polynomial in the functions Li k (−x) for k = 1, 2, 3 (and π 2 ), with arguments x drawn from the 385 X -coordinates of the Gr(4, 7) cluster algebra.
The B 3 ⊗ C * component of the coproduct of R 7 was computed in [2]. We find that the Li 4 terms must be added to eq. (5.1) in order to correctly reproduce the full coproduct of the amplitude. At this stage we know that the difference between R (2) 7 and eqs. (5.1) plus (5.2) is a product of Li k functions of weight strictly less than four. Following the procedure outlined in [1] we find that the missing Li 2 Li 2 terms (beyond the ones that we have already snuck in via eq. (3.2)) are In case of any discrepancy between formulas in the text and the Mathematica file, the latter is authoritative.
We also find the Li 3 log terms The remaining Li 2 log log and log log log log terms which must be added to eqs.
7 are too numerous to display here and are recorded in the attached Mathematica file.
Next we turn to the problem of fixing "beyond-the-symbol" terms, given by numerical constants (in this application, rational numbers times π k ) times functions of weight 4 − k.
The terms proportional to π 2 may be deduced by computing the full differential of all of the terms we have accumulated so far, and subtracting the result from the known analytic formula for dR (2) 7 [3,14]. The result is a linear combination (with rational coefficients) of terms like π 2 log(a 1 )d log a 2 for various A-coordinates a 1 , a 2 . This can be integrated analytically to a linear combination of terms like π 2 Li 2 (−x i ) and π 2 log(x j ) log(x k ) with all arguments being X -coordinates. In this manner we find that the π 2 Li 2 terms in our representation of R up to a single overall additive constant 8 . This constant, expected to be a rational number times π 4 , can be determined by the requirement that R (2) 7 → R (2) 6 smoothly in the collinear limit. We choose to parameterize the 6 7 collinear limit following [14] by replacing There are no ζ(3) log terms since dR (2) n is known [14] to not contain any terms proportional to ζ(3).
with α and β being arbitrary parameters, and then taking the limit t → 0. As long as the starting point (Z 1 , . . . , Z 7 ) is inside the positive domain and α and β are chosen to be positive, then there exists a finite t 0 > 0 such that (Z 1 , . . . , Z 6 , Z 7 (t)) lies in the positive domain for all 0 < t < t 0 . Then the collinear limit 9 7 (Z 1 , . . . , Z 6 , Z 7 (t)) = R 6 (Z 1 , . . . , Z 6 ), (5.7) together with the known formula [1] for R (2) 6 , determines the overall additive constant in R 7 . Each cross-ratio appearing our formula for R (2) 7 approaches either 0, ∞, or a finite value in the limit t → 0 + , so it is a simple matter to compute the limit of the formula using the asymptotic behavior of the polylogarithm functions together with the asymptotic expansions (when x, t and a are positive) Li 2 (−x), (5.12) where ∼ signifies the omission of terms which vanish as powers of t (or powers of t times powers of log t). We have taken the limit of R

Conclusion
We have described an algorithm for bootstrapping explicit analytic formulas for the two-loop n-point MHV remainder functions R (2) n in SYM theory from known results in the literature for the symbol [14] and the differential [3] of R (2) n . The algorithm expresses these amplitudes as linear combination of A 3 cluster polylogarithm functions [4] and (products of) classical polylogarithm functions Li k (−x) with arguments x drawn from the set of X -coordinates [6] for the Gr(4, n) cluster algebra. Each building block utilized in the construction is manifestly realvalued and singularity-free inside the positive domain, ensuring that the generated formula for R (2) n manifests this property as well. As a sample application of this algorithm we have constructed an explicit analytic representation for R (2) 7 . We would like to emphasize that we have put almost no effort into optimizing our result, instead opting to see what we get by treating this as nothing more than a "shift-enter" computation. Although we were somewhat surprised that the answer came out as compact as it did, we anticipate that our result for R (2) 7 will not be the "best" representation possible but hope that it may provide a useful starting point for further exploration of the structure of this amplitude. In that sense we suspect our representation for R (2) 7 may be more similar to the DDS formula [49,50] than to the GSVV formula [1] for R (2) 6 . Let us end by speculating about some possible ways in which our representation for R (2) 7 (and R (2) n more generally) ought to be improved. As a general statement, it is our hope that amplitudes should admit natural functional representations which are as canonical as possible 12 and that any unexplained ambiguity in how to write an amplitude should be a cause for disappointment. This is because our ultimate dream is that it should be possible to formulate a list of physical and mathematical constraints which determine SYM theory amplitudes uniquely, and any free parameter appearing in the representation of some amplitude represents a lost opportunity to manifest some otherwise hidden property it satisfies.
For example, we find it suboptimal that (as mentioned in [4]) the non-classical part of R (2) 7 may be expressed in many different ways in terms of A 3 functions. It would be fantastic if one could identify some particular A 3 subalgebras inside the Gr(4, 7) cluster algebra (or Gr(4, n) more generally) which are for some reason preferred for expressing two-loop MHV amplitudes. Moreover it would be nice if all of the classical terms tabulated in section 4 could be absorbed into an appropriate redefinition of the A 3 function given in eq. (3.2) so that the complete formula for R (2) 7 , or even all R (2) n , could be written as a simple linear combination of suitably defined A 3 functions and nothing else. If this magic A 3 function were positive-valued inside the positive domain, it would furthermore manifest the conjectured [48] positivity of R (2) n itself. It would be ideal if this could be done for all n in a way which manifests collinear limits, with the various A 3 functions appearing in R (2) n tending either to zero or to n−1-point A 3 functions in the collinear limit. Finally, perhaps it is not the A 3 function but something else which is the right building block for realizing all of these dreams.