Rooting Out Letters: Octagonal Symbol Alphabets and Algebraic Number Theory

It is widely expected that NMHV amplitudes in planar, maximally supersymmetric Yang-Mills theory require symbol letters that are not rationally expressible in terms of momentum-twistor (or cluster) variables starting at two loops for eight particles. Recent advances in loop integration technology have made this an `experimentally testable' hypothesis: compute the amplitude at some kinematic point, and see if algebraic symbol letters arise. We demonstrate the feasibility of such a test by directly integrating the most difficult of the two-loop topologies required. This integral, together with its rotated image, suffices to determine the simplest NMHV component amplitude: the unique component finite at this order. Although each of these integrals involve algebraic symbol alphabets, the combination contributing to this amplitude is---surprisingly---rational. We describe the steps involved in this analysis, which requires several novel tricks of loop integration and also a considerable degree of algebraic number theory. We find dramatic and unusual simplifications, in which the two symbols initially expressed as almost ten million terms in over two thousand letters combine in a form that can be written in five thousand terms and twenty-five letters.


Introduction
The analytic structure and functional form of scattering amplitudes computed in (perturbative) quantum field theory continues to hold interesting surprises. Beyond leading order, amplitudes are typically transcendental functions -the simplest of which are known as generalized 'polylogarithms': iterated integrals over differential forms with exclusively simple (logarithmic) poles in each integration variable. Although wider classes of functions are known to be needed for most amplitudes (see e.g. [1][2][3][4][5][6][7][8][9][10][11][12]), polylogarithms are often sufficient at low loop order and particle multiplicity, and are by far the best understood. Much of this understanding has emerged in the context of 'symbology' [13,14], which exploits the coproduct and Hopf algebra structure of these functions [15][16][17][18][19]. (See e.g. [20] for an introduction to these ideas.) One of the key aspects of symbols is that they encode complete information about the (iterated) branch cut structure of polylogarithms in terms of an alphabet of primitive logarithmic branch-points called letters. Knowledge about the alphabets relevant for certain polylogarithmic amplitudes has allowed incredible reaches into perturbation theory, well beyond what would be possible through any known (e.g. Feynman) diagrammatic expansion. Examples of such triumphs include the recent determination of certain six-particle amplitudes in planar maximally supersymmetric (N = 4) Yang-Mills theory (sYM) through seven loops [21][22][23][24][25][26][27][28][29], and through four loops for seven particles [30][31][32].
A microcosm of progress in scattering amplitudes more broadly, these calculations have fueled and been fueled by concrete examples. One still mysterious aspect of most known examples in this theory is that their symbol alphabets are found to be generated by JHEP02(2020)025 cluster mutations [33] -rational transformations that define cluster algebras [34]. Such algebras naturally appear in the context of the positive Grassmannian geometry of on-shell scattering amplitudes [35], and seem to encode physical aspects of amplitudes such as the Steinmann relations [36][37][38][39]; they also encode further types of structure whose physical interpretation remains less clear [40][41][42].
Despite the intriguing role played by cluster algebras, it has long been known that even in planar sYM this story cannot be complete. Not only are non-polylogarithmic functions needed for most scattering amplitudes (at sufficiently high multiplicity or loop order), but even most polylogarithmic (N k≥2 MHV) amplitudes at one loop require symbol letters that are not rationally related to any known cluster algebra. These algebraic roots arise, for example, as Gram determinants in the analysis of Landau singularities (see e.g. [43][44][45][46]).
It is therefore natural to wonder what kinds of letters arise in this theory's MHV and NMHV amplitudes, which have been argued to be polylogarithmic to all orders [47]. The symbol of all two-loop MHV amplitudes -computed in [48] -involve only letters drawn from the coordinates of Grassmannian cluster algebras (which are related to canonical coordinates on the space of positive momentum-twistor variables) [33,40]. Similarly, the symbol of the two-loop seven-point NMHV amplitude (computed in [49]) is entirely composed of cluster coordinates. Whether or not this continues to hold beyond seven particles constitutes an important open question. In particular, in [45] it was suggested that square roots could appear in NMHV amplitudes at two loops (and in MHV amplitudes at three loops) starting for eight particles.
In this work, we probe the existence of these algebraic roots by directly computing a particular component of the eight-point two-loop NMHV amplitude. While we are not currently able to compute this component in full kinematics, it is sufficient to compute it analytically at a single (sufficiently generic) kinematic point. Note that it is, however, necessary to consider an entire amplitude, as it is well known that local integral representations can involve 'spurious' symbol letters (or even 'spurious' non-polylogarithmic parts -see e.g. [50,51]) that cancel between terms. Surprisingly, in the component under study, this is precisely what happens: the local integrals that contribute to the amplitude individually involve quadratic roots, but these roots cancel. This of course has no implications for whether square roots will appear in other NMHV component amplitudes.
We begin in section 2 by defining the particular component we will examine. In section 2.1, we describe a direct integration strategy that can be used to compute it at a kinematic point; while it is not linearly reducible in the conventional sense, we find the integral can be divided up into parts that can be integrated after respective rationalizing changes of the integration variables. The resulting functional form involves many spurious algebraic letters in addition to the expected ones, so algebraic identities are required to eliminate them at symbol level, as we describe in section 2.2. While the individual integrals contributing to this component contain quadratic roots, we show in section 2.3 that the component as a whole does not. We then conclude, discussing further questions and potential applications.
We also present two appendices. Appendix A discusses a nice basis of R-invariants for this amplitude, while appendix B reviews pertinent notions in algebraic number theory.

JHEP02(2020)025
We additionally include several pieces of supplementary material: the integrand of the integral we compute as Omega1357Integrand.m, expressions in multiple polylogarithms in Omega1357MPLs.m and Omega3571MPLs.m, and the simplified symbols we obtain as Omega1357Symbol.m and Omega3571Symbol.m. We also include a table of prime factorizations of the symbol letters conjectured in [45] for comparison with our results as PrimeFactorLetters.pdf. 2 The simplest NMHV octagon component amplitude Explicit, prescriptive formulae for all two-loop n-point N k MHV amplitude integrands for planar sYM, which we denote by A (k),2 n , were given in [52] (see also [53]); these amplitudes are expressed in terms of a basis of dual-conformal Feynman integrands involving only local propagators. Each integral in this basis can be Feynman parameterized and conformally regulated as described in [54,55]. These integrals are not all yet known analytically.
Consider for example the local integrand representation of MHV amplitudes at two loops [56,57]: Here, the 'N 1 's indicate specific choices of loop-dependent numerators for these sets of (otherwise ordinary) Feynman propagators as defined in [52]. Among these terms is the integral Ω 1, 3, 5, 7 = 8 1 2 3 4 5 which was referred to as 'octagon K' in [46], where the particular challenges to its direct integration were described at some length (see also [58]). This integral is in fact the most difficult integral topology required for any eight-point amplitude at two loops for the simple reason that it is the only topology that depends on eight dual-momentum points.
(Equivalently, it is the only topology which depends on 9 conformal degrees of freedom.) In general, the ratio function will involve all of the terms in (2.1) -including Ω[1, 3, 5, 7]because the 2-loop MHV amplitude is required to compute the ratio function. No analytic expression for Ω [1,3,5,7] is currently known, making the analysis of any octagon amplitude a considerable challenge. Luckily, the question regarding whether or not algebraic letters appear in an amplitude can be answered for individual components. (We give a less component-oriented motivation for this amplitude in appendix A.) Moreover, provided the kinematics are parameterized where ab := det λ a , λ b in terms of spinor variables with p a =:λ a λ a , and where η a is the fermionic component of the super momentum-twistor Z a := (z a , η a ) [59][60][61]. This component amplitude is singled out by the fact that it happens to vanish exactly at tree level and one loop (see e.g. [54,62,63]), rendering this two-loop amplitude infrared finite and equal to the ratio function.
Using the results of [52], it is easy to confirm that the two-loop component (2.3) in momentum-twistor variables is simply: where abcd := det z a , z b , z c , z d . Notice that the sum of these integrals contributes to the MHV amplitude (2.1), while their difference is relevant to us here. The good news is that this component amplitude only requires one integral; the bad news is that it requires what is arguably the hardest eight-point integral at two loops.
Following [55], it is reasonably straightforward to Feynman parameterize (2.4) without breaking conformal invariance. We give this Feynman-parametric representation in the supplementary material, in Omega1357Integrand.m, expressed in terms of a particular momentum-twistor (cluster) coordinate chart (see [35,46] for context): 1 Component fields of external supermultiplets are specified by their helicity and SU(4)R-charges, written in superscript and subscript, respectively.

JHEP02(2020)025
where s jk := (1+s j +s k +s j s k +t k ) and s i := (1+s i ), introduced entirely for the sake of notational compression. Here, these coordinates correspond to the charts with s 2 := r 2 (s 1 ), t 2 := r 2 (t 1 ), etc. defined by sequential two-fold rotations r 2 :z i → z i+2 . As described in [46], any rational parameterization of momentum twistors will be free of square roots associated with six-dimensional Gramians, and any rational point in momentum-twistor space can be accessed rationally in any cluster coordinate chart. And so the question of whether or not algebraic letters arise can be answered at any rational point in momentum-twistor space. For the analysis described below, we chose to consider the (nearly symmetrical) point in kinematic space specified by the momentum-twistor matrix obtained from (2.5) by setting t 2 = 2 and all other coordinates (s i , t i , u) to 1. Landau analysis (see [45]) suggests that (2.2) may involve the roots associated with the fourdimensional Gramians: Our question, therefore, is whether or not the roots (2.10) -or any others -arise as part of the symbol alphabet for the component (2.4). Answering this question turned out to require more cleverness and subtlety than expected. We shall now describe the concrete steps involved.

JHEP02(2020)025
The principle obstruction to parametric integration is that I( α, β) is not linearly reducible in the sense of [64]. In particular, using compatibility-graph reduction [65] (as implemented for example in the package HyperInt [66] 2 ), one can readily find that at most two integrations can be carried out without introducing algebraic roots. For instance, upon integrating out β 1 and β 2 (in that order), further integration seems to be obstructed along every path. For example, the pathway in which α 1 is integrated next is obstructed by the existence of a quadratic polynomial Q 1 (α 1 ) in the denominator, as this leads to a result that involves the square root of the discriminant of Q 1 ; this square root involves the remaining integration parameters, naïvely taking us out of the space of multiple polylogarithms. There is a similar obstruction with respect to α 4 , due to a quadratic denominator factor Q 4 (α 4 ). (The obstructions in α 2 and α 3 are given by three quadratic polynomials each.) Luckily, after integrating over β 1 and β 2 , there are no terms that simultaneously depend on both quadratic factors Q 1 (α 1 ) and Q 4 (α 4 ). Thus, we may divide them according to whether or not Q 1 (α 1 ) appears. Specifically, we define with I B consisting of all terms that involve Q 1 (α 1 ), and I A being all terms that do not depend on Q 1 (α 1 ). To be clear, I A consists of both those terms involving Q 4 (α 4 ), and also those depending on neither quadratic factor. Note that I A and I B are separately finite. Before we describe further integrations, it is worth mentioning one potential subtlety. We will ultimately be interested in fixing the projective redundancy of different parts of the original integral in different ways. To do so, we must first reprojectivize these integrals by making the replacement α i → α i /( α i ). 3 This is done before we set any parameter to unity.
Let us first consider the integration of I A . Free of the quadratic obstruction Q 1 (α 1 ), we can integrate over α 1 and subsequently α 2 , leaving us with a one-fold projective integral. The α 2 integration, however, result in terms that involve square roots of two more irreducible quadratics q 1 (α 3 , α 4 ) and q 2 (α 3 , α 4 ). While the appearance of such factors would generally obstruct further integration, it turns out that no single term contains both roots. Thus, we can further divide I A into three parts: I A 0 , which is free of any square roots, I A 1 , which involves only q 1 (α 3 , α 4 ), and I A 2 , which involves only q 2 (α 3 , α 4 ). After setting the projective variable a 4 = 1, we can then use a standard change of variables known as Euler substitution (see also [67]) to rationalize q 1 (α 3 , 1) and q 2 (α 3 , 1), respectively, in the latter two groups.
We can integrate each of the terms in I B following a very similar strategy. Specifically, we first integrate out α 4 and then α 3 , which results in terms that individually involve one (or neither) of a pair of square roots of different quadratic polynomials, q 1 (α 1 , α 2 ) and JHEP02(2020)025 Figure 1. Integration strategy for Ω [1,3,5,7]. Here, the final integrations are written in quotes to clarify that this step should be understood as integration after the changes of variables made to rationalize the quadratic roots; these changes depend on which roots exist, and so are different for different groups I Ai and I Bi . q 2 (α 1 , α 2 ). Splitting these pieces in the same way as for I A , fixing α 2 = 1 and changing variables to rationalize each root, we can do the final integration.
The steps involved in this divide-and-conquer strategy are summarized in figure 1. The result is a sum of terms, each expressed in terms of multiple polylogarithms depending on algebraic arguments of high degree (up to 16 in some cases). These expressions can be evaluated to arbitrarily high precision -for example, using GiNaC [68,69] -and have been checked to agree with the numerical (Monte Carlo) integration of the Feynman parametric integral (in Mathematica). We attach these results as Omega1357MPLs.m and Omega3571MPLs.m.
Unfortunately, as mentioned, the multiple polylogarithms that arise in this process depend on many algebraic roots. In addition to the expected roots from the Landau analysis at this kinematic point, √ 21 and √ 644801, we find that Ω 1, 3, 5, 7 and Ω 3, 5, 7, 1 each involve 85 distinct square roots, with only 12 in common between the two integrals. Each also involves irreducible roots of four distinct fourth-order polynomials, only one of which appears in both integrals. The vast majority of these algebraic roots are certain to be 'spurious': arising entirely through the change of variables introduced in the final stages of the integration strategy (required to rationalize the final integrations). To assess whether or not these roots (or any others) are truly spurious, we analyze the symbol of each integral.

Eliminating identities among 'spurious' algebraic letters
As described above, we are able to evaluate Ω 1, 3, 5, 7 and Ω 3, 5, 7, 1 as complicated expressions in terms of multiple polylogarithms, which we expect to satisfy many nontrivial relations. To investigate these relations, we take the symbol of each function. 4 Doing so, 4 It is sometimes colloquially stated that the symbol of a constant is zero; while this is true for the constants we most familiarly encounter (namely, the multiple zeta values), it is not true in general. One letter that is dropped in the symbol is 1 (which correspond to log(1) = 0). We have also dropped all the roots of unity; if ζ n = 1, then log(ζ) → 1 n log(1) = 0. Allowing this type of transformation is called "working modulo n-torsion" in the mathematics literature.

JHEP02(2020)025
we obtain a pair of extremely complicated expressions, each involving a large number of spurious letters. Factoring each letter naïvely (including factoring any integer primes), Ω 1, 3, 5, 7 has a symbol composed of 8,367,616 terms that involve 2,024 letters, while the symbol of Ω 3, 5, 7, 1 contains 9,941,483 terms and 2,156 letters.
Clearly, these symbols must be simplified. To do so, we want to find a set of multiplicatively independent letters S in terms of which both of these symbols can be expressed. Landau analysis suggests that the final alphabet S should be drawn from the union of the two algebraic number fields Q( √ 21)∪Q( √ 644801). However, our integration procedure yields a symbol with a much larger initial alphabet, involving for instance algebraic numbers up to degree 16. Finding algebraic relations between these complicated letters in order to reduce them to elements of S can be extremely difficult. To give the reader a sense of this complexity, we consider some examples. Let Our initial alphabet includes various roots of P i , denoted σ * i,r for r = 1, . . . , 4. An example of the kind of roots that arise for us are those of the fourth-degree polynomial: Clearly, we expect the four roots of P 1 that arise in our symbol alphabet to be spurious. Therefore, we must find some way to demonstrate that they cancel. Actually, an alphabet merely involving σ * i,r would not be so difficult. It turns out in our case that the most complicated letters we see are of the type ρ − σ * i,r , where ρ can be an integer or a linear combination of up to two square roots. When there are two roots, one always belongs to K. Furthermore, when ρ = m + n √ c with m, n ∈ K and c ∈ Z appears, then its conjugate ρ = m − n √ c also appears. There are two types of relations involving the roots σ * i,r that turn out to be useful for us. The first type involves products 4 r=1 (ρ − σ * i,r ). These products are completely symmetric in the roots of P i , so they belong to an extension of the field K by ρ -in particular, they can be written as linear combinations of square roots and integers. Actually, it turns out that products of certain pairs of roots of P i also yield simple answers. We believe it should be possible to explain the existence of these latter mysterious identities using Galois theory, but we have not performed this analysis.
The second type of identities involve products of type (ρ − σ * i,r )(ρ − σ * i,r ), where ρ is one of the conjugates of ρ. Expanding out this product we obtain a degree-two polynomial in σ * i,r with coefficients in K. Next, we search for exponents e ρ corresponding to values of ρ such that, in the product of these letters raised to power e ρ , the σ * i,r cancels and the answer is of degree two. It turns out to be sufficient to bound the search so that |e ρ | ≤ 2. The 5 To be more precise, two of these minimal polynomials are with coefficients in Z, one is with coefficients

JHEP02(2020)025
calculation of these products can be conveniently performed using SageMath [70], which uses Pari [71]. Let us be more concrete with an example of this second type of identity. For the polynomial P 1 given in (2.13), we find the letters (among many others involving σ * 1,r ), where σ * 1,r is any root of P 1 . It is not hard to verify that using SageMath (or even Mathematica). Fortunately, the method described above turns out to be sufficient to find all required relations between the most complicated letters that appear in our initial symbols, allowing us to get rid of all higher-degree roots. However, many other potentially-spurious letters remain -in particular, there still exist linear combinations of up to two square roots, and square roots beyond the two physical ones in (2.10).
For the letters containing square roots, we group them according to the algebraic number fields to which they belong and compute the factorization of the principal ideal they generate (see appendix B for more details). For this step we use again SageMath and Pari. Using this factorization, we can find multiplicative relations between these letters. Note that the integer prime factors we generated in the first step belong to each of these number fields, so their decomposition in prime ideals has to be computed as well.
This factorization also contains a unit part, which is a term belonging to the group of units of the various rings we consider. In some of the cases we encounter, the unit part is ±1, but in others it is non-trivial. We keep a list of all the units arising during the calculations in a given ring, and if two of them are identical we obtain a new identity by taking the ratio. In principle a more sophisticated approach is possible.
Using these methods, we decompose our letters into a multiplicatively independent set S. Doing so, many of the spurious letters in our symbols combine cleanly into integer letters. Others cancel entirely, removing terms and causing other spurious letters to drop out. In the end, we find the symbol of each function simplifies dramatically. Expressing Ω 1, 3, 5, 7 and Ω 3, 5, 7, 1 in terms of a shared, multiplicatively independent symbol alphabet, we find only 35 letters are needed. These letters only involve the expected square roots: five involve √ 644801, two involve √ 21, and the rest are integer primes. Expressed in these letters, Ω 1, 3, 5, 7 is 5316 terms long, while Ω 3, 5, 7, 1 contains 5245 terms. We attach the symbol of each in supplementary material Omega1357Symbol.m and Omega3571Symbol.m, respectively.

JHEP02(2020)025
Interestingly, some of the symbol letters that contain √ 21 and √ 644801 can be constructed simply in dual twistor space. Namely, out of eight points z 1 , . . . , z 8 , we can form four skew lines (z 1 , z 2 ), (z 3 , z 4 ), (z 5 , z 6 ), (z 7 , z 8 ). These four skew lines have two transversals (lines that intersect all four of them). From the points of intersection on each of these transversals we can form a cross ratio. A similar construction can be carried out starting from the (z 2 , z 3 ), . . . , (z 8 , z 1 ). Some of the cross ratios that can be formed in this way appear directly in our symbol expression for Ω 1, 3, 5, 7 and Ω 3, 5, 7, 1 .
The sum Ω 1, 3, 5, 7 + Ω 3, 5, 7, 1 contributes to the eight-point MHV amplitude. This sum is not free of square roots, and depends on all of the letters present in the two integrals. This observation is still consistent with the observed absence of square roots in the alphabet of the two-loop eight-point MHV amplitude because several other rootcontaining integrals contribute to this amplitude -including two other permutations of the integral we computed here. Other cancellations, much like those we observed, must be present in this combination.
We find that square-root letters are present in the second and third entry of Ω 1, 3, 5, 7 and Ω 3, 5, 7, 1 , but not the first or fourth entry. This is as expected, as first entries should correspond to Mandelstam invariants while last entries are constrained by the Q equation [49]. More specifically, first entries should be composed of four-brackets of the form i, i + 1, j, j + 1 . Examining our symbol, we find first entries of 2, 3, 5, 11, 13, and 31. Computing the expected first entries at our kinematic point, we find 1, 2, 3, 4 = 1 , 1, 2, 4, 5 = 3 , 1, 2, 5, 6 = 5 , 1, 2, 6, 7 = 13 , which indeed cover all observed first entries. We can also investigate whether the prime-number symbol entries we observe elsewhere in the symbol can originate from the entries predicted in [45]. We have attached this analysis as supplementary material, as PrimeFactorLetters.pdf, where we tabulate the prime factors of each of the predicted letters at this kinematic point. We find these factors JHEP02(2020)025 span all of the letters that we observe. There are additional prime factors occurring in predicted letters in [45] that we do not observe; these are marked by an asterisk in our table.
In addition to these observations, we find that the two square roots √ 644801 and √ 21 do not appear together in the same term of the symbol: the symbol can be separated into terms depending on one root, terms depending on the other, and terms depending on neither.

Conclusions and outlook
In this work, we have computed a component of the two-loop eight-point NMHV amplitude in planar sYM at a specific kinematic point. We find that, while the individual integrals contributing to this amplitude do have letters depending on square roots of four-dimensional Gramians, these square roots cancel in the combination present in this component. In order to do this, we have employed an unusual direct integration strategy of breaking the integral into multiple integration pathways, and simplified our result from millions to thousands of terms using algebraic number theory.
This work shows that this particular component is free of square-root letters, but it does not establish that other components of the NMHV amplitude will not depend on such roots. In order to establish this, we would need to compute many more integrals, potentially of similar complexity. Alternatively, other methods may be able to compute these amplitudes much more efficiently, yielding a conclusive answer.
The use of symbol methods with square-root letters is still largely unexplored territory. While previous forays have involved heuristic or numerical elements (e.g. [72,73]), our use of factorization in prime ideals should yield a more canonical and complete analysis of the relations between algebraic letters, and we believe similar methods should be applicable elsewhere.
It is interesting to ask if the cancellation of square roots we observed could have been detected at a later stage. For the individual integrals, better integration methods may exist that would make these cancellations manifest earlier, or even avoid the introduction of spurious roots altogether. For the full component amplitude, one might hope that some analog of Landau analysis might be possible.

A A proposal for representing NMHV octagon amplitudes
In this appendix we describe a particular representation of eight-point NMHV amplitudes, analogous to the decomposition of hexagon and heptagon functions into specific bases. This is a bit outside the line of the main result in this work, but it does provide an independent logic behind why the particular component amplitude (2.4) plays a special role. In order to do this, we must first introduce and motivate a small amount of new notation that we promise will be worthwhile.

A.1 Notational preliminaries: NMHV Yangian invariants
The reader should be aware that NMHV amplitudes can be expressed in terms of socalled R-invariants that, when expressed in momentum-twistor space, are superfunctions defined by Especially at low multiplicity, we find it useful to denote tree amplitudes by which among the ambient n twistors they do not depend. Because such notation, however convenient, is liable to cause confusion when several multiplicities are discussed, we propose to keep this information manifest in the way we write them. We denote these complements by

JHEP02(2020)025
Notice that this would allow us to write A n = A(1 · · · n) =:() c n (A.5) -a notation that we cannot imagine ever actually using. More realistically, however, we should notice that in this notation the symbol for a single R-invariant would be multiplicity dependent. For example, One (BCFW) representation (among many) of the NMHV tree amplitude (A.2) would be, but as already mentioned, we will have little recourse to decompose tree amplitudes into smaller objects. This is in part because, while A(1 · · · n) is in fact dihedrally-invariant in its indices, no formula of the form (A.7) will make this manifest. Equivalence between various dihedrally-related BCFW formulae (A.7) generates all the functional relations among R-invariants. In general, there are n−1 4 linearly independent n-point NMHV Yangian invariants.
At seven particles, for example, there are 15 linearly independent superfunctions into which any amplitude may be decomposed. Although 7 does not divide 15, most authors (see e.g. [31,32,74]) have chosen to write heptagon functions in terms of the cyclic seeds {(12) c 7 , (14) c 7 , A 7 } which generate 2 cyclic classes of length 7 and one cyclic singlet, A 7 . That is, these authors have chosen to decompose all other 7-point superfunctions according to the 'elimination rules' generated cyclically by (13) Having used such eliminations, the heptagon ratio function can be written as (We believe that a better basis for heptagon amplitudes would have been generated by {(1) c 7 , (12) c 7 , A 7 }, but this is not presently our concern.) Let us now describe a similar basis for eight-point NMHV Yangian invariants that is in a precise sense 'optimal'.

A.2 An optimal basis for octagonal NMHV amplitudes
Unlike for seven particles (which is somewhat anomalously nice), there is no easy way to choose among the 56 different R-invariants -7 cyclic classes -into non-redundant classes spanning 35 = 7 4 independent superfunctions. The situation is not obviously much improved if we include the cyclic singlet A 8 , or other lower-point tree-level amplitudes.
From this list, how are we to choose a basis of length 35? Of the cyclic classes generated by those in (A.10), all but two represent classes of length 8. The exceptions are A 8 and (15) c 8 = A(234678), which forms a class of length 4. We are virtually forced to consider the inclusion of this length-4 class into our basis, as any other choice would lead to even greater redundancy.
Including A 8 , the four cyclic images of (15) c 8 = A(234678), and some other choice of four length-8 cyclic classes from among those generated by (A.10), we would have 37 superfunctions in all. In the best case, the two redundancies could be captured entirely by the length-four class (as 2 divides 4 nicely), with the rest independent. It turns out that there are 172 such choices available. The basis choice we describe presently is the one in which the 'elimination rules' of all other superfunctions (in the sense of (A.8)) involve the shortest expressions.
The basis we propose can be defined first in terms of the 37 functions generated by the seeds There are a few things to note about these decompositions. As always, other superfunctions are eliminated according to rotations of (A.12). In addition, there are two aspects of (A.12) regarding d 0 i that deserve comment. First, note that the only superfunction from (A.11) whose decomposition involves d 0 i (except those of the d 0 i 's) is (135) c 8 -indicated with a ' * ' in (A.12). 6 The second aspect to notice about the elimination rules (A.12) is that the last two are for d 0 3 and d 0 4 , which are generated by our initial seeds upon rotation. As evidenced by the JHEP02(2020)025 simple fact that they have elimination rules (and also that 35 = 37−2), these two will not be basis elements. Moreover, it is easy to see that These, combined with the other basis elements in (A.11), non-redundantly span the space of 35 independent superfunctions in terms of four cyclic classes of length 8, one of length 2, and one of length 1. This is our proposed basis for eight-point NMHV amplitudes. In this basis, the eight-point NMHV ratio function may be represented as (As with seven points, please notice that we are adding all of these terms (8-fold-) cyclically. This has the admittedly unfortunate effect of causing V (0) f to be 1/8; it will also require us to account for the over-counting in V For reference, at one loop, these are easy to write explicitly [54,75]. They are (A. 16) We have written these function in terms of the 12 multiplicatively independent dualconformally invariant cross-ratios, Notice that V d is zero at one loop. At two loops, it is not hard to confirm that (A.18)

B Some notions of algebraic number theory
When working with symbols, it is valuable to be able to put them into a canonical form, for instance to decide whether two symbols are equal. As an example, many of the amplitudes that have been computed in planar sYM to date can be uniquely expressed in terms of a known set of Plücker coordinates. In more complicated amplitudes, a basis of symbol letters is not generally known. In such cases, we can simply factorize each symbol letter, as long as this factorization is unique. It is easy to see that factorization will give rise to a unique expression when all symbol letters are integers. However, this is not automatic once algebraic roots are introduced. Consider, for instance, the situation where √ −5 appears in some letters. The number 9 then has two 'factorizations': where the second factorization of 9 is possible when viewed as an element of Z[ , we denote the set of numbers of type a + b √ −5 for a, b ∈ Z, with the obvious addition and multiplication properties. 7 This set of numbers, with these operations, defines a ring.
From the example above it looks like 9 can be factorized in two different ways, but perhaps unique factorization can still be salvaged if some of the factors can be further factored. It turns out that this is not what is happening here.
Before clarifying what is happening, we need to make a distinction between irreducible and prime elements of a ring R. First, we introduce the notion of unit. The elements of R which have multiplicative inverses are the units of R (denoted by U (R)). For the integers, the units are ±1. An element x ∈ R is irreducible if it can not be written as a product of two elements of R neither of which is a unit. Finally, an element x ∈ R is prime if for any a, b ∈ R such that x divides ab, then it divides a or b. For the integers there is no distinction between primes and irreducibles, but in general rings there is.
We now return to the above example: is 3 a prime in Z[ √ −5]? We can show that it is not. If it were prime, it would follow from the fact that 3 divides (2 + √ −5)(2 − √ −5) that it also divides either 2 + √ −5 or 2 − √ −5. But 3 divides a + b √ −5 only if it divides both a and b, which is not the case here.
Is 3 irreducible instead? One can show that the units of Z[ √ −5] are ±1. It is then a simple exercise to show that 3 is indeed irreducible (just use the definition and show that there are no suitable solutions). So the hope that perhaps each of the terms in the factorization can be factorized further to a prime decomposition which is the same in the l.h.s. and r.h.s. is not fulfilled. We conclude that Z[ √ −5] is not a UFD (unique factorization domain).
For this reason, it may look like there is no way to achieve unique factorization. But if we enlarge our perspective a little, we can recover this desired property. We will now 7 We should not think of √ −5 as being a complex number, but rather as an abstract symbol whose property is that it squares to −5. In fact, Z[ √ −5] can be embedded in the complex numbers in two ways, by sending √ −5 to each of the two roots of −5 in C.

JHEP02(2020)025
explain how to do this. The construction we will describe is possible for rings which are Dedekind domains. Let us start with the familiar case of integers. In this case, to a prime p we associate the set of all its multiples. This set has two important properties. First, it is closed under addition; second, multiplying it by any integer lands us back in the same set. This is just the definition of an ideal of the ring of integers Z. For the case of a prime we obtain a prime ideal, but the construction works in general. The set of multiples of p is denoted by (p). This is also called the ideal generated by p.
The notion of divisibility can be translated to the language of ideals: we say that a divides b if (b) ⊆ (a). It is easy to check that this corresponds to the usual notion of divisibility for the integers. Now that we have expressed divisibility in terms of ideals, we may consider ideals generated by more than one element. The ideals generated by one element, such as (p), are called principal ideals. An ideal generated by two elements a and b is denoted by (a, b); as a set, it contains the linear combinations ma + nb where m, n belong to the ring and a, b belong to the ideal. This satisfies all the properties of an ideal. Ideals can be multiplied; we have (p)(q) = (pq) and (a, b)(c, d) = (ac, ad, bc, bd) and the pattern continues in the obvious way, for ideals generated by more generators. These ideals have some pretty obvious properties: Using these rules we can compute the following products, which will be useful momentarily: We also have (3, 1 + √ −5) 2 = (2 + √ −5). Now that we have made the transition from elements of a ring to the principal ideal they generate, we can explain the change of perspective mentioned above. Instead of considering principal ideals, we consider ideals generated by any number of generators. Indeed, now we can refine the factorization as follows: To finish, we should show that the ideals appearing in this factorization are prime. We will not do this explicitly here. This works in general. The factorization is unique in the following sense: any ideal can be decomposed as a product of prime ideals, up to ordering. Finally, we have achieved JHEP02(2020)025 unique factorization, but at the cost that the factors are some abstract, less familiar quantities.
An algebraic number field is a finite extension of Q constructed as follows. Consider a root ρ of a degree n polynomial with rational coefficients. Then, Q[ρ] is the ring generated by rational linear combinations of powers 0 through n − 1 of ρ (higher powers can be reduced). We also define K = Q(ρ) as the field generated by ρ (whose elements are ratios of elements of Q[ρ]). Inside K we find the algebraic integers O K which are the elements of K whose minimal polynomial is monic 8 and with integer coefficients. It is a theorem that the ring of algebraic integers O K of an algebraic number field K is a Dedekind domain, so it has a unique factorization.
Some of the letters we would like to factorize are not actually algebraic integers, so we cannot construct an ideal they generate inside O K . Nevertheless, we can construct a fractional ideal instead, which is a slight generalization of the notion of ideal. We will not give a full definition here, but the reader who wants to have an intuition for what a fractional ideal is can think of p q · Z as a fractional ideal of Z. In other words, we also allow denominators. Now the strategy for computing relations between several elements of a number field K should be clear. For each of these elements we compute the prime ideal decomposition of the principal fractional ideal they generate. The exponents form a matrix with integer coefficients whose rows are labeled by the elements of K and whose columns are labeled by the prime ideals. Every element of the left kernel of this matrix yields a multiplicative relation between the given elements of K.
Historically, it was Kummer who started developing these ideas in connection with Fermat's conjecture. His ideas were refined and generalized by Dedekind, Hilbert, Noether and many others. A good reference and resource for the material described in this appendix is [76].
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.