Efficient Calculation of Crossing Symmetric BCJ Tree Numerators

In this paper, we propose an improved method for directly calculating double-copy-compatible tree numerators in (super-)Yang-Mills and Yang-Mills-scalar theories. Our new scheme gets rid of any explicit dependence on reference orderings, restoring a form of crossing symmetry to the numerators. This in turn improves the computational efficiency of the algorithm, allowing us to go well beyond the number of external particles accessible with the reference order based methods. Motivated by a parallel study of one-loop BCJ numerators from forward limits, we explore the generalization to include a pair of fermions. To improve the accessiblity of the new algorithm, we provide a Mathematica package that implements the numerator construction. The structure of the computation also provides for a straightforward introduction of minimally-coupled massive particles potentially useful for future computations in both classical and quantum gravity.


Introduction
The efficient calculation of scattering amplitudes at high precision relies on the discovery of novel methods and structures, many of which are shared between disparate theories. While these structures are often highly obscured by traditional Lagrangian/Feynman diagram approaches, they tend to lead to new formulations of amplitudes in quantum field theories. One powerful structure in tree amplitudes is the Bern-Carrasco-Johansson (BCJ) color-kinematics duality [1][2][3], which allows a large class of theories to be constructed as the double copy of simpler theories. This duality has been used to push the envelop on accessible calculations in various theories of quantum gravity [4][5][6][7] by "squaring" (super-)Yang-Mills (sYM) theory. Additionally, BCJ-compatible trees have been used in the blossoming application of scattering amplitudes techniques to black hole physics in classical gravity [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. Unfortunately, most textbook methods for calculating tree amplitudes do not directly generate color-kinematics dual representations.
On the other hand, CHY formalism provides a way to compute local numerators in the DDM basis through eq. (1.1). Although the possibility was recognized shortly after the discovery of CHY [26], the explicit construction was highly nontrivial and came much later. 3 The computation for up to three gluons and two traces in Yang-Millsscalar (YMS) theory [57] was done in [58][59][60][61]. Shortly after, a recursive expansion for the single-trace sector of YMS was proposed based on gauge invariance [62] and a direct ansatz construction [63], which was soon proved and refined by expanding the CHY integrand recursively [64]. Follow-up works further extended the algorithm to pure Yang-Mills [65] and multi-trace sector [66]. A crucial feature common to all these constructions is that the DDM basis numerators are given by assigning kinematic factors to spanning trees on n vertices -connected graphs on n vertices with no cycles.
While the spanning tree construction is generic, the need for a reference ordering (RO) causes the numerators to be a function of more than just the particle ordering, and thus not manifest crossing symmetry, N RO (1, β, n) = N RO (1, 2, . . . , n) 2→β 2 ,3→β 3 ,...,n−1→β n−1 . (1.5) Although this is not in line with eq. (1.4), the full amplitude remains permutation invariant due to the nontrivial kernel of m(1, α, n|1, β, n), i.e. the BCJ amplitude relations. Crossing symmetry can be recovered by averaging over all the ROs (ROaverage). However, this prescription introduces a factorial computational complexity, which is highly undesired. We note that the mathematical origin of this RO dependence roots in the Laplace expansion of determinants and Pfaffians: when one performs the expansion along a chosen row, the manifest exchange symmetry between this row and the rest is broken. Different choices lead to different patterns of breaking the symmetry. Alternatively, the prescriptions presented in [62,[64][65][66] can be derived from conventional string theory [32,34,61] and differential operators [67]. The stringy 2 The crossing symmetry in 1 and n are due in part to the BCJ relations. Since the N are for half-ladders, one might expect that exchanging either 1 ↔ β 1 or n ↔ β n−2 in N (1, β, n) would yield a minus sign. However, the interplay between color and kinematic jacobis allows this property of the half-ladders to be broken or restored as needed. Of coursing, imposing these additional crossing symmetry will further eliminate the gauge freedom in these numerators. On the other hand, when 1 and n are a different species from the rest, crossing symmetry involving them is necessarily absent.
3 See Ref. [56] for a generic construction of non-local BCJ numerators using CHY integration rules.
interpretation of the RO is a priority list for breaking the subcycles formed by worldsheet variables using integration-by-parts relations. While the lack of crossing symmetry does not pose a problem when studying treelevel amplitudes, it can be challenging to deal with when using trees as part of loop calculations. The inability to relabel numerators prevents systematic approaches using graph isomorphisms that might generalize to loops. However, the CHY formalism has been extended to handle one-loop directly. This representation of loop amplitudes can be derived from ambitwistor string models based on a nodal Riemann sphere [37,[68][69][70], which in turn allows direct calculation from an (n+2)-point tree-level amplitude through a forward limit [71,72]. Following this approach, one can avoid the difficulty of dealing with contributions from different spin structures at one-loop level. The resultant loop integrands have the prominent feature that they contain linearized propagators rather than the standard Feynman-type ones. Moreover, kinematic numerators satisfying BCJ relations will automatically lead to such numerators at one-loop, which can be directly fed into a double-copy construction for gravity theories [73,74]. These method of obtaining BCJ-compatible one-loop numerators have been explored by various works in literature [73][74][75][76].
Thus, high multiplicity loops in many theories are accessible from high multiplicity trees via CHY and forward limits. The first technical difficulty to overcome towards this goal is the need to efficiently produce tree-level BCJ numerators with very high multiplicity. As one of the main goals of this paper, we develop an improved algorithm to arrive at crossing symmetric numerators without performing the RO-average. In other words, the new method gets rid of RO-dependence completely. We will first present the prescription for gluon amplitudes coupled to at most two fermions. 4 We then generalize it to multi-trace sector of YMS theory. Our algorithm will automatically respect the crossing symmetry among gluons and scalar traces, all of which are not manifest when a RO is required. We have checked that our algorithm can provide complete ten-point results within minutes on a laptop. In a forthcoming paper [78], we will feed these tree-level numerators into the forward-limit machinery to study one-loop numerators with various amount of supersymmetry.
Additionally, since our formulation is dimensionally agnostic, it is possible to embed the numerators in higher dimensions in such a way as to introduce mass to specific particles via dimensional compactification. In particular, it is straightforward to perform this embedding such that only particles 1 and n obtain a mass. These types of diagrams are related to those needed for the recent 3PM (third post-Minkowski) calculation of Ref. [21] as input to the unitarity cuts used for constructing the two-loop classical amplitude.
Our paper is organized as follows. In section 2, we review the general structure of the CHY representation of integrands. This section specifically focuses on the initial expansion of Pf into pure Pf weighted by kinematic and Parke-Taylor factors. We cover pure gluon, two-fermion, and multi-trace YMS. Section 3 begins by reviewing the spanning tree method for CHY (half-)integrand construction. It then presents our derivation of a recursive restructuring of the spanning tree calculation which implicitly performs an average over reference ordering, as well as providing some examples of the new technique. Finally, in section 4, we briefly discuss introducing masses to our numerators. Appendix A provides our explicit conventions for the CHY matrices and normalizations. Appendix B describes our Mathematica package that implements the spanning tree numerator construction.

CHY Integrands and Baseline Expansion
In the CHY representation, n-point tree-level color-ordered amplitudes are expressed as an integral over the moduli space of n-punctured Riemann sphere, where the delta functions in the measure localize the integral to the (n − 3)! solutions of the scattering equations [24][25][26], The color ordering is encoded in the Parke-Taylor factor PT(ρ), which is a weighttwo function under the SL(2, C) transformation σ i → aσ i +b cσ i +d with ad − bc = 1. The (half-)integrand I tree n depends on the kinematic data (momentum and polarization). It is constructed as an SL(2, C) weight-two function as well such that the full integration form is SL(2, C) invariant. The SL(2, C) gauge redundancy, combined with the where (σ 1 , σ n−1 , σ n ) = (0, 1, ∞). The normalization N contains the coupling constants of the theory, which we match against the explicit Lagrangian in appendix A.

(Super-)Yang-Mills
We first focus on Yang-Mills theory. The CHY integrand for pure gluon amplitudes is The full definition of the reduced Pfaffian Pf (Ψ) is given in appendix A. One can rewrite Pf (Ψ) as a baseline expansion on the support of the scattering equations [62,65,79], where the first summation is over all the bi-partitions of the set {2, 3, . . . , n − 1}, and the second summation is over all the permutations of the set B. 5 The matrix Ψ G is a submatrix of Ψ in which the rows and columns are restricted to the set G (see appendix A for more details). We note that both G and B can be empty. In particular, when G = ∅, Pf(∅) = 1. We call (1, ρ, n) a baseline and the baseline factor W gluon is given by is the linearized field strength for gluon i. Our nomenclature is inspired by the spanning tree expansion scheme for Pf , which will be discussed in section 3. From the ambitwistor string point of view, singling out gluon 1 and n amounts to assigning ghost picture −1 to the vertex operators in the RNS formalism [35]. The gauge invariance of these two particles are not manifest, but can be recovered by using the scattering equations.
Interestingly, the CHY integrand with exactly two external fermions takes a similar baseline expansion form. If we fix the fermions to be particle 1 and n, the only difference from eq. (2.5) is in the baseline factor [78], There exist two forms of the baseline factor W 2f , equivalent on the support of the scattering equations and gauge transformations. In D = 10, they are given as 6 where in the first equation, is an arbitrary gluon and the baseline factor vanishes if / ∈ B. Eq. (2.8b) has the benefit of being manifestly crossing symmetric and gauge invariant in all of the gluons. In both expressions, the gamma-matrix convention is {γ µ , γ ν } = 2η µν , and The Weyl-Majorana spinor χ i is the solution to the equation of motion / k i χ i = 0, and the spinor ξ i is related to χ i through / k i ξ i = χ i . The expressions for D < 10 can be obtained through a dimensional reduction. In particular, for D = 4 we can use where α,α = 1, 2 are left-and right-handed Weyl spinor indices, I = 1, 2, 3, 4 is the SU (4) R-symmetry index and the ⊕ is a formal sum over the representation spaces. In D = 6, we can similarly construct χ and ξ using 6D spinor helicity variables [80] where the spinors i A and i A are in the fundamental and anti-fundamental representation of SU (4), and a,ȧ = 1, 2 are the SU (2) × SU (2) little group indices. In both eqs. (2.10) and (2.11), q is a reference vector that obeys q · k i = 0 but otherwise is arbitrary. The final amplitudes do not depend on the choice of q.
The baseline expansion brings the pure gluon and two-fermion CHY integrand into a unified form. In order to obtain the DDM basis numerators N (1, β, n), we need a systematic way to expand Pf(Ψ G ). In section 3, we will discuss how this can be done algorithmically using spanning trees. 6 The two expressions originate from the two ways to assign ghost picture to the vertex operators in the RNS formalism of ambitwistor string: eq. (2.8a) comes from assigning ghost picture −1/2 to both fermion vertex operators and ghost picture −1 to the gluon vertex operator m, while eq. (2.8b) comes from assigning ghost picture −1/2 and −3/2 to the fermion vertex operators and ghost picture zero to all the gluons. We have found the two expressions useful for studying different aspects of forward limits, see Ref. [78] for details.

Yang-Mills-scalar
Both eqs. (2.5) and (2.7) can be interpreted as expanding the pure-gluon/two-fermion integrand in terms of those for single-trace YMS amplitudes. If G is the set of gluons and (1, τ, n) is the (second) color ordering of the bi-adjoint scalars, we can write the half-integrand as (2.12) which corresponds to choosing the baseline factor to be W (1, τ, n) = δ τ,ρ in eq. (2.5).
In fact, there exists a similar baseline expansion for the multi-trace YMS integrand [27], where G denotes the set of gluons and {τ 1 , . . . , τ m } the scalar color traces. Since in many of our future manipulations, scalar traces are treated as a single object and have similar behavior as a gluon, we thus define for convenience the set where each entry t i is either a scalar trace or a single gluon. The matrix Π is 2N dimensional, in which the entries are labeled by gluons and scalar traces. The explicit form of Π is given in appendix A. When m = 0 and 1, we have such that eq. (2.13) reduces to the single trace and pure gluon integrand respectively. For the multitrace integrand eq. (2.13), the baseline expansion involves integrands with total number of gluons and trace reduced. It applies when the end points of the baseline, chosen as 1 and n, do not belong to the same scalar trace. We can schematically write the expansion of I YMS where we assume that 1 ∈ t 1 and n ∈ t N . At this point, each term in the summation of eq. (2.16) is a multi-trace integrand with 1 and n in the same trace (1, ρ, n), where ρ is a permutations of {t 1 , B, t N } with 1 ∈ t 1 and n ∈ t N fixed at the ends. Meanwhile, the set A is given by where α i 's are scalar traces and G a subset of gluons. In section 3.4, we will discuss the spanning tree algorithm to evaluate such integrands. We now introduce some essential tools to specify the baseline factor W MT (1, ρ, n) in the second row of eq. (2.16). Treating each scalar trace in {t 1 , B, t N } as a single object, we can define a sub-ordering (a i , where a i and b i are respectively the first and last element of t i that appear in the baseline. If t i is a scalar trace, then (a i , where the set X i and Y i can be obtained by cyclically rotating a i to the first element. They are exactly the orderings generated by a Kleiss-Kuijf relation [81], and hence the name. The sign generated by this operation will be included in the baseline factor W MT (1, ρ, n). Note that for a gluon, we always This function is nonzero only for KK-compatible permutations of t i .
In addition, we define the ordering function O that rearranges the baseline (1, ρ, n) by comparing the first element of the |B| We can also define the coarse-grained ordering function that extracts a permutation formed by the |B| + 2 elements in {t 1 , B, t N }, which will be useful later in section section 3.4. In fact, we must have λ 0 = t 1 and a 0 = 1 by construction. In particular, we will be interested in those baselines that are O-invariant, For these cases, we must also have λ |B|+1 = t N and b |B|+1 = n. In other words, in O-invariant baselines every scalar trace is consecutive in the sense that between a i and b i there are no elements of other traces or gluons. To characterize these permutations, we introduces a new object whereλ i is a proper subset of λ i and it can be empty. Thus the O-invariant baselines have the form 26) and are characterized by B (1,ρ,n) (λ i |λ i+1 ) = λ i for λ i ∈ {λ 0 , λ 1 , . . . , λ |B| }. We must have t N as the last consecutive block because n ∈ t N is the endpoint of the baseline. If there exists a λ j after t N , then we must have B (1,ρ,n) (t N |λ j ) t N and thus the O-invariant condition is violated.
With the above preparations, we now reach the punchline of the multi-trace baseline expansion: the baseline factor W MT (1, ρ, n) is nonzero if and only if where |B g | is the number of gluons on the baseline, and Thus, the baseline factor for a generic permutation can be written as is the coarse-grained baseline. The B (1,ρ,n) and sgn in eq. (2.28) ensure any permutations that are not of the form of eq. (2.27) are immediately set to 0. With the help of this zeroing, the summation in the second line of eq. (2.16) automatically reduces to Finally, we remark on the case that both the baseline endpoints 1 and n belong to the same trace, say τ 1 , which is left out by eq. (2.16). Starting from eq. (2.13), we simply write by using Kleiss-Kuijf relations. Here, we can naturally reduce the definition of W MT given in eq. (2.27) to the single-trace baseline, We note that since the sgn function is only nonzero for KK-compatible baselines, we can trivially extend the summation range in eq. (2.30) to S τ 1 \{1,n} . Although this rewriting does not reduce the total number of scalar traces and gluons, it puts the CHY integrand into the same form as eq. (2.16), from which the half-ladder numerators can be obtained through a unified spanning-tree algorithm given in section 3.4.

DDM basis numerators from CHY integrands
The baseline expansion is the first step of our systematic approach to obtain the DDM basis numerators N (1, β, n) from CHY integrands. We first focus on eqs. (2.5) and (2.7), the baseline expansion for the pure gluon and two-fermion integrand, both of which are a linear combination of single trace integrands PT(1, ρ, n) Pf(Ψ G ). Using a different type of recursive expansion, we can write them in terms of the single trace integrands involving fewer gluons [62,64], On the right hand side, we pick an arbitrary gluon l ∈ G and sum over all the bipartitions of the rest. As before, both the set A and B can be empty. We call P(l, α, ρ i ) a path factor starting from l and terminating at ρ i on the baseline (we define ρ 0 = 1), with the intermediate points a permutation α of the set B, The nomenclature will be clear in the following section in which we represent this expansion in terms of spanning trees. Finally, which is a linear combination of Parke-Taylor factors of length n−|A|. The final result of this recursive expansion is the Cayley functions of all the (n−1)-point spanning trees with a common path (1, ρ), each of which is a linear combination of DDM basis Parke-Taylor factors [82]. It is important to note that in eq. (3.1) the gauge invariance and the crossing symmetry involving the leg l is not manifest. Since one needs to make such choices at each step of the recursion, the final result in general will not display any explicit gauge invariance or crossing symmetry.

Spanning Trees and Reference Orderings
With the help of eq. (3.1), one can further expand eqs. (2.5) and (2.7) by Parke-Taylor factors in the DDM basis, where the coefficients N (1, β, n) are the DDM basis numerators. They are associated with half-ladder diagrams, and are the master numerators for the BCJ tree relations. A given numerator N (1, β, n) can be constructed by assigning kinematic factors to all the (n − 1)! increasing trees consistent with the label ordering [65], where the increasing trees are defined as Thus by construction 1 is always the root of T while n is always a leaf. Note that a tree T can belong to several IT (1, ρ, n). Some examples of trees and their IT membership can be seen in figure 1. Then, for a given increasing tree T , the kinematic factor N (T ) is evaluated by the following steps: 1. Identify the baseline (1, ρ, n), which is the path n → ρ |ρ| → · · · → ρ 1 → 1 in T .
Note that ρ = ∅ is allowed.
2. Split the rest of T into ordered splitting paths OS(R) based on a reference ordering R ∈ S n−2 : (a) draw a path from the first element of R towards the baseline, which will either end on the baseline or a previously identified ordered splitting path. (b) move to the next element in R that is not traversed yet, repeat the process until all vertices of T are traversed. For example, 3. Assign kinematic factors to each path baseline n → ρ |ρ| → · · · → ρ 1 → 1 : is the product of all of these paths, Continuing the example, the ordered splitting in eq. (3.7) leads to W gluon (1, 2, 5)P(3, 1)P(4, 3) = W gluon (1, 2, 5)(ε 3 ·k 1 )(ε 4 ·k 3 ) . (3.10) The reference ordering R is just a priority list of choosing the special gluon in the recursive expansion (3.1) when multiple choices are present. For simplicity we keep it the same for the evaluation of all the DDM basis numerators. We note that splitting a spanning tree into paths can be viewed as a graphic way to derive the ordered splitting based on a reference order; the algebraic method is first given in [62]. While this approach is a fully constructive method of building BCJ-compatible numerators, any crossing symmetry is broken in each of the DDM basis numerators, since the OS(R) of different color ordering are not crossing symmetric. This leads to one of two computation difficulties when calculating the full integrand, either: • each half-ladder numerator in the DDM basis must be computed individually, or • a given half-ladder numerator must be averaged over all the reference orderings (RO-average) to make it crossing symmetric in {2, . . . , n − 1}, which can then be freely relabeled to obtain the other half-ladder numerators.
Both approaches require an O (n−2)! computation (assuming that relabeling takes negligible time comparing with computing a numerator) that must be performed after constructing a half-ladder numerator, which itself requires an O (n−1)! computation, to arrive at the full DDM basis. Thus the computational complexity for the full DDM basis numerator is O (n−1)! × (n−2)! for the RO method.

Recursive Constructions without Reference Orderings for Pure Gluon and Two-fermion Numerators
The computational complexity of the reference-ordering approach urges a more efficient construction of the kinematic factors. The fact that the crossing symmetry among {2, . . . , n−1} is recovered after RO-average hints that there should exist a new approach that solely depends on the structure of spanning trees and does not involve reference orderings at all. In this section, we present such an improved algorithm. The pure gluon and two-fermion half-ladder numerators built through our new approach will be naturally crossing symmetric in the legs {2, . . . , n − 1}, removing the need for the RO-average and reducing the complexity of construction: The final result is equivalent to the RO-average, but we only need to evaluate each spanning tree once. We begin by considering how to build the RO-average into simple building blocks of the increasing trees, and then present the full, recursive approach. The first important observation is that the baseline n → ρ |ρ| → · · · → ρ 1 → 1 is already independent of the reference ordering. As such we will carry its definition over to our new construction. We can then proceed to considering paths of various lengths, and eventually more complicated trees.
The easiest paths to work with are length one. They are also reference-ordering independent, so their contribution can be carried over from the traditional approach, for each such path. Length two paths are the first object that have reference-ordering dependence. However, the two inequivalent contributions dictated by the choice of reference orderings always enter with the same weight in the average, (3.13) For this particular path factor only the relative ordering between l and j matters: the first term P(l, j)P(j, i) is contributed by those reference orderings in which j is before l while the second term P(l, j, i) is from those with l before j. Thus averaging over reference orderings assign them the same weight.
For longer paths, it becomes more computationally difficult to directly calculate the RO-average. Interestingly, there exists a recursive structure that implicitly reproduces the full RO-average. The first important insight comes from stripping off the k i from eqs. (3.12) and (3.13) such that each becomes a Lorentz vector V µ The subscript on V denotes the set of subtrees rooted on i. We can then make use of the renaming of eq. (3.14) in eq. (3.15) to see the first hints of the recursion We now build a recursive ansatz for the V factor at any position along an arbitrary length path. As with any recursive approach, we assume the recursion correctly constructs a chain of vertices I, and we are now looking to incorporate the new contribution from vertex j / ∈ I that is not the end point of the path. Building from eq. (3.16), our recursive ansatz is where I is the set vertices above j in the increasing tree, and a, b are combinatoric coefficients that we will fix shortly. When we consider the source of the two terms from the reference ordering prescription, a(I) should be proportional to the number of reference orderings in which j precedes every element of I. Similarly, b(I) is proportional to the number of reference orderings in which at least one element of I precedes j. A careful counting of these permutations yields a(I) = p|I|! and b(I) = p|I||I|! with |I| the number of vertices in I, and p the joint proportionality constant. We fix this constant by demanding that the term involving only a product of ·k always carries +1. Since this term is common to all the reference orderings, the average should not change its coefficient. As a result, a(I) and b(I) need to satisfy Thus, the complete expression for the V factor of an arbitrary length path is given by the recursive definition with the termination case given in eq. (3.14). The other generalization necessary to evaluate an arbitrary tree is the merger of multiple paths at a vertex, schematic examples of which are shown in figure 2. The reference-ordering approach provides an explicit splitting of the tree at such vertices, but we want to avoid such a choice. Luckily, the recursive structure can be continued to this case. The simple example is the merger of two length-one paths at a vertex. 20) in which there are three different contributions: one each from when m or l comes before j in the reference orderings, and a third when j is first among the three. Using the knowledge we have already gained from analyzing the simpler paths in eqs. (3.13) and (

The result of RO-average is
The use of in eq. (3.21) makes the generalization to more incoming length-one paths straightforward: for incoming paths from vertices M = {m 1 , . . . , m r } we have where a is the (normalized) number of reference orderings in which j is before M , and similarly b a is the (normalized) number of reference orderings in which the first element in the sub-ordering of M ∪ {j} belongs to I a . Actually, this expression can be used to calculate any sub-trees merging at one vertex, see figure 2c, where M = {T 1 , . . . , T r } and each T a is an arbitrary sub-tree

(3.24)
This is due to the recursive assumption is that each of the V Ta has dressed its sub-tree correctly, and thus all the new vertex calculation needs to do is correctly merge the information from each of the T a via the relative normalizations, which are not sensitive to the structure of the sub-trees. With these new kinematic dressings, we can completely get rid of item 2 in the previous algorithm since explicit ordered splitting computations are no longer necessary. For completeness, our improved algorithm for calculating the numerator contribution N (T ) for a given increasing tree T is: 1. For the path n → ρ |ρ| → . . . ρ 1 → 1, assign a baseline factor W(1, ρ, n). Note that n → 1 and thus ρ = ∅ is allowed, which contributes W(1, n).
3. Calculate V from each sub-tree T a ∈ M ρ i and assemble the result: This algorithm only needs to process a single spanning tree T once, and directly generates the crossing symmetric numerator in {2, . . . , n−1}, thus eliminating the O((n−2)!) workload due to the RO-average, as advertised at the beginning of this subsection. We have explicitly checked agreement with numerators generated by RO-average through seven points, and numerically checked that they reproduce correct amplitudes through nine points. 7 We note that in our numerators k n does not appear by construction, and neither does ε n ·k 1 . Thus our numerators are naturally in a basis of momentum conservation. While the procedure was designed with pure Yang-Mills trees in mind, we can recall from eqs. (2.5) and (2.7) that all terms that care about 1 and n being fermions are localized to the baseline function W 2f . The rest of the kinematic structure is exactly identical to the pure Yang-Mills case. Thus, this algorithm also constructs the halfladder numerators with two fermions.

Four-Point YM Example
With the general principle under control, we'll now turn to an explicit example, and demonstrate some of the functionality of the Mathematica package. We'll construct the DDM basis numerator N (1, 2, 3, 4) using our new method, and then show that it indeed gives the correct numerator for N (1, 3, 2, 4) via relabeling.
The first step for building the numerators is to enumerate IT (1,2,3,4), which is implemented by the function increasingTrees: Then we need to dress each of the trees with their appropriate kinematic factors. This is handled by the function itNumer. For the simplest example, we have (1, 2, 3, 4) .
As discussed in section 2.1, this also builds the correct numerator for two fermions, N 2f , by replacing W gluon → W 2f .

Recursive Constructions for Multi-trace
The recursive construction we described in section 3.2 can be extended to also handle multi-trace YMS numerators. We will work out how to build numerators of the form N MT (1, β, n), with an additional set of specified scalar traces τ i . First using the baseline expansion discussed in section 2.2, we can put the legs 1 and n to the endpoints of a single scalar trace, see eqs. (2.16) and (2.30). Then we can work out the numerators following the process first proposed in Ref. [66], in particular, a spanning-tree algorithm given in section 10.3 (see also appendix A of Ref. [34]). However, this algorithm is RObased and thus crossing symmetry is not manifest. In this section, we develop additional refinements that restore crossing symmetry to the numerators, in line with what we did for the pure Yang-Mills numerators in section 3.2. Similar to the pure Yang-Mills case, the process involves two steps, constructing spanning trees and dressing with kinematic factors. As discussed in section 2.2, the gluons and scalar traces are generally treated on the same footing. First, we discuss the structure of the increasing trees to dress, and more generally the interplay between trace orderings and numerator ordering. The analog of the increasing-tree ordering used in eq. (3.6) is obtained by applying the coarse-grained O function (2.22) to the entire numerator ordering (1, β, n), where each λ i represents a scalar trace or a gluon. It is an ordering of our particle inventory {t 1 , t 2 , . . . , t N } = {τ 1 , τ 2 , . . . , τ m } ∪ G with N = m + |G|, where scalar traces τ i are treated on the same footing as a gluon in G. In addition, the first entry λ 1 must be the trace that leg 1 belongs to, or the leg 1 itself if it is a gluon. We can then construct all the increasing trees according to eq. (3.6) and write The vertices of each tree T are t i 's, which can either be a scalar trace or a gluon. Now that the increasing trees are laid out, we can begin assigning their kinematic dressings. To evaluate N MT (T ), the first step is to extract the baseline from a tree T . Suppose 1 ∈ t 1 and n ∈ t m , we first extract the path (t 1 , λ 1 , . . . , λ |B| , t m ) that connects them in T . If n is a gluon, then t m = n must be a leaf, while this is not necessary when t m is a scalar trace. Moreover, t m can coincide with t 1 . This happens when 1 and n belong to the same scalar trace to begin with. For this case, the path consists of just a single vertex t 1 . The baseline ordering (1, ρ, n) is then obtained by restricting the numerator color ordering (1, β, n) to (t 1 , λ 1 , . . . , λ |B| , t m ), , (3.34) which is exactly the baseline discussed in section 2.2. We can thus dress it with the kinematic factor W MT (1, ρ, n) defined in eq. (2.27). On the other hand, given a baseline (1, ρ, n), we can recover the path (t 1 , λ 1 , . . . , λ |B| , t m ), which we call the coarse-grained baseline, by using the coarse-grained O function, where M i is the set of sub-trees merging at the vertex i on the coarse-grained baseline O cg (1,ρ,n) . The kinematic factor associated to each sub-tree T a ∈ M i is V Ta ·K i,Ta ; K will be defined below in eq. (3.41). It can be evaluated by the RO-based method [34,66], but in the following, we will propose an improved algorithm that automatically recovers the crossing symmetry.
We denote M = {T 1 , T 2 , . . . , T r } as the set of sub-trees merging at the vertex t j . By construction the vertices of our spanning trees are labeled by {t 1 , . . . , t N }. The recursive structure of V when including the vertex t j takes a form very similar to eq. (3.24), where |T a | counts the number of vertices in T a . The combinatoric coefficients here are the same as eq. (3.24) for pure gluon cases since scalar traces are treated as a single object on the same footing as gluons. The path-begin factor b µ t j is given by where (a j , ω j , b j ) the sub-ordering of the elements in t j in the numerator ordering (1, β, n). The effect of RO-average is encoded in the prefactor 1 |t j | . This prescription recovers the crossing symmetry within the trace t j . We note that in the RO approach given in [34,66], one has to pick the same a j across when t j appears in the path-begin factor for all the numerator ordering, which breaks the manifest cyclicity of the scalar traces. Next, the through factor T µν t j ,Ta . At this point, it is useful to generalize B to take trees (multiple traces) after the vertical bar via is each element of t i that comes before everything in T a in the numerator ordering (1, β, n). Then T t j ,Ta is given by Note that it takes a similar form to the T factor used in the baseline factor W MT (1, ρ, n), see eq. (2.28a). The sgn function appearing in eqs. (3.38) and (3.40) ensures that N MT (1, β, n) is nonzero if and only if the numerator ordering (1, β, n) is KK-compatible to every scalar trace. Finally, the path-end factor K µ t j ,Ta is given by the total momentum from t j that comes before everything in T a . The factor K is actually the same as the one used in the RO approach [34,66].

Multi-trace Example
We Since 1 and 8 are already in the same trace, we don't need to use the full power of the baseline recursion in this case, and can instead focus on the kinematic factors immediately. The increasing trees that contribute to this numerator will have τ 1 as the baseline, and the remaining τ s as vertices along the trees. In this particle ordering, the traces are shuffled via so the increasing trees are ordered via τ 2 before τ 3 before τ 4 . The six increasing trees that build this numerator can be generated using Since B(τ 3 |τ 4 ) = (4) = τ 3 , the two trees with edge τ 4 → τ 3 are in the T µν = 0 case of eq. (3.40). Other than that important caveat, the last five of these trees can be evaluated by directly adapting various examples we covered in the YM construction. The first tree will be a good example application of the recursive structure. Using the kinematic dressings from section 3.4 in the recursion, we have (k 1 ·k 2 ) [2(k 2 ·k 4 )(k 2 ·k 5 ) + 6(k 2 ·k 5 )(k 3 ·k 4 ) + 3(k 1 ·k 5 )(k 2 ·k 4 + 3k 3 ·k 4 ) + 6(k 2 ·k 4 )(k 3 ·k 5 ) + 10(k 3 ·k 4 )(k 3 ·k 5 ) + (k 2 ·k 4 )(k 4 ·k 5 ) + 5(k 3 ·k 4 )(k 4 ·k 5 ) +3(k 1 ·k 4 )(2(k 1 ·k 5 ) + (k 2 ·k 5 ) + 3(k 3 ·k 5 ) + (k 4 ·k 5 ))] . (3.46) Next, we look at a case involving gluons that demonstrates the baseline expansion. The goal is to calculate N (1, 2, 3, 4, 5, 6) with Since we don't have a scalar trace of the form (1 . . . n), we need to perform the sum over pushing τ 2 and 5 into the baseline with τ 1 and 6. The various expansions of the baseline are given by which correspond to the increasing trees

Introducing Masses
Since the numerator construction we have described is dimensionally agnostic (up to the choice of fermion wavefunctions), it is straightforward to obtain numerators of minimally-coupled massive theories through a dimensional compactification. Such a construction is equivalent to a spontaneous symmetry breaking [83]. In this section, we will work with the mostly minus metric.
In particular, it is very convenient to introduce a mass to particles 1 and n that sit at the end points of the baseline. To construct the desired massive numerators in 4D, we start by picking special kinematics in 6D, where K i are 6D vectors, and k i are 4D vectors. The 6D massless conditions K 2 1 = K 2 n = 0 impose the 4D massive conditions k 2 1 = k 2 n = m 2 . For scalars, no modifications are needed as their external wavefunctions are trivial. The gluon polarization vectors are all kept in 4D, E i = ( i , 0, 0), and transverse to the momentum, To build up the explicit embedding of massive 4D spinors into massless 6D spinors, one can follow the technique of [4,84], where |i ± and |i ± ] are massive spinor helicity variables [85], and u and v are usual momentum-space Dirac spinors. The explicit representation of 6D spinors depends on the choice of 6D gamma matrices, and here we use the one given in the appendix A of Ref. [80]. 8 Our definition of Dirac spinors in terms of massive spinor helicity variables follows Ref. [86], where one can also find explicit formulas for these quantities.
We note that all the formal expressions for our numerators are unchanged under the uplift (4.1), except that one needs to use massive kinematics when evaluating. In fact, the only way to introduce an explicit mass term is via K 1 ·K n = k 1 ·k n + m 2 while K i · K j = k i · k j for all the rest. Our numerators are constructed to only contain the wavefunction of particle n but not the momentum, such that the product K 1 ·K n will not naturally appear unless one deliberately swaps it in via momentum conservation.

Conclusion and Outlook
In this paper we presented a review of the current state-of-the-art for calculating YM tree amplitudes in the CHY formalism. We identified a computational complexity that results from the requirement of the previous methods to define an ordered splitting of paths in order to build kinematic factors, which in turns builds numerators that are not crossing symmetric. To overcome this complexity, we developed a new method of assigning kinematic factors to increasing trees that incorporates all contributions from different ordered splittings implicitly, rather than via an explicit after-the-fact average. The primary benefit of the new approach is the restoration of crossing symmetry. Our new numerator construction is now only a function of external particle ordering, and has no dependence on reference orderings. The new method is easily generalized to include full multi-trace YMS numerators or two-fermion numerators. Since the numerators we construct are in the DDM basis, they can be double-copied immediately to yield the corresponding numerators for Einstein-Yang-Mills.
Our new method is efficient enough that we can easily calculate 8 point YM trees, where two of the external legs can be swapped out for fermions or scalars instead; we have been able to calculate individual crossing symmetric tree numerators up to 10 points without significant difficulty. This ability to calculate high-point trees with super-partners has been incredibly useful in the study of forward limits, which we leave to its own paper [78].
The fermionic baseline functions are the only piece of the calculation that depend on the dimension of interest. Thus, we described one method of using dimensional compactification to introduce a masses to the 2 singled out particles labeled 1 and n. We hope that the combination of double-copy compatibility, potentially massive particles, and calculation efficiency will be of interest to the growing application of amplitudes techniques to classical gravity, for instance as input into the unitarity cuts approach of constructing relevant loop amplitudes [13,15,16,[19][20][21].
Finally, we provide a Mathematica package the implements the recursive numerator construction. Appendix B and the examples in sections 3.3 and 3.5 have more details about the package and its usage.

Acknowledgments
We would like to thank Song He and Oliver Schlotterer for sharing unpublished notes related to the fermionic expansion eqs. (2.7), (2.8a) and (2.8b

A CHY integrands and scattering amplitudes
In this appendix, we provide the explicit forms of the CHY integrands used in the main text and the normalizations used to match Feynman diagram results. Throughout the paper, we use the following definition for the Pfaffian of a (2n) × (2n) antisymmetric matrix X, The sign factor (−1) is not present in the usual definition of a Pfaffian [88], but convenient for our purposes.
In the pure gluon integrand (2.4), the reduced Pfaffian is defined as where the 2n × 2n antisymmetric matrix Ψ is given in the block form, The three n × n blocks A, B and C are given by The polarization vectors are normalized as ε i ·ε * i = 1. The matrix Ψ ij ij is obtained from Ψ by deleting the i-th and j-th row and column. On the support of the scattering equations, Pf (Ψ) is independent of the choice of i and j. The Ψ G that appears in the baseline expansion (2.5) and the single-trace integrand (2.12) is a 2|G| × 2|G| antisymmetric submatrix of Ψ, obtained by restricting the rows and columns in the A, B and C blocks to the set G, where (X G ) ij = X ij for X = A, B, C and i, j ∈ G.
The multi-trace integrand I n (β 1 , . . . , β m |G) in eq. (2.13) contains the reduced Pfaffian of the 2(m + |G|) × 2(m + |G|) antisymmetric matrix Π, which can also be written in a block form, where the blocks are defined as follows, • The |G| × |G| matrix A G , B G and C G are just those appeared in Ψ G of eq. (A.5).
• The m × |G| matrix A tr,G , B tr,G and C tr,G are given by • The |G| × m matrix A G,tr , B G,tr and C G,tr are given by • The m × m matrix A tr , B tr and C tr are given by where k τ i = c∈τ i k c is the total momentum of the trace τ i .
The reduced Pfaffian Pf (Π) is defined as The two expressions are equivalent on the support of the scattering equations when both gluons and traces are present, while the first line applies for the pure scalar case and the second line applies for the pure gluon case. For these two special cases, one can then easily check that eq. (2.15) holds.
To explicitly calculate amplitudes from the CHY formalism, one needs to perform the moduli space integral in eq. where the matrix Φ is given by To obtain the Jacobian det(Φ 1,n−1,n 1,n−1,n ), one needs to delete three rows and columns from Φ following the gauge fixing. The normalization N is given by where |S| = n − |G| is the total number of scalars. This normalization is fixed by comparing with the color-ordered amplitudes from the YMS Lagrangian [57] with the gauge coupling g and φ 3 coupling λ. The scalar φ aA is in the bi-adjoint representation of the gauge group and another global symmetry group, whose structure constants are f abc andf ABC respectively. In particular, the normalization for the pure gluon cases is which is shared by the two-fermion integrand (2.7) since the two amplitudes are related by a SUSY Ward identity. The outstanding factor 2 can be absorbed into the definition of the reduced Pfaffian, as pointed out in the original CHY paper [25], while the rest simply originate from rescaling color factors. More specifically, we define partial amplitudes in the color trace bases with the normalization Tr(T a T b ) = δ ab and Tr(T A T B ) = δ AB . Both traces are taken in the fundamental representation of the gauge group.

B Mathematica Package
This appendix gives a more in-depth description for the functions provided by our Mathematica package, IncreasingTrees.m. The package can be found by following this link to our GitLab 9 : IncreasingTrees GitLab.
There are no external dependencies for the package, although it does rely on the Graph functionality of Mathematica, and thus may depend on the version in use. We have tested the package on Mathematica versions 11.2 and 12.0. The package defines a set of objects from which it builds the kinematic numerators, as well as defining construction routines to build the various numerators. It localizes all variables within either its own context IncreasingTrees`or its private context IncreasingTrees`Private`, and thus is generally safe to load along other packages. However, it does provide definitions for the single-character variables k,e,W,d,F, which may shadow definitions from other packages.

B.1 Constituent objects
The following objects are used to represent the numerators: • k[i ]:Momentum vector for particle(s) (i). Multi-label k are automatically converted to sums: k[1,2]= k 1 + k 2 .
• W[ ]:Generic baseline factor. Can be swapped out directly using replacement rules for WScalar, WGluon, WFermion2, which are all described in section 2. WL and WR are additionally defined to allow for differentiating theories in the double copy.

B.2 Construction functions
The package exposes the following functions used in assembling half-ladder numerators: • increasingTrees[labels List ]:Constructs all ordered trees where the vertex labels are taken from (labels) and the ordering is consistent with the ordering of (labels). Thus, each tree spans the vertices and is only composed of DirectedEdges of the form (labels)[[ j]] → (labels)[[i]] with i < j.
• itNumer[g Graph, maxV ]: Processes (g) to extract the baseline information and runs the recursive construction (detailed in section 3.2) starting from each vertex along the baseline to generate a kinematic polynomial. (maxV) specifies which vertex should be treated as the end of the baseline.
• ptTreeNumer[points Integer, onlyFuncQ :False ]: This function has a different return scheme based on whether (onlyFuncQ) is True or False: -True: generates a Function with (points) slots that in turn produces a half-ladder numerator with the specified external labels.
• multiTraceNumer[labels List, traces List ]: Computes the kinematic numerator for a mix of scalars and gluons carrying labels specified and ordered in (labels), grouped into scalar traces specified by (traces). Handles the construction and insertion of the baseline factor directly. Traces should be given as a List of Lists, where each sublist specifies a color-ordered trace of scalar particles. Gluons can either be specified as length 1 sublists in (traces), or omitted from the trace specification entirely. Expects each particle to be labeled the same way in (labels) and (traces).
• multiTraceIntegrand[points Integer, traces List ]:Computes the explicit permutation sum of multiTraceNumer over the permutations of labels ranging from (2, . . . ,(points)−1), using the trace configurations given in (traces), and dressed with appropriate PT factors. The particle labels in (traces) should be given as integers 1, . . . ,(points). Unfortunately, since the symmetries of the multi-trace numerators are significantly more nuanced than in the case of only gluons (or gluons with two scalars or two fermions), we don't make direct use of them in multiTraceIntegrand like we do in ptTreeNumer. As such, while multiTraceIntegrand and ptTreeNumer produce identical results for pure gluons, we strongly recommend to use ptTreeNumer in these situations, especially for larger particle number.
In addition to the half-ladder numerator constructors, we also provide the tools to assemble the numerators into amplitudes. However, these computations run into the O((n−2!) 2 ) complexity of calculating the full matrtix of m(α|β).
• Aamp[labels List ]:Constructs the Yang-Mills tree amplitude according to eq. (1.2), where the external particle labels are drawn from (labels). Includes the explicit normalization factor eq. (A.13). The baseline factors W[...] are left unevaluated to allow for choice of special particles.
• Mamp[labels List ]:Constructs the gravity tree amplitude according to eq. (1.3), where the external particle labels are drawn from (labels). The baseline factors for the two copies are left unevaluated, with seperate labels WL and WR to allow for double copies between different types of special particles. No explicit normalization is used.
It is important to note that only Aamp includes explicit normalizations. However, we make the YMS normalization accessible via • ymsNorm[traces List ]:Computes the YMS normalization N from eq. (A.13), with the trace groupings specified as seperate Lists in (traces). Gluons/fermions are specified via length-one Lists. ymCoup and scCoup are used to represent the sYM coupling g and scalar coupling λ, respectively.