Energy flow polynomials: A complete linear basis for jet substructure

We introduce the energy flow polynomials: a complete set of jet substructure observables which form a discrete linear basis for all infrared- and collinear-safe observables. Energy flow polynomials are multiparticle energy correlators with specific angular structures that are a direct consequence of infrared and collinear safety. We establish a powerful graph-theoretic representation of the energy flow polynomials which allows us to design efficient algorithms for their computation. Many common jet observables are exact linear combinations of energy flow polynomials, and we demonstrate the linear spanning nature of the energy flow basis by performing regression for several common jet observables. Using linear classification with energy flow polynomials, we achieve excellent performance on three representative jet tagging problems: quark/gluon discrimination, boosted W tagging, and boosted top tagging. The energy flow basis provides a systematic framework for complete investigations of jet substructure using linear methods.

In this paper, we introduce a powerful set of jet substructure observables organized directly around the principle of infrared and collinear (IRC) safety. These observables are multiparticle energy correlators with specific angular structures which directly result from IRC safety. Since they trace their lineage to the hadronic energy flow analysis of Ref. [74], we call these observables the energy flow polynomials (EFPs) and we refer to the set of EFPs as the energy flow basis. In the language of Ref. [74], the EFPs can be viewed as a discrete set of C-correlators, though our analysis is independent from the original C-correlator logic. Crucially, the EFPs form a linear basis of all IRC-safe observables, making them suitable for a wide variety of jet substructure contexts where linear methods are applicable.
There is a one-to-one correspondence between EFPs and loopless multigraphs, which helps to visualize and calculate the EFPs. A multigraph is a graph where any two vertices can be connected by multiple edges; in this context, a loop is an edge from a vertex to itself, while a closed chain of edges is instead referred to as a cycle. For a multigraph G with N vertices and edges (k, ) ∈ G, the corresponding EFP takes the form: where the jet consists of M particles, z i ≡ E i / M j=1 E j is the energy fraction carried by particle i, and θ ij is the angular distance between particles i and j. The precise definitions of E i and θ ij will depend on the collider context, with energy and spherical (θ, φ) coordinates typically used for e + e − collisions, and transverse momentum p T and rapidity-azimuth (y, φ) coordinates for hadronic collisions. For brevity, we often use the multigraph G to represent the formula for EFP G in Eq. (1.1), e.g.: This paper is a self-contained introduction to the energy flow basis, with the following organization. Sec. 2 contains a general overview of the EFPs, with more detailed descriptions of Eq. (1.1) and the correspondence to multigraphs. We also discuss a few different choices of measure for z i and θ ij . As already mentioned, EFPs are a special case of C-correlators [74], so not surprisingly, we find a close relationship between EFPs and other classes of observables that are themselves C-correlators, including jet mass, ECFs [51], certain generalized angularities [75], and energy distribution moments [49]. We also highlight features of the EFPs which are less well-known in the C-correlator-based literature.
In Sec. 3, we give a detailed derivation of the EFPs as an (over)complete linear basis of all IRC-safe observables in the case of massless particles. Because this section is rather technical, it can be omitted on a first reading, though the logic just amounts to systematically imposing the constraints of IRC safety. In Sec. 3.1, we use an independent (and arguably more transparent) logic from Ref. [74] to show that any IRC-safe observable can be written as a linear combination of C-correlators: (1.3) where f N is an angular weighting function that is only a function of the particle directionŝ p µ i = p µ i /E i (and not their energies E i ). To derive Eq. (1.3), we use the Stone-Weierstrass theorem [76] to expand an arbitrary IRC-safe observable in polynomials of particle energies, and then directly impose IRC safety and particle relabeling invariance. In Sec. 3.2, we determine the angular structures of the EFPs by expanding f N in terms of a discrete set of polynomials in pairwise angular distances. Remarkably, the discrete set of polynomials appearing in this expansion is in one-to-one correspondence with the set of non-isomorphic multigraphs, which facilitates indexing the EFPs by multigraphs to encode the geometric structure in Eq. (1.1).
In Sec. 4, we investigate the complexity of computing EFPs. Naively, Eq. (1.1) has complexity O(M N ) due to the N nested sums over M particles. However, the rich analytic structure of Eq. (1.1) and the graph representations of EFPs allow for numerous algorithmic speedups. Any EFP with a disconnected graph can be computed as the product of the EFPs corresponding to its connected components. Furthermore, we find that the Variable Elimination (VE) algorithm [77] can be used to vastly speed up the computation of many EFPs compared to the naive O(M N ) algorithm. VE uses the factorability of the summand to systematically determine a more efficient order for performing nested sums. For instance, all tree graphs can be computed in O(M 2 ) using VE. As an explicit example, consider an EFP with naive O(M 6 ) scaling: ( In Sec. 5, we perform numerical linear regression with EFPs for various jet observables. The linear spanning nature of the energy flow basis means that any IRC-safe observable S can be linearly approximated by EFPs, which we write as: for some finite set of multigraphs G and some real coefficients s G . One might worry that the number of EFPs needed to achieve convergence could be intractably large. In practice, though, we find that the required set of G needed for convergence is rather reasonable in a variety of jet contexts. While we find excellent convergence for IRC-safe observables, regressing with IRC-unsafe observables does not work as well, demonstrating the importance of IRC safety for the energy flow basis. In Sec. 6, we perform another test of Eq. (1.5) by using linear classification with EFPs to distinguish signal from background jets. We consider three representative jet tagging problems: quark/gluon discrimination, boosted W tagging, and boosted top tagging. In this study, the observable appearing on the left-hand side of Eq. (1.5) is the optimal IRC-safe discriminant for the two classes of jets. Remarkably, linear classification with EFPs performs comparably to multivariate machine learning techniques, such as jet images with convolutional neural networks (CNNs) [50,[63][64][65][66] or dense neural networks (DNNs) with a complete set of N -subjettiness observables [57]. Both the linear regression and classification models have few or no hyperparameters, illustrating the power and simplicity of linear learning methods combined with our fully general linear basis for IRC-safe jet substructure.
Our conclusions are presented in Sec. 7, where we highlight the relevance of the energy flow basis to machine learning and discuss potential future applications and developments. A review of C-correlators and additional tagging plots are left to the appendices.

Energy flow polynomials
IRC-safe observables have long been of theoretical and experimental interest because observables which lack IRC safety are not well defined [78][79][80][81], or require additional care to calculate [82][83][84][85][86], in perturbative quantum chromodynamics (pQCD). More broadly, though, IRC safety is a simple and natural organizing principle for high-energy physics observables, since IRC-safe observables probe the high-energy structure of an event while being insensitive to low-energy and collinear modifications. IRC safety is also an important property experimentally as IRC-safe observables are more robust to noise and finite detector granularity.
As argued in Refs. [74,[87][88][89], the C-correlators in Eq. (1.3) are a generic way to capture the IRC-safe structure of a jet, as long as one chooses an appropriate angular weighting function f N . Later in Sec. 3, we give an alternative proof that C-correlators span the space of IRC-safe observables and go on to give a systematic expansion for f N . This expansion results in the EFPs, which yield an (over)complete linear basis for IRC-safe observables. In this section, we highlight the basic features of the EFPs and their relationship to previous jet substructure observables.

The energy flow basis
One can think of the EFPs as C-correlators that make specific, discrete choices for the angular weighting function f N in Eq. (1.3). True to their name, EFPs have angular weighting functions that are polynomial in pairwise angular distances θ ij . The energy flow basis is therefore all C-correlators with angular structures that are unique monomials in θ ij , meaning monomials that give algebraically different expressions once the sums in Eq. (1.3) are performed. Since we intend to apply the energy flow basis for jet substructure, we remove the dependence on the overall jet kinematics by normalizing the particle energies by the total jet energy, The uniqueness requirement on angular monomials can be better understood by developing a correspondence between monomials in θ ij and multigraphs: Multigraph/EFP Correspondence. The set of loopless multigraphs on N vertices corresponds exactly to the set of angular monomials in {θ i k i } k< ∈{1,··· ,N } . Each edge (k, ) in a multigraph is in one-to-one correspondence with a term θ i k i in an angular monomial; each vertex j in the multigraph corresponds to a factor of z i j and summation over i j in the EFP: Using Eq. (2.1), the EFPs can be directly encoded by their corresponding multigraphs. For instance: Since any labeling of the vertices gives an equivalent algebraic expression, we represent the graphs as unlabeled. The specification that the EFPs are unique monomials translates into the requirement that the corresponding multigraphs are non-isomorphic. Versions of these multigraphs have previously appeared in the physics literature in the context of manybody configurations [90,91], encoding all local scalar operators of a free theory [92], and in graphically depicting ECFs for jets [52,93]. Table 1 contains a summary of the correspondence between the properties of EFPs and multigraphs. The number of graph vertices N corresponds to the number of particle sums in the EFP, and the number of graph edges d corresponds to the degree of the EFP (i.e. the degree of the underlying angular monomial). The number of separated prongs for which an individual EFP is first non-vanishing is the chromatic number of the graph: the smallest number of colors needed to color the vertices of the graph with no two adjacent vertices sharing a color. For computational reasons discussed further in Sec. 4, we also care about the treewidth of the graph, which is related to the computational complexity χ of an EFP. Also for computational reasons, we make a distinction between connected or prime multigraphs and disconnected or composite multigraphs; the value of a composite EFP is simply the product of the prime EFPs corresponding to its connected components.
Because the EFP basis is infinite, a suitable organization and truncation scheme is necessary to use the basis in practice. In this paper, we usually truncate by restricting to the set of all multigraphs with at most d edges. This is a natural choice because it corresponds to truncating the approximation of the angular function f N at degree d polynomials. Furthermore, this truncation results in a finite number of EFPs at each order of truncation, which is not true for truncation by the number of vertices. The number of multigraphs with exactly d edges is Sequence A050535 in the On-Line Encyclopedia of Integer Sequences (OEIS) [94,95]; the number of connected multigraphs with exactly d edges is Sequence A076864 in the OEIS [94]. The numbers of EFPs in our truncation of the energy flow basis are the partial sums of these sequences, which are listed in Table 2a up to d = 10. Table 2b tabulates the number of prime EFPs of degree d binned by N up to d = 10. Table 3 Table 3.

Degree
Connected Multigraphs

Energy and angular measures
There are many possible choices for the energy fraction z i and angular measure θ ij used to define the EFPs. In the analysis of Sec. 3, this choice arises because there are many systematic expansions of IRC-safe observables in terms of energy-like and angular-like quantities. Typically, one wants to work with observables that respect the appropriate Lorentz subgroup for the collision type of interest. For e + e − colliders, the symmetries are the group of rotations about the interaction point, and for hadron colliders they are rotations about and boosts along the beam axis (sometimes with a reflection in the plane perpendicular to the beam). Therefore, the energy fractions z i usually use particle energies E i at an e + e − collider and particle transverse momenta p T,i at a hadron collider. For the angular weighting function f N , though, there are many different angular structures one can build out of the particle directionsp µ i . The EFPs use the simplest and arguably most natural choice to expand the angular behavior: pairwise angular distances θ ij , determined using spherical coordinates (θ, φ) at an e + e − collider and rapidity-azimuth coordinates (y, φ) at a hadron collider. Other classes of observables, such as ECFs [51] and ECFGs [52], also use pairwise angles since they manifestly respect the underlying Lorentz subgroup. For building the EFPs, is important that the θ ij , or any other choice of geometric object, be sufficient to reconstruct the value of the original function f N in terms of thep µ i . For pairwise angles, this property can be shown by triangulation, under the assumption that the observable in question does not depend on the overall jet direction nor on rotations or reflections about the jet axis. Since jets are collimated sprays of particles, the θ ij are typically small and are good expansion parameters.
At various points in this paper, we explore three different energy/angular measures. For e + e − collisions, our default is: where β > 0 is an angular weighting factor. For the hadron collider studies in Secs. 5 and 6, we use: where ∆y ij ≡ y i − y j , ∆φ ij ≡ φ i − φ j are determined by the rapidity y i and azimuth φ i of particle i. This measure is rotationally-symmetric in the (y, φ) plane, which is the most commonly used case in jet substructure. For situations where this rotational symmetry is not desirable (such as for jet pull [96]), we can instead use a two-dimensional measure that treats the rapidity and azimuthal directions separately: where each line on the multigraph now has an additional decoration to indicate whether it corresponds to ∆y or ∆φ. We emphasize that the choice of measure is not unique, though it is constrained by the IRC-safety arguments in Sec. 3. For example, IRC safety requires that the energy-like quantities appear linearly in z i . For the default measures, the angular exponent β can take on any positive value and still be consistent with IRC safety. Depending on the context, different choices of β can lead to faster or slower convergence of the EFP expansion, with β < 1 emphasizing smaller values of θ ij and β > 1 emphasizing larger values of θ ij . For special choices of z i and θ ij , some EFPs may be linearly related, a point we return to briefly in Sec. 4.1.

Relation to existing substructure observables
Many familiar jet observables can be nicely interpreted in the energy flow basis. When an observable can be written as a simple expression in terms of particle four-momenta or in terms of energies and angles, the energy flow decomposition can often be performed exactly. Some of the most well-known observables, such as jet mass and energy correlation functions, are exactly finite linear combinations of EFPs (with appropriate choice of measure), which one might expect since they also correspond to natural C-correlators. Unless otherwise specified, the analysis below uses the default hadronic measure in Eq. (2.4) with β = 1 and treats all particles as massless. 2

Jet mass
Jet mass is most basic jet substructure observable, and not surprisingly, it has a nice expansion in the energy flow basis. In particular, the squared jet mass divided by the jet energy squared is an exact N = 2 EFP using the e + e − measure in Eq. (2.3) with β = 1: Note that mass is exactly an EFP for any β = 2/N measure choice.
2 A proper treatment of non-zero particle masses would require an additional expansion in the velocities of the particles (see related discussion in Refs. [97,98]). To avoid these complications, one can interpret all particles as being massless in the E-scheme [97], i.e. p µ rescaled = E (1,p) withp = p/| p|.
For the hadronic measure in Eq. (2.4) with β = 1, there is an approximate equivalence with the squared jet mass divided by the jet (scalar) transverse momentum: Since the jet mass is not exactly rotationally symmetric in the rapidity-azimuth plane, the subleading terms in Eq. (2.7) are not fully encompassed by the simplified set of hadronic observables depending only on {∆y 2 ij + ∆φ 2 ij }, but could be fully encompassed by using an expansion in {∆y ij , ∆φ ij } as in Eq. (2.5). For narrow jets, these higher-order terms in the expansion become less relevant since ∆y ij , ∆φ ij 1. 3

Energy correlation functions
The ECFs are designed to be sensitive to N -prong jet substructure [51]. They can be written as a C-correlator, Eq. (1.3), with a particular choice of angular weighting function: where θ ij = (∆y 2 ij + ∆φ 2 ij ) 1/2 . In terms of multigraphs, the ECFs correspond to complete graphs on N vertices: which are EFPs using the measure in Eq. (2.4) with exponent β. The ECFs have since been expanded to a more flexible set of observables referred to as the ECFGs [52]. Letting min (m) indicate the m-th smallest element in a set, the ECFGs are also C-correlators with angular weighting function: The ECFGs do not have an exact multigraph correspondence due to the presence of the min function, but are evidently closely related to the EFPs since they share a common energy structure. The min function itself can be approximated by polynomials in its arguments, 3 Alternatively, we could use a measure with θij = 2 p µ i p jµ p T ,i p T ,j β/2 , similar in spirit to the Conical Geometric measure of Ref. [99], to exactly recover the jet mass.
which induces an approximating series for the ECFGs in terms of EFPs when plugged into the common energy structure.
Both the EFPs and the ECFGs represent natural extensions of the ECFs but in different directions. From our graph-theoretic perspective, the EFPs extend the ECFs to non-fullyconnected graphs. The ECFGs extend the scaling properties of the ECFs into observables with independent energy and angular scalings. As discussed in Sec. 2.4, there are angular structures possible in the EFPs that are not possible in the ECFGs. As with any jet substructure analysis, the choice of which set of observables to use depends on the physics of interest, with the EFPs designed for linear completeness and the ECFGs designed for nice power-counting properties. . (2.13) For even α, the parenthetical in Eq. (2.13) can be expanded and identified to be a linear combination of EFPs with N = α and d = α (see Ref. [49] for a related discussion). For α = 2, Eq. (2.13) implies: (2.14) -12 -For α = 4 and α = 6, Eq. (2.13) implies: This can be continued for arbitrarily high, even α. Thus, the even α angularities are exact, non-trivial linear combinations of EFPs, illustrating the close connections between the two classes of observables. While angularities with odd or non-integer α do not have the same analytic tractability, the specific case of α = 1/2 is shown to be numerically well approximated by EFPs in Sec. 5.3.

Geometric moment tensors
Next, we consider observables based on the two-dimensional geometric moment tensor of the energy distribution in the (y, φ)-plane [49,60]: where the distances are measured with respect to the p T -weighted centroid axis (y J , φ J ) from Eq. (2.12). Useful observables can be constructed from the trace and determinant of C, such as planar flow Pf = 4 det C/(tr C) 2 [101,104], which is a ratio of two IRC-safe observables. We see that Eq. (2.17) is exactly a matrix of EFPs with N = 2 and the two-dimensional hadronic measure from Eq. (2.5). The trace tr C and determinant det C have the rotational symmetry in the (y, φ)-plane of the default hadronic measure from Eq. (2.4), allowing them to be written as linear combinations of EFPs with that measure: In Ref. [49], a general class of energy flow moments was explored and categorized, with the goal of classifying observables according to their energy flow distributions. These energy flow moments are defined with respect to a specified jet axis: Using the p T -weighted centroid axis, this is the natural generalization of Eq. (2.17), with the special case of I k 1 k 2 = (C) k 1 k 2 . By performing a similar analysis to the one used to arrive at Eq. (2.18), one can show that any scalar constructed by contracting the indices of a product of objects in Eq. (2.19) can be decomposed into an exact linear combination of EFPs.

Going beyond existing substructure observables
Because the EFPs are C-correlators that span the space of IRC-safe observables, their angular structures should encompass all possible behaviors of C-correlators. By contrast, the ECFs and ECFGs mentioned in Sec. 2.3.2 have more restricted behaviors, and it is illuminating to understand the new kinds of structures present in the EFPs.
Without loss of generality, the angular weighting function f N in Eq. (1.3) can be taken to be a symmetric function of the particle directionsp µ i due to the symmetrization provided by the sum structure (see Eq. (3.14) below). The ECFs and ECFGs exhibit a stronger symmetry, though, since the angular functions in Eqs. (2.8) and (2.10) are invariant under the swapping any two pairwise angles θ ij . This symmetry is manifested in the ECFs multigraphs in Eq. (2.9) by the fact that all pairs of indices are connected by the same number of edges.
We can easily see that the pairwise swap symmetry of the ECFs is stronger than the full permutation symmetry of the EFPs: the group of permutations of the angular distances θ ij has N 2 ! elements, whereas the group of permutations of the indices {i a } has N ! elements. An example of an EFP that does not satisfy the stronger symmetry is the following N = 4 graph: (2.20) The angular weighting function of the EFP in Eq. (2.20) is symmetric under the 4! permutations in the indices (vertices) i a → i σ(a) but not under the exchange of pairwise angles (edges) θ i 1 i 3 → θ i 2 i 3 which would result in a different EFP, namely: = . (2.21) Another feature of the ECFs and ECFGs is that their angular weighting function f N vanishes whenever two of its arguments become collinear. Indeed, one of the present authors made the erroneous claim in Ref. [52] that this vanishing behavior was required by collinear safety. 4 Instead, the argument in Sec. 3.1.3 shows this not to be the case, and observables defined by Eq. (1.3) are IRC safe for any sufficiently smooth and non-singular f N . An example 4 If the sums are taken over distinct N -tuples as in Ref. [52], then the angular function does have to vanish on collinearity for C safety. In general, non-collinearly-vanishing angular functions are C safe if the sum is taken over all N -tuples of particles, including sets with repeated indices.
of an EFP that does not necessarily vanish when two of its arguments become collinear is the following N = 3 graph: which does not vanish whenp µ i 2 →p µ i 3 . More generally, any non-fully-connected graph will not vanish in every collinear limit, but the corresponding EFP will still be collinear safe.
By relaxing the restrictions on the angular weighting function f N to those minimally required by IRC safety, the energy flow basis captures all topological structures which can possibly appear in a C-correlator, beyond just the ones described by ECFs and ECFGs.

Constructing a linear basis of IRC-safe observables
Having introduced the EFPs, we now give a detailed argument that they linearly span the space of IRC-safe observables. Due to its more technical nature, this section can be omitted on a first reading, and the reader may skip to Sec. 4. Refs. [74,[87][88][89] argue that, from the point of view of quantum field theory, all IRC-safe information about the jet structure should be contained in the C-correlators. In Sec. 3.1, we independently arrive at the same conclusion by a direct application of IRC safety. We then go on in Sec. 3.2 to expand the angular structure of the C-correlators to find a correspondence between multigraphs and EFPs.
An IRC-safe observable S depends only on the unordered set of particle four-momenta , and not any non-kinematic quantum numbers. An observable defined on {p µ i } M i=1 can alternatively be thought of as a collection of functions, one for each number of particles M . IRC safety then imposes constraints on this collection and thereby induces relations between the functions. The requirement of IR safety imposes the constraint [81]: while the requirement of C safety imposes the constraint: Eq. (3.1) says that the observable is unchanged by the addition of infinitesimally soft particles, while Eq. (3.2) guarantees that the observable is insensitive to a collinear splitting of particles. As written, only particle M is affected in Eq. (3.2). The indexing used to identify particles, however, is arbitrary and these properties continue to hold when the particles are reindexed. This particle relabeling symmetry is not an additional constraint that is imposed but rather a consequence of assigning labels to an unordered set of particles. These three restrictions-IR safety, C safety, and particle relabeling symmetry-are necessary and sufficient conditions for obtaining the energy flow basis.
Throughout this analysis, particles are treated as massless, p µ i = E ip µ i , wherep µ i is purely geometric. Note that we could replace E i with any quantity linearly dependent on energy, such as the transverse momentum p T,i , which corresponds to making a different choice of measure in Sec. 2.2.

Expansion in energy
Consider an arbitrary IRC-safe observable S, expanded in terms of the particle energies. If the observable has a simple analytic dependence on the energies, then the usual Taylor expansion can be used: where M is the particle multiplicity and the derivatives are evaluated at vanishing energies. An example of this is the jet mass from Eq. (2.6): where η µν is the Minkowski metric. This expression is already in the form of Eq. (3.3) with: For some observables, though, a Taylor expansion may be difficult or impossible to obtain. The simplest example is a non-differentiable observable. This is the case for m J (rather than m 2 J ); the presence of the square root spoils the existence of a Taylor expansion, but the square root can be nonetheless approximated by polynomials arbitrarily well in a bounded interval. A more complicated case is if the observable is defined in terms of an algorithm, such as a groomed jet mass [5,[105][106][107][108][109], and an explicit formula in terms of particle four-momenta would not be practical to differentiate or write down. Similarly, the observable could be a non-obvious function of the particles, i.e. the optimal observable to accomplish some task.
In cases without a Taylor expansion, the Stone-Weierstrass theorem [76] still guarantees that the observable can be approximated over some bounded energy range by polynomials in the energies. 5 We write down such an expansion by considering all possible polynomials in the energies and multiplying each one by a different geometric function. Combining all terms of degree N into C N , the expansion is: . . ,p µ M ) are geometric angular functions, which depend on the indices of the energy factors i 1 · · · i n and could in general be different for different multiplicities M . The Stone-Weierstrass theorem guarantees that there is a maximum degree N max in this energy expansion for any given desired accuracy, but places no further restrictions on the C N .
To derive constraints on these angular functions C (M ) i 1 ···i N , we impose the three key properties of IR safety in Sec. 3.1.1, particle relabeling invariance in Sec. 3.1.2, and C safety in Sec. 3.1.3, which we summarize in Sec. 3.1.4. In applying these properties, we will often use the fact that when setting two expressions for the observable S equal to each other, we can read off term-by-term equality by treating the particle energies as independent quantities: Note that the sum structure in Eq. (3.6) implies that, without loss of generality, the angular functions can be taken to depend only on the labels i 1 , . . . , i N as an unordered set.

Infrared safety
IR safety constrains the angular functions appearing in the expansion of Eq. (3.6) in two ways: by restricting which particle directions contribute to a particular term in the sum and by relating angular functions of different multiplicities. First, consider a particular angular function, C i 1 ···i N in Eq. (3.6), and some particle j ∈ {i 1 , . . . , i N }. Consider particle j in the soft limit: if C (M ) i 1 ···i N depends onp µ j in any way, then IR safety is violated because E j does not appear in the product of energies but the value of the observable changes as the direction of j is changed. Hence, IR safety imposes the requirement that namely the indices of the arguments must match those of the angular function. Note that we must always write C (M ) i 1 ···i N with N arguments, even if some are equal due to indices coinciding. Next, consider two polynomial approximations of the same observable: one as a function of M particles and the other as a function of M +1 particles. In the soft limit of particle M +1, E M +1 → 0, the IR safety of S, written formally in Eq. (3.1), guarantees that the function of M + 1 particles approaches the function of M particles. In terms of the corresponding polynomial approximations, we have that: We see from Eq. (3.9) that the same angular coefficients from the polynomial approximation of the function of M + 1 particles can be validly chosen for the approximation of the function of M particles, with the following equality of angular functions: which says that the multiplicity label on the angular functions can be dropped. As a result of enforcing IR safety, the dependence of the angular functions on multiplicity has been eliminated, as well as the dependence of a given angular function on any particles with indices not appearing in its subscripts.

Particle relabeling symmetry
Now, using particle relabeling symmetry, for all σ ∈ S M , where S M is the group of permutations of M objects, we have that C N is unchanged by the replacement . With the angular functions as constrained by IR safety, the particle relabeling invariance of C N can be written as: where the sums were reindexed according to σ −1 . In particular, from Eq. (3.12), we have for any σ ∈ S M that: Eq. (3.13) allows us to permute the indices of C i 1 ···i N within S M , equating previously unrelated angular functions.
As written, C i 1 ···i N is not necessarily symmetric in its arguments. Without loss of generality, though, we can symmetrize C i 1 ···i N without changing the value of C N as follows: where C i 1 ···i N is now symmetric in its arguments. We assume in the next step of the derivation that the angular weighting functions are symmetric in their arguments.

Collinear safety
The key requirement for restricting the form of C i 1 ···i N is C safety. If the angular weighting function(s) were required to vanish whenever two of the inputs were collinear, then the observable would be manifestly C safe (see e.g. [52]); this is a sufficient condition for C safety but not a necessary one. More generally, one can have non-zero angular functions of N arguments even when subsets of the arguments are collinear.
Using the IR safety argument of Eq. (3.10) and the particle relabeling symmetry of Eq. (3.13), we can relate any angular function C i 1 ···i N to one of the following: where there is one of these "standard" angular functions for each integer partition of N .
In particular, the length of the integer partition is how many unique indices appear in the subscript and the values of the partition indicate how many times each index is repeated.
The role of C safety is to impose relationships between these standard angular functions, eventually showing that the only required function is C 123···N . Intuitively, this means that as any set of particles become collinear, the angular dependence is that of collinear limit of N arbitrary directions. The proof that this follows from C safety, however, is the most technically involved step of this derivation.
The requirement of C safety in Eq. (3.2) implies that S is unchanged whether one or the same particles with a collinear splitting of the first particle, for all λ ∈ [0, 1] and i > 1. Rewriting Eq. (3.11), we can explicitly separate out the terms of the sums involving k collinearly split indices {0, 1}: where in going to this last expression, we have used the symmetry of C i 1 ···i N in its arguments and accounted for the degeneracy of such terms using the binomial factor N k . We then insert the collinear splitting kinematics of Eq. (3.16) into Eq. (3.18), where in going to this last expression, we have used the particle relabeling symmetry of Eq. (3.13) to sort the {0, 1} subscript indices of the angular functions.
The constraint of C safety says that Eq. (3.20) is equal to Eq. (3.11) on the non-collinearly split event. To make this constraint more useful, we use the binomial theorem to write 1 in a suggestive way: and insert this expression into Eq. (3.11), separating out factors where k of the indices are equal to 1: Subtracting Eq. (3.22) from Eq. (3.20) and treating the energies as independent quantities, the following constraint can be read off: where the identical arguments of the angular functions are suppressed for compactness. We would like to obtain that the quantity in parentheses in Eq.
for 0 ≤ ≤ k. Note that in this expression, the first k arguments of the functions are identical. The constraint in Eq. (3.24) is very powerful, especially when combined with the relabeling symmetry of Eq. (3.13). While we obtained Eq. (3.24) using the collinear limit, the particle directionp 0 appears nowhere in this expression, so the 0 subscript is simply an index on the angular function. Therefore, when any k arguments of one of the angular functions become collinear, any ≤ k of the corresponding subscript labels may be swapped out for values not appearing anywhere else in the indices. A concrete example of this is 25) where the N index here plays the role of the 0 index in Eq. (3.24). This then implies that all of the angular functions in Eq. (3.15) can related to a single function: yielding the intuitive result that the angular dependence when some number of particles become collinear should follow from the collinear limit of N arbitrary directions.

A new derivation of C-correlators
Finally, substituting Eq. (3.26) into Eq. (3.11) implies that where we recognize C f N N as the C-correlators of Eq. (1.3). This expression says that an arbitrary IRC-safe observable can be approximated arbitrarily well by a linear combination of C-correlators. In this way, we have given a new derivation that C-correlators linearly span the space of IRC-safe observables by directly imposing the constraints of IRC safety and particle relabeling symmetry on an arbitrary observable.
The argument presented here suffices to show the IRC-safety of the C-correlators with any continuous angular weighting function, even if it is not symmetric. Though we used the symmetrization in Eq. (3.14) to aid the C-safety derivation in Sec. 3.1.3, it is now perfectly valid to relax this constraint on f N . In particular, we can simply consider Eq. (3.14) applied in reverse and select a single term in the symmetrization sum to represent f N . Thus we are not constrained merely to symmetric f N , which will be helpful in obtaining the EFPs.

Expansion in geometry
Having now established that the C-correlators linearly span the space of IRC-safe observables, we now expand the angular weighting function f N in Eq. (3.27) in terms of a discrete linear angular basis. 6 By virtue of the sum structure of the C-correlators, this angular basis directly translates into a basis of IRC-safe observables, i.e. the energy flow basis.
Following the discussion in Sec. 2.2, we take the angular function f N to depend only on the pairwise angular distances θ ij . Note that the results of Sec. 3.1 continue to hold with pairwise angular distances in place of particle directions, as long as θ ij is a dimensionless function ofp µ i andp µ j with no residual dependence on energy. Of course, this choice would not be valid for expanding IRC-safe observables that do not respect the symmetries implied by θ ij , such as trying to use the default hadronic measure in Eq. (2.4) for observables that depend on the overall jet rapidity. In such cases, one can perform an expansion directly in thep µ i , though we will not pursue that here.
Expanding the angular function f N in terms of polynomials up to order d max in the pairwise angular distances yields: where Θ d is the set of monomials in {θ ij | i < j ∈ {i 1 , . . . , i N }} of degree d, M is one of these monomials, and the b M are numerical coefficients. While this is a perfectly valid expansion, it represents a vast overcounting of the number of potential angular structures. Our goal is to substitute Eq. (3.28) into the definition of a C-correlator in Eq. (3.27) and identify the unique analytic structures that emerge. Note that two monomials M 1 , M 2 ∈ Θ d that are related by a permutation σ ∈ S N with action θ iai b → θ i σ(a) i σ(b) give rise to identical C-correlators, C M 1 = C M 2 , as a result of the relabeling symmetry in Sec. 3.1.2. Thus, we can greatly simplify the angular expansion by summing only over equivalence classes of monomials not related by permutations, which we write as Θ d /S N . Writing this out in terms of E ∈ Θ d /S N : where, by the relabeling symmetry, M E can be any representative monomial in the equivalence class E, and the coefficient b E = |E| b M absorbs the size |E| of the equivalence class. As described in Sec. 2.1, the set of monomials Θ d is in bijection with the set of multigraphs with d edges and N vertices, and the set of equivalence classes Θ d /S N is in bijection with the set of non-isomorphic multigraphs with d edges and N vertices. In particular, each edge (k, ) in a multigraph G corresponds to a factor of θ i k i in the monomial M E : where G corresponds to the equivalence class E. By substituting Eq. (3.31) into Eq. (3.30) and relabeling the coefficient b E to b G , we can identify the resulting analytic structures that linearly span the space of C-correlators as the (unnormalized) EFPs: 32) where G N,d is the set of non-isomorphic multigraphs with d edges on N vertices. In Sec. 3.1.4, it was shown that the set of IRC-safe observables is linearly spanned by the set of C-correlators, summarized in Eq. (3.27). In this section, we have shown in Eq. (3.32) that the C-correlators themselves are linearly spanned by the EFPs, whose angular structures are efficiently encoded by multigraphs. By linearity, the EFPs therefore form a complete linear basis for all IRC-safe observables, completing our argument.

Computational complexity of the energy flow basis
Since we would like to apply the energy flow basis in the context of jet substructure, the efficient computation of EFPs is of great practical interest. Naively, calculating an EFP whose graph has a large number of vertices requires a prohibitively large amount of computation time, especially as the number of particles in the jet grows large. In practice, though, we can dramatically speed up the implementation of the EFPs by making use of the correspondence with multigraphs. Beta code to calculate the EFPs using these methods is available through our EnergyFlow module.

Algebraic structure
The set of EFPs has a rich algebraic structure which will allow in some cases for faster computation. Firstly, they form a monoid (a group without inverses) under multiplication. In analogy with the natural numbers, the composite EFPs, those with disconnected multigraphs, can be expressed as a product of the prime EFPs corresponding to the connected components of a disconnected graph: where C(G) is the set of connected components of the multigraph G.
As a concrete example of Eq. (4.1), consider: Thus, we only need to perform summations for the computation of prime EFPs, with the composite ones given by Eq. (4.1). Note that if one were combining EFPs with a nonlinear method, such as a neural network, the composite EFPs would not be needed as separate inputs since the model could in principle learn to compute them on its own. The composite EFPs are, however, required to have a linear basis and should be included when linear methods are employed, such as those in Secs. 5 and 6. The relationship between prime and composite EFPs is just the simplest example of the algebraic structure of the energy flow basis. The EFPs depend on M energies and M 2 pairwise angles, but there are only 3M − 4 degrees of freedom for the phase space of M massless particles, leading generically to additional (linear) relations among the EFPs. Hence, the EFPs are an over complete linear basis. We leave further analysis and exploration of these relations to future work, and simply remark here that linear methods continue to work even if there are redundancies in the basis elements.

Dispelling the O(M N ) myth for N -particle correlators
It is useful to analyze the complexity of computing an EFP. 7 A naive implementation of Eq. (1.1) runs in O(M N ) due to the N nested sums over M particles. There is a computational simplification, however, that can be used to tremendously speed up calculations of certain EFPs by making use of the graph structure of G. As an example, consider the following EFP: In general, since the summand is a product of factors, the distributive property allows one to put parentheses around combinations of sum operators and factors. A clever choice of such parentheses, known as an elimination ordering, can often be used to perform the N sums of Eq. (1.1) in a way which greatly reduces the number of operations needed to obtain the value of the EFP for a given set of particles. This technique is known as the Variable Elimination (VE) algorithm [77] (see also Ref. [112] for a review). 7 The title of this section is inspired by Ref. [111].  9 21  55 146  415 1 212  3 653  3  1  3 12  47 185  757 3 181 13 691  4  1  2  11  49  231  5 1 Total 1 2 5 12 33 103 333 1 183 4 442 17    When run optimally, the VE algorithm reduces the complexity of computing EFP G to O(M tw(G)+1 ) where tw(G) is the treewidth of the graph G, neglecting multiple edges in the case of multigraphs. The treewidth is a measure which captures how tangled a graph is, with trees (graphs with no cycles) being the least tangled (with treewidth 1) and complete graphs the most tangled (with treewidth N − 1). Additionally, we have that for graphs with a single cycle the treewidth is 2 and for complete graphs minus one edge the treewidth is N − 2. Thus the EFPs corresponding to tree multigraphs can be computed with VE in O(M 2 ) whereas complete graphs do require the naive O(M N ) to compute with VE. Since the ECFs correspond to complete graphs (see Eq. (2.9)), they do not benefit from VE. Similarly, VE cannot speed up the computation of ECFGs, since the ECFGs do not have a factorable summand.
Finding the optimal elimination ordering and computing the treewidth for a graph G are both NP-hard. In practice, heuristics are used to decide on a pretty-good elimination ordering (which for the small graphs we consider here is often optimal) and to approximate the treewidth. In principle, these orderings need only be computed once for a fixed set of graphs of interest. Similarly, many algebraic structures reappear when computing a set of EFPs for the same set of particles, making dynamic programming a viable technique for further improving the computational complexity of the method. Table 4a shows the number of EFPs listed by degree d and VE complexity χ (with respect to the heuristics used in our implementation), and Table 4b further breaks up the EFP counts by N . Fig. 1 shows the time to compute the average d ≤ 7 EFP as a function of multiplicity M for different VE complexity χ. Finally, we note that though VE often provides a significant speedup over the naive algorithm, there may be even faster ways of computing the EFPs. 8

Linear regression with jet observables
Regression, classification, and generation are three dominant machine learning paradigms. Machine learning applications in collider physics have been largely focused on classification (e.g. jet tagging) [65][66][67][68][69][70][71][72][73] with recent developments in regression [113] and generation [114,115]. For a more complete review of modern machine learning techniques in jet substructure, see Ref. [48]. The lack of established regression problems in jet physics is due in part to the difficulty of theoretically probing multivariate combinations as well as the challenges associated with extracting physics information from trained regressions models.
In this section, we show that the linearity of the energy flow basis mitigates many of these problems, providing a natural regression framework using simple linear models, probing the learned observable combinations, and gaining insight into the physics of the target observables. Since regression requires training samples, we observe how the regression performance compares on jets with three characteristic phase-space configurations: one-prong QCD jets, two-prong boosted W jets, and three-prong boosted top jets. We use linear regression to demonstrate convergence of the energy flow basis on IRC-safe observables, while illustrating their less-performant behavior for non-IRC-safe observables.

Linear models with the energy flow basis
Linear models assume a linear relationship between the input and target variables, making them the natural choice for (machine) learning with the energy flow basis for both regression and classification. A linear model M with EFPs as the inputs is defined by a finite set G of multigraphs and numerical coefficients w = {w G } G∈G : The fundamental relationship between EFPs, linear models, and IRC-safe observables is highlighted by comparing Eq. partition the set G into subsets G N of graphs with N vertices. The sum in Eq. (5.1) can be broken into two sums, one over N and the other over all graphs in G N . The linear energy structure of the EFPs in Eq. (1.1) allows for the second sum to be pushed inside the product of energies onto the angular weighting function: where N max is the maximum number of vertices of any graph in G. The quantity in parentheses in Eq. (5.2) may be though of as a single angular weighting function. The linear model written in this way reveals itself to be a sum of C-correlators (similar to Eq. (3.27)), one for each N , where the linear coefficients within each G N parameterize the angular weighting function f N of that C-correlator. This arrangement of the learned parameters of the linear model into N max C-correlators contrasts sharply with the lack of a physical organization of parameters in nonlinear methods such as neural networks or boosted decision trees.

Event generation and EFP computation
For the studies in this section and in Sec. 6, we generate events using Pythia 8.226 [116][117][118] with the default tunings and shower parameters at √ s = 13 TeV. Hadronization and multiple parton interactions (i.e. underlying event) are included, and a 400 GeV parton-level p T cut is applied. For quark/gluon distribution, quark (signal) jets are generated through pp → qZ(→ νν), and gluon (background) jets through pp → gZ(→ νν), where only lightquarks (uds) appear in the quark sample. For W and top tagging, signal jets are generated through pp → W + W − (→ hadrons) and pp → tt(→ hadrons), respectively. For both W and top events, the background consists of QCD dijets. Final state, non-neutrino particles were made massless, keeping y, φ, and p T fixed, 9 and then were clustered with FastJet 3.3.0 [119] using the anti-k T algorithm [120] with a jet radius of R = 0.4 for quark/gluon samples and R = 0.8 for W and top samples (and the relevant dijet background). The hardest jet with rapidity |y| < 1.7 and 500 GeV ≤ p T ≤ 550 GeV was kept. For each type of sample, 200k jets were generated. For the regression models, 75% were used for training and 25% for testing.
For these events, all EFPs up to degree d ≤ 7 were computed in Python using our N -subjettiness ratio Sudakov safe, safe for two-prong kinematics τ (β=1) 32 N -subjettiness ratio Sudakov safe, safe for three-prong kinematics M Particle multiplicity IRC unsafe Table 5: The six substructure observables used as targets for linear regression, listed with relevant properties. The first three are IRC safe, the next two are Sudakov safe in general (and IRC safe in the noted regions of phase space), and particle multiplicity is IRC unsafe. The Les Houches Angularity [124,125] is calculated with respect to the p T -weighted centroid axis in Eq. (2.12), and the N -subjettiness observables [54,55] are calculated using k T axes.
squares regression to find a suitable set of coefficients w * : where O(J) is the value of the observable and EFP G (J) the value of the EFP given by multigraph G on jet J. There are possible modifications to Eq. (5.3) which introduce penalties proportional to w 1 or w 2 2 where · 1 is the 1-norm and · 2 is the 2-norm. The first of these choices, referred to as lasso regression [121], may be particularly interesting because of the variable selection behavior of this model, which would aid in selecting the most important EFPs to approximate a particular observable. We leave such investigation to future work. See Ref. [122] for a review of linear models for regression.
We use the LinearRegression class of the scikit-learn python module [123] to implement Eq. (5.3) with no regularization on the samples described in Sec. 5.2. In general, the smallest possible regularization which prevents overfitting (if any) should be used. Because of the linear nature of linear regression and the analytic tractability of Eq. (5.3), the w * corresponding to the global minimum of the squared loss function can be found efficiently using convex optimization techniques. Such techniques include closed-form solutions or convergent iterative methods.
As targets for the regression, we consider the six jet observables in Table 5 to highlight some interesting test cases. As our measure of the success of the regression, we use a variant of the correlation coefficient between the true and predicted observables that is less sensitive to outliers than the unadulterated correlation coefficient. When evaluating the trained linear model on the test set, only test samples with predicted values within the 5 th and 95 th percentiles of the predictions are included. In the contexts considered in this paper, narrowing this percentile range lowers the correlation coefficient and widening the range out toward all of the test set increases the correlation coefficient. The qualitative nature of the results are Mult. (a) Mult. Mult.
(c) Figure 2: Correlation coefficients between true and predicted values for the jet observables in Table 5 insensitive to the specific choice of percentile cutoffs. We perform this regression using EFPs of degree up to d for d from 2 to 7 on all three jet samples, with the results shown in Fig. 2.
Histograms of the true and predicted distributions for a subset of these observables are shown in Fig. 3 for the three types of jets considered here.
Since the learned coefficients depend on the training set, in principle different linear combinations may be learned to approximate the substructure observables in different jet contexts. This stands in contrast to the analysis in Sec. 2.3, where many jet substructure observables were identified as exact linear combinations of EFPs, independent of the choice of inputs. The IRC-safe observables-mass, Les Houches angularity, and 2-subjettiness-are all learned with a correlation coefficient above 0.98 in all three cases by d = 7.
The IRC-unsafe multiplicity sets the scale of performance for observables that are not IRC safe. For the N -subjettiness ratios, the regression performance depends on whether the observable is IRC safe or only Sudakov safe [82,83]. The ratio τ 21 is only IRC safe for regions of phase space with two prongs or more (i.e. the W and top samples), and τ 32 is only IRC safe for three prongs or more (i.e. just the top sample). In cases where the N -subjettiness ratio is IRC safe, the regression performs similarly to the other IRC-safe observables, whereas for the cases where the N -subjettiness ratio is only Sudakov safe, the regression performance is poor (even worse than for multiplicity). It is satisfying to see the expected behavior between

Cross-section (normalized)
Top Jets Pythia 8.226, √ s = 13 TeV Cross-section (normalized)  the safety of the observable and the quality of the regression with EFPs. As a final cross check of the regression, we can use the linear model in Eq. (5.1) to confirm some of the analytic results of Sec. 2.3. Specifically, we perform a linear regression with the target observable being the even-α angularities with respect to the p T -weighted centroid axis. These were shown to be non-trivial linear combinations of EFPs in Sec. 2.3.3. Regressing onto λ (2) , λ (4) , and λ (6) , the linear model learned the observables with effectively 100% accuracy and the learned linear combination was exactly that predicted by Eqs. (2.14), (2.15), and (2.16), up to a precision of 10 −6 . Fig. 4 shows the learned linear combinations of EFPs for the W jet sample.

Linear jet tagging
We now apply the energy flow basis to three representative jet tagging problems-lightquark/gluon classification, W tagging, and top tagging-providing a broad set of contexts in which to study the EFPs. Since the energy flow basis is linear, we can (in principle) access the optimal IRC-safe observable for jet tagging by training a linear classifier for this problem. As mentioned in Sec. 5.3, one benefit of linear models, in addition to their inherent simplicity, is that they are typically convex problems which can be solved exactly or with gradient descent to a global minimum. See Ref. [122] for a review of linear models for classification.
A (binary) linear classifier learns a vector w * that defines a hyperplane orthogonal to the vector. A bias term, which can be related to the distance of this hyperplane from the origin, sets the location of the decision boundary, which is the hyperplane translated away from the origin. The decision function for a particular point in the input space is the normal distance to the decision boundary. In contrast with regression, where the target variable is usually continuous, classification predictions are classes, typically 0 or 1 for a binary classifier.
Different methods of determining the vector w * -such as logistic regression, support vector machines, or linear discriminant analysis-may learn different linear classifiers since the methods optimize different loss functions. For our linear classifier, we use Fisher's linear discriminant [126] provided by the LinearDiscriminantAnalysis class of the scikit-learn python module [123]. The choice of logistic regression was also explored, and jet tagging performance was found to be insensitive to which type of linear classifier was used.
The details of the event generation and EFP computation are the same as in Sec. 5.2. To avoid a proliferation of plots, we present only the case of W tagging in the text and refer to App. B for the corresponding results for quark/gluon discrimination and top tagging. Qualitatively similar results are obtained on all three tagging problems, with the conclusion that linear classification with EFPs yields comparable classification performance to other powerful machine learning techniques. This is good evidence that the EFPs provides a suitable linear expansion of generic IRC-safe information relevant for practical jet substructure applications.

Alternative jet representations
In order to benchmark the EFPs, we compare them to two alternative jet tagging paradigms: • The jet images approach [50] treats calorimeter deposits as pixels and the jet as an image, often using convolutional neural networks to determine a classifier. Jet images have been applied successfully to the same tagging problems considered here: quark/gluon discrimination [65], W tagging [63], and top tagging [66,68].
• The N -subjettiness basis was introduced for W tagging in Ref. [57] and later applied to tagging non-QCD jets [73]. We use the same choice of N -subjettiness basis elements as Ref. [57], namely: 1 , τ 2 , · · · , τ N −1 }, (6.1) with 3N − 4 elements needed to probe N -body phase space. These are then used as inputs to a DNN.
Both of these learning paradigms are expected to perform well, and we will see below that this is the case. As a strawman, we also consider linear classification with the N -subjettiness basis elements in Eq. (6.1), which is not expected to yield good performance. For completeness, we also perform DNN classification with the energy flow basis. We now summarize the technical details of these alternative jet tagging approaches. For jet images, we create 33 × 33 jet images spanning 2R × 2R in the rapidity-azimuth plane. Motivated by Ref. [65], both single-channel "grayscale" jet images of the p T per pixel and two-channel "color" jet images consisting of the p T channel and particle multiplicity per pixel were used. The p T -channel of the jet image was normalized such that the sum of the pixels was one. Standardization was used to ensure that each pixel had zero mean and unit standard deviation by subtracting the training set mean and dividing by the training set standard deviation of each pixel in each channel. A jet image CNN architecture similar to that used in Ref. [65] was employed: three 36-filter convolutional layers with filter sizes of 8 × 8, 4 × 4, and 4 × 4, respectively, followed by a 128-unit dense layer and a 2-unit softmaxed output. A rectified linear unit (ReLU) activation [127] was applied to the output of each internal layer. Maxpooling of size 2 × 2 was performed after each convolutional layer with a stride length of 2. The dropout rate was taken to be 0.1 for all layers. He-uniform initialization [128] was used to initialize the model weights.
For the DNN (both for the N -subjettiness basis and for the EFPs), we use an architecture consisting of three dense layers of 100 units each connected to a 2-unit softmax output layer, with ReLU activation functions applied to the output of each internal layer. For the training of all networks, 300k samples were used for training, 50k for validation, and 50k for testing. Networks were trained using the Adam algorithm [129] using categorical cross-entropy as a loss function with a learning rate of 10 −3 and a batch size of 100 over a maximum of 50 epochs. Early stopping was employed, monitoring the validation loss, with a patience parameter of 5. The python deep learning library Keras [130] with the Theano backend [131] was used to instantiate and train all neural networks. Training of the CNNs was performed on Microsoft Azure using NVIDIA Tesla K80 GPUs and the NVIDIA CUDA framework. Neural network performance was checked to be mildly insensitive to these parameter choices, but these parameter choices were not tuned for optimality. As a general rule, the neural networks used here are employed to give a sense of scale for the performance attainable with jet images and the N -subjettiness basis using out-of-the-box techniques; improvements in classification accuracy may be possible for these methods with additional hyperparameter tuning.

W tagging results and comparisons
We present results for the W tagging study here, with the other two classification problems discussed in App. B. The performance of a binary classifier is encapsulated by the background mistag rate ε b at a given signal efficiency ε s . For all of the figures below, we plot inverse receiver operator characteristic (ROC) curves, 1/ε b as a function of ε s , on a semi-log scale; a higher ROC curve indicates a better classifier. The corresponding standard ROC (ε b vs. ε s ) and significance improvement (ε s / √ ε b vs. ε s ) curves are available in the source files of the arXiv preprint as additional pages in the figure.
We begin by studying the performance for different choices of angular exponent β in the default hadronic measure from Eq. (2.4). Fig. 5 shows ROC curves for the choices of β = 0.2, β = 0.5, and β = 1, using all EFPs with d ≤ 7. The differences in performance are mild, but β = 0.5 slightly improves the ROC curves for W tagging, so we use β = 0.5 for the remainder of our studies. The choice of β = 0.5 was also found to be optimal for the cases of quark/gluon and top tagging discussed in App. B. . Though the improvement is mild, β = 0.5 shows the best overall performance. See Fig. 10 for the corresponding quark/gluon discrimination and top tagging results, where β = 0.5 is also the best choice by a slight margin.
Next, in Fig. 6a, we test the linear spanning nature of the EFPs by comparing the ROC curves of the linear and nonlinear models trained on EFPs up to different d. With linear regression, there is a large jump in performance in going from d ≤ 3 (13 EFPs) to d ≤ 6 (314 EFPs), and a slight increase in performance from d ≤ 6 to d ≤ 7 (1000 EFPs), indicating good convergence to the optimal IRC-safe observable for W jet discrimination. To avoid cluttering the plot, d ≤ 4 and d ≤ 5 are not shown in Fig. 6a, but their ROC curves fall between those of d ≤ 3 and d ≤ 6, highlighting that the higher d EFPs carry essential information for linear classification. By contrast, using nonlinear classification with a DNN, the EFPs performance with d ≤ 3 is already very good, since functions of the low d EFPs can be combined in a nonlinear fashion to construct information contained in higher d composite EFPs. The linear and nonlinear performance is similar with the d ≤ 7 EFPs for operating points of ε s 0.5, though the nonlinear DNN outperforms the linear classifier in the low signal efficiency region. It should be noted that the linear classifier is not trained specifically for the low signal efficiency region and it may be possible that choosing a different hyperplane could boost performance there. We leave to future work a more detailed investigation of optimizing the choice of linear classifier.
The performance of the N -subjettiness basis with both linear and nonlinear classifiers is shown in Fig. 6b. For both linear classification and the DNN, performance appears to saturate with the 6-body (14 τ N s) phase space, with not much gained in going to 10-body (26 τ N s) phase space, except for a small increase in the low signal efficiency region for the DNN; we confirmed up to 30-body (86 τ N s) phase space that no change in ROC curves was In both cases, we show the observables combined linearly (solid) and with a DNN (dashed). The linear combinations of EFPs can be seen to approach the nonlinear combinations, particularly for higher signal efficiencies, while the linear combinations of the N -subjettiness basis can be seen to saturate well below the nonlinear combinations as the number of observables is increased. See Fig. 11 for the corresponding quark/gluon discrimination and top tagging results.
observed compared to 10-body phase space. As expected, there is relativity poor performance with linear classification even as the dimension of phase space is increased. Classification with a DNN, though, shows an immense increase in performance over linear classification, as expected since the N -subjettiness basis is expected to nonlinearly capture all of the relevant IRC-safe kinematic information [57]. This illustrates that nonlinear combinations of the Nsubjettiness observables are crucial for extracting the full physics information.
The corresponding quark/gluon and top tagging plots in Fig. 11 effectively tell the same story as Fig. 6, robustly demonstrating the linear spanning nature of the EFPs used for classification across a wide variety of kinematic configurations. As a side note, in App. B there are sometimes cases where a linear combination of EFPs yields improved performance compared to a DNN on the same inputs, particularly at medium to high signal efficiencies. Since even a one-node DNN should theoretically be able to learn the linear combination of EFPs learned by the linear classifier, regimes where the linear classifier outperforms the DNN demonstrate the inherent difficulty of training complex multivariate models.
In Fig. 7 we directly compare the EFP classification power against the N -subjettiness basis and the 1-channel ("grayscale") and 2-channel ("color") CNNs. For operating points CNNs. The most evident gap is between the linearly-combined N -subjettiness basis and the remaining curves, which achieve similar classification performance for medium and high signal efficiencies. See Fig. 12 for the corresponding quark/gluon discrimination and top tagging results.
with ε s 0.5, all methods except the linear N -subjettiness classifier show essentially the same performance. The worse performance of the linear EFP classifier at low signal efficiencies is expected, since the Fisher linear discriminant is not optimized for that regime. Overall, it is remarkable that similar classification performance can be achieved with these three very different learning paradigms, especially considering that the DNNs and grayscale CNN implicitly, and the color CNN explicitly, have access to non-IRC-safe information (including Sudakov-safe combinations of the IRC-safe inputs). This agreement gives evidence that the tagging techniques have approached a global bound on the maximum possible discrimination power achievable, at least in the context of parton shower simulations.
Once again, the analogous quark/gluon and top tagging plots, shown in Fig. 12, show very similar behavior to the W tagging case in Fig. 7. Linear classification with the EFPs performs similarly to the DNNs and CNNs, tending to slightly outperform at high signal efficiencies and underperform at low signal efficiencies. Ultimately, the choice of tagging method comes down to a trade off between the simplicity of the inputs and the simplicity of the training method, with the EFPs presently requiring more inputs than the N -subjettiness basis but with the benefit of using a linear model. In the future, we plan to study ways of reducing the size of the EFP basis by exploiting linear redundancies among the EFPs and using powerful linear methods to automatically select the most important observables for a given task. It is clear that important information is contained in the higher N -particle correlators, which can be included because the algorithm in Sec. 4 evades the naive O(M N ) scaling. Interestingly, the discrimination power appears to be almost saturated by the graphs computable in O(M 2 ). See Fig. 13 for the corresponding quark/gluon discrimination and top tagging results.

Opening the energy flow box
As argued in Eq. (5.2), one of the main advantages of linear methods with the energy flow basis is that one can attempt to "open the box" and directly explore what features have been learned. We leave to future work a full exploration of this possibility, but here we attempt to probe which topological structures within the EFP basis carry the discrimination power for the different tagging problems. Since we have shown that the EFPs with d ≤ 7 have sufficient discrimination power to qualitatively match the performance of alternative tagging methods, we will restrict to this set of observables. In Fig. 8a, we vary the maximum number of vertices in the EFP graphs, where the maximum N is 14 for d ≤ 7, finding that the performance roughly saturates at N = 9, highlighting the importance of higher N EFPs. The algorithmic advances described in Sec. 4 allow for the efficient computation of these higher N EFPs, which have complexities as intractable as O(M 9 ) with the naive algorithm. Additionally, note that nearly every EFP (all except those corresponding to complete graphs) has a non-vanishing angular weighting function, which is a new feature compared to the ECFs and ECFGs (see Sec. 2.4). In Fig. 8b, we vary the maximum computational complexity χ of the EFP graphs, where the maximum χ is 4 for d ≤ 7. Remarkably, the full performance of linear classification with the d ≤ 7 EFPs can be obtained with merely those observables calculable in O(M 2 ) with VE. Thus, fortuitously for the purposes of jet tagging, it seems that restricting to the most efficiently computable EFPs (in the VE paradigm) is sufficient for extracting the near-optimal IRC-safe observable for jet discrimination. Similar results hold for quark/gluon discrimination and top tagging, shown in Fig. 13.

Conclusions
In this paper, we have introduced the EFPs, which linearly span the space of IRC-safe observables. 10 The core argument, presented in Sec. 3, is that one can systematically expand an arbitrary IRC-safe observable in terms of energies and angles and read off the unique resulting analytic structures. This expansion yields a new way to understand the importance of Ccorrelators [74,[87][88][89] for IRC safety, and it enables a powerful graph-theoretic representation of the various angular structures. The multigraph correspondence makes manifest a more efficient algorithm than the naive O(M N ) one for computing EFPs, overcoming a primary obstacle to exploring higher-N multiparticle correlators for jet substructure.
To demonstrate the power of the energy flow basis, we performed a variety of representative regression and classification tasks for jet substructure. Crucially, linear methods were sufficient to achieve good performance with the EFPs. As a not-quite apples-to-apples comparison in three representative jet tagging applications, linear classification with 1000 EFPs achieved comparable performance to a CNN acting on a jet image with 33 × 33 = 1089 pixels. Because of the wide variety of linear learning methods available [122], we expect that the EFPs will be a useful starting point to explore more applications in jet substructure and potentially elsewhere in collider physics.
There are many possible refinements and extensions to the energy flow basis. In this paper, we truncated the EFPs at a fixed maximum degree d; alternatively, one could truncate the prime EFPs at a fixed d and compute all composite EFPs up to a specified cutoff. Since the EFPs yield an overcomplete basis, it could be valuable to cull the list of required multigraphs. A similar problem of overcompleteness was solved for kinematic polynomial rings in Ref. [110], and that strategy may be relevant for EFPs with a suitable choice of measure. In the other direction, it may be valuable to make the energy flow basis even more redundant by including EFPs with multiple measures. With a vastly overcomplete basis, one could use techniques like lasso regression [121] to zero out unnecessary terms. While we have restricted our attention to IRC-safe observables, it would be straightforward to relax the restriction to just infrared safety. In particular, the set of IR-safe (but C-unsafe) functions in Eq. (3.15) can be expanded into multigraphs that have an extra integer decoration on each vertex to indicate the energy scaling. Finally, the EFPs are based on an expansion in pairwise angles, but one could explore alternative angular expansions in terms of single particle directions or multiparticle factors.
To gain some perspective, we find it useful to discuss the EFPs in the broader context of machine learning for jet substructure. Over the past few years, there has been a surge of interest in using powerful tools from machine learning to learn useful observables from low-level or high-level representations of a jet [50,[57][58][59][60][61][62][63][65][66][67][68][69][70][71][72][73]. The power of these machine learning methods is formidable, and techniques like neural networks and boosted decision trees have shifted the focus away from single-or few-variable jet substructure taggers to multivariate methods. On the other hand, multivariate methods can sometimes obscure the specific physics information that the model learns, leading to recent efforts to "open the box" of machine learning tools [63,72,113,132,133]. Even with an open box, though, theoretical calculations of multivariate distributions are impractical (if not impossible). Furthermore, training multivariate models is often difficult, requiring large datasets, hyperparameter tuning, and preprocessing of the data.
The EFPs represent both a continuation of and a break from these machine learning trends. The EFPs continue the trend from multivariate to hypervariate representations for jet information, with O(100) elements needed for effective regression and classification. On the other hand, the linear-spanning nature of the EFPs make it feasible to move away from "black box" nonlinear algorithms and return to simpler linear methods (explored previously for jet substructure in e.g. [50,55]) without loss of generality. Armed with the energy flow basis, there is a suite of powerful tools and ideas from linear regression and classification which can now be fully utilized for jet substructure applications, with simpler training processes compared to DNNs and stronger guarantees of optimal training convergence. Multivariate methods would ideally be trained directly on data to avoid relying on imperfect simulations, as discussed in Ref. [134]. The energy flow basis may be compelling for recent data-driven learning approaches [134][135][136] due to its completeness, the simplicity of linear learning algorithms, and a potentially lessened requirement on the size of training samples.
As with any jet observable, the impact of non-perturbative effects on the EFPs is important to understand. Even with IRC safety, hadronization modifies the distributions predicted by pQCD and therefore complicates first-principles calculations. It would be interesting to see if the shape function formalism [137,138] could be used to predict the impact of nonperturbative contributions to EFP distributions. Alternatively, one standard tool that is used to mitigate non-perturbative effects is jet grooming [5,[105][106][107][108][109], which also simplifies first-principles calculations and allows for "quark" and "gluon" jets to be theoretically welldefined [139]. We leave a detailed study of the effects of non-perturbative contributions and jet grooming on EFPs to future work.
Eventually, one hopes that the EFPs will be amenable to precision theoretical calculations of jet substructure (see e.g. Refs. [108,[139][140][141][142][143][144][145][146][147][148]). This is by no means obvious, since generic EFPs have different power-counting structures from the ECFs [51] or ECFGs [52]. That said, phrasing jet substructure entirely in the language of energy flow observables and energy correlations may provide interesting new theoretical avenues to probe QCD, realizing the C-correlator vision of Refs. [74,[87][88][89]. Most IRC-safe jet observables rely on particle-level definitions and calculations, but there has been theoretical interest in directly analyzing the correlations of energy flow in specific angular directions [149][150][151], particularly in the context of conformal field theory [152][153][154][155][156]. The energy flow basis is a step towards connecting the particle-level and energy-correlation pictures, and one could even imagine that the energy flow logic could be applied directly at the path integral level. Ultimately, the structure of the EFPs is a direct consequence of IRC safety, resulting in a practical tool for jet substructure at colliders as well as a new way of thinking about the space of observables more generally.

A Energy flow and the stress-energy tensor
In this appendix, we review the connection between the energy flow of an event, as described by the stress-energy tensor, to multiparticle energy correlators [74,[87][88][89].
Consider an idealized hadronic calorimeter cell at positionn in pseudorapidity-azimuth (η, φ)-space, spanning a patch of size dη dφ. The energy flow operator E T (n) corresponding to the total transverse momentum density flowing into the calorimeter cell can be written in terms of the stress-energy tensor T µν [87,98,[157][158][159] as: with its action on a state |X of M massless particles given by: Next, consider N calorimeter cells at positions (n 1 , · · · ,n N ). An illustration of an example calorimeter cell configuration is shown in Fig. 9. For an event X, multiply together the measured energy deposits in each of these N cells. The corresponding observable is then the energy N -point correlator as defined in Refs. [149,150]: E T (n 1 ) · · · E T (n N ) |X =

B Quark/gluon discrimination and top tagging results
In this appendix, we supplement the W tagging results of Sec. 6 with the corresponding results for quark/gluon discrimination and top tagging. The details of the event generation are given in Sec. 5.2. Similar to the W tagging case, the β = 0.5 choice has the best performance (marginally) for both tagging problems.
We compare the EFP linear classification performance with β = 0.2, β = 0.5, and β = 1 in Fig. 10. Consistent with the W tagging case in Fig. 5, we find that the optimal performance is achieved with β = 0.5. We therefore use β = 0.5 for the remainder of this study.
In Fig. 11 we compare the linear and nonlinear performances of the energy flow basis and the N -subjettiness basis. There is a clear gap between the linear and nonlinear N -subjettiness classifiers, whereas no such gap exists for the EFPs. Interestingly, the linear classifier of EFPs tends to outperform the DNN at medium and high signal efficiencies, indicating the difficulty of training high-dimensional neural networks. This behavior was not seen in Fig. 6, most likely because the achievable efficiency is overall higher in the W tagging case.
A summary of the six tagging methods is shown in Fig. 12, comparing linear and nonlinear combinations of the energy flow basis and N -subjettiness basis to grayscale and color jet images. As in Fig. 7, linear combinations of EFP tend to match or outperform the other methods, especially at high signal efficiencies.
Finally, we truncate the set of EFPs with d ≤ 7 by the number of vertices N and by the VE computational complexity χ in Fig. 13. As in Fig. 8, the higher N -particle correlators contribute to the classification performance up to at least N = 7, whereas the higher-complexity EFPs beyond χ = 2 do not significantly contribute to the classification performance.  Figure 11: Same as Fig. 6, but for quark/gluon discrimination (top) and top tagging (bottom). As in the W tagging case, the linear combinations of EFPs can be seen to approach (or even exceed) the nonlinear combinations, particularly for higher signal efficiencies.