How much information is in a jet?

Machine learning techniques are increasingly being applied toward data analyses at the Large Hadron Collider, especially with applications for discrimination of jets with different originating particles. Previous studies of the power of machine learning to jet physics have typically employed image recognition, natural language processing, or other algorithms that have been extensively developed in computer science. While these studies have demonstrated impressive discrimination power, often exceeding that of widely-used observables, they have been formulated in a non-constructive manner and it is not clear what additional information the machines are learning. In this paper, we study machine learning for jet physics constructively, expressing all of the information in a jet onto sets of observables that completely and minimally span N-body phase space. For concreteness, we study the application of machine learning for discrimination of boosted, hadronic decays of Z bosons from jets initiated by QCD processes. Our results demonstrate that the information in a jet that is useful for discrimination power of QCD jets from Z bosons is saturated by only considering observables that are sensitive to 4-body (8 dimensional) phase space.


Introduction
The problem of discrimination and identification of high energy jet-like objects observed at the Large Hadron Collider (LHC) is fundamental for both Standard Model physics and searches as the lower bound on new physics mass scales increase.Heavy particles of the Standard Model, like the W , Z, and H bosons or the top quark, can be produced with large Lorentz boosts and dominantly decay through hadrons.They will therefore appear collimated in the detector and similar to that of jets initiated by light QCD partons.The past several years have seen a huge number of observables and techniques devoted to jet identification [1][2][3][4], and many have become standard tools in the ATLAS and CMS experiments.
The list of observables for jet discrimination is a bit dizzying, and in many cases there is no organizing principle for which observables work well in what situations. 1 Motivated by the large number of variables that define the structure of a jet, several groups have recently applied machine learning methods to the problem of jet identification [9][10][11][12][13][14][15][16][17][18][19][20][21].Rather than developing clever observables that identify certain physics aspects of the jets, the idea of the machine learning approach is to have a computer construct an approximation to the optimal classifier that discriminates signal from background.For example, Ref. [11] interpreted the jet detected by the calorimetry as an image, with the pixels corresponding to the calorimeter cells and the "color" of the pixel corresponding to the deposited transverse momentum in the cell.These techniques have outperformed standard jet discrimination observables and show that there is additional information in jets to exploit.However, this comes with a significant cost.Machine learning methods applied to jet physics typically have hundreds of input variables with thousands of correlations between them.Thus, in one sense this problem seems ideally suited for machine learning, but it also lacks the immediate physical interpretation and intuition that individual observables have.Previous studies have shown that the computer is learning information about what discriminates jets of different origins, but it has not been clearly demonstrated what information standard observables are missing.Along these same lines, the improvement of discrimination performance of machine learning over standard observables is relatively small, suggesting that standard observables capture the vast majority of useful information in jets.
In this paper, we approach machine learning for jet discrimination from a different perspective.We construct an observable basis that completely and minimally spans the phase space for the substructure of a jet. 2 For a jet with M particles, the phase space is 3M − 4 dimensional, and so we identify 3M − 4 infrared and collinear (IRC) safe jet substructure observables that span the phase space. 3These basis observables are then passed to a machine learning algorithm for identification of relevant discrimination information. 4A general jet will have an arbitrary number of particles in it, and so we will observe how the discrimination power depends on the dimension of phase space that we assume.That is, we will assume that the jet has 2 particles, 3 particles, 4 particles, etc., as defined by the set of basis observables and observe how the discrimination power improves.This method is constructive in the following sense.With some number of assumed particles in the jet, the discrimination power will saturate, which then immediately tells us what reduced set of observables are necessary to effectively extract all information that is useful for discrimination.This approach has the additional advantage that the identified observables can be calculated theoretically from first principles, without relying on parton shower modeling.
As it is a widely-studied problem in jet substructure, we will apply this approach to the discrimination of boosted, hadronically decaying Z bosons from jets initiated by light quarks or gluons.The results of our study are shown in Fig. 1.Here, we plot the simulated signal (Z boson) efficiency versus the background (QCD jet) rejection rate as determined by a deep neural network, for observables that are sensitive to 2-, 3-, 4-, 5-and 6-body phase space.To identify the phase space variables, we choose to measure the jet mass and the Nsubjettiness observables [24][25][26], but this choice is not special.This plot demonstrates that observables sensitive to 4-body phase space saturate the discrimination power.4-body phase space is only 8 dimensional, suggesting that very few observables are necessary to identify all interesting structure of these jets.We anticipate that this approach can be applied to other discrimination problems in jet substructure, as well, and greatly reduce the dimensionality of the variable space that is being studied.
The outline of this paper is as follows.In Sec. 2, we define the observable basis that is used to identify all variables of M -body phase space.As mentioned above, we choose to use the N -subjettiness observables.In this section, we also prove that the set of observables is complete and minimal.In Sec. 3, we discuss our event simulation and machine learning implementation.We present the results of our study, and compare discrimination power from the M -body phase space observables to standard observables as a benchmark.We conclude in Sec. 4. Additional details are in the appendices.

Observable Basis
In this section, we specify the basis of IRC safe observables that we use to identify structure in the jet.For simplicity, we will exclusively use the N -subjettiness observables [24][25][26], however this choice is not special.One could equivalently use the originally-defined N -point energy correlation functions [27], or their generalization to different angular dependence [28].Our choice of using the N -subjettiness observbles in this analysis is mostly practical: the evaluation time for the N -subjettiness observables is significantly less than for the energy correlation functions.We also emphasize that the particular choice of observables below is to just ensure that they actually span the phase space for emissions in a jet.There may be a more optimal choice of a basis of observables, but optimization of the basis is beyond this paper.
The N -subjettiness observable τ N is a measure of the radiation about N axes in the jet, specified by an angular exponent β > 0: In this expression, p T J is the transverse momentum of the jet of interest, p T i is the transverse momentum of particle i in the jet, and R Ki , for K = 1, 2, . . ., N , is the angle in pseudorapidity z Figure 2: Illustration of the momentum fraction and pairwise angle variables that describe 2-body (right) and 3-body (left) phase space.
and azimuth between particle i and axis K in the jet.There are numerous possible choices for the N axes in the jet; in our numerical implementation, we choose to define them according to the exclusive k T algorithm [29,30] with standard E-scheme recombination [31].Note that τ N = 0 for a jet with N or fewer particles in it.
To identify structure in the jet, we need to measure an appropriate number of different N -subjettiness observables.This requires an organizing principle to ensure that the basis of observables is complete and minimal.Our approach to ensuring this is to identify the set of N -subjettiness observables that can completely specify the coordinates of M -body phase space.Ensuring that the set is minimal is then straightforward: as M -body phase space is 3M − 4 dimensional, we only measure 3M − 4 N -subjettiness observables.A jet also has an overall energy scale.To ensure sensitivity to this energy scale, we will also measure the jet mass, m J .
We will describe how to do this for low dimensional phase space, and then generalize to arbitrary M -body phase space.We will work in the limit where the jet is narrow and so all particles in the jet can be considered as relatively collinear.This simplifies the expressions for the values of the N -subjettiness observables to illustrate their content, but does not affect their ability to span the phase space variables.
• 2-Body Phase Space: 2-body phase space is 3 • 2 − 4 = 2 dimensional.For a jet with two particles, the phase space can be completely specified by the transverse momentum fraction z of one of the particles: and the splitting angle θ between the particles.This configuration is shown in Fig. 2a.To uniquely identify the z and θ of this jet, we can measure two 1-subjettiness observables, defined by different angular exponents α = β.For concreteness, we will measure τ To determine the measured values of the 1-subjettiness observables, we need to determine the angle between the individual particles of the jet and the axis.Because E-scheme recombination conserves momentum, the angles between the particles 1 and 2 and the axis are: 3) It then follows that the values of the 1-subjettiness observables are: (2.4) These expressions can be inverted to find z and θ individually: . (2.5) Note the symmetry for z ↔ 1 − z: this is to be expected because we have not assumed an ordering of the transverse momenta of particles 1 and 2.
• 3-Body Phase Space: 3-body phase space is 3 • 3 − 4 = 5 dimensional, and so to completely determine the configuration of a jet with three particles, we need to measure 5 N -subjettiness observables.The 5 phase space variables can be defined to be the 3 pairwise angles between the particles i and j in the jet: θ 12 , θ 13 , and θ 23 , and two of the transverse momentum fractions, say, z 1 and z 2 .We define the momentum fractions as: This configuration is shown in Fig. 2b.To determine the phase space variables, we will measure a collection of 1-and 2-subjettiness observables.
Our choice for which collection of 1-and 2-subjettiness observables is the following.We will measure three 1-subjettiness observables τ (0.5) 1 , τ 1 , and τ 1 and two 2-subjettiness observables τ (1) 2 and τ (2) 2 .To motivate this collection of observables, note that one of the axes for measuring 2-subjettiness necessarily lies along the direction of a particle.Therefore, measuring 2-subjettiness is only sensitive to one relative energy fraction and one angle between pairs of particles, as illustrated explicitly in the 2-body case in Eq. (2.4).Because 2-subjettiness is only sensitive to two phase space variables, we only measure two 2-subjettiness observables.
The axis for the 1-subjettiness observables, however, is necessarily displaced from the direction of any particle in the jet. 5 This is because the E-scheme recombination conserves momentum, and so this axis can only degenerate to the direction of a particle in the jet if another particle has 0 energy or is exactly collinear to another particle.Therefore, this collection of 5 N -subjettiness observables will generically span the full 3-body phase space.In App.A, we present the explicit expressions for the 1-and 2-subjettiness observables in terms of the phase space coordinates.
• M-Body Phase Space: For M -body phase space, we can define the coordinates of that phase space by M − 1 transverse momentum fractions z i , for i = 1, . . ., M − 1, and 2M − 3 pairwise angles θ ij between particles i and j.The remaining pairwise angles angles are then uniquely determined by the geometry of points in a plane. 6o determine all of these phase space variables, we extend the set of N -subjettinesses that were measured in the 2-and 3-body case.In this case, the 3M − 4 observables we measure are: 1 , τ 1 , τ 2 , τ 2 , . . ., τ M −1 . (2.8) Note that there are 3(M − 2) + 2 = 3M − 4 observables, and these will span the space of phase space variables for generic momenta configurations, when all particles have non-zero energy and are a finite angle from one another.
As we observed in the 3-body phase space case, for a collection of M particles, all but one of the axes for the measurement of (M − 1)-subjettiness lies along the direction of a particle.Therefore, we only measure two (M − 1)-subjettiness observables.Stepping back another clustering as relevant for (M − 2)-subjettiness, there are two possibilities: -Either M − 3 axes lie along the direction of M − 3 particles in the jet, and the three remaining particles are all clustered around the last axis.Then, the measurement of (M − 2)-subjettiness is sensitive to the phase space configuration of 3 particles in the jet.By measuring three (M − 2)-subjettinesses and two (M − 1)-subjettinesses, this then completely specifies the phase space configuration of those three particles.
-The other possibility is that M − 4 axes lie along particles in the jet, while there are two particles clustered around each of the two remaining axes.About each axis, you are sensitive to the phase space configuration of two particles, which corresponds to a total of 4 phase space variables.Additionally, you are sensitive to the relative contribution of the two pairs of particles to the total (M − 2)-subjettiness value.This configuration therefore is described by 5 phase space variables, and can be completely specified by the measurement of three (M − 2)-subjettinesses and two (M − 1)-subjettinesses.
This argument can be continued at further stages in the declustering.Each time an axis is removed, three new phase space variables are introduced.These can be completely specified by the measurement of three additional N -subjettiness observables.This then proves that the collection of N -subjettiness observables given above uniquely determines M -body phase space.
In the next section, we will study the information contained in this basis and use it to identify the features that are exploited in the discrimination of hadronically decaying Z boson jets from QCD jets.

Deep Learning Implementation
In this section, we describe our event simulation and implementation of machine learning to the N -subjettiness basis of observables introduced in the previous section.We generate pp → Z+ jet and pp → ZZ events at the 13 TeV LHC with MadGraph5 v2.5.4 [35].The Z boson in pp → Z+ jet events is decayed to neutrinos, while one Z boson in pp → ZZ events is decayed to neutrinos, while the other is decayed to quarks.These tree-level events are then showered in Pythia v8.223 [36,37] with default settings.In App.B, we will show results showered with Herwig v7.0.4 [38,39], however with one-tenth the number of events as the Pythia samples.Ignoring the neutrinos in the showered and hadronized events, we use FastJet v3.2.1 [40,41] to cluster the jets.On the clustered anti-k T [42] jets with radius R = 0.8 and minimum p T of 500 GeV, we then measure the basis of N -subjettiness observables using the code provided in FastJet contrib v1.026.We emphasize that observables are measured on the particles as a proof of concept; we do not apply any detector simulation.
The precise set of observables we measure on the jet that we use for discrimination are the following.We measure the jet mass and the collection of N -subjettiness observables sufficient to completely determine up through 6-body phase space.That is, we measure the collection of N -subjettiness observables defined with k T axes: 1 , τ 2 , τ 3 , τ . ( We will see that this collection of N -subjettiness observables is more than sufficient to describe all of the information useful for discrimination in the jet.Additionally, for comparison, we will measure a collection of standard observables that have been defined for discrimination of boosted, hadronic decays of Z bosons from jets initiated by QCD.We measure the N -subjettiness ratios τ 2,1 and τ 2,1 with one-pass winner-take-all (WTA) axes [32][33][34], and (generalized) energy correlation function ratios D (1) 2 and D (2) 2 [43] and N (1) 2 and N (2) 2 [28].The discrimination power of these observables will provide a benchmark for the information extracted in the machine learning of the collection of N -subjettiness observables.
All deep learning analysis was carried out on the NVIDIA DIGITS DevBox, with four GeForce GTX TitanX GPUs, built on the 28 nm Maxwell architecture.The specifications of the GPU are listed in Table 1 The dataset consisted of 7,868,000 events, split evenly between Z and QCD jets, stored in the compressed HDF5 format [44].The data was shuffled to ensure each data file had approximately a 1:1 ratio of both classes of events.No mass cuts were imposed on the events fed to networks with the expectation that they would automatically learn the optimal cuts on mass and the observable phase space.The training and validation data consisted of 6,144,000 events and 1,536,000 events respectively, while 188,000 events were set aside for predictions.All networks were trained using the highly modular Keras [45] deep learning libraries and tested using the relevant scikit-learn [46] packages.At the time of training, data from the relevant columns of N -subjettiness variables was fed to the neural networks with the aid of a custom-designed data generator, which creates an archive of pre-processed data files.
A single neural network architecture, consisted exclusively of five fully connected layers, was utilized for all analyses.The first two Dense layers consisted of 10000 and 1000 nodes, respectively, and were assigned a Dropout [47] regularization of 0.2, while next two Dense layers consisted of 100 nodes each, and were assigned a Dropout regularization of 0.1 to prevent overfitting on training data by making each node more 'independent'.The input layer and all hidden layers utilized the ReLU activation function [48], while the output layer, consisting of a single node, used a sigmoid activation.The network was compiled with the binary cross-entropy loss minimization function, using the Adam optimization [49].Models were trained with Keras' default EarlyStopping, with a patience threshold of 5, to negate possible over-fitting.For each set of observables, the typical number of training epochs was about 60.To further eliminate errors due to under-training or over-training of networks, the same architecture was trained 25 different times for each round of analysis.The model that trained best for a given variable basis was picked based on a metric of maximizing the area under the signal vs. background efficiency curve.
Before showing the results from the deep neural network, we first show plots of the collection of observables sensitive to two-prong structure measured on the jets.In Fig. 3, we plot the mass of the signal and background jets as defined by the simulation and jet finding from earlier.Applying a mass cut around the Z boson peak, we then measure the two-prong jet observables.In Fig. 4, we show the distributions of the N -subjettiness and energy correlation function ratios τ 2 , and N (β) 2 .As was extensively studied in the original works, these plots make clear the separation power that these observables enable.When we compare these observables to the discrimination power of the M -body phase space observables, we relax the hard mass cut, and let the machine learn the optimal mass and observable cuts dynamically.
In Fig. 1, we plot the signal jet (Z boson) efficiency versus the background jet (QCD) rejection rate for the collection of observables that minimally span M -body phase space, along with the jet mass.The observables that are passed to the neural network to specify M -body phase space are, explicitly:

, τ
(2) 4 6-body: τ Significant gains in discrimination power are observed by including observables sensitive to higher-body phase space, until enough observables to specify at least 4-body phase space are included.Including observables sensitive to 5-and 6-body phase space does not improve discrimination power, and therefore suggests that there is only an extremely limited amount of information in a jet useful for discrimination.
To see what information is necessary to accomplish the maximal discrimination power, in Fig. 5 we plot the signal efficiency versus background rejection rate for the collection of N -subjettiness and energy correlation function ratios plotted earlier.For comparison, we also include the corresponding curves for the jet mass, jet mass plus 3-body phase space observables, and jet mass plus 4-body phase space observables.The discrimination power of all of these observables are comparable, and this illustrates that they appear to capture most of the information contained in the 3-body phase space observables.Then, to match the maximum discrimination power (as represented by the jet mass plus 4-body phase space curve), one just needs to augment the measurement of jet mass and an N -subjettiness or energy correlation function ratio with observables that are sensitive to some 3-and 4-body phase space information.We leave the construction of these optimal 3-and 4-body phase space observables for this purpose to future work.
As a cross check that our minimal basis of N -subjettiness observables listed above does capture the maximal amount of information useful for discrimination, in Fig. 6, we compare our minimal basis to an overcomplete basis of observables.Here, we measure the mass and the    , τ 4 , τ 4 , τ .
From our arguments in Sec. 2, this is an overcomplete basis for 5-body phase space and therefore should not contain any additional information useful for discrimination.This is illustrated in Fig. 6 where we plot the discrimination power of this overcomplete basis as determined by the neural network described earlier.For comparison, we also show the discrimination power of the jet mass, the jet mass plus the 3-body observable basis, and the jet mass plus the 4body observable basis as determined by the neural network described earlier.As expected, no improvement of discrimination power is accomplished when more observables beyond the minimal set are included.The apparent slight decrease in discrimination power using the overcomplete basis is likely due to suboptimal training because of the large number of input observables.
In App.C, we present results for the signal vs. background efficiency as determined by a neural network with an additional hidden layer and the result of a boosted decision tree.These different classification networks demonstrate the same conclusion, that discrimination power saturates once enough observables are measured to resolve 4-body phase space.Additionally, these results show that the discrimination power of the overcomplete basis is just marginally better than that accomplished by the 4-body observable basis.This is consistent with our observation that 4-body phase space is essentially saturating all useful discrimination information.

Conclusions
Motivated by both the enormous data sets produced by the ATLAS and CMS experiments as well as their exceptional resolution, deep learning approaches to physics at the LHC are seeing an increased interest.This is especially true for jet physics, where the identification of the initiating particle of a jet is of fundamental importance.Previous applications of deep learning to jet physics applied techniques from computer science (like image recognition or natural language processing) and demonstrated impressive discrimination power.While the effectiveness of these methods is exceptional, they often lack a physical interpretation and are not presented in a constructive manner.The deep neural network is definitely identifying relevant structure in the jets, but what this is or if it is just a feature of the simulated data is not identified.Other recent efforts to reduce dependence on modeling have been studied in the context of weak supervision in Ref. [20].
In this paper, we have approached the problem of machine learning for jet physics in a physically clear, constructive manner.Instead of providing the machine with the energy deposits in calorimeter cells of the jet, we measure a basis of observables on the jet that completely and minimally spans M -body phase space.The effective resolution to the emissions in the jet is increased by increasing the number of observables measured on the jet.We demonstrated that the information useful for discrimination of a jet initiated by a boosted, hadronically-decaying Z boson from a jet initiated by a light QCD parton is saturated when enough observables are measured to span 4-body phase space.As 4-body phase space is only 8 dimensional, the amount of useful information in the jet is quite small.Additionally, this procedure is constructive in the sense that one can then form observables that are nonzero for a jet with four constituents to optimally discriminate signal from background.Similar constructions of observable bases for identifying particular phase space regions has been studied recently to resum non-global logarithms [50] and calculate multi-differential cross sections on jets [51,52].
Important for our analysis is that we use an IRC safe basis of observables that span the Mbody phase space, namely, the N -subjettiness observables.This is vital for constructibility, as in principle the cross section for the measurement of multiple N -subjettiness observables on a jet can be calculated in the perturbation theory of QCD. 7 It would be possible to additionally include information that is not IRC safe, for example, jet charge.Nevertheless, some non-IRC safe information is already included in this approach, like the jet constituent multiplicity.Additionally, included in the basis of M -body phase space observables are techniques like jet grooming that systematically remove radiation from the jet.This could enable a systematic study of how jet grooming methods affect the optimal discrimination observables, which has been addressed recently [28,54].
An advantage of our approach is that the jet data is preprocessed in a useful way at the same time that the basis observables are being measured.In applications of image processing to jets, one typically has to perform a series of transformations to ensure that different jets can be compared (see the discussion in, e.g., Ref. [11]).Jets must be rotated and rescaled appropriately so that (approximate) symmetries do not wash out the ability to discriminate.By instead measuring a collection of IRC safe observables like N -subjettiness on which we train, this preprocessing step is unnecessary, as the value of the observable is only sensitive to relative angles between particles and energy fractions.
From our results, it would also be interesting to study in detail the information for discrimination that is missed when using standard jet observables like N -subjettiness ratios τ or energy correlation function ratios 2 .The construction and justification of these particular observables exploited properties of QCD in the soft and/or collinear limits.These observables appear to be sensitive to most of the 3-body phase space information available for discrimination of boosted, hadronically decaying Z bosons from QCD jets.Observables that are sensitive to the remaining information for discrimination could be constructed by studying in detail the differences between how the decays of Z bosons and QCD fill 4-body phase space.We anticipate that these methods can also be used for discrimination of many different types of jets, including quark versus gluon and QCD versus top quark discrimination, as well as for multi-label classification of jets.The ultimate goal of such a program would be to design an anti-QCD tagger which could identify, using only a few observables that are sensitive to a small phase space, if a jet was likely initiated by a light QCD parton.This could open the door to new classes of observables that are sensitive to exotic configurations within jets.Therefore the values of the 2-subjettiness observables can be inverted to determine the relative momentum fraction and the pairwise angle θ 12 .

A.2 1-subjettiness
Now, we would like to calculate the value of 1-subjettiness on this configuration of particles.This requires determining the angle between each of the three particles and their direction of net momentum.To determine these angles, we consider the distribution of particles in the jet in a plane, as displayed in Fig. 7.We set particle 1 at the origin (0, 0) of the plane, particle 2 along the horizontal axis at (θ 12 , 0), and particle 3 at a generic point in the plane.The horizontal and vertical coordinates of particle 3 can be calculated to be: The values of the three 1-subjettiness observables are then: For τ (2) 1 , the expression simplifies significantly in terms of the momentum fractions and pairwise angles.

B Herwig Results
In this appendix, we present discrimination results for jets showered in Herwig 7.0.4[38,39] from events generated in MadGraph.The number of events showered in Herwig is about a factor of 10 fewer than that shown in the main body of the paper with Pythia, and so the neural network training is not as efficient.Nevertheless, the conclusions drawn from this reduced Herwig sample are the same as from Pythia; namely, that observables sensitive to 4-body phase space saturate discrimination power.
On the sample of jets from pp → ZZ, with one Z decaying hadronically, and pp → Z+ jet, we identify the same jets and measure the same collection of N -subjettiness observables as described in the main text.These observables are then passed through the deep neural network as described in Sec. 3, with 390,000 events each for pp → ZZ and pp → Z+ jet processes.These events were divided into 684,000 used for training, 76,000 for validation, and 20,000 for testing.
In Figs. 8 and 9, we show validation plots on the jets showered with Herwig, to be compared with Figs. 3 and 4 from Pythia.The jet mass distribution in Fig. 8 agrees qualitatively well with the corresponding plot from Pythia; though the Herwig sample seems to lack the small shoulder of the Z boson mass distribution present in Pythia.With a cut on the jet mass around the location of the Z boson peak, we then measure the same selection of one-versus two-prong discriminant variables in the Herwig sample.Again, good qualitative agreement is seem with Pythia, though the effects of finite statistics are much more evident.Fig. 10 shows the signal efficiency vs. background rejection rate for the collections of observables that resolve M -body phase space as determined by the neural network.Just like in the Pythia samples, the discrimination power is observed to increase as more N -subjettiness observables are included.The discrimination power is observed to saturate with observables that are sensitive to 3-or 4-body phase space.This difference from when the Pythia events saturated could be due to the smaller jet sample size, though it could also be due to differences between the Pythia and Herwig parton showers.It has been observed in numerous other studies [43,[55][56][57][58][59][60][61][62] that the discrimination performance differs significantly between jets showered in Pythia versus Herwig.The exact reason for the discrepancy is beyond this paper, but the existence of a saturation point also in Herwig demonstrates that there is only a very limited amount of information in the jet for discrimination.

C Results with Other Architectures
In this appendix, we show discrimination results for a neural network with one more hidden layer than the network studied in the body of the paper, as well as the output of a boosted

C.1 A Deeper Neural Network
The neural network used in this appendix is identical to the network studied in the body of the paper, except with the addition of another layer.Immediately after the input layer, we have included an additional Dense layer of 1000 nodes, with a Dropout regularization of 0.2.The typical number of training epochs of this new neural network was about 50 for each collection of observables.We show the discrimination performance as identified by this network in Figs.11 and 12.In Fig. 11, we show the discrimination power as more observables are added to resolve higherbody phase space.As with the other studies in this paper, we see that the discrimination power is saturated when 4-body phase space is resolved.Additionally in Fig. 12, we compare the discrimination power of 3-and 4-body phase space observables to the overcomplete 5-body phase space observables described in Sec. 3. The overcomplete basis of observables is observed to be only very slightly better than 4-body phase space basis, suggesting that essentially all useful discrimination information has been extracted.

C.2 Boosted Decision Tree
Because our observable basis is quite small, we can input them to a boosted decision tree to evaluate the discrimination power.We used ROOT's TMVA package [63,64] to train and test the boosted decision trees.Each collection of phase space observables studied elsewhere in this paper were input to the boosted decision trees, and forests of 2500 trees were used.We also trained on forests of 850 trees, and observed no significant improvement in discrimination     Signal efficiency versus background rejection rate for jet mass plus the overcomplete basis of observables that are sensitive to 5-body phase space described in the text, as determined by the deeper neural network.For comparison, we also include the signal efficiency versus background rejection rate for jet mass, jet mass plus minimal 3-body phase space observables, and jet mass plus the minimal 4-body phase space observables.power in extending to forests of 2500 trees, suggesting that the boosted decision trees are extracting all the information that they can.The results of the boosted decision trees are shown in Fig. 13.These results are again consistent with what we found earlier; namely, that discrimination power is observed to saturate once 4-body phase space is resolved.QCD jet rejection rate plot as generated by the boosted decision tree for events showered in Pythia.The different curves correspond to the mass plus collections of observables that uniquely define M -body phase space.Discrimination power is seen to saturate when 4-body phase space is resolved.In this plot, we also include the overcomplete 5-body phase space collection of observables, labeled "oc.5-body".

Figure 1 :
Figure 1: Z boson jet efficiency vs. QCD jet rejection rate plot as generated by the deep neural network.Details of the event simulation, jet finding, and machine learning are described in Sec. 3. The different curves correspond to the mass plus collections of observables that uniquely define M -body phase space.Discrimination power is seen to saturate when 4-body phase space is resolved.

Figure 3 :
Figure 3: Distribution of the mass of the jet in pp → Zj (red dashed) and hadronicallydecaying Z boson in pp → ZZ (blue dotted) from the Pythia parton shower.The minimum transverse momentum is 500 GeV, and the jets are found with the anti-k T algorithm with radius R = 0.8.

2 (Figure 6 :
Figure6: Signal efficiency versus background rejection rate for jet mass plus the overcomplete basis of observables that are sensitive to 5-body phase space described in the text, as determined by the neural network.For comparison, we also include the signal efficiency versus background rejection rate for jet mass, jet mass plus minimal 3-body phase space observables, and jet mass plus the minimal 4-body phase space observables.

3 Figure 7 :
Figure 7: Configuration in a plane of a jet with three particles.Pairwise angles θ ij and momentum fractions z i of the individual particles are labeled.Momentum conservation enforces that z 3 = 1 − z 1 − z 2 .

Figure 8 :
Figure 8: Distribution of the mass of the jet in pp → Zj (red dashed) and hadronicallydecaying Z boson in pp → ZZ (blue dotted) from the Herwig parton shower.The minimum transverse momentum is 500 GeV, and the jets are found with the anti-k T algorithm with radius R = 0.8.

Figure 9 :
Figure 9: Distributions of various two-prong discrimination observables measured on the sample of jets showered with Herwig, on which a mass cut of m ∈ [90, 120] GeV has been placed.From top to bottom are plotted signal (blue dotted) and background (red dashed) distributions of: N -subjettiness ratios τ (1) 2,1 (left) and τ (2) 2,1 (right), energy correlation function ratios D (1) 2 (left) and D

Figure 10 :Figure 11 :
Figure 10: Z boson jet efficiency vs. QCD jet rejection rate plot generated by the deep neural network for jets showered in Herwig.The different curves correspond to the mass plus collections of N -subjettiness observables that uniquely define M -body phase space.Discrimination power is seen to saturate when 3-or 4-body phase space is resolved.

Figure 13 :
Figure13: Z boson jet efficiency vs. QCD jet rejection rate plot as generated by the boosted decision tree for events showered in Pythia.The different curves correspond to the mass plus collections of observables that uniquely define M -body phase space.Discrimination power is seen to saturate when 4-body phase space is resolved.In this plot, we also include the overcomplete 5-body phase space collection of observables, labeled "oc.5-body".

Table 1 :
. Only one GPU was used during training and testing.Manufacturer specifications of the GTX TitanX.