Novel jet observables from machine learning

Previous studies have demonstrated the utility and applicability of machine learning techniques to jet physics. In this paper, we construct new observables for the discrimination of jets from different originating particles exclusively from information identified by the machine. The approach we propose is to first organize information in the jet by resolved phase space and determine the effective N -body phase space at which discrimination power saturates. This then allows for the construction of a discrimination observable from the N -body phase space coordinates. A general form of this observable can be expressed with numerous parameters that are chosen so that the observable maximizes the signal vs. background likelihood. Here, we illustrate this technique applied to discrimination of H→bb¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ H\to b\overline{b} $$\end{document} decays from massive g→bb¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ g\to b\overline{b} $$\end{document} splittings. We show that for a simple parametrization, we can construct an observable that has discrimination power comparable to, or better than, widely-used observables motivated from theory considerations. For the case of jets on which modified mass-drop tagger grooming is applied, the observable that the machine learns is essentially the angle of the dominant gluon emission off of the bb¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ b\overline{b} $$\end{document} pair.


Introduction
Several groups have recently applied promising machine learning techniques to the problem of classifying jets from different originating particles [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17].A review of the advances of the field is presented in Ref. [18].These approaches, while demonstrating exceptional discrimination power, often come with the associated costs of utilizing hundreds of low-level input variables with thousands of correlations between them, and lack an immediately accessible physical interpretation.Ref. [19] presented the first application of a bottom-up organizing principle, whereby neural networks were trained and tested on minimal and complete bases of observables sensitive to the phase space of M subjets in a jet.By appropriately identifying M subjets in a jet, this study probed their phase space with sets of 3M − 4 infrared and collinear (IRC) safe observables.This essentially utilizes the distribution of the M subjets on the phase space to identify useful information for discrimination.By increasing the dimensionality of the phase space in a systematic way, Ref. [19] used machine learning to demonstrate that in the case of discriminating boosted hadronic decays of Z bosons from jets initiated by light partons, once 4-body phase space is resolved, no more information is observed to contribute meaningfully to discrimination power.In addition, Ref. [20] also presented a first application of this method to developing promising generic anti-QCD taggers that match or outperform the discrimination power of dedicated taggers.Such approaches motivate the development of novel observables that can capture all of the salient information for discrimination of jets as learned by the machine.
While the output of a neural network, boosted decision tree, or other machine learning method is itself an observable, it is in general a highly non-linear function of the input.Additionally, the precise form of the explicit observable constructed by the machine is very sensitive to the assumed parameters; for example, the number of nodes or layers in a neural network.In this paper, we propose a procedure to identify discriminating features of jets learned by machines to then generate novel observables.These observables capture the important physics identified by the machine while at the same time being human-parseable.The general procedure is as follows: 1. Construct a basis of observables that is sensitive to the phase space of subjets in a jet.
Measure these basis observables on your signal and background samples.
2. Use machine learning techniques, such as neural networks, to identify the resolved Mbody phase space at which signal vs. background discrimination power saturates.
3. Construct a function of the phase space variables (with tunable parameters) at which discrimination power saturates.This function will be a new observable on the jets that can be used individually for discrimination.
4. Fix the parameters in the new observable by demanding that it maximizes some discrimination metric, such as the area under the signal vs. background efficiency curve (ROC curve).
This algorithm is simple enough that it can be automated with essentially no human input, with a specified basis of observables to span M -body phase space and an appropriate functional form for the observable.We will present and use a particular choice for the phase space basis and functional form of the final observable in this paper, but these may need to be modified and optimized for different studies.
For concreteness, here we apply the above approach to the problem of discriminating highly boosted decays of the Standard Model Higgs boson to a pair of b-quarks from splittings of gluons to b-quarks.We study identification of H → b b decays here as the signal and background jets both have a two-prong substructure, and theoretically-optimized discrimination observables have not been studied in great detail.Recently, Ref. [21] utilized jet substructure approaches to propose a promising search strategy for this boosted decay mode of the Higgs, encouraging the possibility of discovery in data from Run-II of the LHC.In order to further increase the probability of discovery, it is necessary to explore new strategies to ensure sensitivity to the specific features of this decay mode.
Using the organizing principle proposed in Ref. [19], if discrimination power saturates at M -body phase space then the machine must be learning some function of the corresponding 3M − 4 phase space variables.To resolve the M -body phase space, we use the N -subjettiness observables [22,23], as employed in Ref. [19].In the case of discrimination of boosted H → b b decays to g → b b splittings, we find that the discrimination power increases only slightly once 3-body phase space is resolved.Thus, we will only study the resolved 3-body phase space in this paper.The function of the observable on 3-body phase space that we study is a simple product form . (1.1) Here, the τ N are the N -subjettiness observables, 3-body phase space is 5 dimensional, and the parameters a, b, c, d, e will be chosen to maximize discrimination power.We emphasize that while this product form is simple, there may be a better choice for the form of function on phase space.
We show in Fig. 1 the results of this analysis.We consider jets in simulation on which no grooming has been applied and on which the modified mass-drop tagger (mMDT) [24,25] has been applied.We then measure the mass m J of these ungroomed or groomed jets (as appropriate) and make a cut of m J ∈ [100, 150] GeV, in the range of the Higgs peak.On  For both, the M -body curves determined by a neural network demonstrate a large increase in discrimination power between 2-and 3-body phase space.Also shown (in red) is the new observable which captures the majority of information important for discrimination identified by resolving 3-body phase space.
these jets, we then measure the 2-and 3-body phase space variables and determine the single function of the 3-body phase space variables as described above.The signal and background efficiencies in Fig. 1 do not include the effects of the mass cut, so that the curves end at 100% signal and background efficiency.The ROC curve for the new observable is shown as the solid line on these plots, and exhibits significant improvement over the 2-body phase space observables and nearly captures all discrimination power of 3-body phase space.In the case of mMDT jets, we will show that this new observable effectively corresponds to the angle of the dominant gluon emission off of the b b pair.The outline of this paper is as follows.In Sec. 2 we review and define the minimal and complete observable bases that are used to identify the coordinates of M -body phase space, and discuss the mMDT groomer.In Sec. 3, we describe our event simulation and machine learning implementation, and demonstrate the application of our procedure for developing the new observable in the case of H → b b vs. g → b b.Further, we explore the physics implications the functional form of the observable, and compare its discrimination power to standard observables motivated from QCD considerations.We conclude in Sec. 4 and discuss other possible applications of this procedure.

Observable basis
In this section, we discuss the basis of IRC safe observables that we use to identify structure in the jet, following the approach presented in Ref. [19].For our analysis, we exclusively use the N -subjettiness observables [22,23,26].This is without loss of generality and the analysis can, for example, be equivalently implemented with the N -point energy correlation functions [27] or the four-momentum of subjets from the exclusive k T algorithm.This specific choice for each M -body basis is only to ensure that the set of observables minimally and completely span the phase space of emissions in a jet.
The N -subjettiness observable τ N provides a measure of the radiation about N axes in the jet, specified by the angular exponent β > 0: Here, p T J is the transverse momentum of the jet of interest, p T i is the transverse momentum of particle i in the jet, and R Ki , for K = 1, 2, . . ., N , is the angle in pseudorapidity and azimuth between particle i and axis K in the jet.In our analyses, we choose to define the N axes in the jet according to the exclusive k T algorithm [28,29] with E-scheme recombination [30].

mMDT Grooming Algorithm
In the analysis of the next section, we measure the aforementioned observables on samples of both ungroomed jets and jets groomed with the modified mass-drop tagger (mMDT) [24,25].Given a set of constituents of a jet with radius R, and for a fixed transverse momentum fraction parameter z cut , the mMDT grooming algorithm proceeds as follows: 1. Recluster the jet with the Cambridge/Aachen (C/A) algorithm [31][32][33].
2. Sequentially step through the branching history of the reclustered jet.At each branching with daughter branches i and j, check the mMDT criterion min(p T i , p T j ) If the condition fails, drop the softer of two daughter branches and follow through to the next branching in the rest of the clustering history.
3. This continues until the mMDT criterion is passed.At this point the algorithm terminates and the final jet is groomed of all branches that fail to pass the test.This ensures that the softest emissions at wide angles from the hard subjets, and contamination from the underlying event (UE) and initial state radiation (ISR), are effectively removed from the final groomed jet.
In the groomed case, the observable bases are measured after grooming, thus the collection of particles in the jet effectively contributing to the phase space are different than for ungroomed jets.

How to Make an Observable
In this section, we describe our event simulation and implementation of machine learning to the N -subjettiness basis of observables described in the previous section.We generate background pp → Z + b b and signal pp → Z(H → b b) events at the 13 TeV LHC with MadGraph5 v2.5.4 [34].The Z bosons were used as a control and decayed exclusively to neutrinos.These treelevel events are then showered in Pythia v8.226 [35,36] with default settings.Later in this section we will also show results obtained by applying the observable learned from Pythia to events showered with Herwig v7.1.1 [37,38].We use FastJet v3.2.1 [39,40] to cluster the jets.On the clustered anti-k T [41] jets with radius R = 0.8 and minimum p T of 500 GeV, we then measure the basis of N -subjettiness observables using the code provided in FastJet contrib v1.026.The observables are measured at the particle level and we do not apply any detector simulation.
We then study these jets without grooming and with mMDT grooming with z cut = 0.1.On these samples, we measure the jet mass and apply a cut of 100 < m J < 150 GeV which selects the Higgs signal region.Additionally, we measure the sufficient collection of N -subjettiness observables to completely determine up through 6-body phase space.We then proceed to develop novel observables learned from the machine that further discriminate signal and background.
To do this, we use the approach of Ref. [19].This enables us to identify the resolved phase space that captures the vast majority of the discrimination power.To calculate the ROC curves for M -body phase space shown in Fig. 2, we trained deep neural networks with fully-connected layers on the bases of N -subjettiness observables discussed in the previous section.Discrimination power is seen to dramatically increase on going from 2-to 3-body phase space, and higher phase space improves discrimination only slightly.All networks were trained using the Keras [42] deep learning libraries.However, it is important to note here that this procedure, as justified in Ref. [19], is agnostic of the specific machine learning method used and, equivalently, other machine learning methods (like a boosted decision tree) could have been used to identify the point of saturation.From these results, in what follows we will exclusively study 3-body phase space.
From the assumption that 3-body phase space effectively saturates discrimination power, our goal is to define a single observable that captures this discrimination power.As previously discussed, this proposed observable must be a function of the phase space variables at the point of saturation.Determining this function thus requires parametrizing the possible functions of the phase space variables somehow.Our approach will be illustrative and demonstrate the procedure for doing so.However, there may be smarter or more effective ways to optimize  this process.Here, we will just consider the observable formed from the product of 3-body (5 dimensional) phase space variables, raised to powers a, b, c, d and e: At this stage, the optimal values of these powers are undetermined, and there is no guarantee that this form of the observable actually includes all discrimination power of 3-body phase space.We leave the problem of a complete observable basis to future work.

Determining Optimal Parameters
Utilizing the measurements of N -subjettiness observables from our datasets of groomed and ungroomed jets, we run a Monte Carlo simulation whereby uniform random numbers in the range [−5, 5] were assigned to the exponents a, b, c, d and e. 1 In each run, values of the resultant product observable were measured on samples of 200,000 signal and background jets from Pythia that passed the mass cut of m J ∈ [100, 150] GeV.We then construct the 1 dimensional binned likelihood distributions of the observable, which is the optimal discriminant for a given functional form of the observable by the Neyman-Pearson lemma [43].The likelihood distributions of the observable, for each set of exponent values, were then used to calculate the area under the ROC curve (AUC) to estimate the discrimination power.For each run, the values of the exponents and the calculated AUC were stored only when the AUC crossed a threshold value of 0.5.This was necessary to exclude binning effects on the measured discrimination power of the observable.We apply this procedure to jets that have been groomed with mMDT and those that have not.In the groomed case, due to the exclusion of soft emissions and contamination from initial state radiation (ISR) or underlying event, it is relatively straightforward to extract a useful physical understanding from the obtained functional form of the observable.In Fig. 3, we plot the distributions of the exponents a, b, c, d and e with the requirement that the AUC for the corresponding product observable is greater than 0.81 or 0.73, for ungroomed and groomed jets, respectively.These distributions will enable us to extract the exponent values for which the AUC is maximized for the binned likelihood distributions of the product observable measured on signal and background.
By studying the histograms of the exponents one can make the following conclusions: • For the ungroomed jets, AUC is maximized when c = 0, d = 0.5, e = −1 as the distributions for these exponents are very narrow.Since the distributions for a and b are both approximately uniform on [−5, 5], further interpretation is required.This will be done shortly.
• For the groomed jets, AUC is maximized when c = 0, d = −2, e = 2. Again, since a and b are both approximately uniformly distributed over [−5, 5], further analysis is required to determine the values that maximize the AUC.
To determine the values for a and b for both ungroomed and groomed jets, we work to understand the correlation between the exponents.
To determine the correlation between the exponents a and b, we plot their joint probability distribution from the uniform sampling on [−5, 5] with the same cuts on the resulting observables' AUC.For both ungroomed and groomed jets, this is shown in Fig. 4.These plots demonstrate a strong correlation between these exponents, which to very good approximation is: These relationships can be used to fix b, for example, as a function of exponent a.To determine the value of the exponent a, we then fix b, c, d, and e as earlier, and calculate the AUC for a ∈ [ −5, 5].The results of this scan are shown in Fig. 5 for ungroomed and groomed jets.In particular, we note from the plot that AUC is maximized for the ungroomed case when a = 2 and for the groomed case when a = −2.This implies that the exponent b = 0 for both cases, using Eq.(3.2).Thus, the product observable takes on the following forms for the two kinds of jets: Any monotonic function of the observable will produce the same discrimination power, and so we can simplify the expression for the groomed product observable.For the product observable for groomed jets, we use the expression: It is interesting to note that the observables from this method are Sudakov safe [44,45] because they are formed from ratios of IRC safe observables.

Physical Interpretation
In Fig. 6, we plot the distribution of these new product observables measured on signal and background jets showered in Pythia.This shows that these product observables on ungroomed   and groomed jets effectively separate signal from background.Additionally, in Fig. 7, we measure these product observables determined from the Pythia signal and background samples on the jets showered with Herwig.We observe a similar relative separation between the distributions, although the absolute scale is different, in the Herwig samples suggesting that these observables are sensitive to real physics, and not idiosyncrasies of the parton shower programs.Especially for groomed jets, these simple forms for the product observables enable a nice interpretation of the physics to which they are sensitive.In the case of ungroomed jets, there are multiple sources of radiation (final state, initial state, underlying event, etc.) that makes an interpretation a bit more challenging, so we won't discuss it more here.When the jets are groomed with mMDT, however, contamination radiation from the initial state or underlying event is dominantly removed, and so a picture of the jet exclusively with radiation from the final state is accurate.In this case, the mMDT jet with resolved 3-body phase space consists of the b and b pair, and the dominant gluon emitted off of them.The 3-body phase space configuration is shown in Fig. 8, with transverse momentum fractions z i and pairwise angles θ ij .In what follows, we will let particles 1 and 2 be the b and b, and particle 3 be the gluon.
Because we make a cut on the jet mass and there is no soft singularity for g → b b splitting, we assume that the emitted gluon is relatively soft and/or collinear with respect to the b and the b.With this assumption, then the value of τ Here, z is the transverse momentum fraction of the b quark subjet, for example.The combination z(1 − z)θ 2 12 is just the ratio of the jet mass to the jet transverse momentum to this order, m 2 J /p 2 T J , and is approximately constant because the mass cut is relatively narrow.The term in parentheses on the right, z 0.75 (1 − z) 0.25 + z 0.25 (1 − z) 0.75 , is typically an order-1 on these jets with a mass cut is just some constant value.The remaining factor in β (g) 3 however, contains significantly interesting physics.The two other N -subjettinesses that appear in that observable can be expressed as (see Ref. [19] for more details): In writing this, we assume that the gluon, with transverse momentum fraction is the first particle clustered by the k T algorithm, and therefore sets the value of these τ 2 observables.z i is the transverse momentum fraction of the closer in angle of particles 1 or 2 (the b or b), with θ i3 this angle.The ratio that appears in β Therefore, with these assumptions, the groomed jet product observable is approximately proportional to the angle between the dominant gluon emission and the closer of the b or b: As color octets, gluons preferably emit at wide angles, while singlet Higgs bosons emit at small angles, and so we do expect this observable to provide discrimination power.

Comparison to Standard Observables
To demonstrate that these observables learned by the machine are indeed powerful, we compare their discrimination power to that of a collection of standard observables.For comparison, we use N -subjettiness ratio τ 2,1 with winner-take-all axes [46][47][48] and (generalized) energy correlation function ratios D (2) 2 [49] and N (2) 2 [50].While these and related observables have been used for identification of boosted H → b b decays, they are not necessarily optimized for this purpose.Nevertheless, they provide a useful benchmark.The signal efficiency versus background rejection rates for jets showered in Pythia are shown in Fig. 9 and for jets from Herwig, in Fig. 10.Most interestingly, for ungroomed jets, the new product observable Figure 11: Scatter plots of 1000 signal (red) and background (blue) jets in the plane of β 3 versus the 3-body phase space likelihood ratio from the neural network.Ungroomed jets are shown in (a), and mMDT groomed jets in (b).The gross monotonic relationship between β 3 and the likelihood indicates that most of the information in the likelihood is captured in β 3 , while the spread indicates that there is some discrimination information that is missed in β 3 .
versus the likelihood ratio as measured on 1000 of both signal and background jets.If the β 3 observables perfectly captured the information in the likelihood, these scatter plots should reduce to a monotonic curve.The deviation from monotonicity is a measure of the information that β 3 misses with respect to the likelihood.Broadly, these plots demonstrate a monotonic relationship between β 3 and the likelihood, but there is some spread.The relative size of the spread is less for β 3 measured on ungroomed jets, which may reflect that in this case, β 3 captures more of the information in the likelihood than β (g) 3 .

Conclusions
Previous deep learning studies in jet physics have shown immense promise.While it has been shown that appropriately designed deep learning techniques can outperform standard observables, such studies have not effectively probed what more information the machines are identifying.Building on Ref. [19], we propose a procedure that develops powerful new observables from the knowledge of the information contained in jets that contributes to an observable's discrimination power.By systematically controlling the information fed to neural networks, it is possible to identify the minimal amount of information required to effectively discriminate between highly boosted decays of different massive particles and light QCD partons.The method of planing introduced in Ref. [17] may also be useful in identifying the minimal information necessary for powerful discrimination.
Here, we have proposed an algorithm that can, in principle, be automated to construct new observables for any discrimination problem.While the H → b b application shows promise, such a procedure might also be applied to other specific problems like identifying q vs. g jets, top quarks, or even to develop new observables that work effectively as generic anti-QCD taggers.
The most improvement to this method would be accomplished by construction of an optimal basis of functions with parameters that can be tuned to maximize discrimination power.Of course, at this point, one is faced with a trade-off between simplicity and efficacy that must be taken into account.By utilizing constructive deep learning techniques that are sensitive to exotic configurations within jets, this approach is presented with an intention to open the door to a whole new class of powerful substructure observables that can be tailored to specific or generic classification problems, while also providing further physics insight regarding the jets being studied.

Figure 1 :
Figure 1: H → b b jet efficiency vs. g → b b jet rejection rate plots for ungroomed (a) and groomed (b) jets.For both, the M -body curves determined by a neural network demonstrate a large increase in discrimination power between 2-and 3-body phase space.Also shown (in red) is the new observable which captures the majority of information important for discrimination identified by resolving 3-body phase space.

Figure 2 :
Figure 2: H → b b jet efficiency vs. g → b b jet rejection rate plot for the ungroomed (a) and groomed (b) jets, as determined by a neural network.The curves effectively demonstrate saturation of discrimination power on the resolution of 3-body phase space.

Figure 3 :
Figure 3: Histograms for the values of exponents of the product observable.For ungroomed (a) and groomed (b) jets exponent values in these histograms were accepted when the generated AUCs for the binned signal and background likelihood distributions were above 0.81 and 0.73 respectively.

Figure 4 :
Figure 4: Heat maps of the correlation between a and b exponents of the product observable for ungroomed (a) and groomed (b) jets.

Figure 5 :
Figure 5: Variation of area under the ROC curve for the observable when the a exponent is scanned over the range [−5, 5], keeping the c, d and e exponents fixed and varying b with a as per Eq.(3.2).

Figure 6 :Figure 7 :
Figure 6: Distributions of the product observable for signal (red) and background (blue), measured on the samples of ungroomed (a) and groomed (b) jets showered with Pythia, within a mass cut of m J ∈ [100, 150] GeV.