HYTREES: Combining Matrix Elements and Parton Shower for Hypothesis Testing

We present a new way of performing hypothesis tests on scattering data, by means of a perturbatively calculable classifier. This classifier exploits the"history tree"of how the measured data point might have evolved out of any simpler (reconstructed) points along classical paths, while explicitly keeping quantum-mechanical interference effects by copiously employing complete leading-order matrix elements. This approach extends the standard Matrix Element Method to an arbitrary number of final state objects and to exclusive final states where reconstructed objects can be collinear or soft. We have implemented this method into the standalone package HYTREES and have applied it to Higgs boson production in association with two jets, with subsequent decay into photons. HYTREES allows to construct an optimal classifier to discriminate this process from large Standard Model backgrounds. It further allows to find the most sensitive kinematic regions that contribute to the classification.


I. INTRODUCTION
The separation of interesting signal events from large Standard-Model induced backgrounds is one of the biggest challenges in searches for new physics and when measuring particle properties at the LHC. This problem is magnified when the final-states of interest have a large probability to be produced proton-proton collisions according to the Standard Model. Typical classifications into signal and background events are based on observables that are characteristic of the the quantum numbers of the particles involved in each hypothesis. For example, the quantum numbers (e.g. charges, spin and mass) of a resonance result in a specific radiation profile in the detector. The radiation induced by such a resonance is more likely to populate specific phase space regions. Thus, to infer if a process is induced by signal or by background, one wants to know how likely the measured radiation profile was induced by either hypothesis, i.e. P({p i }|S) for signal and P({p i }|B) for background, where {p i } denotes the set of 4-momenta measured in the detector. The Neyman-Pearson Lemma shows [1] that by taking the ratio between both probabilities yields an ideal classifier. This approach underlies the so-called Matrix Element Method (MEM) [2], which has been used in a large variety of contexts [3][4][5][6][7][8][9]. In the MEM, the probabilities P({p i }|S) and P({p i }|B) are calculated directly from the matrix elements of the respective "hard" processes. In [10,11] the parton-level MEM has been extended to including the parton shower in the evaluation of the probabilities, and has been implemented in Shower [10][11][12] and Event [13][14][15] Deconstruction, thereby allowing for the analysis of an arbitrary number of final state objects. Information from the parton shower is particularly important in jet-rich final states and in the comparison of the substructure of jets for classification. Here exclusive fixed-order matrix elements do not provide a good description of nature, due to the appearance of collinear and soft divergences in the matrix elements. Conversely, LHC signals and backgrounds are often predicted by using General-Purpose Event Generators (see e.g. [16]) to produce scattering event pseudo-data. In this context, several frameworks to combine the parton shower with multiple hard matrix elements of the multi-jet processes containing have been laid out, e.g. the MLM [17] procedure, the CKKW [18] and CKKW-L [19] methods, or iterated matrix-element corrections approach [20,21]. These formalisms allow to avoid double counting between jets generated during the parton shower step and the matrix element, such that multiple jets can simultaneously be described with matrix-element accuracy.
We are proposing to combine techniques used traditionally for the CKKW-L and the iterated matrix-element correction approach of [21], and then use the resulting procedure to construct sophisticated perturbative weights of the full input event, that will facilitate the classification between signal and background. To calculate P({p i }|S) and P({p i }|B), one needs to evaluate all possible combinations of parton shower and hard process histories that can give rise to the final state {p i }. Conceptually, such an analysis method is suitable for any final state of interest consisting of reconstructed objects, i.e. arbitrary numbers of isolated leptons, photons and jets. The approach, dubbed hytrees, is in line with the Shower/Event deconstruction method, but goes beyond these by including hard matrix elements with multiple jet emissions to calculate the weights of the event histories. We describe here the first implementation of such a method and showcase it in the context of a concrete example which is highly relevant for Higgs phenomenology, i.e. pp → (H → γγ) + jets.
The outline of the paper is as follows. In Sec. II we discuss the details of the hytrees algorithm. hytrees relies on the Dire parton shower [22] to calculate the weights of the event histories. For details on the splitting probabilities used in the Dire dipole shower we refer to Appendix A. In Sec. III we apply hytrees to the study of the classification of the process pp → (H → γγ) + jets versus the processes without Higgs boson that lead to pp → γγ + jets. We offer conclusions in Sec. IV. The definition of the classifier χ suggested in Eq. 1 is in principle very intuitive. A practical implementation however requires assumptions and abstractions before the classifier can be calculated on experimental data. Thus, to test and develop the classifier, we will use event generator pseudo-data. We will evaluate the new classifier on this pseudo-data. To be concrete, we use realistic (showered and hadronised) events, i.e. each "event" consists of a collection of particles -photons, leptons, long-lived hadrons, etc. -with each particle represented by a 4-vector stored in HepMc event format [23]. The hard process underlying these events were generated using MadGraph [24], and showered and hadronised using Pythia [25].
These events are further processed to arrive at final states consisting of reconstructed objects, i.e. isolated leptons, isolated photons or jets. A lepton (e, µ) or photon isolated from hadronic energy by demanding that the total hadronic activity around in a cone of radius R = 0.3 around the object must contain less than 10% of its p T , and the object is required to have p T ≥ 20 GeV and |y| < 2.5. Jets are reconstructed using the anti-kT algorithm [26] as implemented in fastjet [27], with radius R = 0.4. We only consider events with at least two jets of p T,j ≥ 35 GeV, since looser cuts are usually not considered in experimental analyses at the LHC. After these steps, the final state of interest is now considerably simplified compared to the particle-level final state, only consisting of O(10) reconstructed objects. On these states, we will want to calculate χ of Eq. 1 from first principles relying on perturbative methods. Thus, we want to be as insensitive as possible from experimental or non-perturbative effects, such as hadronisation or pileupinduced soft scatterings. Using reconstructed objects as input to our calculation protects us to a large degree from contributions that are theoretically poorly controllable.
To allow the calculation of the classifier to be as detailed and physical as possible, we will directly use a parton shower to calculate the necessary factors. For this, we identify the reconstructed objects in the event with partons of a parton shower, i.e. with the perturbative part of the event generation before hadronisation. The first necessary step is to redistribute momenta to ensure that all jet momenta can be mapped to on-shell parton momenta, and then adding beam momenta defined by momentum conservation in the center-of-mass frame. Each of these events is then translated to all possible partonic pseudo-events, by assigning all possible parton flavors and all possible color connections to the jets 1 . The resulting collection of events are then passed to the parton shower algorithm 2 to calculate all necessary weights.
The general philosophy is illustrated in Fig 1. A reasonable probability for the six configurations in the lowest layer should depend on the 2 → 2 matrix elements for particles connected to the "hard" scattering (grey blob). At the same time, the probability of the three configurations in the middle layer should be proportional to the 2 → 3 matrix elements for particles connected to the blob, and the overall probability of the top layer should be proportional to the 2 → 4 matrix elements. It is crucial to keep these conditions in mind when attempting a classification, since in general, the distinction between "hard scattering" and "subsequent radiation" is only well-defined in the phase-space region of ordered soft-and or collinear emissions. In such phase-space regions, the quantum-mechanical amplitudes factorize into independent building blocks (such as splitting functions or eikonal factors) that effectively make up a "classical" path. If the kinematics of the event is such that interference effects between the amplitudes for different paths (i.e. hypotheses) are sizable, then this needs to be reflected in the classifier. There should not be any discriminating power for such events. Here, we will build a classifier that does depend on assigning a classical path to phase-space points. The kinematics of each unique point will be used to calculate the rate of classical paths, such as the ones illustrated in Fig 1. In phase-space regions that allow a (quantum-mechanically) sensible discrimination, the rates of the dominant paths will factorize into products of squared low-multiplicity matrix elements and approximate (splitting) kernels. In all other regions, we should be as agnostic as possible to the path. These two regions can be reconciled by always using the complete, non-factorized matrix elements to calculate the rate, and only employ the approximate (splitting) kernels to "project out" the rate of paths. This will guarantee that we minimize the dependence of assigning classical paths in inappropriate phase-space regions. We can succeed in defining the rate by the full non-factorized matrix element, for events of varying multiplicity, by employing the iterated matrix-element correction probabilities derived in [21] (see Eq. (15) therein) when calculating the probability of each path. The simultaneous use of matrix-elements for different multiplicities is a significant improvement over traditional matrix-element methods, which only leverage matrix-elements for a fixed multiplicity at a time.
The calculation of the classifier thus proceeds by constructing all possible ways how the partonic input state could have evolved out of a sequence of lower-multiplicity partonic states, by explicitly constructing all lower-multiplicity intermediate states via successive recombination of three into two particles, until no further recombination is possible. This construction of all "histories" follows closely the ideas used in matrix-element and parton shower merging methods [19]. The probability of an individual recombination sequence relies on full matrix elements as much as possible. In particular, we ensure that not only the probability of the lowest-multiplicity state is given by leading-order matrix elements, but that the probability of higher-multiplicity states is simultaneously determined by leading-order matrix elements. Further improvements of the method to incorporate running coupling effects, rescaling of parton distributions due to changes in initial-state longitudinal momentum components, as well as all-order corrections for momentum configurations with large scale hierarchies are discussed below.
Let us illustrate the calculation using the red paths in Fig 1. One definite path (from dashed red through solid red to the top layer, e.g. following the rightmost lines in the figure) will contribute to the overall probability as where P X are approximate transition kernels, for example given by dipole splitting functions [29,30]. In order to construct this probability for the case of Fig 1, splitting functions for all QCD and QED vertices, as well as for Higgs-gluon, Higgs-fermion and Higgs-photon couplings have been calculated. When summing over the two dashed red paths, the full |M (Hjj)| 2 is recovered, while summing over the dashed green and dashed blue paths yield the full mixed QCD/QED matrix elements |M (γγj)| 2 and |M (γjj)| 2 , respectively. The total sum of the probabilities of all paths reduces to |M (γγjj)| 2 , as desired. This discussion is complicated significantly by phase-space constraints, but can be generalized to an arbitrary multiplicity and to arbitrary splittings [21].
Note that it is straightforward to "tag" a path of recombinations as QCD-, QED-or Higgs-type by simply examining the intermediate configuration. The sum of all probabilities of all Higgs-type paths is an excellent measure of how Higgs-like the input state was, while the sum of all non-Higgs-type probabilities is an excellent measure of how background-like the input was. Following Eq. 1, it is thus natural to define the probability of the Higgs-hypothesis as where the respective probabilities are defined as A plethora of tags defining a hypothesis can be envisioned -once all paths of all intermediate states leading to the highest-multiplicity (input) state are known, it is straightforward to attribute a probability to each hypothesis. Of course, not all hypotheses are sensible from the quantum-mechanical perspective if interference effects are important.
In this case, we expect that if the hypothesis is tested on pseudo-data with the hytrees method, the results are similar, irrespective of how the pseudo-data was generated. There should not be strong discrimination power for such problematic hypotheses. Finally, a discrimination based on matrix elements alone is likely to give an unreasonable probability for multi-jet hadronic states, since e.g. large hierarchies in jet transverse momenta will not be described by fixed-order matrix elements alone, and because the overall flux of initial-state partons is tied to changes in the parton distribution functions. Thus, we include the all-order effects of the evolution between intermediate states into the probability of each path. For a path p of intermediate states S transitioning to the next higher multiplicity at scales t (p) i , this can be done by correcting the probability of each path to i , which is directly related to Sudakov form factors [31,32]. We have also introduced the placeholder α FIX (S p i ) for the coupling constant of the branching producing state S p i out of state S (p) i−1 , and α(S i ) as a placeholder for the same coupling evaluated taking the kinematics of state S p i into account 3 . Finally, the parton luminosity appropriate for state S Ratios of these factors account for the rescaling of the initial flux due to branchings. The weights w p are also a key component of the CKKW-L algorithm, which employs trial showers to generate the no-branching probabilities, and attaches the PDF-and α s ratios as event weight to pretabulated fixed-order input events.
In hytrees, we also invoke trial showers to generate the no-branching factors, i.e. the calculation of the weights w p is performed by directly using a realistic parton shower, specifically the Dire plugin to Pythia. To correctly calculate w p for all possible paths, we extend this parton shower to include QED radiation (so that the shower can give a sensible all-order QED-resummed weight for the green paths in Fig 1) and to allow the transitions q → qH, g → gH and H → γγ (in order to correctly assign the red clustering paths in Fig. 1). Details on these improvements, and on the use of matrix-element corrections in Dire , are given in Appendix A.

III. APPLICATION TO H → γγ + JETS
To assess the performance of our approach in separating signal from background, and to showcase the scope of its potential applications, we study the signal process pp → Hjj with subsequent decay of the Higgs boson into photons, χ Higgs ( Higgs pseudodata ), µ 2 R = t χ Higgs ( Higgs pseudodata ), µ 2 R = 1 4 t χ Higgs ( Higgs pseudodata ), µ 2 R = 4t H → γγ, at a center-of-mass energy of √ s = 13 TeV. This process is of importance in studying the quantum numbers of the Higgs boson, e.g. its couplings to other Standard Model particles [33][34][35][36] or its CP properties [37][38][39][40][41]. Just like for the Higgs discovery channel with inclusive number of jets, pp → (H → γγ) + X, this channel suffers from a large Standard-Model continuum background. We generate signal and background events using MadGraph for the hard process cross section and Pythia for showering and hadronisation. At the generation level, we apply minimal cuts for the photons (p T,γ ≥ 20 GeV, |η| < 2.5 and ∆R γγ ≥ 0.2), and on the final state partons j (p T,j ≥ 30 GeV, |η| ≤ 4.5 and ∆R jj ≥ 0.4). While we do not consider detector efficiencies for the jets, we simulate the detector response in the reconstruction of the photons by smearing their energy such that the Breit-Wigner distributed invariant mass m 2 γγ = (p γ,1 + p γ,2 ) 2 has a width of 2 GeV after reconstruction. Under such inclusive cuts, the signal process receives contributions from gluon fusion, as well as from weak-boson fusion [42,43]. Standard approaches to exploit this signal process often rely on the application of weak boson fusion cuts [44,45], which render gluon fusion contributions sub-dominant. Instead here, we will focus on the gluon fusion contributions exclusively, aiming to apply hytrees to discriminate the continuum di-photon background from the gluon-fusion induced Higgs signal 4 .
In Fig. 2 we show log 10 (χ H ), as calculated according to Eqs. 1 and 2, for Higgs-signal pseudo-data (left) and non-Higgs background samples (right). It is apparent that the observable χ can discriminate between signal and background events. Signal events have on average large χ H , i.e. they result in a relatively large value for P({p i }|S) in comparison to P({p i }|B), and vice versa for background events. Since the hytrees method is based on calculating well-defined perturbative factors, it goes beyond many existing classification methods by also providing an estimate of theoretical uncertainties of the hypothesis-testing variable χ H . We find that the theoretical uncertainty, estimated by varying the renormalisation scale between t/2 ≤ µ R ≤ 2t (where t are the Dire parton-shower evolution variables [22], as necessary to evaluate running α s effects at the nodal splittings in the history tree, and to perform µ R -variations of the no-branching factors) are very small for χ in our example. This is somewhat remarkable, as signal and background enter to lowest order at O(α 2 s ) for the hard process. As shown in Fig. 3, P({p i }|S) and P({p i }|B) separately (and multiplied by the total probability to ensure that no artificial numerator-denominator cancellations occur) show a large sensitivity on scale variations, which cancels when taking the ratio to calculate χ H . This can also be understood in terms of a cancellation for the performance of the classifier. In the calculation of both the signal and the background hypotheses, partons are interpreted as emitted from the initial state partons, thus forming the final states with two (or more) jets. As this underlying dynamics is governed by QCD, this is very similar for signal and background, so that this part of the event does not contain much discriminative information. Furthermore, changing the argument of α s will affects signal and background in a similar way.
This raises the question whether all information used in discriminating signal from background is in fact contained  in the electroweak part of the event, and could e.g. be captured by analyzing the invariant mass distribution m γγ . We can investigate the effect of a mass-window cut within experimental uncertainties by selecting signal and background events that satisfy |m γγ − 125 GeV| < 2 GeV, in line with the way we smeared the energy of the photons. Fig. 4 shows when applying a mass cut, the normalised distributions of χ H overlap much more for signal and background samples, indicating that the very good separating observed in Fig. 2 rests largely on the fact that the photons in the signal arise due to the decay of a narrow resonance. Still, the signal samples result on average in a large value for χ H compared to the background samples and thus S/B can be improved with a cut on χ H .
In order to construct the history tree for the hytrees method, it was necessary to introduce "Higgs splitting kernels" (cf. App. A) to define the probability of the H → γγ decay. In principle, it would be permissible to use the physical Higgs-boson width when calculating these splitting kernels. However, it is reasonable to expect that this might lead to an artificially strong discrimination power. Fig. 4 shows that this is not the case, by varying the Higgs-boson width in the splitting kernel in a very large range.
The hytrees method effectively takes all possible observables into account to discriminate between two hypotheses. To investigate further how this relates to cutting on m γγ , Fig. 5 shows the probabilities P directly, binned in the differential distributions m γγ and m jj . This highlights that hytrees might also be useful to find optimal cuts in a cutand-count analysis, since hytrees can quantify how much differential observables can discriminate between different FIG. 5: Probabilities to identify signal or background pseudodata according to "Higgs-signal" and "non-Higgs signal" hypothesis, as function of the dijet invariant mass and the diphoton invariant mass.
hypotheses. As shown in Fig. 5, m jj is very similar for signal and backgrounds, while m γγ is very discriminative. The sensitivity of any observable in classifying events can be studied in this way.
Classification with respect to Higgs or no-Higgs hypotheses is not the only application for hytrees in our example. One can imagine to construct different classification observables to test different hypotheses. For example, we could define χ QED and χ QCD in analogy to Eqs. (3) and (4), i.e.
In Fig. 6, we show how the Higgs-signal and non-Higgs background samples fare regarding these three classification variables χ H , χ QED and χ QCD . The best discrimination between signal and background is observed in χ H . This is not surprising, as χ H tests explicitly if there is a Higgs boson in the sample or not. χ QCD and χ QED perform as expected, yielding an on average larger value of χ for the background sample, and smaller values for the events that do contain a Higgs boson. While χ QCD retains some discriminative power between the Higgs and no-Higgs samples, the least discriminate variable is χ QED . Hence, with respect to the green path in Fig. 1, the signal and background samples provide very little separable kinematic features. The QED hypothesis provides a very similar classifier, irrespective of the event sample, indicating that no "classical" path in the history tree is preferred, and that thus, interferences are relevant. It is comforting that in this case, the hytrees method does indeed, as desired, not produce an artificial discrimination power by referring to classical paths. In conclusion, by applying hytrees to known signal and background samples it is possible to optimise the discriminating observable, and to obtain an improved understanding of the kinematic features that allow a discrimination between signal and backgrounds.

IV. CONCLUSIONS
The classification of events into signal and background is the basis for all searches and measurements at collider experiments. By building on the Event Deconstruction method [10,13], CKKW-L merging [19] and the iterated matrix-element correction approach of [21], we have developed and implemented a novel way to classify realistic (i.e. fully showered and hadronised) final states according to different theory hypotheses. This method has been implemented in a standalone package, called hytrees, and will be made publicly available.
In principle this method is applicable to any final state and any theoretical hypotheses. However, there is a practical limitation due to the sharply increasing time it takes to evaluate complex final states with many (colored) particles. While invisible particles have not been implemented yet, approaches how to take them into account in the hypothesis testing exist [15] and will be included in a future release of hytrees.
We have applied hytrees to the gluon-fusion induced production of Hjj with subsequent decay H→ γγ. This process receives large backgrounds where the photons can either be produced in the hard interaction of the process pp → γγjj or by being radiated off the final state or initial state quarks of the process pp → jj. Detector effects were rudimentarily taken into account by smearing the photon momenta. hytrees can directly calculate the probability of how likely an event was produced through a transition of interest. We have shown that hytrees can confidently separate between signal and background samples with respect to the Higgs or no-Higgs hypothesis. While the method takes into account all possible kinematic observables simultaneously to classify the event according to the hypotheses of consideration, it is also possible to study how much individual observables, or combinations of observables, contribute to the overall classification. Thus, hytrees can be used to optimise cuts for cut-and-count based analyses very efficiently. The flexible and first-principle calculation-based approach enables us to obtain an improved understanding of the kinematic features that allow us to discriminate between signal and backgrounds for very large classes of processes at any high-energy collider experiment.

V. ACKNOWLEDGMENTS
We thank Valentin Hirschi for collaboration during an early stage of this project, by sharing a private code to generate all color connections, and for longstanding help with using MadGraph to generate the C++ matrix element code employed for matrix element corrections. MS is grateful to Dave Soper for a longstanding collaboration on the Shower/Event Deconstruction approach. MS thanks the University of Tuebingen and the Humboldt Society for support and hospitality during the finalisation of parts of this work. SP would like to thank Walter Giele for collaboration on dipole showers for QED splittings.
Realistic classifications of final states containing jets and photons according to an hypothesis require the construction of all possible branching histories that could have produced the final states. Thus, all possible ways of splitting or recombining the particles in the final state have to be considered. For the problem at hand, this requires a simultaneous description of QCD-and QED branchings, both at fixed-and all-order perturbative accuracy. If an hypothesis does not only depend on the final-state particles alone, but rather infers reconstructed intermediate state such as Higgs bosons, it is also necessary to incorporate the relevant intermediate branchings.
The description of QCD splittings used in this publication is implemented in the Dire plugin to Pythia , and consists of a partial-fractioned dipole parton shower including mass effects, as documented in detail in [22]. For all QCD splittings, we evaluate the running coupling at the evolution scale assigned to the splitting, i.e. α(S We implement QED emissions as an extension of the partial fractioned dipole shower of Dire [22], using the same evolution and energy sharing variables as well as kinematical splitting functions (and mass corrections) as discussed in [22]. The crucial difference to the treatment of QCD is that we allow all pairs of electric charges to form dipoles that coherently emit photons, similar to the ideas presented in [46] and more recently discussed in [47] and [48]. At variance to the latter, we split the soft-photon radiation pattern into two pieces each assigned to one dipole splitting kernel. The color factors in the QCD splitting functions in [22] are further replaced by the electric (dipole) charge correlators, which can readily be negative. This inconvenience is addressed by using the weighted parton shower [49,50] algorithm implemented in Dire . The assignment of recoilers for the γ → ff splitting takes guidance from the simultaneous emission of a soft quark pair in QCD (see e.g. [51]) which can be thought of being emitted from a parent color dipole [52]. The latter calculation is of course not directly applicable to QED. Nevertheless, in the absense of other concrete ideas, we allow all electrically charged particles to act as spectator for the γ → ff splitting. For all QED branchings, we do not employ a running coupling and instead fix α to the Thompson value (cf. [46].), i.e. α(S i ) = α em (0) = 0.00729735. More details on the formalism of QED showers will be presented elsewhere [53]. Since our QED splitting kernels can readily become negative, we expect that the event weight fluctuation due to the weighting algorithm can become a significant problem. This is however largely circumvented by including QCD and QED matrix element corrections up to pp → γγjjj in the formalism of [21] into the parton shower: Since the matrix-element corrections guarantee the correct radiation pattern irrespective of the splitting kernels, it is legitimate to enforce positive splitting kernels for splittings yielding states for which matrix-element corrections are available, thus not producing large weight fluctuations.
To allow testing the hypothesis of an intermediate Higgs boson, we further include the emission rate g → gH and the decay rate H→ γγ directly into the parton shower evolution. The emission rate q → qH is omitted, since its contribution is only present for heavy quarks and is, due to the quark masses, further suppressed by phase space. The evolution variable and phase space mapping for the emission rate g → gH is identical to that of (massive) QCD or QED splittings, and the splitting function is a simple uniform weight Γ H→gg (m H ). This allows to assign a probability to the production vertex of the Higgs boson, and is sufficient as long as the emission rate is effectively absent in the shower evolution. All gluons that can be reached by tracing leading-N C color connections are possible spectators for this splitting. The coupling value α(S The virtuality of the photon pair serves as evolution variable for the H→ γγ decay. In this case, the splitting kernel is defined by where S is the number of possible recoilers for this splitting. In line with the reasoning for the γ → ff splitting above, we allow all gluons as spectators for this splitting. Again, it worth noting that we do employ matrix-element corrections for shower splittings that produce pp → γγjjj or less complicated states, such that for the purposes of this publication, the concrete prescription of the P H→γγ is of minor importance. The coupling value α(S