1 Introduction

The separation of interesting signal events from large Standard-Model induced backgrounds is one of the biggest challenges in searches for new physics and when measuring particle properties at the LHC. This problem is magnified when the final-states of interest have a large probability to be produced in proton–proton collisions according to the Standard Model. Typical classifications into signal and background events are based on observables that are characteristic of the quantum numbers of the particles involved in each hypothesis. For example, the quantum numbers (e.g. charges, spin and mass) of a resonance result in a specific radiation profile in the detector. The radiation induced by such a resonance is more likely to populate specific phase space regions. Thus, to infer if a process is induced by signal or by background, one wants to know how likely the measured radiation profile was induced by either hypothesis, i.e. the probability \(\mathcal {P}(\{p_i\}|S)\) for signal and \(\mathcal {P}(\{p_i\}|B)\) for background, where \(\{p_i\}\) denotes the set of 4-momenta measured in the detector. The Neyman–Pearson Lemma shows [1] that the ratio between both probabilities

$$\begin{aligned} \chi = \frac{\mathcal {P}(\{p_i\}|S)}{\mathcal {P}(\{p_i\}|B)} \end{aligned}$$
(1)

yields an ideal classifier. This approach underlies the so-called Matrix Element Method (MEM) [2], which has been used in a large variety of contexts [3,4,5,6,7,8,9]. In the MEM, the probabilities \(\mathcal {P}(\{p_i\}|S)\) and \(\mathcal {P}(\{p_i\}|B)\) are calculated directly from the matrix elements of the respective “hard” processes. In [10, 11] the parton-level MEM has been extended to including the parton shower in the evaluation of the probabilities, and has been implemented in Shower [10,11,12] and Event [13,14,15] Deconstruction, thereby allowing for the analysis of an arbitrary number of final state objects. Information from the parton shower is particularly important in jet-rich final states and in the comparison of the substructure of jets for classification. Here exclusive fixed-order matrix elements do not provide a good description of nature, due to the appearance of collinear and soft divergences in the matrix elements.

Conversely, LHC signals and backgrounds are often predicted by using General-Purpose Event Generators (see e.g. [16]) to produce pseudo-data of scattering events. In this context, several frameworks to combine the parton shower with multiple hard matrix elements for multi-jet processes have been laid out [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]. Such merging schemes improve both the accuracy and the precision of event simulation tools. Double-counting between jets generated during the parton shower step or at the matrix element level is avoided by explicit vetoes and the inclusion of Sudakov factors or no-emission probabilities, such that multiple jets can simultaneously be described with matrix-element accuracy in one inclusive sample.

We propose to combine techniques used traditionally for merging schemes inspired by the CKKW-L method [35], and techniques of the iterated matrix-element correction approach of [37], and then use the resulting procedure to construct sophisticated perturbative weights for an input event, so that the weights may facilitate the classification between signal and background. To calculate \(\mathcal {P}(\{p_i\}|S)\) and \(\mathcal {P}(\{p_i\}|B)\), one will need to evaluate all possible combinations of parton shower and hard process histories that can give rise to the final state \(\{p_i\}\). Conceptually, such an analysis method is suitable for any final state of interest consisting of reconstructed objects, i.e. arbitrary numbers of isolated leptons, photons and jets. The approach, dubbed hytrees, is in line with the Shower/Event deconstruction method, but goes beyond these by including hard matrix elements with multiple jet emissions to calculate the weights of the event histories. We describe here the first implementation of such a method and showcase it in the context of a concrete example which is highly relevant for Higgs phenomenology, i.e. \(pp \rightarrow (\mathrm {H}\rightarrow \gamma \gamma ) + \mathrm {jets}\).

The outline of the paper is as follows. In Sect. 2 we discuss the details of the hytrees algorithm. hytrees relies on the Dire parton shower [38] to calculate the weights of the event histories. For details on the splitting probabilities used in the Dire dipole shower we refer to Appendix A. In Sect. 3 we apply hytrees to the study of the classification of the process \(pp \rightarrow (\mathrm {H}\rightarrow \gamma \gamma ) + \mathrm {jets}\) versus the processes without Higgs boson that lead to \(pp \rightarrow \gamma \gamma +\mathrm {jets}\). We offer conclusions in Sect. 4.

Fig. 1
figure 1

Pictorial representation of the paths contributing to the calculation of the probabilities \(\mathcal {P}(\{p_i\}|\,\text {Higgs})\), \(\mathcal {P}(\{p_i\}|\,\text {QED})\) and \(\mathcal {P}(\{p_i\}|\,\text {QCD})\), as described in the text

2 Implementation of hytrees

The definition of the classifier \(\chi \) suggested in Eq. (1) is in principle very intuitive. A practical implementation however requires assumptions and abstractions before the classifier can be calculated on experimental data. Thus, to test and develop the classifier, we will use event generator pseudo-data. We will evaluate the new classifier on this pseudo-data. To be concrete, we use realistic (showered and hadronised) events, i.e. each “event” consists of a collection of particles – photons, leptons, long-lived hadrons, etc. – with each particle represented by a 4-vector stored in the HepMc event format [39]. The hard processes underlying these events were generated using MadGraph [40], and showered and hadronised using Pythia  [41].

These events are further processed to arrive at final states consisting of reconstructed objects, i.e. isolated leptons, isolated photons or jets. A lepton \((e,\mu )\) or photon is considered isolated by demanding that the total hadronic activity in a cone of radius \(R=0.3\) around the object must contain less than \(10\%\) of its \(p_T\), and the object is required to have \(p_T \ge 20\) GeV and \(|y|<2.5\). Jets are reconstructed using the anti-kT algorithm [42] as implemented in fastjet [43], with radius \(R=0.4\). We only consider events with at least two jets of \(p_{T,j} \ge 35\) GeV, since looser cuts are usually not considered in experimental analyses at the LHC. After these steps, the final state of interest is now considerably simplified compared to the particle-level final state, only consisting of \(\mathcal {O}(10)\) reconstructed objects. On these states, we will want to calculate \(\chi \) of Eq. (1) from first principles relying on perturbative methods. Thus, we want to be as insensitive as possible from experimental or non-perturbative effects, such as hadronisation or pileup-induced soft scatterings. Using reconstructed objects as input to our calculation protects us to a large degree from contributions that are theoretically poorly controllable.

To allow the calculation of the classifier to be as detailed and physical as possible, we will directly use a parton shower to calculate the necessary factors. For this, we identify the reconstructed objects in the event with partons of a parton shower, i.e. with the perturbative part of the event generation before hadronisation. The first necessary step is to redistribute momenta to ensure that all jet momenta can be mapped to on-shell parton momenta, and then adding beam momenta defined by momentum conservation in the center-of-mass frame. Each of these events is then translated to all possible partonic pseudo-events, by assigning all possible parton flavors and all possible color connections to the jets.Footnote 1 The resulting collection of events are then passed to the parton shower algorithmFootnote 2 to calculate all necessary weights.

The general philosophy is illustrated in Fig. 1. A reasonable probability for the six configurations in the lowest layer should depend on the \(2\rightarrow 2\) matrix elements for particles connected to the “hard” scattering (grey blob). At the same time, the probability of the three configurations in the middle layer should be proportional to the \(2\rightarrow 3\) matrix elements for particles connected to the blob, and the overall probability of the top layer should be proportional to the \(2\rightarrow 4\) matrix elements. It is crucial to keep these conditions in mind when attempting a classification, since in general, the distinction between “hard scattering” and “subsequent radiation” is only well-defined in the phase-space region of ordered soft- and or collinear emissions. In such phase-space regions, the quantum–mechanical amplitudes factorize into independent building blocks (such as splitting functions or eikonal factors) that effectively make up a “classical” path. If the kinematics of the event is such that interference effects between the amplitudes for different paths (i.e. hypotheses) are sizable, then this needs to be reflected in the classifier. There should not be any discriminating power for such events. Here, we will build a classifier that does depend on assigning a classical path to phase-space points. The kinematics of each unique point will be used to calculate the rate of classical paths, such as the ones illustrated in Fig. 1. In phase-space regions that allow a (quantum–mechanically) sensible discrimination, the rates of the dominant paths will factorize into products of squared low-multiplicity matrix elements and approximate (splitting) kernels. In all other regions, we should be as agnostic as possible to the path. These two regions can be reconciled by always using the complete, non-factorized matrix elements to calculate the rate, and only employ the approximate (splitting) kernels to “project out” the rate of paths. This will guarantee that we minimize the dependence on assigning classical paths in inappropriate phase-space regions. We can succeed in defining the rate by the full non-factorized matrix element, for events of varying multiplicity, by employing the iterated matrix-element correction probabilities derived in [37] [see Eq. (15) therein] when calculating the probability of each path. The simultaneous use of matrix-elements for several different multiplicities is a significant improvement over traditional matrix-element methods, which only leverage matrix-elements for a single fixed multiplicity at a time. Extensions of MEM to NLO accuracy seem possible, and a worthwhile avenue to pursue [45, 46]. In this case, both Born-level and real-correction multiplicites can act in concert as a theoretically improved classifier for inclusive signal signatures.

The calculation of the classifier thus proceeds by constructing all possible ways how the partonic input state could have evolved out of a sequence of lower-multiplicity partonic states, by explicitly constructing all lower-multiplicity intermediate states via successive recombination of three into two particles, until no further recombination is possible. This construction of all “histories” follows closely the ideas used in matrix-element and parton shower merging methods [35]. The probability of an individual recombination sequence relies on full matrix elements as much as possible. In particular, we ensure that not only the probability of the lowest-multiplicity state is given by leading-order matrix elements, but that the probability of higher-multiplicity states is simultaneously determined by leading-order matrix elements. Further improvements of the method to incorporate running coupling effects, rescaling of parton distributions due to changes in initial-state longitudinal momentum components, as well as all-order corrections for momentum configurations with large scale hierarchies are discussed below.

Let us illustrate the calculation using the red paths in Fig. 1. One definite path (from dashed red through solid red to the top layer, e.g. following the rightmost lines in the figure) will contribute to the overall probability as

(2)

where \(P_{\text {X}}\) are approximate transition kernels, for example given by dipole splitting functions [47, 48]. The proof-of-principle implementation below uses the partial-fractioned dipole splitting kernels used in the Dire parton shower and documented in [38]. The necessary extensions of Dire to QED and Higgs splittings are outlined in Appendix A.

In order to construct the probabilities for the cases shown in Fig. 1, splitting functions for all QCD and QED vertices, as well as for Higgs–gluon, Higgs–fermion and Higgs–photon couplings have been calculated. When summing over the two dashed red paths, the full \(|\mathcal {M}({\mathrm {H}jj})^{}|^2\) is recovered, while summing over the dashed green and dashed blue paths yield the full mixed QCD/QED matrix elements \(|\mathcal {M}({\gamma \gamma j})^{}|^2\) and \(|\mathcal {M}({\gamma jj})^{}|^2\), respectively. The total sum of the probabilities of all paths reduces to \(|\mathcal {M}({\gamma \gamma jj})^{}|^2\), as desired. This discussion is complicated significantly by phase-space constraints, but can be generalized to an arbitrary multiplicity and to arbitrary splittings. We use the iterated ME correction approach of [37] in our proof-of-principle implementation below.

Note that it is straightforward to “tag” a path of recombinations as QCD-, QED- or Higgs-type by simply examining the intermediate configuration. The sum of all probabilities of all Higgs-type paths is an excellent measure of how Higgs-like the input state was, while the sum of all non-Higgs-type probabilities is an excellent measure of how background-like the input was. Following Eq. (1), it is thus natural to define the probability of the Higgs-hypothesis as

$$\begin{aligned} \chi _{\text {H}} \equiv \frac{\mathcal {P}(\{p_i\}|\,\text {Higgs})}{\mathcal {P}(\{p_i\}|\,\lnot \,\text {Higgs})}, \end{aligned}$$
(3)

where the respective probabilities are defined as

$$\begin{aligned} \mathcal {P}(\{p_i\}|\,\text {Higgs})= & {} \frac{\sum \mathcal {P}_{\text {H}}}{\sum (\mathcal {P}_{\text {H}} + \mathcal {P}_{\text {QCD}}+ \mathcal {P}_{\text {QED}}) } ~~~ \mathrm {and} \nonumber \\ \mathcal {P}(\{p_i\}|\,\lnot \,\text {Higgs})= & {} \frac{\sum (\mathcal {P}_{\text {QCD}}+\mathcal {P}_{\text {QED}})}{ \sum (\mathcal {P}_{\text {H}} + \mathcal {P}_{\text {QCD}}+ \mathcal {P}_{\text {QED}}) }. \end{aligned}$$
(4)

A plethora of tags defining a hypothesis can be envisioned – once all paths of all intermediate states leading to the highest-multiplicity (input) state are known, it is straightforward to attribute a probability to each hypothesis. Of course, not all hypotheses are sensible from the quantum–mechanical perspective if interference effects are important. In this case, we expect that if the hypothesis is tested on pseudo-data with the hytrees method, the results are similar, irrespective of how the pseudo-data was generated. There should not be strong discrimination power for such problematic hypotheses.

Finally, a discrimination based on matrix elements alone is likely to give an unreasonable probability for multi-jet hadronic states, since e.g. large hierarchies in jet transverse momenta will not be described by fixed-order matrix elements alone, and because the overall flux of initial-state partons is tied to changes in the parton distribution functions. Thus, we include the all-order effects of the evolution between intermediate states into the probability of each path. We expect that these improvements will ameliorate the over-sensitivity of fixed-order matrix-element methods to small event distortions due to multiple soft and/or collinear emissions that were e.g. observed in [46]. For a path p of intermediate states \(S_i^{(p)}, i\in [1,n^{(p)}]\) that transition to the next higher multiplicity at scales \(t_i^{(p)}\), all-order evolution effects can be included by correcting the probability of each path to

$$\begin{aligned}&\mathcal {P}_{\text {A}}\rightarrow \mathcal {P}_{\text {A}} w_{p},\quad \text {where}\nonumber \\&w_{p} = \prod _{i=1}^{n^{(p)}} \Pi (S_{i-1}^{(p)}; t_{i-1}^{(p)},t_i^{(p)}) \, \frac{\alpha (S_i^{(p)},t_i^{(p)})}{ \alpha ^{\text { FIX}}(S_i^{p})} \, \frac{f(S_{i-1}^{(p)}; x_{i-1}^{(p)},t_{i-1}^{(p)})}{f(S_{i-1}^{(p)}; x_{i-1}^{(p)},t_{i}^{(p)})}.\nonumber \\ \end{aligned}$$
(5)

\(\Pi (S_{i-1}^{(p)}; t_{i-1}^{(p)},t_i^{(p)})\) is the no-branching probability of state \(S_{i-1}^{(p)}\) between scales \(t_{i-1}^{(p)}\) and \(t_i^{(p)}\), which is directly related to Sudakov form factors [49,50,51]. We have also introduced the placeholder \(\alpha ^{\text { FIX}}(S_i^{p})\) for the coupling constant of the branching producing state \(S_i^{p}\) out of state \(S_{i-1}^{(p)}\), and \(\alpha (S_i^{(p)},t_i^{(p)})\) as a placeholder for the same coupling evaluated taking the kinematics of state \(S_i^{p}\) into account.Footnote 3 Finally, the parton luminosity appropriate for state \(S_{i-1}^{(p)}\), evaluated at longitudinal momentum fraction \(x_{i-1}^{(p)}\) and factorization scale \(t_{i-1}^{(p)}\) are collected in the factors \(f(S_{i-1}^{(p)}; x_{i-1}^{(p)},t_{i-1}^{(p)})\). Ratios of these factors account for the rescaling of the initial flux due to branchings. The weights \(w_p\) are also a key component of the CKKW-L algorithm, which employs trial showers to generate the no-branching probabilities, and attaches the PDF- and \(\alpha _s\) ratios as event weight to pretabulated fixed-order input events.

In hytrees, we also invoke trial showers to generate the no-branching factors, i.e. the calculation of the weights \(w_p\) is performed by directly using a realistic parton shower, specifically the Dire plugin to Pythia. The trial shower algorithm is directly based on the CKKW-L merging implementation in Pythia, and is discussed in some detail in [36]. To correctly calculate \(w_{p}\) for all possible paths, we extend this parton shower to include QED radiation (so that the shower can give a sensible all-order QED-resummed weight for the green paths in Fig. 1) and to allow the transitions \(q\rightarrow qH, g\rightarrow g H\) and \(H\rightarrow \gamma \gamma \) (in order to correctly assign the red clustering paths in Fig. 1). Details on these improvements, and on the use of matrix-element corrections in Dire , are given in Appendix A.

Fig. 2
figure 2

Classification of signal or background pseudodata according to Higgs hypothesis, using different values for the argument of the QCD running coupling, both in the evaluation of coupling factors as well as the evaluation of no-branching probabilities

Fig. 3
figure 3

Non-normalized probabilities \(\mathcal {W}(\{p_i\}|\,\text {Hypothesis}) = \mathcal {P}(\{p_i\}|\,\text {Hypothesis}) \cdot \sum (\mathcal {P}_{\text {H}} + \mathcal {P}_{\text {QCD}} + \mathcal {P}_{\text {QED}})\) of Higgs and non-Higgs pseudodata to be tagged as Higgs or non-Higgs configuration, using different values for the argument of the QCD running coupling, both in the evaluation of coupling factors as well as the evaluation of no-branching probabilities

3 Application to \(\mathrm {H} \rightarrow \gamma \gamma \) + jets

To assess the performance of our approach in separating signal from background, and to showcase the scope of its potential applications, we study the signal process \(pp \rightarrow \mathrm {H}jj\) with subsequent decay of the Higgs boson into photons, \(\mathrm {H} \rightarrow \gamma \gamma \), at a center-of-mass energy of \(\sqrt{s} = 13\) TeV. This process is of importance in studying the quantum numbers of the Higgs boson, e.g. its couplings to other Standard Model particles [52,53,54,55] or its CP properties [56,57,58,59,60]. Just like for the Higgs discovery channel with an inclusive number of jets, \(pp \rightarrow (\mathrm {H} \rightarrow \gamma \gamma ) + X\), this channel suffers from a large Standard-Model continuum background. We generate signal and background events using MadGraph for the hard process cross section, and Pythia for showering and hadronisation. At the generation level, we apply minimal cuts for the photons (\(p_{T,\gamma } \ge 20\) GeV, \(|\eta | < 2.5\) and \(\Delta R_{\gamma \gamma } \ge 0.2\)), and on the final state partons j (\(p_{T,j} \ge 30\) GeV, \(|\eta | \le 4.5\) and \(\Delta R_{jj} \ge 0.4\)). While we do not consider detector efficiencies for the jets, we simulate the detector response in the reconstruction of the photons by smearing their energy such that the Breit-Wigner distributed invariant mass \(m^2_{\gamma \gamma }= (p_{\gamma ,1} + p_{\gamma ,2})^2\) has a width of 2 GeV after reconstruction. Under such inclusive cuts, the signal process receives contributions from gluon fusion, as well as from weak-boson fusion [61, 62]. Standard approaches to exploit this signal process often rely on the application of weak-boson-fusion cuts [63, 64], which render gluon-fusion contributions sub-dominant. Instead here, we will focus on the gluon-fusion contributions exclusively, aiming to apply hytrees to discriminate the continuum di-photon background from the gluon-fusion induced Higgs signal.Footnote 4

In Fig. 2, we show \(\log _{10}(\chi _{\text {H}})\), as calculated according to Eqs. (1) and  (2), for Higgs-signal pseudo-data (left) and non-Higgs background samples (right). It is apparent that the observable \(\chi \) can discriminate between signal and background events. Signal events have on average large \(\chi _{\text {H}}\), i.e. they result in a relatively large value for \(\mathcal {P}(\{p_i\}|S)\) in comparison to \(\mathcal {P}(\{p_i\}|B)\), and vice versa for background events. Since the hytrees method is based on calculating well-defined perturbative factors, it goes beyond many existing classification methods by also providing an estimate of theoretical uncertainties of the hypothesis-testing variable \(\chi _{\text {H}}\). An exhaustive definition of the uncertainty of hytrees is extremely similar to the uncertainty of an event generator, in that it both perturbative ambiguities (of fixed-order matrix elements as well as all-order resummation) and non-perturbative variations contribute to overall uncertainty budget. In the context of event generators, uncertainties have recently received much attention (see e.g. the community effort [65, 66] or [34, 67,68,69]). No exhausive uncertainty budget of both perturbative and non-perturbative components of event generators has been presented so far. Here, for our proof-of-principles implementation, we use perturbative scale variations, applied both to fixed-order and all-order components of the hytrees method, as one example illustration of a source of theoretical uncertainty. We find that the theoretical uncertainty, estimated by varying the renormalisation scale between \(t/2 \le \mu _R \le 2 t\) (where t are the Dire parton-shower evolution variables given in Table 1, as necessary to evaluate running \(\alpha _s\) effects at the nodal splittings in the history tree, and to perform \(\mu _R\)-variations of the no-branching factors) are very small for \(\chi \) in our example. This is somewhat remarkable, as signal and background enter to lowest order at \(\mathcal {O}(\alpha ^2_s)\) for the hard process. As shown in Fig. 3, \(\mathcal {P}(\{p_i\}|S)\) and \(\mathcal {P}(\{p_i\}|B)\) separately (and multiplied by the total probability to ensure that no artificial numerator–denominator cancellations occur) show a large sensitivity on scale variations, which cancels when taking the ratio to calculate \(\chi _{\text {H}}\). This can also be understood in terms of a cancellation for the performance of the classifier. In the calculation of both the signal and the background hypotheses, partons are interpreted as emitted from the initial state partons, thus forming the final states with two (or more) jets. As the underlying dynamics is governed by QCD, this is very similar for signal and background, so that this part of the event does not contain much discriminative information. Furthermore, changing the argument of \(\alpha _s\) will affects signal and background in a similar way.

Fig. 4
figure 4

Classification of signal or background pseudodata according to Higgs hypothesis, using different values of \(\Gamma _{H}\). Only configurations with diphoton invariant masses in a small window are shown, to further demonstrate the discrimination power w.r.t. a simple mass cut

Fig. 5
figure 5

Probabilities to identify signal or background pseudodata according to “Higgs-signal” and “non-Higgs signal” hypothesis, as function of the dijet invariant mass and the diphoton invariant mass

This raises the question whether all information used in discriminating signal from background is in fact contained in the electroweak part of the event, and could e.g. be captured by analyzing the invariant mass distribution \(m_{\gamma \gamma }\). We can investigate the effect of a mass-window cut within experimental uncertainties by selecting signal and background events that satisfy \(|m_{\gamma \gamma } - 125~\mathrm {GeV}| < 2~\mathrm {GeV}\), in line with the way we smeared the energy of the photons. Figure 4 shows when applying a mass cut, the normalised distributions of \(\chi _{\text {H}}\) overlap much more for signal and background samples, indicating that the very good separation observed in Fig. 2 rests largely on the fact that the photons in the signal arise due to the decay of a narrow resonance. Still, the signal samples result on average in a large value for \(\chi _{\text {H}}\) compared to the background samples and thus S / B can be improved with a cut on \(\chi _{\text {H}}\).

In order to construct the history tree for the hytrees method, it was necessary to introduce “Higgs splitting kernels” (cf. App. A) to define the probability of the \(\text {H}\rightarrow \gamma \gamma \) decay. In principle, it would be permissible to use the physical Higgs-boson width when calculating these splitting kernels. However, it is reasonable to expect that this might lead to an artificially strong discrimination power. Figure 4 shows that this is not the case, by varying the Higgs-boson width in the splitting kernel in a very large range.

The hytrees method effectively takes all possible observables into account to discriminate between two hypotheses. To investigate further how this relates to cutting on \(m_{\gamma \gamma }\), Fig. 5 shows the probabilities \(\mathcal {P}\) directly, binned in the differential distributions \(m_{\gamma \gamma }\) and \(m_{jj}\). This highlights that hytrees might also be useful to find optimal cuts in a cut-and-count analysis, since hytrees can quantify how much differential observables can discriminate between different hypotheses. As shown in Fig. 5, \(m_{jj}\) is very similar for signal and backgrounds, while \(m_{\gamma \gamma }\) is very discriminative. The sensitivity of any observable in classifying events can be studied in this way.

Classification with respect to Higgs or no-Higgs hypotheses is not the only application for hytrees in our example. One can imagine to construct different classification observables to test different hypotheses. For example, we could define \(\chi _{\text {QED}}\) and \(\chi _{\text {QCD}}\) in analogy to Eqs. (3) and (4), i.e.

$$\begin{aligned}&\chi _{\text {QED}} \equiv \frac{\mathcal {P}(\{p_i\}|\,\text {QED})}{\mathcal {P}(\{p_i\}|\,\lnot \,\text {QED})} \qquad \text {and} \nonumber \\&\chi _{\text {QCD}} \equiv \frac{\mathcal {P}(\{p_i\}|\,\text {QCD})}{\mathcal {P}(\{p_i\}|\,\lnot \,\text {QCD})}, \end{aligned}$$
(6)

with the probabilities

$$\begin{aligned} \mathcal {P}(\{p_i\}|\,\text {QED})&= \frac{\sum \mathcal {P}_{\text {QED}}}{\sum (\mathcal {P}_{\text {H}} + \mathcal {P}_{\text {QCD}}+ \mathcal {P}_{\text {QED}}) },\nonumber \\ \mathcal {P}(\{p_i\}|\,\lnot \,\text {QED})&= \frac{\sum (\mathcal {P}_{\text {QCD}}+\mathcal {P}_{\text {H}})}{ \sum (\mathcal {P}_{\text {H}} + \mathcal {P}_{\text {QCD}}+ \mathcal {P}_{\text {QED}}) } \end{aligned}$$
(7)
$$\begin{aligned} \mathcal {P}(\{p_i\}|\,\text {QCD}) =&\frac{\sum \mathcal {P}_{\text {QCD}}}{\sum (\mathcal {P}_{\text {H}} + \mathcal {P}_{\text {QCD}}+ \mathcal {P}_{\text {QED}}) },\nonumber \\ \mathcal {P}(\{p_i\}|\,\lnot \,\text {QCD}) =&\frac{\sum (\mathcal {P}_{\text {QED}}+\mathcal {P}_{\text {H}})}{ \sum (\mathcal {P}_{\text {H}} + \mathcal {P}_{\text {QCD}}+ \mathcal {P}_{\text {QED}})}. \end{aligned}$$
(8)
Fig. 6
figure 6

Classification of signal or background pseudodata according to different signal hypotheses. a Higgs hypothesis \(\chi _{\text {H}}\) (\(\Gamma _H=0.1\) GeV), b QCD hypothesis \(\chi _{\text {QCD}}\), c QED hypothesis \(\chi _{\text {QED}}\)

In Fig. 6, we show how the Higgs-signal and non-Higgs background samples fare regarding these three classification variables \(\chi _\text {H}\), \(\chi _\mathrm {QED}\) and \(\chi _\mathrm {QCD}\). The best discrimination between signal and background is observed in \(\chi _\text {H}\). This is not surprising, as \(\chi _\text {H}\) tests explicitly if there is a Higgs boson in the sample or not. \(\chi _\mathrm {QCD}\) and \(\chi _\mathrm {QED}\) perform as expected, yielding an on average larger value of \(\chi \) for the background sample, and smaller values for the events that do contain a Higgs boson. While \(\chi _\mathrm {QCD}\) retains some discriminative power between the Higgs and no-Higgs samples, the least discriminate variable is \(\chi _\mathrm {QED}\). Hence, with respect to the green path in Fig. 1, the signal and background samples provide very little separable kinematic features. The \(\mathrm {QED}\) hypothesis provides a very similar classifier, irrespective of the event sample, indicating that no “classical” path in the history tree is preferred, and that thus, interferences are relevant. It is comforting that in this case, the hytrees method does indeed, as desired, not produce an artificial discrimination power by referring to classical paths. In conclusion, by applying hytrees to known signal and background samples it is possible to optimise the discriminating observable, and to obtain an improved understanding of the kinematic features that allow a discrimination between signal and backgrounds.

4 Conclusions

The classification of events into signal and background is the basis for all searches and measurements at collider experiments. By building on the Event Deconstruction method [10, 13], CKKW-L merging [35] and the iterated matrix-element correction approach of [37], we have developed and implemented a novel way to classify realistic (i.e. fully showered and hadronised) final states according to different theory hypotheses. This method has been implemented in a standalone package, called hytrees, and will be made publicly available.

In principle this method is applicable to any final state and any theoretical hypotheses. However, there is a practical limitation due to the sharply increasing time it takes to evaluate complex final states with many (colored) particles. While invisible particles have not been implemented yet, approaches how to take them into account in the hypothesis testing exist [15] and will be included in a future release of hytrees.

We have applied hytrees to the gluon-fusion induced production of Hjj with subsequent decay H \( \rightarrow \gamma \gamma \). This process receives large backgrounds where the photons can either be produced in the hard interaction of the process \(pp \rightarrow \gamma \gamma jj\) or by being radiated off the final state or initial state quarks of the process \(pp \rightarrow jj\). Detector effects were rudimentarily taken into account by smearing the photon momenta. hytrees can directly calculate the probability of how likely an event was produced through a transition of interest. We have shown that hytrees can confidently separate between signal and background samples with respect to the Higgs or no-Higgs hypothesis. While the method takes into account all possible kinematic observables simultaneously to classify the event according to the hypotheses of consideration, it is also possible to study how much individual observables, or combinations of observables, contribute to the overall classification. Thus, hytrees can be used to optimise cuts for cut-and-count based analyses very efficiently. The flexible and first-principle calculation-based approach enables us to obtain an improved understanding of the kinematic features that allow us to discriminate between signal and backgrounds for very large classes of processes at any high-energy collider experiment.