1 Introduction

Several beyond the standard model (BSM) models, e.g Micro-Black-Holes (mBH) [1,2,3,4], R-Parity Violation (RPV) Supersymmetry [5, 6] or Sphaleron induced processes [7, 8] predict the possible production at the LHC of events with a large number of outcoming high energy partons . These events will give rise to final states consisting of a high multiplicity of jets, produced above the TeV scale, namely, energetic multi-jet events.

The identification of this type of signal, through the observation of an excess of energetic multi-jet events, is far from being straightforward due to the presence of large Standard Model (SM) background originating from Quantum Chromodynamics (QCD) processes. The presently available event generators for such processes perform Leading Order (LO), Next to leading order (NLO) or even partial NNLO calculations followed by radiation of additional partons through the Parton Shower (PS) algorithm. The accuracy of these calculations is limited as some unavoidable approximations must be imposed. Significant effort has been invested in the study of energetic multi-jet events at the LHC with the ATLAS and CMS collaborations [9,10,11]. These studies cope with a major difficulty, namely, with the need to estimate the kinematics of multi-jet events without the usage of simulation. Yet some indirect dependence on simulation always remains. As a first stage in the development of a novel data-driven technique for estimation of QCD processes, the difference between predictions of several event generators, when energetic multi-jet events are simulated, is presently studied. The description of the technique, as well as its performance using the aforementioned simulations, are the subjects of the latter sections.

2 Simulation of multijet processes

Modeling multi-jet processes in QCD is a challenging task mainly due to the large number and high complexity of the relevant Feynman diagrams. Generally speaking, there are three main approaches to handle this complexity.

The simplest approach is to couple the Leading Order (LO) calculations that give rise to two outcoming partons with a Parton Shower (PS) algorithm that can produce additional jets. While such an approach provides precise physical modelling of di-jet with soft-collinear parton emissions, it is less accurate pertaining to topologies with more than two well separated partons. In spite of its simplicity the LO+PS technique provides a surprisingly good description of the Tevatron and LHC data [12].

A better simulation can be achieved by carrying out the calculations of additional real parton emissions, while neglecting the contribution of virtual corrections. Such an approach can give rise to events with up to four or five jets. In order to simulate higher jet multiplicities a PS algorithm is applied. While such an approach improves the description of final states with more than two well separated partons, the interfacing with a PS algorithm requires a proper matching scheme.

The most rigorous approach is a formal order-by-order perturbative calculation, where each extra order includes diagrams with one more outcoming particle and one more loop in the intermediate state. Next to Leading Order (NLO) calculations can compute the properties of up to 3 jets with one loop corrections. Calculations of higher orders are very resource consuming and are thus limited.

For the purpose of comparison between the various event-generators strategies, each approach was used to simulate QCD events at \(\sqrt{s}=\) 13 TeV as outlined below. Jets were reconstructed using the anti-kt algorithm [13] implemented in the FastJet 3.2.1 package [14] with a radius parameter value of R = 0.4. All jets were required to satisfy \(p_T~\)> 50 GeV and \(|\eta |<2.8\).

  • For the LO+PS approach events were generated and showered with PYTHIA8.235 [15]. To efficiently cover the large phase-space (from the GeV to TeV scales) the sample was generated in slices of the reconstructed leading jet’s \(p_T\) with a constant number of events simulated in each slice. The leading reconstructed jet kinematics is affected by initial and final state radiation (ISR/FSR). Therefore, the generator level parameter of \({\hat{p}}_T^{min}\), i.e. - the cut for the minimum transverse momentum of the outgoing leading parton at generator level, has to be set to be lower than the leading jet \(p_T\) used in defining the slices. Optimization studies found that for a sample with the leading jet \(P_T\) between \(p_T^{MIN}\) and \(p_T^{MAX}\) GeV a cut of \({\hat{p}}_T^{min} = \left( p_T^{MIN}/395\right) ^3+\left( p_T^{MIN}/164\right) ^2+\left( p_T^{MIN}/1.85\right) \) was most efficient in minimizing computation time.

  • For the multi-leg approach events were simulated with the MadGraph5_aMC@NLO v2.6.3.2 [16] event generator using matrix elements calculations for up to four partons at leading order. Events were generated in slices of the total sum of partonic \(p_T~\)(\({\hat{H}}_T\)) covering the entire energy range. The use of \({\hat{H}}_T\) for defining the slices greatly minimized computation time and was possible due to the multi-leg simulation at generator level. The generated events were fed into PYTHIA8 where the PS algorithm has been applied to all partons, using the CKKW-L merging scheme [17, 18], with a merging scale of 80 GeV.

  • In the last case of full NLO calculations the POWHEG-BOX v2 framework [19,20,21] has been used to simulate di-jet and three-jet processes [22, 23]. For full coverage of the entire energy range, the sample was creating using 350 GeV slices of Powheg-\(k_T^{born}\) (the \({\hat{p}}_T\) of the underlying Born diagram). Events were showered with PYTHIA8 using the default Powheg NLO merging scheme.

In all cases, the factorization and renormalization scales are set to \(H_T/2\), and the CT14 [24] PDF-set was used. Parton shower and underlying event was used with the Monash 2013 tune [25].

3 Results

3.1 Event kinematics

The dependence of the average number of jets per event (\(\mathinner {\langle {N^{Jet}}\rangle }\)) on the total event’s transverse energy as quantified by \(H_T\)Footnote 1, is shown in Fig. 1. One notes the drop of \(\mathinner {\langle {N^{Jet}}\rangle }\)  at high values of \(H_T\) exhibited by all the generators that are used in this study. \(\mathinner {\langle {N^{Jet}}\rangle }\)  reaches a maximum at \(H_T\) approximately \(0.4~E_{beam}\) and then drops by about 25% at \(H_T\) roughly \(1.5~E_{beam}\). This drop is due, in part, to the drop in the relative cross-sections of subprocesses that contain gluons in their final state (namely, \(qg \rightarrow qg\) and \(gg \rightarrow gg\)) as depicted in Fig. 2. Due to their higher “color” charge, gluons tend to radiate more jets than quarks (Fig. 3). Therefore, a smaller fraction of final state gluons entails lower \(\mathinner {\langle {N^{Jet}}\rangle }\). However, all processes, including \(qq \rightarrow qq\), exhibit the same drop in \(\mathinner {\langle {N^{Jet}}\rangle }\) at high \(H_T\), (see Fig. 3) presumably due to the running of \(\alpha _s\).

All three generators predict the same dependence of the average jet multiplicity on \(H_T\). However, the absolute value varies by 10% between the generators in a non-trivial manner: for example, predictions calculated using the tri-jet NLO method are below the di-jet NLO calculation. The difference is attributed to the fact that the calculations are executed at different perturbative orders and implement different merging schemes and requires further study. In any case, these differences show the need for a data driven approach, as will be presented in the following subsection.

Fig. 1
figure 1

The average number of jets per event (\(\mathinner {\langle {N^{Jet}}\rangle }\)) as a function of \(H_T\). The result is stable under a change of the minimal transverse momentum and maximal pseudo-rapidity for jet acceptance. The lower average jet multiplicty exhibited by Madgraph and Powheg trijet may be attributed to the QCD scale uncertainties (not shown) and requires further investigation

Fig. 2
figure 2

The relative cross-sections at LO of the 3 leading subprocesses. Events were generated using Pythia

Fig. 3
figure 3

The average number of jets per event (\(\mathinner {\langle {N^{Jet}}\rangle }\)) as a function of \(H_T\) for the 3 leading subprocesses. Events were generated using Pythia. Note that the presence of gluons in the final state entails higher \(\mathinner {\langle {N^{Jet}}\rangle }\). Note also that the \(\mathinner {\langle {N^{Jet}}\rangle }\) drop at high values of \(H_T\) appears for all three subprocesses

As described above, NLO and multileg calculations are used to generate up to three and four jets respectively. The simulation of higher jet multiplicities is done in all cases using the PS algorithm. Therefore, in order to compare the results of the three different simulation strategies the properties of the third and fourth jet (in \(p_T\) order) are examined. In Fig. 4 a comparison of the fraction of the transverse momenta carried out by third jet in events with 3 jets (\(\frac{p_T^{(3)}}{H_T}\), top), and by the fourth jet in events with 4 jets (\(\frac{p_T^{(4)}}{H_T}\), bottom) are shown. One notes that the differences between the three strategies are modest. For 3-jet events Pythia tends to exhibit a small excess of events with high \(\frac{p_T^{(3)}}{H_T}\) which is compensated by a lower yield of soft 3rd jet. Powheg and Madgraph \(\frac{p_T^{(3)}}{H_T}\) distributions look similar.

Similarly, in Fig. 5 the distribution of the angular separation between the thrust axis and the 3rd jet (in 3-jet events, top) and the 4th jet (in 4-jets events, bottom) is shown. The transverse thrust axis is defined by:

(1)

Where \(\mathbf {n}\) is a unit vector and \(\mathbf {\mathrm {p}}_{\text {{T}}_j}\) is the transverse momentum vector of the jth jet. Using that definition, the azimuthal angle of the Thrust axis (\(\phi _\mathrm {T_\bot }\)) w.r.t. the x axis of the transverse plane can be evaluated analytically (j index suppressed to avoid cluttering of notation):

$$\begin{aligned} \phi _\mathrm {T_\bot } = \frac{1}{2}\arctan \left( \frac{-2\Sigma p_x p_y}{\Sigma \left( p_y^2 - p_x^2\right) }\right) + \kappa \frac{\pi }{2} \end{aligned}$$
(2)

where:

$$\begin{aligned} \kappa = {\left\{ \begin{array}{ll} 1, &{} \text {if}\ \cos (2\phi _\mathrm {T_\bot })\left( \Sigma p_y^2 - p_x^2\right) <2\sin (2\phi _\mathrm {T_\bot })\Sigma p_x p_y \\ 0, &{} \text {otherwise} \end{array}\right. }\nonumber \\ \end{aligned}$$
(3)

In 3-jet events the angular separation between the 3rd jet and the thrust axis in Madgraph tends be larger than that in Pythia, while the same angle in Powheg lies in between. No significant difference between the three simulation strategies is seen for the same distribution in 4-jet events.

Fig. 4
figure 4

The fraction of the transverse momenta carried out by third jet (\(\frac{p_T^{(3)}}{H_T}\)) in 3-jets events (top), and by the fourth jet (\(\frac{p_T^{(4)}}{H_T}\)) in 4-jet events (bottom). All entries are for events satisfying \(H_T > 1\) TeV

Fig. 5
figure 5

Angular distribution of third jet relative to the transverse thrust axis (see Eq. 2) in 3-jet events (top) and by the fourth jet in 4-jet events (bottom). All entries are for events satisfying \(H_T > 1\) TeV

3.2 The two hemispheres method (THm)

In spite of the reasonable agreement between the outcome of the various QCD event generators, their predictions of the cross-section and various shape variables of multi-jet events may be at odds with measurements. Hence, a data-driven procedure for robust modelling of energetic multi-jet events is greatly needed. A new procedure focused on predicting the jet multiplicities is described hereafter. The starting point for this procedure is similar to that taken by ATLAS and CMS in estimating the QCD background for multijet events [26, 27] , namely, that QCD events can approximately be portrayed as beginning with a \(2 \rightarrow 2\) process that gives rise to two back-to-back (in the x-y plane) out-coming partons, followed by a parton shower. The comparison between LO, NLO and partial NNLO in the previous section shows that the \(2 \rightarrow 2\) picture is not modified significantly by \(2 \rightarrow 3\) and \(2 \rightarrow 4\) processes. Multijet QCD events are produced, in this picture, in a sequence of a steps:

  • The \(2 \rightarrow 2\) matrix element is used and two partons (taken here to be quarks, gluons or a quark and a gluon) are generated with the proper \(p_T~\) and \(\eta \) distributions;

  • The parton shower algorithm is applied to each parton and secondary partons may be radiated (or split) off the primary ones. The Parton shower is then iteratively applied to the next generation of partons till no more partons are radiated;

  • Partons are hadronized and unstable hadrons are let to decay.

In the process’s center-of-mass system one can use the plane perpendicular to the initial outgoing partons line of flight to define two hemispheres. In pp collisions the \({{\hat{z}}}\) direction (beam axis) is almost information-free, therefore, projecting the event to the \((x,y)\) plane preserves all vital information. The line perpendicular to the transverse thrust axis (defined in Eq. 2) may be used as the dividing line.

Because of momentum conservation and the simplistic assumption that the PS is carried out for each of the partons independently, it is claimed here that, at first approximation, the jet multiplicity in each hemisphere (\(N^A_{Jets}\) and \(N^B_{Jets}\)) are independent of each other. Second order effects (e.g. qg production, ISR etc.) may violate this hypothesis and it will therefor be validated below.

The conjecture, that \(N^A_{Jets}\) and \(N^B_{Jets}\) are uncorrelated, would not be true for a variety of BSM models that give rise to energetic multi-jet events, for example, those mentioned in the introduction. The following procedure is suggested to differentiate between events arising from QCD background and those arising from one of the hypothetical signals which violates the independence conjecture:

  • Select High \(N^{Jet}\) events having \(N^A_{Jets}\)=1 (i.e. events with one jet in the first hemisphere). The hypothetical High \(N^{Jet}\) signal is unlikely to give rise to such events and, therefore, this sample should be signal-free or at least signal depleted.

  • Extract the distribution of the number of jets in the second hemisphere (\(N^B_{Jets}\)) from the signal-free (i.e. \(N^A_{Jets}\)=1) sample. This \(N^B_{Jets}\)(\(N^A_{Jets}\)=1) distribution should therefore represent the \(N^B_{Jets}\) distribution of pure QCD for all values of \(N^A_{Jets}\), i.e. \(N^A_{Jets}\)=2,3,4.., thus serving as a QCD background estimation for those samples which might host signal events.

  • Finally, Compare the \(N^B_{Jets}\) distribution as obtained from the signal free sample with those obtained from the expected signal region (\(N^A_{Jets}\) \(>1\)) distributions. An excess of events with high \(N^B_{Jets}\)  may be considered as a possible indication for the presence of a signal

As discussed, the above procedure relies on the assumption that for QCD the \(N^B_{Jets}\)  distribution is independent of \(N^A_{Jets}\). The independence assumption can be tested and validated using QCD simulations by directly comparing \(N^B_{Jets}\)(\(N^A_{Jets}\)=1) distribution with those of \(N^B_{Jets}\)(\(N^A_{Jets}\)=i) where i=2,3,4... Such a comparison is shown in Fig. 6 (top) using LO events generated by Pythia in the \(H_{T}\) region of 2 \(< H_T < \) 2.5 TeV. The black markers indicate the distribution of \(N^B_{Jets}\) while \(N^A_{Jets}\)  is constrained to 1. The colored markers indicate the distribution of \(N^B_{Jets}\) while \(N^A_{Jets}\) is constrained to 2 (red), 3 (green), and 4 (blue). In order to visually facilitate the comparison, all distributions are normalized such that the contents of the second bin (\(N^B_{Jets}\)=2) is normalized to one. As seen, the \(N^B_{Jets}\) distributions are in good agreement with each other, differing by less than 50% at the highest bins, which, given the statistical uncertainty, is less than 1\(\sigma \).

Fig. 6
figure 6

Hemisphere multiplicity plots showing distribution of \(N^B_{Jets}\) while \(N^A_{Jets}\) constrained to 1 (black), 2 (red), 3 (green) and 4 (blue). Events were generated by Pythia (top), Powheg (center) and Madgraph (bottom). Events are selected such that 2\(<H_T<\)2.5 TeV and all jets has \(p_T>\)50 GeV and \(|\eta |<2.8\). The results are practically independent of these three selection criteria. Events are divided into two hemispheres using the Thrust axis. For visual comparison, all distributions normalized such that content of the \(N^B_{Jets}\)=2 bin equals one

The simplistic picture of QCD events as basically a \(2 \rightarrow 2\) back-to-back events holds only at the LO. The NLO and obviously NNLO or higher orders give rise to more complicated pictures with three, four and more outcoming partons. Powheg gives rise to \(2 \rightarrow 3\) events and Madgraph to \(2 \rightarrow 4\). Figure 6 (center) shows the validity of the independence hypothesis for Powheg and Fig. 6 (bottom) shows the same information for MadGraph. As in Pythia, the distributions are in general agreement.

A figure of merit for the overall systematic uncertainty of the method, for each \(H_T\) range, is evaluated by the average weighted deviation, \({\bar{v}}\), of the THm prediction compared to the “data” in simulated QCD samples over all \(N^{Jet}\) bins in the signal region:

$$\begin{aligned} {\bar{v}}(H_T)=\frac{\sum _{j}v_j\frac{1}{\sigma _j^2}}{\sum _{j}\frac{1}{\sigma _j^2}} \end{aligned}$$
(4)

where the index j runs through the twelve combinations of \(N^A_{Jets}\)=2 through 4 and \(N^B_{Jets}\) 3 through 6, \(v_j\) is the deviation (i.e. ratio of the simulated data to the THm prediction given by \(N^A_{Jets}\)=1) at each datapoint and \(\sigma _j\) is the statistical uncertainty. Figure 7 summarizes \({\bar{v}}\) for each generator at different \(H_T\) bins of 500 GeV each. Blue error bars mark the weighted standard deviation of the twelve datapoints in each \(H_T\) bin, where the weight of a datapoint is defined by its statistical uncertainty. All deviations, \({\bar{v}}\), (including errors) are below 25%. A precise quantification of the systematic uncertainty to be associated with the THm would be analysis dependant. Additional study may reduce systematic uncertainties.

Fig. 7
figure 7

Weighted average deviation for each generator for \(H_T\) bins of 500 GeV using jet multiplicities of \(N^A_{Jets}\)=2 through 4 and \(N^B_{Jets}\)=3 through 6. Blue error bars mark the weighted standard deviation of the 12 datapoints

Other methods currently used in multijet analyses estimate a 1 – 130% [9] or 40 – 110% [10] uncertainty using a fit extrapolation technique or a 25% uncertainty using a jet mass template technique [11]. Thus the magnitude of the systematic uncertainties of the THm are comparable to those of conventional methods but since the sources of the uncertainties are very different the procedures compliment one another.

4 Conclusion

The modeling of jet energy and angular distributions of QCD processes in multi-jet events at 13 TeV was compared for three state-of-the-art MC generation strategies. In particular the kinematics of the 3rd jet, mostly affected by NLO and \(4^{th}\) jet, mostly affected by NNLO calculations, was studied. The differences between the three models under study were found to be small. The average number of jets per event (\(\mathinner {\langle {N^{Jet}}\rangle }\)) exhibits a drop from approximately 4.5 jets for events with \(H_T\) about 0.4 \(E_{beam}\) to roughly 3.7 jets for events with \(H_T\)  close to \(1.5~E_{beam}\). Part of this drop is explained by the increase of the relative cross section of \(qq \rightarrow qq\) processes from \(\approx \) 25% of the total cross section at \(H_T\) = 2.5 TeV to 80% at 7 TeV at the expense of a drop of the relative cross section of \(qg \rightarrow qg\) from 50% to 20% and the vanishing of the \(gg \rightarrow gg\) process.

The comparison between the different generator predictions of the 3rd jet \(\text {p}_\text {T}\) in 3 jet events reveals small differences. Pythia’s 3rd jet tends to be a bit more energetic than Powheg and Madgraph. the angle that separates this jet from the transverse thrust direction tends to be slightly smaller in Pythia and larger in Madgraph. No significant difference is noticed while studying the properties of the fourth jet.

A data-driven procedure for estimating the QCD background for multijet final states, i.e. the Two Hemisphere Method (THm) has been proposed. The basic conjecture of this procedure, namely, the independence of the jet multiplicity in one hemisphere on that in the other, has been tested with the three generators and found to be correct within 25–50%. A figure of merit estimating the overall systematic uncertainty by including all jet multiplicities for each generator gives a comparable number, approximately 25% (Fig. 7). We consider these results as encouraging. The attainable sensitivity of a THm analysis is comparable to that of the conventional methods. Since the sources of the uncertainties in this new approach are very different from the current methods the procedures compliment one another.