On the Modelling of Energetic Multi-jet QCD Events

Physics beyond the Standard Model (BSM) may be unveiled by a study of events produced at the LHC with energies above the TeV scale. Such events are dominated by QCD processes, where the calculations of such processes always rely on some sort of approximation. It is important, therefore, to examine the modelling of such events for the sake of achieving a better understanding of QCD and for improving background estimation methods for possible BSM signals. In this note, jet spatial distributions in high energetic multi-jet processes were compared using several state-of-the-art MC event generators. Slight differences were found, mainly in the spatial distribution of a 3rd highest transverse momentum jet. Also, a data-driven technique for the estimation of processes with a final state that contains a large number of jets is proposed. This procedure can predict jet multiplicities up to a precision of ~25\%.


Introduction
Several beyond the standard model (BSM) models, e.g Micro-Black-Holes (mBH) [1,2,3,4] or R-Parity Violation (RPV) Supersymmetry [5,6], predict the possible production at the LHC of events with a large number of outcoming high energy partons . These events will give rise to final states consisting of a high multiplicity of energetic jets, namely, multi-jet events. The identification of this type of signal, through the observation of an excess of energetic High N Jet events, is far from being straightforward due to the presence of large Standard Model (SM) background originating from Quantum Chromodynamics (QCD) processes. The presently available event generators for High N Jet events perform Leading Order (LO), Next to leading order (NLO) or even partial NNLO calculations followed by radiation of additional partons through the Parton Shower (PS) algorithm. The accuracy of these calculations is further limited as some unavoidable approximations must be imposed.
Significant effort has been invested in the study of energetic multi-jet events at the LHC with the ATLAS and CMS collaborations [7,8,9,10]. These studies cope with a major difficulty, namely, with the need to estimate the kinematics of multi-jet events without the usage of simulation. Yet some indirect dependence on simulation always remains. As a first stage in the development of a novel data-driven technique for background estimation, the difference between predictions of several event generators, when energetic High N Jet events are simulated, is presently studied. The description of the technique, as well as its performance using the aforementioned simulations, are the subjects of the latter sections.

Simulation of Multijet Processes
Modeling Multi-jet processes in QCD is a challenging task mainly due to the large number and high complexity of the relevant Feynman diagrams. Generally speaking, there are three main approaches to handle this complexity: The simplest approach is to couple the Leading Order (LO) calculations that give rise to two outcoming partons with a Parton Shower (PS) algorithm that can produce additional jets. While such an approach provides precise physical modelling of di-jet with soft-collinear parton emissions, it is less accurate pertaining to topologies with more than two well separated partons. In spite of its simplicity the LO+PS technique provides a surprisingly good description of the Tevatron and LHC data [11].
A better simulation can be achieved by carrying out the calculations of additional real parton emissions, while neglecting the contribution of virtual corrections. Such an approach can give rise to events with up to four or five jets. In order to simulate higher jet multiplicities a PS algorithm is applied. While such an approach improves the description of final states with more than two well separated partons, the interfacing with a PS algorithm raises the issue of double event counting.
The most rigorous approach is a formal order-by-order perturbative calculation, where each extra order includes diagrams with one more outcoming particle and one more loop in the intermediate state. Next to Leading Order (NLO) calculations can compute the properties of up to 3 jets with one loop corrections. Calculations of higher orders are very resource consuming and are thus limited. As above, events with higher jet multiplicities are simulated by the application of a PS algorithm with proper matching scheme.
For the purpose of comparison between the various event-generators strategies, each approach was used to simulate QCD events at √ s = 13 TeV as outlined below: • For the LO+PS approach events were generated and showered with PYTHIA8.235 [12]. To efficiently cover the large phase-space (from the GeV to TeV scales) the sample was generated in slices of the leading jet's p T with a constant number of events simulated in each slice. The leading jet p T is defined as the p T of the leading reconstructed jet after the PS and is therefore affected by initial and final state radiation (ISR/FSR). Therefore, the generator level parameter ofp min T , i.e. -the cut for the minimum transverse momentum of the outgoing leading parton at generator level (before PS), has to be set to be lower than the leading jet p T used in defining the slices. Optimization studies found that for a sample with the leading jet P T between • For the multi-leg approach events were simulated with the MadGraph5 aMC@NLO v2.6.3.2 [13] event generator using matrix elements calculations for up to four partons at leading order. Events were generated in slices of the total sum of partonic p T (Ĥ T ) covering the entire energy range. The use ofĤ T for defining the slices greatly minimized computation time and was possible due to the multi-leg simulation at generator level. The generated events were fed into PYTHIA8 where the PS algorithm has been applied to all partons, using the CKKW-L merging scheme [14,15], with a merging scale of 80 GeV.
• In the last case of full NLO calculations the POWHEG-BOX v2 framework [16,17,18] has been used to simulate di-jet and three-jet processes [19,20]. For full coverage of the entire energy range, the sample was creating using 350 GeV slices of Powhegk born T (thep T of the underlying Born diagram). Events were showered with PYTHIA8 using the default Powheg NLO merging scheme.
In all cases, the factorization and renormalization scales are set to H T /2 and jets were reconstructed using the anti-kt algorithm [21] implemented in the FastJet 3.2.1 package [22] with a radius parameter value of R = 0.4. All jets were required to satisfy p T > 50 GeV and |η| < 2.8. The CT14 [23] PDF-set was used in all cases. PYTHIA8 was used with the Monash 2013 tune [24].

Event Kinematics
The dependence of the average number of jets per event ( N Jet ) on the total event's transverse energy as quantified by H T 1 , is shown in Figure 1.  Figure 2. Due to their higher "color" charge, gluons tend to radiate more jets than quarks ( Figure 3). Therefore, a smaller fraction of final state gluons entails lower N Jet . However, all processes, including qq → qq, exhibit the same drop in N Jet at high H T , (see Figure 3) presumably due to the running of α s . All three generators predict the same dependance of N Jet on H T , however, the value of N Jet predicted by Pythia and Powheg dijet is consistently higher by 10% than that predicted by Madgraph and Powheg trijet. The difference is attributed to the systematic uncertainties in these calculations, e.g. due to the selection if the QCD scale, and requires further study.
As described above, NLO and multileg calculations are used to generate up to three and four jets respectively. The simulation of higher jet multiplicities is done in all cases using the PS algorithm. Therefore, in order to compare the results of the three different simulation strategies the properties of the third and fourth jet (in p T order) are examined. In Figures 4a and 4b      Similarly, in Figure 5a the distribution of the angular separation between the 3 rd jet and the thrust axis (in 3-jet events) is shown. Figure 5b depicts the same distribution but for the 4 th jet (in 4-jets events). The transverse thrust axis is defined by: Where n is a unit vector and p T j is the transverse momentum vector of the j th jet. Using that definition, the azimuthal angle of the Thrust axis (φ T ⊥ ) w.r.t. the beam direction (ẑ) can be evaluated analytically (j index suppressed to avoid cluttering of notation): where: In 3-jet events the angular separation between the 3 rd jet and the thrust axis in Madgraph tends be larger than that in Pythia, while the same angle in Powheg lies in between.
No significant difference between the three simulation strategies is seen for the same distribution in 4-jet events.   Based on this study one concludes that the differences between the three simulation approaches are small when focusing on the third jet, and negligible when focusing on the fourth one. As mentioned above, for higher jet multiplicities the PS algorithm is used by all three simulation strategies. Therefore, one may deduce that High N Jet events are described in practically the same way by all available QCD event generators.

The Two Hemispheres Method (THm)
In spite of the reasonable agreement between the outcome of the various QCD event generators, their predictions of the cross-section and various shape variables of multi-jet events may be at odds with the measurements. Hence, a data-driven procedure for background estimation is badly needed. A new procedure aimed at acheiving this goal is described hereafter. The starting point for this procedure is similar to that taken by ATLAS and CMS in estimating the QCD background for multijet events [25,26] , namely, that QCD events can approximately be portrayed as beginning with a 2 → 2 process that gives rise to two back-to-back (in the x-y plane) out-coming partons, followed by a parton shower. The comparison between LO, NLO and partial NNLO in the previous section shows that the 2 → 2 picture is not modified significantly by 2 → 3 and 2 → 4 processes. Multijet QCD events are produced, in this picture, in a sequence of a steps: • The 2 → 2 matrix element is used and two partons (taken here to be quarks, gluons or a quark and a gluon) are generated with the proper p T and η distributions; • The parton shower algorithm is applied to each parton and secondary partons may be radiated (or split) off the primary ones. The Parton shower is then iteratively applied to the next generation of partons till no more partons are radiated; • Partons are hadronized and unstable hadrons are let to decay.
In the process's center-of-mass system one can use the plane perpendicular to the initial outgoing partons line of flight to define two hemispheres. In pp collisions theẑ direction (beam axis) is almost information-free, therefore, projecting the event to the (x, y) plane preserves all vital information. The line perpendicular to the transverse thrust axis (defined in eq. 3.2) may be used as the dividing line. Because of momentum conservation and the simplistic assumption that the PS is carried out for each of the partons independently, it is claimed here that, at first approximation, the jet multiplicity in each hemisphere (N A Jets and N B Jets ) are independent of each other. Second order effects (e.g. qg production, ISR etc.) may violate this hypothesis and it is therefor validated in the next subsection.
The conjecture, that N A Jets and N B Jets are uncorrelated, would not be true for a variety of BSM models that give rise to High N Jet final states like, for example, those mentioned in the introduction. The following procedure is suggested to differentiate between High N Jet events arising from QCD background and those arising from one of the hypothetical signals which violates the independence conjecture: • Select High N Jet events having N A Jets =1 (i.e. events with one jet in the first hemisphere). The hypothetical High N Jet signal is unlikely to give rise to such events and, therefore, this sample should be signal-free or at least signal depleted. • Finally, Compare the N B Jets distribution as obtained from the signal free sample with those obtained from the expected signal region (N A Jets > 1) distributions. An excess of events with high N B Jets may be considered as a possible indication for the presence of a signal As discussed, the above procedure relies on the assumption that for QCD the N B Jets distribution is independent of N A Jets . The independence assumption can be tested and validated using QCD simulations by directly comparing N B Jets (N A Jets =1) distribution with those of N B Jets (N A Jets =i) where i=2,3,4.. . Such a comparison is shown in Figure 6a using LO events generated by Pythia in the H T region of 2 < H T < 2.5 TeV. The black markers indicate the distribution of N B Jets while N A Jets is constrained to 1. The colored markers indicate the distribution of N B Jets while N A Jets is constrained to 2 (red), 3 (green), and 4 (blue). In order to visually facilitate the comparison, all distributions are normalized such that the contents of the second bin (N B Jets =2) is normalized to one. As seen, the N B Jets distributions are in good agreement with each other, differing by less than 50% at the highest bins, which, given the statistical uncertainty, is less than 1σ.
The simplistic picture of QCD events as basically a 2 → 2 back-to-back events holds only at the LO. The NLO and obviously NNLO or higher orders give rise to more complicated pictures with three, four and more outcoming partons. Powheg gives rise to 2 → 3 events and Madgraph to 2 → 4. Figure 6b shows the validity of the independence hypothesis for Powheg and Figure 6c shows the same information for MadGraph. As in Pythia, the distributions are in general agreement.
A figure of merit for the overall offset of the signal region (i.e. N B Jets distributions of higher N A Jets ) from the proposed data-driven THm prediction (i.e. N B Jets distribution of N A Jets =1) may be obtained by taking the weighted average offset of all bins in the signal region: where the index j runs through the twelve combinations of N A Jets =2 through 4 and N B Jets 3 through 6, and v j and σ j are the offset (i.e. ratio to the THm prediction given by N A Jets =1) and the statistical uncertainty of each datapoint respectively. Figure 7 summarizes the offset from the THm for each generator at different H T bins of 500 GeV each. Uncertainties mark the weighted standard deviation of the twelve datapoints in each H T bin, where the weight of datapoint is defined its statistical uncertainty. All offsets are below 20%.
The accuracy of the QCD background estimation can presumably be improved by additional study of the independence violating effects.

Conclusion
The modeling of jet energy and angular distributions of QCD processes in multi-jet events at 13 TeV was compared for three state-of-the-art MC generation strategies. In particular the kinematics of the 3 rd jet, mostly affected by NLO and 4 th jet, mostly affected by NNLO calculations, was studied. The differences between the three models under study were found to be small and insignificant for most purposes. The average number of jets per event ( N Jet ) exhibits a drop from approximately 4.5 jets for events with H T about 0.4 E beam to roughly 3.7 jets for events with H T close to 1.5 E beam . Part of this drop is explained by the increase of the relative cross section of qq → qq processes from ≈ 25% of the total cross section at H T =2.5 TeV to 80% at 7 TeV at the expense of a drop of the relative cross section of qg → qg from 50% to 20% and the vanishing of the gg → gg process. The comparison between the different generator predictions of the 3 rd jet p T in 3 jet events reveals small differences. Pythia's 3 rd jet tends to be a bit more energetic than Powheg and Madgraph. the angle that seperates this jet from the transverse thrust direction tends to be slightly smaller in Pythia and larger in Madgraph. No significant difference is noticed while studying the properties of the fourth jet.
A data-driven procedure for estimating the QCD background for multijet final states, i.e. the Two Hemisphere Method (THm) has been proposed. The basic conjecture of this procedure, namely, the independence of the jet multiplicity in one hemisphere on that in the other, has been tested with the three generators and found to be correct within 25 -50%. A figure of merit estimating the systematic uncertainty by including all jet multiplicities for each generator gives a comparable number, approximately 25%. We consider these results as encouraging. The attainable sensitivity of a THm analysis is comparable to that of the conventional methods. Since the sources of the uncertainties in this new approach are very different from the current methods the procedures compliment one another.