Modelling and computational improvements to the simulation of single vector-boson plus jet processes for the ATLAS experiment

This paper presents updated Monte Carlo configurations used to model the production of single electroweak vector bosons (W, Z/γ∗) in association with jets in proton-proton collisions for the ATLAS experiment at the Large Hadron Collider. Improvements pertaining to the electroweak input scheme, parton-shower splitting kernels and scale-setting scheme are shown for multi-jet merged configurations accurate to next-to-leading order in the strong and electroweak couplings. The computational resources required for these set-ups are assessed, and approximations are introduced resulting in a factor three reduction of the per-event CPU time without affecting the physics modelling performance. Continuous statistical enhancement techniques are introduced by ATLAS in order to populate low cross-section regions of phase space and are shown to match or exceed the generated effective luminosity. This, together with the lower per-event CPU time, results in a 50% reduction in the required computing resources compared to a legacy set-up previously used by the ATLAS collaboration. The set-ups described in this paper will be used for future ATLAS analyses and lay the foundation for the next generation of Monte Carlo predictions for single vector-boson plus jets production.


Introduction
The production of a single electroweak vector boson (W or Z/γ * ) in association with jets, collectively referred to as V + jets, is a fundamentally important process at the Large Hadron Collider (LHC) [1]. Measurements of electroweak bosons in the leptonic decay modes provide a clean experimental signature within the ATLAS detector [2] to test the electroweak sector of the Standard Model (SM) and perturbative quantum chromodynamics (QCD) in high-multiplicity final states. In addition, V + jets production is a significant irreducible SM background for a diverse range of analyses within the LHC physics programme: Higgs boson [3,4] and top quark measurements [5,6], and searches for new physics phenomena beyond the Standard Model (BSM) [7][8][9]. These analyses often rely on Monte Carlo (MC) simulation-based predictions for background estimation, and therefore an accurate description of the V + jets process with sufficient statistical precision across a wide phase space with many final-state jets is required. Consequently, multi-leg MC generators that combine multiple exclusive jet emissions with the highest possible perturbative accuracy in the strong and electroweak couplings are preferred.
In recent years, the statistical precision of V + jets MC samples has become an increasingly limiting factor for a broad array of ATLAS data analyses. This is primarily due to the long event-generation time of multi-leg MC predictions with high perturbative accuracy that is demanded by the ATLAS physics programme. The relatively large V + jets production cross-section and the rapidly growing integrated luminosity of the proton-proton collision dataset require MC event generators to provide enormous simulated samples, which results in a significant computational challenge for experiments. Current analyses exploring high cross-section regions of phase space already rely on simulated samples that contain many billions of events. Furthermore, the wide range of ATLAS physics searches continue to explore V + jets processes in increasingly extreme regions of phase space, with cross-sections that are many orders of magnitude lower than the inclusive cross-section. In order to meet the statistical demands of these analyses, specialized techniques are required to populate their selection regions. Meeting all of these requirements has historically resulted in a challenging workflow with computing demands that continue to stretch existing resources. With finite computational resources and a fixed computing budget for the High Luminosity LHC (HL-LHC) [10], fast and efficient MC event generation has already been identified as a potential bottleneck for the large integrated luminosity expected during the HL-LHC runs [11,12].
The ATLAS collaboration has relied on a multi-leg Sherpa 2.2.1 [13,14] V + jets prediction for the past several years, which has been used for a range of measurements and new-physics searches. This configuration, hereafter referred to as the 2.2.1 set-up, has been substantially extended to meet the statistical needs of these analyses. It required significant resources to produce, due to the long event-generation time, inefficient heavy-flavour filtering techniques, and a relatively large fraction of negative MC event weights. Moreover, many analyses that probe regions of phase space with poorly understood predictions required complex data-driven corrections to their background estimates. The recent efforts highlighted in this paper have focused on improving both the -1 -JHEP08(2022)089 theoretical and computational aspects of the widely used Sherpa V + jets predictions in ATLAS. In addition, an alternative multi-jet merged next-to-leading-order (NLO) Mad-Graph5_aMC@NLO+Pythia [15,16] configuration with similar perturbative accuracy to the Sherpa prediction has been produced for use by ATLAS for the first time. This configuration is used for an independent comparison with the Sherpa predictions.
The modelling improvements have focused on the level of agreement of the Sherpa 2.2.1 predictions with data using a series of well motivated theoretical changes. It is known that cross-sections predicted by the Sherpa 2.2.1 configuration disagree with the data in several important regions of phase space. In particular, those dominated by extremely energetic jets significantly overestimate the measured cross-section, requiring sophisticated data-driven corrections [17,18]. Moreover, measurements of the transverse momentum of Drell-Yan lepton pairs show artificial discontinuities around the matrix-element and parton-shower merging scale [19]. Such discontinuities can arise from different cross-sections predicted by the matrix element and parton shower at the merging scale. This paper begins to address these issues through well-motivated theoretical changes including the inclusion of higherorder QCD and electroweak (EW) corrections, a dedicated scale-setting procedure for topologies with unordered parton-shower histories, modified Catani-Seymour parton-shower splitting kernels, and a careful treatment of input electroweak parameters. Comparisons of these changes, along with the newly available MadGraph5_aMC@NLO+Pythia predictions, are performed with publicly available ATLAS and CMS data to highlight the changes.
The computational resources required for large-scale production of any multi-jet merged MC prediction that meets the mentioned theoretical demands is significant. Furthermore, the techniques used to populate the low cross-section regions of phase space can significantly exacerbate the problem. In the Sherpa 2.2.1 configuration, a discrete phase-space slicing technique was used, which resulted in artificial features in the distribution of event weights that adversely impacted the statistical precision of the sample. In addition, the relatively large fraction of MC events with negative weights in the high cross-section regions of phase space, coupled with the long event-generation time, made it difficult to provide a simulated dataset with enough generated luminosity to match the collected data. Therefore, the focus of the computational improvements in this paper has been to reduce the per-event generation time, and the fraction of MC events with negative weights, whilst simultaneously including higher-order EW and QCD corrections. Moreover, a continuous statistical enhancement method is introduced in order to more efficiently populate the low cross-section regions of phase space. A detailed assessment of the CPU resources of the Sherpa configuration is shown, along with approximations that reduce the per-event generation time. In addition, the performance of various statistical enhancement methods is reported and is shown to improve on the approach adopted for the 2.2.1 configuration.
This paper is organized as follows. The Sherpa and MadGraph5_aMC@NLO+Pythia configurations are described in section 2. The theoretically motivated improvements are shown in section 3, in which comparisons with unfolded ATLAS and CMS data are performed where relevant. Section 4 is devoted to computational improvements, which include a characterization of the per-event CPU time consumption, an evaluation of the JHEP08(2022)089 phase-space slicing and statistical enhancement techniques, and a real-world example of the impact that the aforementioned computational improvements have had on computing resources. Finally, conclusions are given in section 5.

Generator configurations
This section describes the Sherpa and MadGraph5_aMC@NLO+Pythia configurations and software versions used throughout the paper. An extensive software suite [20] is used for the generation of simulated MC samples shown in this paper. All predictions are normalized to the cross-sections obtained from the generator. Particular attention is given to settings that are highlighted in the modelling and numerical improvements in sections 3 and 4, respectively. For settings not mentioned below, the default values from the event generator are used.

Sherpa
A legacy Sherpa 2.2.1 configuration is included in this paper as a reference point, against which various improvements are quantified. Further performance studies of this set-up can be found in ref. [14]. An improved configuration is described below and is referred to as the Sherpa 2.2.11 configuration.
Both configurations are simulated with the Sherpa [13] MC generator. In this set-up, NLO-accurate matrix elements for up to two jets, and LO-accurate matrix elements for up to five (four) jets in the five-flavour scheme are calculated with Comix [21] for the 2.2.11 (2.2.1) configuration. The b-and c-quarks are treated as massless at matrix-element level and massive in the parton shower. Virtual QCD corrections for matrix elements at NLO accuracy are provided by OpenLoops [22][23][24][25]. The Hessian NNPDF3.0nnlo [26,27] PDF set is used for the 2.2.11 configuration, whilst the 2.2.1 configuration used the MC replica version of NNPDF3.0nnlo. In the Sherpa Z + jets configurations the invariant mass of the charged leptons is required to be larger than 40 GeV, while the charged lepton plus neutrino invariant mass is required to be larger than 2 GeV for W + jets final states.
The 2.2.11 Z + jets configuration uses an electroweak input scheme specified by m Z , the QED coupling constant evaluated at the Z resonance, α(m Z ), and the effective weak mixing angle sin 2 θ eff [28], while the W + jets configuration uses the so-called G µ scheme specified by the Fermi constant G µ , m Z and m W . Remaining EW parameters are derived from these input parameters using tree-level relations. Further discussion and comparisons of these EW input schemes are presented in section 3.2. For the 2.2.1 configuration, all V + jets processes use an effective EW input scheme in which the masses of the W , Z and Higgs boson are taken as inputs, alongside sin 2 θ eff , α(m Z ), the vacuum expectation value (v), and Higgs self-coupling constant (λ). Such an input scheme allows certain tree-level relations between some parameters to be violated. Electroweak virtual corrections are included in the 2.2.11 configuration, and are available as an alternative generator weight based on the NLO EW virt approach [29,30]. The effect of these corrections using an additive, multiplicative, or exponentiated combination scheme for higher-order QCD and EW terms is explored in section 3.1. QED radiation to electron, muon, hadron and τ -lepton -3 -

JHEP08(2022)089
of the calculation. This scale scheme reduces the per-event generation time as shown in section 4.1. It is used in the Sherpa 2.2.11 configuration only.
All MC@NLO processes are affected by negative weights. The presence of negative weights degrades the statistical precision of a sample compared to the one without negative weights, and therefore it is desirable to reduce the fraction of negative weights as much as possible. In the Sherpa 2.2.11 configurations, the MC@NLO matching does not account for subleading gluon spin and colour correlations known to have small effects on final-state observables [42]. This results in an overall reduction in the fraction of negative weights, while preserving the essential features of the matched prediction [43]. In addition, the LO contributions are reweighed with a differential NLO/LO K-factor using a 2 → 2 core process obtained through an inverse parton-shower clustering. The clustering is first performed in an inclusive mode where both QCD and EW vertices are allowed to be clustered; if the inclusive clustering fails to find a suitable process to evaluate the NLO/LO K-factor, it is re-performed in an exclusive mode where the clustering of EW vertices is disabled [44]. This scheme is referred to as the inclusive+exclusive clustering mode. For the 2.2.1 set-up, the highest 2 → n NLO QCD process in the inclusive mode is used as the core process. Using the 2 → 2 core process improves the statistical power of the MC sample in high cross-section regions of phase space as shown in section 4.1.
Control of the phase-space sampling is achieved via either a discrete slicing or continuous phase-space biasing technique. In the case of the Sherpa 2.2.1 configuration, the sample is divided into discrete max(H T , p V T ) regions (slices), with boundaries at [0, 70, 140, 280, 500, 1000, 6500] GeV. The H T variable is defined as the scalar sum of all partonic jet transverse momenta and p V T is the transverse momentum of the lepton pair. The Sherpa 2.2.11 configuration was statistically enhanced in the max(H T , p V T ) variable using an analytic enhancement technique. Section 4.2.1 discusses the strengths and weaknesses of different enhancement techniques in terms of statistical performance.
QCD scale uncertainties have been evaluated on-the-fly using 7-point variations of the factorization and renormalization scales coherently in the matrix elements and the parton shower [45]. The scales are varied independently by factors of 0.5 and 2 while avoiding cases where the factorization and renormalization scales move in opposite directions. PDF uncertainties for the nominal NNPDF3.0nnlo PDF set are evaluated using the 100 variation replicas in case of the 2.2.1 set-up and using the Hessian eigenvector set for the 2.2.11 set-up, as well as ±0.001 shifts of α S around the nominal value of 0.118. Additionally, the central values of the CT18nnlo [46] and MSHT2020nnlo [47] PDF sets are included as additional generator weights.
A summary of the Sherpa configurations is shown in table 1.

MadGraph5_aMC@NLO+Pythia
An additional prediction using the MadGraph5_aMC@NLO 2.6.5 [15] program is used to generate weak bosons with up to three additional partons in the final state at NLO accuracy in the strong coupling. The renormalization and factorization scales are set to one half the transverse mass of all final state partons. The accuracy on the absolute value of the cross section obtained during the phase-space integration is required to be less than 0.0001.
-5 -JHEP08(2022)089 The showering and subsequent hadronization is performed using Pythia 8.240 [16] with the A14 tune [48], and the NNPDF2.3lo PDF set [49] with α S = 0.130. The different jet multiplicities are merged using the FxFx NLO matrix-element and parton-shower merging prescription [50]. The sampled phase-space is biased using a function equal to the square of sum of final state momenta (i.e leptons plus partons). Heavy-flavor hadron decays are handled by EvtGen 1.7.0 and QED radiation is added by Pythia 8. The inclusive fraction of generated events with negative weights is approximately 17%, and reaches 35% for p V T and H T phase spaces above 100 GeV. The MadGraph5_aMC@NLO+Pythia calculation uses a five-flavour number scheme with massless b-and c-quarks at the matrix-element level, and massive quarks in the Pythia 8 shower. At the event-generation level, the jet transverse momentum is required to be at least 10 GeV, with no restriction on the absolute value of the jet pseudorapidity. The PDF set used for event generation is NNPDF3.1nnlo supplemented with the LUXqed photon PDFs using α S = 0.118 [51]. The merging scale is set to Q cut = 20 GeV. Scale variations where the renormalization and factorization scales have been varied independently by a factor of 2 or 0.5 in the matrix element are included as alternative generator event weights [52].

Cross-sections
The predicted cross sections from the Sherpa and MadGraph5_aMC@NLO+Pythia event generators for the configurations described above are shown in  Graph5_aMC@NLO+Pythia set-ups, while the inclusive NNLO PDF uncertainty is computed using the 68% C.L. MSTW2008nnlo PDF eigenvector error sets. The cross-sections predicted by the different configurations can differ for numerous reasons, including the perturbative accuracy of the set-up, the choice of EW input scheme, and the choice of PDF set. The cross-section from the 2.2.11 configuration is larger than that from the Sherpa 2.2.1 configuration due to the inclusion of the 5j@LO matrixelements. All cross-sections are larger than the inclusive NNLO QCD value since both the Sherpa and MadGraph5_aMC@NLO+Pythia configurations include contributions beyond an inclusive NNLO QCD calculation. The scale uncertainties in the Sherpa 2.2.11 set-up include variations in the 5j@LO matrix element as well as coherent variations in the parton shower, which together lead to a ∼40% increase in the total inclusive scale uncertainty relative to the Sherpa 2.2.1 configuration. All results presented in this paper are normalized to the respective event generator's cross-section prediction.

Theoretically motivated improvements
This section reports on the theoretical changes that are included in the Sherpa 2.2.11 configuration. They include four major topics: the inclusion of higher-order QCD and electroweak corrections, an electroweak input scheme for Z + jets events that specifies the weak mixing angle, specialized treatment of the dipole operators for the Catani-Seymour splitting kernels, and a special treatment of unordered parton-shower histories. These changes are described below and the performance of the Sherpa 2.2.11 and Mad-Graph5_aMC@NLO+Pythia configurations is compared with that of the 2.2.1 configuration. Rivet [54] is used to compare the predictions with previously published ATLAS or CMS data to highlight the theoretical changes.

Electroweak virtual corrections
Electroweak corrections to single boson production can play a significant role in regions of phase space where the boson is produced with large transverse momentum, owing to the appearance of EW Sudakov logarithms of the form log(s/m 2 V ) [30]. These corrections are crucial for an accurate description of single vector bosons and will play an important role for BSM searches probing high-p V T phase-space regions. The full electroweak calculation contains contributions from virtual loops and from real emission of gauge bosons. The -7 -JHEP08(2022)089 Sherpa 2.2.11 configuration presented in this paper only contains the virtual-loop terms, since applying the full NLO EW corrections is not currently possible in a multi-jet merged set-up. Moreover, the real emission of a gauge boson leads to a fundamentally different experimental signature that is typically produced in separate samples.
Amplitudes for the electroweak virtual corrections are calculated by OpenLoops and implemented in the Sherpa event generator for multi-jet merged set-ups. Results from MEPS@NLO merged set-ups with up to two additional jets at next-to-leading order in QCD for selected processes are shown in ref. [30]. In this paper, electroweak virtual corrections are included in the multi-jet merged set-ups with 0-2j@NLO+3-5j@LO QCD accuracy, and their effects are demonstrated relative to the NLO QCD-only prediction for all leptonic decay modes of the W and Z bosons. Moreover, multiple NLO QCD and NLO EW combination schemes are compared, and these are stored as alternative generator weights.
The cross-section calculated at NLO QCD and NLO EW accuracy for a 2 → n final state, where the higher-order terms in QCD and EW couplings are summed (additive scheme), can be expressed as: where σ n,LO denotes the Born-level cross-section and ∆σ NLO n,QCD the NLO QCD corrections to the Born process. The ∆σ NLO n,EW virt term only contains the virtual-loop components, and in the NLO EW virt approximation these are specified by [30]: where Φ n represents the phase space of the n-particle final state, B n,mix are the mixed QCD-EW Born contributions with order α n−1 S and α 3 in the strong and electroweak coupling, respectively, V n,EW is the exact one-loop virtual correction to the Born process with order α 3 in the electroweak coupling, and I n,EW is the approximated subtraction term used to render the above cross-section finite. The approximated electroweak virtual corrections in eq. (3.1) are denoted by NLO EW virt in this paper. They were developed and shown to be in good agreement with the full NLO EW calculation in ref. [30] up to a few percent, with the exception of phase spaces dominated by real emission contributions such as the off-shell region of the m invariant mass distribution. Such phase-spaces are dominated by real QED emissions from charged leptons that are not included in the NLO EW virt calculations, and instead supplemented by dedicated samples. In the results shown below, the one-loop contributions are included for final states with up to two extra jets.
Two additional NLO QCD and NLO EW combination schemes are provided by the Sherpa framework. The first is the multiplicative scheme, which is defined as: The second scheme is referred to as the exponentiated scheme, and is defined as:  The multiplicative and exponentiated schemes are motivated by the factorization of the QCD and EW Sudakov corrections at high energies [55].
Comparisons of the NLO QCD-only prediction with and without the virtual electroweak corrections are shown in figure 1 using the Sherpa 2.2.11 configuration. The corrections reach 30% around p V T ∼ 1 TeV and qualitatively follow the expected logarithmic behaviour. The relative size of the corrections is in good agreement with those obtained from the fixed-order NLO EW calculations when including corrections for up to two additional jets at next-to-leading order [30]. The multiplicative and exponentiated contributions show the largest deviations from the NLO-only prediction and are in good agreement with each other. The difference between the additive and multiplicative or exponentiated predictions is interpreted as a source of systematic uncertainty for missing higher orders resulting from mixed QCD-EW contributions, and around p V T of 1 TeV the uncertainty is approximately 10% for all final states. Future measurements of vectors bosons produced with large transverse momentum will be crucial in testing these predictions, in addition to understanding NLO EW effects overall.

Electroweak input scheme
An EW input scheme defines the set of input parameters which are use to derive any remaining EW parameters. Some choices of EW scheme can result in values that differ from experimentally determined values. In particular, using tree-level relations in the G µ scheme the weak mixing angle is found to be: which disagrees with the experimentally determined value of sin 2 θ eff =0.23113±0.00014 [56]. Although the G µ scheme is an attractive choice for Z + jets (Drell-Yan) processes at hadron colliders because it absorbs a large part of the radiative corrections in the LO prediction [28], this level of discrepancy causes measurable differences in the forward-backward asymmetry of leptons from Z boson decays and the τ -lepton polarization as highlighted below. The sin 2 θ eff scheme by construction ensures the correct value of sin 2 θ eff since it is specified as one of the input parameters. The sin 2 θ eff scheme based on ref. [28] was implemented by the OpenLoops and Sherpa authors [30] throughout the course of the work presented in this paper. It is used for the Sherpa 2.2.11 Z + jets configuration for both the NLO QCD and NLO EW virt calculations. In this scheme, the derived W boson mass differs from its experimentally measured value, but for single Z boson production this has negligible impact on the physics performance.

Forward-backward asymmetry
The vector and axial-vector couplings in the pp → Z → process lead to a forwardbackward asymmetry (A FB ) in the polar angle of the lepton with respect to the incoming quark direction in the rest frame of the dilepton system. This forward-backward asymmetry has been measured as a function of the dilepton invariant mass by ATLAS in √ s = 7 TeV proton-proton collisions and used to extract the weak mixing angle [57]. The measured asymmetry values are used to test the performance of the G µ and sin 2 θ eff schemes using the Sherpa 2.2.11 configuration.
The forward-backward asymmetry is most sensitive to the weak mixing angle close to the Z boson's pole mass since the size of A FB is proportional to sin 2 θ W at m ∼ m Z . As shown in figure 2, with NLO QCD accuracy the G µ scheme leads to an overestimate of A FB near the Z-pole at the 1σ-2σ level. The inclusion of EW corrections provides a better overall description of the data in the G µ scheme. The sin 2 θ eff scheme provides a good description of the data with and without EW corrections, with a level of agreement with the data similar to that achieved with the NLO QCD+EW virt G µ scheme. The reduced dependency on EW corrections makes the sin 2 θ eff scheme preferable to the G µ scheme for Drell-Yan production.

Polarization of τ -leptons
Similarly to the forward-backward asymmetry, the polarization of τ -leptons in Z/γ * → τ τ decays is sensitive to the value of sin 2 θ eff since the Z boson couples differently to lefthanded and right-handed τ -leptons due to the chiral coupling asymmetry. The expected  τ polarization for on-shell Z bosons is approximately −0.145 as obtained from precision measurements of the effective weak mixing angle at LEP [58], and more recently it was measured by ATLAS in proton-proton collisions at √ s = 8 TeV in Z → τ τ events and found to be −0.14 ± 0.02 (stat) ± 0.04 (syst) [59].
The polarization (P ) of τ -leptons is extracted from a fit to the visible momentum fraction, x π = | p(π)|/| p(τ )| in τ → πν decay, using the following semi-analytic formula [60]: Simulated events with two opposite-sign final-state τ -leptons with an invariant mass of 66 < m τ τ < 116 GeV are selected. The distribution of x π normalized to unity and corresponding fit results of eq. (3.2) for the G µ and sin 2 θ eff schemes are shown in figure 3. The outlier point at low x π arises from the finite pion mass and is excluded from the fit. The polarization of τ -leptons in the G µ scheme is found to be (−20.2 ± 0.9)% at NLO in QCD and (−17.6 ± 0.9)% when including additive NLO EW virt corrections. In the sin 2 θ eff scheme, a τ polarization value of (−14.4 ± 0.9)% is extracted and is insensitive to EW virtual corrections. Given this value's better agreement with the SM prediction and the measured values from both LEP and ATLAS, and the reduced dependency on EW corrections, the sin 2 θ eff scheme is preferred.

Modified Catani-Seymour splitting kernels
The splitting kernels in the Catani-Seymour [61] based parton shower of Sherpa are known to exhibit non-singular corrections which can alter the radiation pattern significantly in the regions of phase space where the matrix element is not logarithmically enhanced. While a large part of these regions is covered by the hard matrix elements in an NLO-matched and multi-jet merged set-up, some part of it will always be populated by the parton shower. In addition, the radiation pattern of the shower also affects the Sudakov suppression in the hard region dominated by matrix elements.
In the Sherpa 2.2.11 configuration, the parton-shower splitting kernels are modified to better reflect the known soft limits of the hard matrix elements. The kinematic parameterization of an emission in the collinear kinematic region is modified to match the partial-fractioned soft eikonal part to the collinear limit. The modified splitting kernels can be found in appendix A. These changes are made consistently in both the parton shower and the NLO fixed-order and NLO matched calculations, and mainly affect the physics in the parton shower, thus being relevant for p V T around and below the merging scale (20 GeV). No significant effect was observed in the fraction of negative event weights due to these modifications.
The Sherpa 2.2.11 set-up contains the modified splitting kernels described above, while the 2.2.1 configuration does not and can be used to assess the modelling changes.

JHEP08(2022)089
Predictions from these set-ups are compared with a measurement of the transverse momentum p T of Drell-Yan lepton pairs in the mass range 66-116 GeV [19]. The measurement used 36.1 fb −1 of 13 TeV proton-proton collision data collected by the ATLAS experiment. The predictions from the Sherpa 2.2.11 and 2.2.1 configurations, along with the Mad-Graph5_aMC@NLO+Pythia FxFx configuration, are shown in figure 4 as a function of the unit normalized measured p T and φ * η [19] variables. The latter variable is constructed from only the angular separation of the leptons, which reduces the impact of the lepton momentum resolution at low-p T and depends only on the well-measured directions of the leptons. The red uncertainty band is shown on the NLO QCD prediction for the The new splitting kernels used in the Sherpa 2.2.11 configuration significantly improve the description of the data below and around the merging scale (p T ∼ 20 GeV). The artificial discontinuity close to the merging scale is reduced, representing a closer match of cross-sections predicted by the matrix element and parton shower at the merging scale. As a result of these changes, the uncertainty due to variations of the merging scale is reduced by a factor of two compared to the Sherpa 2.2.1 set-up and contributes less than 5% around the merging scale. For intermediate-p T and large φ * η values, the predicted cross sections of the Sherpa 2.2.11 configuration is reduced by approximately 10% compared to the Sherpa 2.2.1 prediction. This is primarily a consequence of the scale changes for unordered histories discussed in the next section that reduce the cross sections for events with many jets. While the central value disagrees with the measured data points in the intermediate-p T and large φ * η regions, the predictions are in agreement within the scale uncertainties shown in red and the significant over-prediction observed in the Sherpa 2.2.1 for the highest p T values measured is reduced. The NLO EW virt prediction reduces the cross-section further in the high p T and φ * η regions, as expected from figure 1. The uncertainties from different NLO EW virt schemes is small for the majority of the phase-space, indicating good agreement between the three NLO EW virt predictions.
The MadGraph5_aMC@NLO+Pythia FxFx prediction uses a different subtraction technique based upon the FKS method [62][63][64] and exhibits fundamentally different modelling behaviour than the Sherpa predictions in the low p T phase-space. Below the merging scale of 20 GeV, the MadGraph5_aMC@NLO+Pythia prediction models the data well, but around the merging scale it disagrees with the data in the opposite direction relative to the Sherpa configurations. For the intermediate and high-p T regions, the MadGraph5_aMC@NLO+Pythia prediction agrees very well with the Sherpa 2.2.11 NLO QCD prediction.

Scale-setting scheme for unordered histories
The MEPS@NLO algorithm used by the Sherpa event generator matches fixed-order perturbative calculations to parton showers by probabilistically identifying parton-shower to-   pologies through a k T -type clustering algorithm [35]. In this approach, the first or hardest scale at which the parton shower starts needs to be identified. This is done by running the shower and clustering backwards to a core process that cannot be further decomposed. The strong coupling associated with each QCD branching in the parton-shower tree is then evaluated at the nodal k T of the branching, and each propagator in the tree is dressed by a corresponding Sudakov factor. Such a process typically leads to a sequence of m nodal values whose k T scales increase with each additional clustering step: which leads to small Sudakov factors counteracting large values of the running strong coupling. However, in extreme regions of phase space, this scale hierarchy in eq. (3.3) may be violated. A prime example would be jets of similar transverse momentum that are widely separated in rapidity but not separated in the k T scale used in the clustering. Such configurations would also be accompanied by Sudakov-type suppression factors, but these specific contributions are not taken into account in standard parton showers because they correspond to an evolution in rapidity. Due to the absence of the suppression factor, potentially large enhancements from evaluating the strong coupling at small transverse momenta can then not be compensated for, and the differential cross-section, together with its perturbative uncertainty, becomes uncontrollably large. The strong coupling associated with the production of jets in this 'unordered' region of phase space must therefore not be evaluated at the nodal k T scales of the jet clustering. Instead it should be computed at a global hardness scale associated with the event.

JHEP08(2022)089
The Sherpa 2.2.11 configuration uses a new algorithm to handle these cases, where unordered histories are no longer used in the scale-setting algorithm. If the k T scale decreases during the clustering, the largest previously identified scale will be used. In comparison to previous predictions from the Sherpa generator, this typically leads to reduced crosssections and smaller uncertainties in regions of the phase space that contain multiple jets of similar hardness. In the reverse clustering procedure, nodes containing EW particles are only included if they involve a strongly interacting particle (e.g. u-quark clustered with W , but not e clustered with ν). The virtuality of the EW propagators is accounted for during the clustering.
The Sherpa 2.2.11 configuration includes the changes described above, while the The W + jets data is based upon a CMS measurement of the differential cross-sections for the associated production of a W boson and jets in 2.2 fb −1 of proton-proton collisions at √ s = 13 TeV [66]. The predicted cross-section for events with at least three jets is significantly reduced and are in agreement with the data. In the old scheme, the predicted cross-section overestimated the data by up to 50% for jets produced with momentum greater than 300 GeV. In the new scheme, the rates are considerably reduced and in better agreement with the data. This improved agreement with the data for highly energetic jets is at the expense of a higher predicted cross section for events with low jet transverse momentum, and is observed to overshoot the data by as much as 15% as shown in figure 5 (b). The NLO EW virt corrections reduce the QCD-only prediction by ∼10% for large jet momentum and improve agreement with data.
For the Z + jets comparisons, the data is based upon an ATLAS measurement of the production cross-section of a Z boson in association with jets in 3. 16   Looking towards the HL-LHC era [10], this is only possible if a reduction of the per-event generation time is achieved and smart sampling techniques are used that do not adversely impact the statistical performance of the generated events. The following section addresses these issues using V + jets processes as a case study.

Speed characterization
The time it takes to generate a single event, or the per-event CPU cost, is highly dependent on the configuration of the MC generator: for example, the perturbative accuracy of the cross-section in both the strong and electroweak couplings, the number of extra partons at matrix-element level, the matrix-element to parton-shower matching algorithm, and the scale-setting algorithm. Moreover, additional scale and PDF predictions used to evaluate sources of theoretical uncertainty are routinely stored in the nominal sample as alternative -17 -

JHEP08(2022)089
generator weights. Depending on the number of additional predictions and the methodology with which one evaluates the variation weight, significant CPU cost can be incurred. This section quantifies the per-event CPU cost of configurations with varying perturbative accuracy and Sherpa-specific scale-setting and clustering algorithms, collectively referred to as additional features. The pp → e ± e ∓ + jets process is used as a representative process, and events are generated using Sherpa 2.2.11. Additionally, a comparison between the MadGraph5_aMC@NLO+Pythia FxFx merged set-up and Sherpa is included, where a dedicated Sherpa configuration ensures that the same perturbative accuracy and a similar scale-setting algorithm is used in the two event generators.
Benchmarking of the MC configurations takes place on a dedicated machine that has no additional computing load. The machine utilizes an Intel(R) Xeon(R) E5-2667 v2 CPU with a base 3.30 GHz clock speed and a 4.0 GHz boost capability under single-core operation, which has 2 sockets with 8 cores each (16 physical CPU cores). A Western Digital Black 7200 rpm 2 TB hard-drive (HDD) is used as non-volatile memory, whilst 4× 8192 MB Samsung DDR3 with 1600 million transfers per second (MT/s) (Model: M393B1G70QH0) is used for dynamic volatile memory. The corresponding machine is benchmarked using the HEP-SPEC2006 (HS06) standard [68], a real application-level suite of seven C++ [69] benchmarks derived from the SPEC CPU2006 standard, that tests integer and floatingpoint operational performance of the host machine. The HS06 score is obtained with hyperthreading disabled, and all physical processor cores configured as independent single-core nodes with HEP-SPEC2006 compiled with gcc in 32-bit mode. The corresponding HS06 score for each CPU second was found to be 20.97 HS06 seconds. It should be noted that platform dependencies, single-core versus multi-core mode operation, and configuration variations of the CPU can yield a variance of the HS06 score of approximately 20%.
Event generation times in seconds are monitored event-by-event, where care is taken to exclude any initialization steps associated with the software application. Two key classes of metrics are obtained: the mean CPU time per event and associated standard deviation as extracted from the sampled events, and the median per-event time in addition to the ±25% quantiles. All CPU times are then converted to the HS06 standard using the score 1 second = 20.97 HS06 seconds.

Perturbative accuracy
The perturbative accuracy class of set-ups quantifies the effect of increasing perturbative accuracy in the strong coupling constant. Within this class of set-ups, three distinct classes exist: pure LO, pure NLO, and mixed NLO+LO. Set-ups with an increasing number of extra partons produced at matrix-element level, ranging from zero to five extra partons, are studied.
The event generation HS06 cost per event for the perturbative accuracy set-ups are shown in figure 7. The LO set-ups are considerably faster than the NLO set-ups, partly because of the smaller number of diagrams that need to be sampled, but also due to the higher complexity of the multi-jet merging algorithms when applied to NLO vs. LO events. In the class of NLO set-ups, the results are categorized as a function of the number of NLO emissions and compared between pure NLO set-ups and one with up to five additional jets at LO accuracy. For the 0j@NLO set-up, the pure NLO set-up is extremely fast and on a similar scale to LO set-ups with a similar number of final-state partons. When adding in the 5j@LO legs to the 0j@NLO set-up, the execution time increases and is roughly of the same order of magnitude as the pure 5j@LO set-up. The fact that the 0j@NLO+1-5j@LO set-up is faster than the pure 5j@LO set-up is attributed to the ∼15% increase in the cross section for the 2 → 2 process when going from LO to NLO. As such, in the NLO set-up the 2 → 2 process is sampled more often and is faster than processes with more final-state partons, leading to an overall decrease in the average time per event. When adding more NLO diagrams to the calculation, there is approximately a tenfold increase in the event generation time for each additional NLO contribution. The difference with and without the LO legs is relatively small compared to the increase when moving from 1j@NLO to 2j@NLO.
While the configurations with up to one extra parton at LO or NLO in the strong coupling are shown to be much faster, the cross-sections for these configurations are known to miss important contributions and severely underestimate the total cross-section [70][71][72]. In order to meet the requirements of the ATLAS physics programme, which demands -19 -JHEP08(2022)089 accurate predictions for phase spaces selecting multiple jets and is thus sensitive to these higher-order QCD contributions, it is mandatory for the matrix elements to include the highest perturbative accuracy that available theoretical tools and computational resources allow. Therefore, the QCD accuracy for the Sherpa 2.2.11 configuration is chosen to be 0-2@NLO+3,4,5j@LO. Most of the CPU cost originates from the 2j@NLO contribution, but can be reduced with various well-motivated approximations as described in the next section.

Additional features
The features class of set-ups quantifies the effect of various approximations or additional predictions included in the Sherpa 2.2.11 configuration relative to a reference configuration. The reference configuration uses a configuration similar to the Sherpa 2.2.11 configuration in table 1, but removes the H T scale for H-events shown in eq. (2.1), the NLO EW virt corrections shown in section 3.1, the scale and PDF variations, and uses the 2 → 4 core process for the evaluation of the differential K-factor for LO matrix elements as done for the Sherpa 2.2.1 set-up. Each feature is evaluated independently by adding them one-byone above the reference set-up, so that the CPU cost of each additional feature can be quantified.
In order to reduce the CPU budget of the costly 2j@NLO contribution, an H T scale choice is used for H-events. The default merging scale requires independent calculations for the real-emission correction and all associated dipole subtraction terms in the H-events. The number of dipole terms grows as n 3 with the number of external partons (n), and for each of those dipole terms the clustering algorithm requires a runtime proportional to (n − 1) 2 . Therefore, the expected speed-up when using the H T scale is particularly large for H-events at large multiplicities. As shown in figure 8, a factor of three reduction in the per-event generation time is achieved when using an H T scale for H-events. The clustering scheme that uses a simplified 2 → 2 core process to evaluate the differential K-factor was found to have consistent per-event CPU times compared to the setup using the 2 → 4 core process.
The additional predictions include NLO EW virt corrections in three schemes, as described in section 3.1, and over 300 additional predictions for scale and PDF uncertainties. As shown at the bottom of figure 8, the NLO EW virt corrections, and scale and PDF uncertainties, each independently increase the event generation time by up to 40%. Although the additional scale and PDF predictions increase the event generation time, they are produced at a significantly smaller CPU cost than generating over 300 different predictions individually, and improve the accuracy of the prediction.
The results in figure 8 clearly show it is possible to reduce the per-event CPU time by up to a factor of three, provided that the features do not negatively impact the physic modelling performance. In order to illustrate the effect on the modelling, the reference setup is first modified by switching to the H T scale for H-events (blue), and again by using the inclusive + exclusive clustering scheme (green). Additionally, the Sherpa 2.2.11 set-up with all additional features added simultaneously is included for comparison. As can be seen in figure 9 phase spaces. The largest differences observed are with the modified H T scale for H-events for intermediate H T and p V T values, but amount to less than 5% and are well within the uncertainty band formed from variations of the renormalization and factorization scales. Given these small differences and the significant reduction in the per-event CPU time, both the H T scale and the modified clustering scheme are adopted as the default in the 2.2.11 V + jets sample.

Generator comparisons
This section presents a comparison of the CPU time for the Sherpa and Mad-Graph5_aMC@NLO+Pythia configurations when the major differences between generator-specific approaches are factored out. In particular, the two generators have different approaches to setting the renormalization scale and in this paper it is set at different perturbative accuracies. The NLO multi-jet merging approach in Sherpa is constructed so as to account for the resummation of soft-collinear higher-order corrections originating in the phase space available to gluon decays. This is achieved by a suitable setting of the renormalization scale, which should correspond to the nodal transverse momentum in a  parton-shower branching tree for each gluon emission [73,74]. In order to identify this scale, the event needs to be clustered using a sequential recombination algorithm, whose computational complexity grows as n 2 as the number of external partons, n, increases. On the other hand, the MadGraph5_aMC@NLO+Pythia configuration discussed in the paper uses an event-level scale choice which requires significantly less CPU resources than the recombination algorithm used in Sherpa. budget in the majority of the Sherpa configurations used by ATLAS, it is important to ensure an accurate description of vector bosons produced with many jets. In particular, when using an event-level H T scale the Sherpa prediction under-estimates the data by up to 30% for phase-spaces with multiple jet emissions in W + jets and Z + jets selections.

Statistical enhancement methods
The probability of sampling an event with a specific phase-space configuration is dictated by the differential partonic cross-section. This poses a problem from a sampling coverage perspective as regions of phase space with small production cross-sections, for example V + jets processes in which the transverse momentum of the vector boson is large, are rarely sampled. Therefore, although the speed with which events can be generated is a key ingredient in generating large MC samples quickly, it is not the whole picture. The question of phase-space sample coverage is a problematic one from a logistical perspective, because regions of interest to BSM searches often have small production crosssections, whilst regions studied by SM precision measurements often have high production cross-sections. In order to address the statistical needs for the broad ATLAS physics programme, smart phase-space sampling/biasing methods are required. Section 4.2.1 details the deployment of enhancement strategies that either divide up the integration problem into multiple regions or bias the integrand to artificially increase the sample rate of specific phase-space points.
The weighted mode of operation of event generators can introduce statistical features that alter the statistical power of each sampled event. The precision of the MC estimated integral is dependent on an array of factors ranging from the number of sampled points, to the variance of the event weights. Since biasing techniques alter the phase-space sample rate, additional sources of variance are introduced, which can degrade the statistical precision of a sample. The statistical features of the event weight distribution are studied in section 4.2.2 for an array of enhancement techniques.

Sample biasing and statistical precision
Monte Carlo integration numerically approximates a definite integral by randomly sampling a set of N gen points within the integral domain according to a probability density function. For scattering experiments this integral represents the cross-section, whereby each sampled point represents an event with a generation probability defined by the differential partonic cross-section. MC event generators operating in a weighted mode therefore assign each event a weight (w) based on the generation probability, such that the cross section is approximated by the average of the weights w , 1 [75]. With this in mind, the relative statistical uncertainty of a MC sample across the entire generated phase space 2 is given by: The MC estimator of the cross-section integral is equivalent to the product of the event weight mean and integral volume. 2 In order for this to be used for a subset of the phase space it should be corrected by the ratio of the event weight sum entering the sub-space over the total number of events.

JHEP08(2022)089
whereσN is the sample error,N is the expected event yield given N gen sampled phase space points, and s 2 w is the event weight variance. The last square-root factor of eq. (4.1) is referred to as the statistical dilution factor, which originates from the presence of non-unit event weights. Consequently, samples containing a large variance of event weights will have a significantly reduced statistical precision.
This statistical picture is complicated by the introduction of negative event weights, as a result of the MC@NLO method [76]. The presence of negative weights not only increases the event weight variance, but also introduces an additional source of statistical suppression, thereby further reducing the statistical precision of a generated MC sample. The above relative statistical uncertainty formalism of eq. (4.1) can be reparameterized in terms of the negative-weight fraction f nw as given by: where a constant non-zero event weight of w = ±c is assumed, and f nw is the negative- for N gen total and N − gen negativeweight events. The asymptotic limit of f nw = 0.5 can be seen to induce an infinitely large error. The statistical precision of the generated MC samples is studied below, using the four key parameters obtained above: [N gen , w , s 2 w , f nw ]. These are referred to as numerical efficiency parameters.
Given the strong reliance of the MC statistical precision on the number of sampled events, N gen , the question of how to sufficiently populate remote phase-space regions for a precise prediction is crucial. For example, in the V + jets case study, high-H T phasespace regions are mainly populated from topologies with high parton multiplicity. The resulting cross-section contribution to a more inclusive generator phase space is therefore small, due to the suppression caused by powers of α S . Therefore, when generating a MC sample inclusive of H T , the high H T region will be sparsely populated. Establishing an enhancement strategy to address this must therefore meet two key criteria: it should exert control over the phase-space sampling in a versatile way, and induce minimal variance in the event weight.
To address this, three enhancement techniques are tested, which offer varying levels of control over how events are sampled from the generated phase space. These are divided into two categories, referred to as discrete, and continuous in the following. As per the name, all methods enhance the probability of sampling an event according to one or more kinematic observables, where the exact definition of each observable is limited to anything that can be constructed from the four-momenta of the matrix-element final-state particles.
In the discrete case, MC samples are sliced according to kinematic observables, O, by constructing exclusive integration boundaries between which the MC generator would sample. This discrete technique is referred to as a sliced enhancement strategy. An illustration of the sliced enhancement strategy for pp → e + e − + jets is given in figure 10, in which the distribution of unweighted events and the resulting differential cross-section is shown as a function of log 10 (max(H T , p V T )). Six phase-space slices are generated with internal   Figure 10(b) contains the same information, split according to the final-state parton multiplicity, demonstrating that events with high parton multiplicity mostly populate the large-max(H T , p V T ) phase-space region.
The phase-space slices shown in figure 10 are not perfectly aligned with the integration boundaries, due to a disparity in the p V T and H T definitions used at the matrix-element integration stage versus those at the matrix-element final-state level (as shown in the figure). Specifically, for NLO processes the hard-scattering matrix element can contain an additional parton emission, which would be matched between the matrix element and parton shower via the colour-exact variant of the MC@NLO algorithm [33]. At the integration step this emission is not present, and so only the on-shell final-state partons are used to define H T and p V T . This feature is unique to the slicing strategy, which, as shown in the following section, has significant consequences for the statistical precision of the sample at the internal boundaries.
In contrast, continuous enhancement techniques artificially scale the probability density function by a biasing function e( O), where O is a set of observables. Following the transformation the resulting cross-section is non-physical, so each sampled event is corrected by the inverse of the biasing function, w bias i,n = e( O) −1 , as an event-weight multiplier after event generation.
Two instances of this type of technique are considered, referred to as the differential enhancement and analytic enhancement strategies: • Differential enhancement: the biasing function is set to the inverse of the differential cross-section as a function of a single observable for each parton-multiplicity -25 -

JHEP08(2022)089
final state, e n (O) = (dσ n /dO) −1 , where the subscript n denotes the number of finalstate partons. This is applied within some kinematic bounds defined by the user, α < O < β.
• Analytic enhancement: the biasing function is an analytic formula parameterized according to a set of observables independent of the final-state multiplicity n, i.e. e( O). Figure 11 illustrates the effect of the continuous enhancement techniques on the unweighted event sample rate versus that predicted by the differential cross-section, for both the differential ( figure 11(a)), and analytic ( figure 11(b)) strategies. The observable of choice is max(H T , p V T ) with the former using a log form, log 10 (max(H T , p V T )), and the latter using a squared form, (max(H T , p V T )/20) 2 . The squared form was empirically selected to balance the production rate of the low and high max(H T , p V T ) regions. 3 Unlike the discrete sliced enhancement strategy, the unweighted event sample rate varies smoothly as a function of the enhancement observable(s), which has profound consequences for the statistical precision of the sample (as explored in the following section). It should be noted that the differential enhancement scheme samples events more uniformly than the analytic scheme. This is most notable when comparing the individual partonmultiplicity subprocesses between the differential and analytic enhancement schemes. This feature is native to the differential enhancement scheme because the sampling probability density function is biased using the inverse of the differential cross-section. Therefore, the transformed probability density function from which events are sampled is approximately uniform as a function of the biasing observable. The sharp change in sample rate at log 10 (max(H T , p V T )) ≈ 3.3 for the differential enhancement approach is a consequence of an upper bound on the differential enhancement strategy of max(H T , p V T ) ≈ 2 TeV, after which the sample rate is dictated by the cross section. Achieving this level of uniformity with the analytic scheme ultimately requires some fine tuning of the observables and analytic functional form, but with this comes versatility and flexibility in controlling the sample rate.
Each of the three techniques have their own set of strengths and weaknesses, in terms of phase-space sampling speed (generation time), phase-space sampling control, and statistical precision. Some of these features are already visible in figures 10 and 11, for example the smoother sampling coverage of the continuous enhancement strategies compared to the sliced enhancement strategies. The impact of these features on the performance of each sample is explored in the following subsection.

Quantitative assessment
Using the two Sherpa configurations outlined in section 2.1, an assessment of the statistical precision of the discrete and continuous enhancement strategies for pp → e + e − + jets and JHEP08(2022)089 pp → µ + µ − + jets is performed. The max(H T , p V T ) sliced 2.2.1 configuration serves as a reference sample, due to its status as the default configuration for modelling V + jets processes within the ATLAS physics programme, when evaluating the performance of the new continuous enhancement 2.2.11 configurations. The assessment frames the statistical precision of each sample within the context of the four MC numerical efficiency parameters previously defined, [N gen , w , s 2 w , f nw ]. As demonstrated by figures 10 and 11, the differential distribution of sampled events is of key interest when assessing the statistical performance of the aforementioned enhancement strategies. With this in mind, and in order to provide a comparison that is sensitive to the distribution of events, not the absolute inclusive quantity, the inclusive phase space of each sample is limited to a total of 10 million unweighted events, N gen = 10 7 . Figure 12 shows the distribution of event weights as a function of max(H T , p V T ), normalized to unit area, for the 2.2.1 ( figure 12(a)), 2.2.11 differential enhancement (figure 12(b)), and 2.2.11 analytic enhancement (figure 12(c)) configurations. The mean weight and rootmean-square deviation as a function of max(H T , p V T ) are given by the red markers. As observed in figure 10, the discrete max(H T , p V T ) slices of the 2.2.1 configuration are visible as differently coloured bands. Given that the assigned event weight is proportional to the amplitude of the matrix element and the infinitesimal phase-space element evaluated for the event kinematics, the mean weight and variance of each slice must differ. This is because each slice populates a unique and exclusive region of phase space. Therefore, when constructing the inclusive sample each phase-space slice contributes a single unique mode to the event weight distribution. differential enhancement strategy on the final-state parton multiplicity; each 2 → n process is enhanced by its own inverse differential cross-section as outlined in section 4.2.1. Given the correlation of H T and p V T with the parton multiplicity of the hard-scatter process, for any given max(H T , p V T ) value two different parton-multiplicity final states experience uniquely different biasing weights. The resulting event weight distribution is therefore multimodal in nature, which has negative consequences on the statistical precision of the sample as discussed below.
In contrast, the analytic enhancement strategy of the 2.2.11 configuration is independent of the hard-scatter final-state parton multiplicity, and with no discrete integration boundaries a smooth event weight distribution is obtained as a function of max(H T , p V T ), as demonstrated by figure 12(c).
With this in mind, figure 13 summarizes the statistical precision of the three enhancement strategies as a function of max(H T , p V T ) and p V T . The top panel shows the pp → e + e − + jets differential cross-section (black) generated at  (b). The top panel in each case also superimposes the relative statistical uncertainty as a function of the kinematic observable. The middle panel shows the differential behaviour of the statistical dilution factor resulting from the event weight variance, as given by eq. (4.1), whilst the third panel shows the differential negative-weight fraction. From figure 13 it can be seen that the differential enhancement strategy gives the least precise prediction. This is the direct result of the event weight variance induced by the parton-multiplicity-dependent biasing, which manifests in the form of a large statistical dilution factor. In the comparisons shown, this dilution factor exceeds that of both the 2.2.1 and 2.2.11 configurations for p V T > 100 GeV and max(H T , p V T ) > 150 GeV, and does so by almost an order of magnitude in the high max(H T , p V T ) and p V T regions. In the comparison of the analytic and sliced enhancement strategies the conclusions are, however, slightly more nuanced. The statistical precision of the differential crosssection predicted by the 2.2.11 analytic enhancement strategy either matches or exceeds that of the 2.2.1 sliced enhancement strategy. Furthermore, the statistical dilution factor and negative-weight fraction of the analytic enhancement strategy is far less variable than that of the sliced enhancement strategy. The superior statistical precision and stability of the 2.2.11 configuration is attributed to two key factors.
The first is the removal of phase-space slicing during the MC integration step. Specifically, the 2.2.1 phase-space slices are only exclusive at the matrix-element level during integration, in which the on-shell final-state partons (void of NLO parton emission matching) and V boson decay products are used to construct p V T and H T . After parton showering and hadronization, the observables are defined by hadron clustered jets, and electroweak showered leptons. Consequently, the disparity in the definition of the H T and p V T observables means that the phase-space slices overlap at the internal integration boundaries when using the hadron-level definitions. Therefore, at the internal phase-space boundaries a small overlap results in a multimodal event weight distribution and so an increased event weight variance. Given the higher density of integration boundaries in the max(H T , p V T ) < 280 GeV region (see section 2.1), the size and variability of the dilution factor is larger.
This intrinsic feature ultimately limits the scalability of the sliced enhancement strategy, because should a sample with finer integration slices be required, the growing number of internal boundaries will induce an increased statistical dilution effect. Effectively, the scheme must sacrifice statistical precision in order to gain more control of the sample rate across the generated phase space.
The second key factor is a reduction of the negative-weight fraction in the 2.2.11 configuration. An overall inclusive reduction of the negative-weight fraction from ∼18% to ∼9% was achieved by neglecting sub-leading gluon spin and colour correlations in the MC@NLO matching. Furthermore, by using the 2 → 2 process to define the local NLO/LO reweighting K-factor for the 2.2.11 configuration, instead of the 2 → 4 process as done for the 2.2.1 configuration, the fraction of negative weights shifts from the high cross-section regions at low p V T and H T into the low cross-section tails. As a result, the statistical dilution in the highest cross-section regions is reduced, thereby significantly increasing the overall statistical precision of the sample.
Despite the superior statistical precision of the analytic enhancement strategy, and the versatility of the analytic form (sample rate can be tuned to suit the physics use case), the      T ) in black, as read from the left axis. The relative differential uncertainty of said crosssection for the three enhancement schemes is shown superimposed and should be read from the right axis. The first panel shows the differential efficiency factor resulting from the weight spread, whilst the second panel shows the negative-weight fraction, as a function of p V T and max(H T , p V T ).
subdomain sampling capability of the sliced enhancement strategy is still invaluable. With this in mind, a hybrid scheme is possible in which multiple analytically enhanced samples exclusive in their phase-space slicing can be combined. This is demonstrated in figure 14 in which the aforementioned max(H T , p V T ) analytically enhanced 2.2.11 configuration sample (red) is shown as a function of the invariant mass of the dimuon system (m µµ ) for pp → µ + µ − + jets, along with the resulting differential cross-section (black). Due to the off-shell suppression, the ability of the max(H T , p V T ) enhanced sample to populate off-pole-mass regions is limited.
Consequently, a supplementary 2.2.11 configuration using an (m V /100) 2 enhancement with a dimuon invariant mass requirement of m µµ > 120 GeV is generated. This is combined with the 2.2.11 max(H T , p V T ) enhanced sample by vetoing events below m µµ = 120 GeV in the max(H T , p V T ) sample. The resulting combined sample, as shown by the orange line in figure 14, has a smaller relative statistical uncertainty in the m µµ > 120 GeV region due to the larger statistical population, and the reduced statistical dilution factor in the m µµ > 120 GeV region. The negative-weight fraction is unaffected by the biasing and subsequent combination procedure, as demonstrated.

Impact on computing resources
The inclusive V + jets cross-section is sufficiently large that the computing resources required to match the dataset's integrated luminosity are significant and difficult to achieve  Figure 14. Differential cross-section of pp → µ + µ − + jets as a function of m µµ in black, as read from the left axis. The relative differential uncertainty for the max(H T , p V T ) analytic enhancement Sherpa 2.2.11 configuration and the m µµ + max(H T , p V T ) analytic enhancement Sherpa 2.2.11 configuration, is shown superimposed and should be read from the right axis. The first panel shows the differential efficiency factor resulting from the weight spread, whilst the second panel shows the negative-weight fraction, as a function of m µµ . across the wide range of phase space required for physics analyses. Moreover, as the LHC physics programme advances it continues to explore extreme regions of phase space often requiring computationally expensive higher-order strong and electroweak corrections. As shown in the previous section, this higher accuracy requires additional resources but these can be reduced through well-motivated approximations with minimal impact on the physics modelling. In this section, the computing resources for the Sherpa 2.2.11 and 2.2.1 configurations are compared in order to quantify the combined effect of the various improvements described in this paper.

Per-event CPU cost for production configurations
The per-event CPU time of the final production versions of the events and the alternative clustering scheme lead to shorter event generation times. In addition, the analytic enhancement described in section 4.2.1 is included and biases the sampled events. The 2.2.1 configuration contains on-the-fly scale and PDF variations, and was sliced into discrete samples in the max(H T , p V T ) variable. The CPU time of each for the max(H T , p V T ) slices was characterized independently. The mean values in seconds per event and HS06 seconds per event are shown in table 3. Each configuration was run five times and the standard deviation of the results is quoted as the trial error and amounts to less than 5% for all configurations.
As can be seen from the table, the event generation time for the high-max(H T , p V T ) 2.2.1 slices is considerably longer than the slices at lower values, since 2 → n processes with higher n dominate the higher-max(H T , p V T ) slices and are more computationally challenging. Since these multi-jet merged setups are primarily used in phase-spaces with jet selections, the intermediate to higher-max(H T , p V T ) slices are most relevant to analyses.

Generated effective luminosity
The effective luminosity of a generated MC dataset can be used as a metric to quantify the MC sample size and compare the performance of the different configurations. It is defined to be: where x b is the center of bin b of a generic observable, σ(x b ) is the total cross-section in bin b, and N (x b ) eff is the effective number of events in bin b. The effective number of events is computed from weighted events and defined to be:

JHEP08(2022)089
where w i corresponds to the generator event weights in the sample, and the sum extends over events that are contained within bin b.
In order to facilitate the comparison of all the Sherpa configurations in table 1, the total number of events in each generated sample is normalized to a common benchmark of 200 M events. Since the 2.2.1 configuration is separated into discrete samples based on the max(H T , p V T ) quantity, the number of events that populate each of the individual max(H T , p V T ) slices needs to be specified. For the purposes of these comparisons, the actual number of events generated by the ATLAS collaboration in each of the discrete max(H T , p V T ) slices is assumed, and interpreted as the overall demand from ATLAS. The fraction of events in each of the slices is shown in the last column of table 3. The sum of all the slices are then normalized to 200 M for comparison with the continuous enhancement samples.
The effective luminosity in each generated MC sample as a function of the boson transverse momentum and the total sum of jet transverse momenta (H T ) are shown in figure 15. The Sherpa 2.2.11 configuration provides an effective luminosity similar to that from the sliced Sherpa 2.2.1 set-up across a wide range of phase spaces, and exceeds the ATLAS Run 2 dataset's integrated luminosity for intermediate to high p V T and H T selections. When comparing the effective luminosity of the 2.2.11 and 2.2.1 set-ups in the low p V T and H T regions, the two configurations have comparable performance overall, with maximum differences of 30%. For high p V T and H T selections, the spread of weights and the negative-weight fraction are slightly higher in the Sherpa 2.2.11 set-up than in 2.2.1, which is reflected by the reduced effective luminosity in the tails of the Sherpa 2.2.11 set-up. However, both set-ups already exceed the dataset's integrated luminosity by over an order of magnitude. On the other hand, the extremely large spread of weights present in the differential enhancement Sherpa 2.2.11 configuration has a detrimental effect on the effective luminosity, which can be over two orders of magnitude lower than that of the Sherpa 2.2.11 configuration when using the analytic enhancement scheme. The differential enhancement configuration fails to reach the Run 2 dataset's integrated luminosity across the most of the phase space.
Besides providing comparable performance to the 2.2.1 setup the effective luminosity of the 2.2.11 set-up is more uniform than in the 2.2.1 configuration. As discussed in section 4.2.2, discrete slicing causes a migration between slices, which produces an artificially large spread of weights. This effect is most pronounced near the slice boundaries and degrades the effective luminosity in those regions When using a continuous enhancement method as done in the Sherpa 2.2.11 configurations, such an effect is no longer present, improving the uniformity of the effective luminosity across values of key variables used in physics analyses. This is clearly observable in the H T distribution in figure 15 around the slice boundaries 280 GeV and 1000 GeV.
Beyond the improvement in the generated effective luminosity, the Sherpa 2.2.11 configuration costs less than the 2.2.1 set-up from a CPU standpoint, despite the former containing additional LO QCD legs and virtual electroweak corrections. Using the mean values for the CPU cost and the relative number of events in each of the slices found in  configurations is 140 × 10 3 MHS06 seconds and 75 × 10 3 MHS06 seconds, respectively. This represents nearly a factor of two reduction in the computing resources needed to obtain the results shown in figure 15. While the improvements in generated effective luminosity arise largely from the improved numerical efficiency parameters discussed in section 4.2.2, the overall decrease in CPU cost originates from the various approximations that were shown in section 4.1 to reduce the per-event generation time by over a factor of three.
Looking towards the HL-LHC, the integrated luminosity of the data set is expected to be at least 3000 fb −1 . Assuming the same sampling pattern as given by figure 15, in which the MC effective luminosity matches or exceeds that of the data for p V T and H T greater than approximately 100 GeV, roughly 10 billion events will be required for each of the leptonic decay modes of the Z boson. Accounting for the leptonic W boson decay modes as well, roughly 330 billion events will be needed to match the data set luminosity at the HL-LHC. With the developments in the latest Sherpa 2.2.11 setup presented in this paper, the CPU cost for this would be 3.8 MHS06 years. Assuming an aggressive research and development (R&D) programme over the following decade, the CPU budget allocated to event generation is 3.2 MHS06 for the year of 2031 [77]. Therefore the latest Sherpa 2.2.11 setup would exceed the yearly budget by 16% while the Sherpa 2.2.1 configuration would exceed the yearly resources by 132%. Despite exceeding the yearly budget, the reduction in the computing resources represents an important milestone towards meeting the computing budget for the HL-LHC. Further CPU reductions aimed at reducing the costly additional features shown in section 4.1.2 are expected to bring the latest Sherpa 2.2.11 setup below the allocated yearly resources and enable the usage of a second alternative multi-jet merged V+jets setup, such as the MadGraph5_aMC@NLO+Pythia described in this paper.

Conclusions
The production of a single vector boson plus jets is an important process for study at the LHC and an integral part of the ATLAS physics programme. State-of-the-art predictions for single boson production are routinely used in comparisons with data measurements, and serve as an important ingredient for the modelling of backgrounds in measurements of other SM processes or searches for physics beyond the SM. As such, an accurate description of V + jets processes, along with good statistical coverage across a wide range of phase spaces, is crucial for the success of the ATLAS physics programme.
This paper highlights a series of recent modelling and computational improvements in the Sherpa event generator developed beyond the 2.2.1 set-up used by ATLAS for the last several years. The theoretical changes were shown to improve the description of data for a variety of measurements, and provide additional higher-order electroweak corrections for future measurements to probe. The major CPU bottlenecks in the event generation time were identified, and various approximations were implemented that reduce the perevent time by up to a factor of three. Together with a continuous statistical enhancement technique, the resources required to produce the Sherpa 2.2.11 configuration were reduced by up to a factor of two relative to the 2.2.1 configuration. In addition, the performance of the first alternative ATLAS multi-jet MadGraph5_aMC@NLO+Pythia configuration using FxFx merging with a similar perturbative accuracy to the Sherpa prediction was shown.
In summary, the MC configurations reported in this paper form the base standard of single vector boson plus jets simulation moving into the third run period (Run 3) of the LHC for ATLAS analyses. They also demonstrate that selective scale choices, appropriate phase space biasing algorithms, and negative weight treatments are appropriate techniques capable of reducing the CPU foot print of MC simulation moving into the HL-LHC era. However, although the changes made in the Sherpa 2.2.11 configuration will provide sufficient CPU savings such that a complementary higher order MC prediction (such as that presented by the MadGraph5_aMC@NLO+Pythia configuration) can be produced without increasing the total CPU foot print, the aforementioned improvements only represent the conservative R&D programme outlined in ref. [77]. Therefore, in order to reduce the total CPU usage of MC simulation to within the projected HL-LHC CPU budget, a more aggressive computational R&D programme in the field of MC simulation will be required moving into the future.

A.1 Final-initial dipoles
The eikonal factors appearing in the Catani-Seymour splitting functions [79] are changed as: This leads to an additional contribution to theK operators of the form 2 log 2 − x a,ij 1 − x a,ij .

A.2 Initial-final dipoles
The eikonal factors appearing in the Catani-Seymour splitting functions [79] are changed as: This leads to an additional contribution to theK terms of the form 2 log 2 − x a,ik 1 − x a,ik .

A.3 Initial-initial dipoles
The light-cone momentum fractions are modified as x i,ab → x i,ab +ṽ i , and the eikonal factors appearing in the Catani-Seymour splitting functions [79] are redefined as follows: The collinear term for q → q is changed according to: The collinear kernel for q → g is changed according to: The collinear terms for g → g are changed according to: The collinear kernel for g → q is changed according to: This leads to an additional contribution to theK operators of the form: