Parton Shower Uncertainties with Herwig 7: Benchmarks at Leading Order

We perform a detailed study of the sources of perturbative uncertainty in parton shower predictions within the Herwig 7 event generator. We benchmark two rather different parton shower algorithms, based on angular-ordered and dipole-type evolution, against each other. We deliberately choose leading order plus parton shower as the benchmark setting to identify a controllable set of uncertainties. This will enable us to reliably assess improvements by higher-order contributions in a follow-up work.


Introduction
General purpose Monte Carlo (MC) event generators [1][2][3][4][5][6] are central to both theoretical and experimental collider physics studies. Recent development of these simulations has seen improvements in various areas, both within perturbative calculations, through matching to fixed order [2,[7][8][9][10][11][12][13][14][15], combining higher jet mutliplicites [16][17][18][19][20][21][22], as well as the all-order resummation with parton showers [23][24][25][26] and also within the non-perturbative, phenomenological models [27,28]. While there are well established prescriptions on how to quantify the theoretical uncertainty of fixed-order calculations due to missing higher order contributions [29][30][31][32][33][34][35] 1 , no such general recipe exists for resummed calculations (see e.g. [36,37]), and parton shower algorithms in particular. Given the perturbative improvements, and the expected precision from data-taking at Run II of the Large Hadron Collider [38,39], the task of assigning theoretical uncertainties to MC event generators is becoming increasingly crucial. This also applies to validating new approaches against existing data, as well as using predictions to design future observables and/or collider experiments. Phenomenological studies, for example, indicate that MC event generators can be used even in primarily data driven methods to perform powerful analyses once theoretical uncertainties are under control [40,41]. It is therefore important to quantify the uncertainties associated with an event generator in a reliable way.
Uncertainties due to non-perturbative modelling have been addressed in [42] and [43], as well as the impact of the parton shower on reconstructed observables [44]. Various ambiguities and sources of uncertainty have been addressed within the context of other multi-purpose event generators as well; in particular recoil schemes [45,46] and parton distribution functions (PDF) [47][48][49] have so far been considered, both for pure showers and in the context of matched or merged samples, see e.g. [50]. All of these studies share a commonality in that they focus on a single source of uncertainty which is usually connected to the development/improvement studied. Contrary to this, the authors in [51][52][53][54] describe possible approaches to uncertainty handling for the Drell-Yan process. An even more systematic approach for how to handle the possible interplay between theoretical and experimental uncertainties can be found in [55]. Further in the direction of a systematic approach, CMS published a short guide on how to estimate MC uncertainties [56] and outlined some issues to address. Finally, new techniques of propagating uncertainties through the parton shower by means of an alternate event weight were proposed [57].
In the present work, we address uncertainties of parton shower algorithms within the Herwig 7 event generator [1,2]. Herwig 7 is a general purpose event generator that computes any observable at next-to-leading order (NLO) precision in perturbation theory automatically matched to a parton shower. It includes sophisticated modules for very different physics aspects ranging from interfaces for physics beyond the standard model and two independent parton shower algorithms [45,58], to a detailed modelling of multiple particle interactions [59][60][61].
It is our aim to develop a consistent uncertainty evaluation for event generators, and Herwig 7 in particular. This work is a first step in this direction, concerning the parton shower part and will be extended by further detailed studies in the context of higher order improvements and the interplay with non-perturbative, phenomenological models and parameter fitting. The present paper is therefore structured as follows: in Sec. 2 we classify all different types of uncertainties and their respective sources. We then argue as to why we start with a pure leading order (LO) plus parton shower (PS) study. The sources of uncertainty tested in this study are described in detail in Sec. 3. Our results are presented in Sec. 4 for e + e − and fully inclusive pp production, while our findings including additional jet radiation are described in Sec. 5. The results establish a baseline of a set of controllable uncertainties, that can then be used to quantify the impact of higher order corrections to be addressed in upcoming work. Finally, we present a summary and outlook in Sec. 6.

Sources of Uncertainty
For any general purpose event generator that is based on both perturbative input and phenomenological models, there are a number of different sources of uncertainty to be addressed: -Numerical: Computational precision and statistical convergence. This is clearly a limitation which can be overcome by investing enough computing resources and will hence not be addressed further. -Parametric: Quantities taken from measurements or fits beyond the event generator parameters. This includes masses, coupling constants, and PDFs, and the impact of these needs to be quantified separately and potentially on a process-by-process basis watching out for maximum sensitivity. -Algorithmic: The actual parton shower algorithm, matching and merging prescriptions, and phenomenological models considered. The last are not considered here, as we limit ourselves to the simulation available in Herwig 7. -Perturbative: Truncation of expansion series in coupling or logarithmic order. The main purpose of this work is to elaborate on quantifying these uncertainties in the case of leading order plus parton shower simulation, which will be motivated in more detail below. -Phenomenological: Goodness of fit uncertainties regarding parameters in the non-perturbative models. We will argue that a remaining spread of predictions obtained by fitting parameters for each of the variations of controllable perturbative uncertainties is able to quantify the cross talk to non-perturbative models and a genuine model uncertainty.
In this study we will address perturbative uncertainties in the parton shower algorithms as a first piece of the chain of variations to be done. We will use two different shower algorithms to benchmark the uncertainty prescriptions against each other and to point out further interesting differences. The results considered here will serve as further input to identify improvements of NLO matching and merging, to be addressed in a separate paper.
Phenomenological uncertainties will be subject to future investigations. However, we will point out first hints towards their influence by considering variations of the shower infrared cutoff in a selected number of cases. The reasoning to this is two-fold: On one hand, we want to stress the fact that parton level studies should typically be carried out with care, and their region of validity can by estimated by cutoff variations with large changes that indicate non-negligible hadronisation corrections. On the other hand, this fact also indicates how cutoff variations, along with other variations, may actually point to the possibility of quantifying otherwise unknown, generic, model uncertainties and the interplay with non-perturbative corrections.

Why Leading Order?
We solely consider LO plus PS simulation in this work. The motivation to do so is as follows: With fixed-order improvements it is clearly very hard to disentangle sources of uncertainty stemming from pure parton showering, and those which have been potentially improved by higherorder corrections. In order to quantify genuine parton shower uncertainties in an improved setting one would typically need to look at jets beyond those that received fixed-order hard process input (e.g. the second jet from a leading order configuration in a next-to-leading order matched simulation). Not only is this computationally unnecessary for the sake of studying only parton shower uncertainties, it also introduces slightly different shower dynamics, the differences of which, with respect to leading order, would also need to be quantified carefully. Additionally, it is our aim to show where and how fixed order input improves the simulation along with the expected reduction in uncertainty; stated otherwise: To use the NLO matched simulation in order to identify which of the non-first-principle variations considered in this work are indeed reliable estimators of theoretical uncertainty in the perturbative part of event generator predictions.

Different Algorithms or Uncertainties?
To quantify to what extent commonly used recipes are a sensible measure of uncertainties in parton shower algorithms the first step is a clear distinction of what possible sources exist within a fixed algorithm, and what differences should actually be attributed to the consideration of distinct algorithms. Looking at different algorithms, we obtain a strong cross-check on whether the uncertainties assigned to one algorithm are sensible, provided we consider algorithms that exhibit similar resummation properties. We will also show that changes to the algorithms that are naively expected to be subleading, can cause severe difference in the resummation properties. Similarly, kinematics parametrisations to convert on-shell partons to off-shell ones after multiple radiation are known to cause numerically significant differences [45,46,62].
Such details, as well as the choice of splitting kernels and evolution variable should not be considered a source of uncertainty within an algorithm but are details that fix a distinct algorithm; we therefore call them algorithmic uncertainties. An uncertainty band based on varying such details cannot serve as a systematic framework to quantify missing higher-logarithmic contributions. If differences between algorithms are not covered by variation of the scales involved, either the estimate of uncertainty or the resummation properties of the algorithms should be questioned.
The relevant scales for the study at hand are: the hard scale µ H (factorisation and renormalisation scale in the hard process); -the veto scale µ Q (boundary on the hardness of emissions); -the shower scale µ S (argument of α S and PDFs in the parton shower).
No a priori prescription can be obtained as to what these scale choices should optimally be; the first two are usually taken as 'a typical scale of the hard process', while the last one faces more constraints to guarantee resummation properties and the correct backward evolution [63] within the parton shower. Having made a central choice, we vary the scales by fixed factors to generate subleading terms with coefficients of order one as an initial guess on higher order corrections and phase-space effects. At least two parameters in our shower algorithms are typically obtained in the course of tuning to data, the strong coupling α s (M Z ), and the shower cutoff parameter. 2 Using the different tuned values (at least with the latter having, in general, a different meaning between the two showers), the predictions on parton level will differ, though fully simulated, hadronic events, will yield a comparable description of data. We argue that these differences should be evaluated carefully, but belong to a future study that will address the interplay with non-perturbative models in more detail.

Simulation Setup
We consider both parton shower modules available in Herwig 7, the default angular-ordered shower [58] and the dipole-type shower based on [8,45]; in addition to their default settings, which we have adjusted to make them as similar as possible by choosing the same p ⊥ cutoff and α s running (the 'baseline' settings for this work), we consider a number of modifications mainly outlined in Sec. 3, all of which constitute different algorithms in the sense outlined above. The two showers are very different in their nature: The angular-ordered, QTilde, shower evolves on the basis of 1 → 2 splittings with massive DGLAP functions, using a generalised angular variable and employs a global recoil scheme once showering has terminated; its available phase space is intrinsically limited by the angular-ordering criterion, resulting in a 'dead zone', though it is able to generate emissions with transverse momenta larger than the hard process scale and so typically an additional veto on jet radiation is imposed (see Sec. 3 for more details). The dipole based shower, Dipoles, uses 2 → 3 splittings with Catani-Seymour kernels with an ordering in transverse momentum and so is able to perform recoils on an emission-by-emission basis; the splitting kernels naturally require the two possible emitting legs of each dipole to share their phase space and there is no a priori phase-space limitation, but the available phase space is controlled by the starting scale of the shower.
Using the baseline, we find very similar predictions despite the very different nature of these algorithms. As an example we show in Fig. 1 the predictions for the Higgs p ⊥ spectrum at an LHC with √ s = 13 TeV. The only difference between the algorithms we observe in the very low p ⊥ region where the interplay with the treatment of the remnant and intrinsic transverse momentum smearing becomes important. For future reference, we have also included results running the showers at their default settings to highlight what level of interplay with tuned values and non-perturbative models can be expected. To be  more precise, we use a two-loop running, MS, α s including CMW correction [64] with α CM W s (M Z ) = 0.126 (which corresponds to α MS s (M Z ) = 0.118), and the MMHT2014 NLO PDF set [65] with five active flavours, interfaced through LHAPDF 6 [66], as far as initial state radiation is concerned. 3 Hard processes are simulated at leading order (see the previous discussion), using the Matchbox infrastructure powered by amplitudes generated by Mad-Graph5 aMC@NLO [11]. In e + e − collissions, we consider di-jet production; at hadron colliders, in addition, we consider stable Z-boson Drell-Yan production, (e + e − j) production within the mass window 66 GeV < m ll < 116 GeV around the Z mass, as well as production of a stable, 125 GeV, Higgs accompanied by zero or one jet. In the presence of additional jets in the hard process we use FastJet [67,68] to perform the generation cuts; analyses are performed throughout using the Rivet framework [69], with analysis modules based on existing experimental and generic Monte Carlo implementations. In e + e − collisions, where we choose a centre of mass energy of √ s = 100 GeV as baseline, we reconstruct jets with the Durham algorithm [70], while the hadron collider setup reconstructs anti-k ⊥ jets with a radius of R = 0.4 within a rapidity range |y| < 5 and a transverse momentum threshold of p ⊥ > 20 GeV. Parton level without multiple interactions and hadronisation is employed, and partons up to and including b-quarks are treated as massless objects. Both parton showers mentioned above use a p ⊥ cutoff prescription with a value of µ IR = 1 GeV. Electroweak parameters are kept at their default values.

Consistency Checks
The ability to compare different algorithms puts us into the unique position of performing a number of consistency checks for the uncertainty estimate that we advocate. In particular, perturbative error bands should cover algorithmic discrepancies, if these algorithms are expected to deliver the same accuracy. If that is not the case then the algorithm at hand is questionable. Furthermore, by construction the shower approximates emissions in the soft and collinear region. If we force the shower to produce hard emissions, larger uncertainties are to be expected by a controllable prescription. Another point is the possibility of double counting hard emissions. The shower should not cover phase-space regions that are already covered by the hard process input. This property is typically reflected in demanding that observables that receive input at fixed order are not significantly altered by subsequent showering. Clearly, the definition of 'region', which in this case is covered by the veto scale on hard emissions (see Sec. 3 for a more detailed discussion), is again only precise to the level of accuracy covered by the parton shower and varying this boundary should serve as a measure of missing logarithmic orders. We emphasise that a boundary chosen to be far away from the correct ordering behaviour may introduce severe double counting issues, ultimately impacting on a resummation of a tower of logarithms which is not typical to the process, i.e. not encountered in any higher order corrections to an observable considered. Furthermore, the perturbative uncertainties for observables in phase-space regions that do not receive logarithmically enhanced contributions should be driven by the hard scale alone, while the other scales have negligible impact. Logarithmically sensitive observables, on the other hand, should be altered by the parton shower and the uncertainties should be driven by all possible scale variations together. The setting where this is least clear is pure jet production, which we will address amongst other 'jetty' processes in Sec. 5.

Phase-Space Restrictions and Profile Choices
The quantity central to parton showers is the splitting kernel. Its exponentiation gives rise to the Sudakov form factor, which regulates the divergence of the splitting kernel for soft and/or collinear emissions. On top of this, there are two further crucial ingredients (besides formally subleading, though not necessarily small issues like kinematic parametrisations): The evolution variable chosen, and the phase space accessible at a fixed value of the evolution variable. Emissions are typically further subject to an upper bound on their hardness. This cannot be directly deduced from a priori principles but should be chosen in the order of magnitude of the typical hardness scale of the process being evolved to avoid the double counting issues mentioned before.
The central point we are concerned with in this section shall be summarised in a simplified model of final state radiation. Quite generally, we have to consider three different scales: a hard scale K ⊥ defining the phase space available to emissions at a fixed transverse momentum; a veto scale Q ⊥ defining the maximum transverse momentum available to emissions; and the kinematic limit of transverse momentum, R ⊥ . We consider the p ⊥ spectrum of a single soft emission with splitting kernel (possibly after an appropriate transformation of the evolution variable into a transverse momentum) where C i is the colour factor associated with the emitting leg. The longitudinal momentum fraction, z, has limits that read in the presence of a hard scale K 2 ⊥ . With emissions weighted by κ, an arbitrary function of a veto scale Q 2 ⊥ , we find a p ⊥ spectrum of the form dP dp 2 ⊥ dz with the Sudakov form factor R ⊥ denotes the scale that makes all of phase space available to emissions, while we denote the infrared cutoff by µ 2 IR (we have not shown the zero p ⊥ , non-radiating event contribution). Once a hard cutoff κ( is chosen, this setup is known to reproduce the right anomalous dimensions. It has to be applied to a full evolution in a hierarchy Q 2 ⊥ is chosen and the form of the z boundaries being crucial to produce the correct logarithmic pattern [25,45]. Instead, if one desires to make all of the phase space available to parton shower emissions, K 2 ⊥ = R 2 ⊥ is chosen and no other than the kinematic constraint p 2 ⊥ < R 2 ⊥ is in place. 4 We have here considered the freedom of ensuring suppression of such emissions by an arbitrary function κ. We call this weighting function a profile scale choice. One of the subjects of the present study is to identify sensible profile scale choices; we stress that such a choice is of algorithmic nature and not an intrinsic source of uncertainty. We will consider the following choices, depicted in Fig. 2: , which is expected to reproduce the correct tower of logarithms;

and quadratically interpolating in between.
This profile is expected to reproduce the correct towers of logarithms, and switches off the resummation smoothly towards the hard region (currently we use ρ = 0.3 5 ): which is also referred to as damping factor within the POWHEG-BOX implementation [7]; and power shower: imposing nothing but the phase-space restrictions inherent to the shower algorithm considered.
Different combinations of R 2 ⊥ and K 2 ⊥ can be achieved within the two showers. In particular, the dipole shower is able to populate the region up to K 2 ⊥ = R 2 ⊥ ('power 4 Typically, the splitting kernel for exact phase-space factorisation is then accompanied by a damping factor ∼ 1 − p 2 ⊥ /R 2 ⊥ towards the edge of phase space. 5 In principle ρ should be varied with a reasonable range, though we do not expect a big effect from this variation, given the similarities between ρ = 0.3 and ρ = 0 corresponding to the theta profile; see the following sections. shower'), while, for 2 → 1 processes at hadron colliders the angular-ordered phase space, by construction, imposes K 2 ⊥ = Q 2 ⊥ to be the mass of the singlet which is produced. The leading logarithmic contribution of the z integration at this simple qualitative level is given by We shall illustrate the impact of the profile scale choice κ on the Sudakov form factor by considering a fixed α s , and evaluate To obtain the desired resummation properties, namely a number of limitations on κ and the other scale choices need to be imposed. Clearly, the limiting cases for small and large transverse momenta need to be reproduced; While this is the case for all of the profiles we considered in this study, it is not sufficient to produce the desired tower of logarithms. Imposing the former restriction we still require that: ⊥ for the term involving the derivative of κ to become subleading.
Specifically the first restriction is only guaranteed by either the angular-ordered phase space which naturally imposes this restriction, or the restricted phase space chosen for the dipole shower. The second restriction also excludes choices of κ providing a ratio of logarithms to effectively replace K 2 ⊥ by Q 2 ⊥ in the first term in Eq. 7. To this extent, we conclude that only those profiles that are narrow smeared versions (in the sense of varying only in a region where Q 2 ⊥ /q 2 ⊥ is of order one) of a theta-type cutoff will provide the proper tower of logarithms. Choices such as the resummation profile are desirable to avoid discontinuities introduced by the theta-type cutoff which are beyond the accuracy considered, while keeping the resummation properties of the parton shower; the profile we consider here is only one such kind, and there is no restriction on the exact form considered. The name 'profile' is chosen since the treatment of the hard scale we consider here closely resembles prescriptions on scale variations within the analytic resummation context [71].

Identifying a 'Resummation Scale'
The hard veto scale Q 2 ⊥ is the scale that, when considering transverse momentum spectra as outlined in the previous section, is closest in role to a resummation scale in analytical resummation. However, it is not typically the same as a shower starting scale. Though in our case this statement is true for the dipole shower, no such notion exists for the angular-ordered shower where typically the masses of the emitting dipoles set the shower starting scale owing to the angular-ordered phase space. In the latter case, emissions exceeding the transverse momenta of jets present in the hard process are possible and an additional veto on transverse momenta generated by the shower is applied; the value of this veto is, in this case, the analogue of the starting scale of the dipole shower. The resummation scale of the typical q ⊥ -resummation can thus not directly be related to an analogue hard scale present in different shower algorithms, especially when they evolve in a variable different from the transverse momentum and hence built up the full spectrum from multiple, differently ordered emissions.
For both the showers considered here, transverse momenta of parton shower emissions are expected to be limited or suppressed by the scale Q 2 ⊥ , which on very general grounds should thus be of the order of a typical scale of the hard process, i.e. the factorisation scale. As with fixed-order calculations the residual dependence on Q 2 ⊥ is expected to become smaller as more and more logarithmic orders are incorporated. This implies a pattern of scale compensation through successive logarithmic orders, which a parton shower can typically only guarantee at the level of at most next-to-leading logarithms (NLL).

Scale Variations
Having chosen a reasonable profile scale and value of K 2 ⊥ ∼ Q 2 ⊥ , the leading behaviour of the Sudakov exponent takes the well-known form where the highest level of accuracy one can hope for with coherent evolution is NLL accuracy, neglecting subleading colour correlations, at least for some observables and typically limited phase-space regions [64], along with a two-loop running of α s . In addition to variations of the scales in the hard process, µ R/F = ξ H µ R/F , we vary both the hard veto scale, µ Q = ξ Q Q ⊥ , and the arguments of α s and the PDFs in the parton shower splitting kernels, µ S = ξ S q ⊥ . We constrain the number of possible variations to be ξ ∈ [1/2, 1, 2]. This spans a cube of, ξ H ⊗ ξ S ⊗ ξ Q , 27 combinations. All these choices are connected to logarithmic scale choices. There is therefore no a priori way of reducing their number. We emphasise that in principle only the full 27-point envelope constitutes a comprehensive uncertainty measure. We therefore always produce the full envelope along with envelopes for each of the individual variations. Using this it is possible to observe which scale drives the overall uncertainty in a particular region of phase space. While one clearly expects the variation of Q 2 ⊥ to cancel out to the level of NLL accuracy [72] (if this is indeed resembled by the parton shower), the situation is less clear for the other variation and different proposals have been made as to what extent the contribution at the level of NLL contributions should be canceled (see e.g. [73] for a discussion) or otherwise considered as a probe of where precisely higher accuracy of the shower is missing. We do not consider introducing any terms that cancel these variations to the NLL order, and postpone a detailed analysis of this issue to future work. We do, however, analyse these variations as we are convinced that they are another clean handle on controlling where we expect, specifically, soft emissions and contributions by the hadronisation model to dominate. A recent Les Houches study [74] has also shown that, when not taking into account the full variations of this kind, discrepancies between different shower algorithms, which are expected to be similar, are not covered within these variations.

Real-life Constraints
Besides the unclear definition of a resummation scale in the context of different shower algorithms, another word of caution needs to be raised when considering the hard veto scales: There are cases in which there is no meaningful variation as the hard scale is a fixed quantity such as the mass of an independently evolving final state emitting system, e.g. showers in e + e − → hadrons. It is not clear how one would quantify the respective shower uncertainty in this case, besides looking at shape differences encountered at different centre-of-mass energies of the e + e − collider to quantify the scaling of the predictions with respect to ratios of the hard scale to the infrared sensitive quantity considered. Already this observation clearly marks the fact that no claim of a full and well-understood uncertainty recipe can be made at this point, but only are we able to perform initial steps in this direction. Similarly for the power shower there is no meaningful variation of µ Q . It can also happen, as is the case for the angular-ordered shower, that the algorithm chosen naturally imposes an upper bound on the hardness of the emission. In the case of Drell-Yan type processes, the angular-ordered shower, for example, will only allow for a down-variation of Q 2 ⊥ and is thus questionable as to whether this variation in these cases is the right measure.
As with the small scales probed in the evolution of the parton shower, µ R,F variations in the shower may actually encounter regions where typically some cutoff or freezinglike behaviour is imposed to both, the running of α s and the parton distributions functions, which may result in interesting dynamics when variation of such small scales is used to infer uncertainties -a variation of the freezing prescription may thus be desirable, as well.

Clean Benchmarks
To begin exploring the uncertainties that arise from the considerations of Sec. 3 we start by studying 'clean benchmarks', i.e. hard processes with the least number of legs: e + e − annihilation, and Drell-Yan-type 2 → 1 processes producing either a Z or Higgs boson. For the case of e + e − collisions, the notion of a hard veto scale does not directly exist owing to the fact that the phase-space boundary and relevant hard scale coincide. However, we can compare variations of the collision energy and quantify this impact at the level of normalised distributions to acquire a handle on variations of the logarithmic structure similar to hadron-hadron collisions 6 . On top of the three scales µ H,S,Q described above, we vary the infrared cutoff of the shower by a factor of 1/2 and 2 for the e + e − setting, in order to obtain a first indication of how much dynamics of the shower is expected to be absorbed into hadronisation effects; notice that varying the argument of α s may serve a similar purpose.

Final State Showers
e + e − → qq provides the clean environment to study final state radiation. Note that in this case the power and theta profile coincide, which is also our choice in the following.
The Thrust distribution, Fig. 3, shows good agreement between showers; this is true both for the central prediction and its variations, and shows that they possess the same resummation accuracy. Differences that do emerge between the showers are related to cutoff effects and nonradiating events in the region towards T = 1; these offer no insight into the resummation properties. A further difference emerges from the dead-zone of the QTilde shower, however this is a region that can be supplemented by using matching or ME corrections. For this observable we note that the √ s and µ S variations are similar in magnitude. In Fig. 4 we show results for the integrated two-jet rate; the uncertainties are dominated by √ s as well as cutoff variations at small y cut . Again, the overall uncertainties are comparable between the showers; as expected, we obtain large uncertainties in the small y cut region, which is dominated by hadronisation effects.

Initial State Showers
As far as initial-state showering is concerned, we investigate a gluon-initiated process pp → H (in the large-m t effective theory), and a quark-initiated process pp → Z; these particles are set stable for simplicity. Inclusive observables, such as the rapidity of the resonance in this case, are quantities expected to be well described by the matrix element, and thus should be unmodified by the parton shower; this is reflected in Figs. 5 and 7 where both showers display good agreement, with uncertainties mainly driven by the hard process variation. The differences in magnitude should be attributed to different cou-  plings for each process, with envelope shape differences attributed to the PDFs.
The jets in these samples are generated solely from the parton shower; therefore the p ⊥ of the leading (hardest) jet directly probes the impact of the profile scales.   Comparing Figs. 6-10, we find that the different profile choices exhibit significantly different behaviours, both amongst themselves as well as between different showers.  The resummation and theta profiles, as intended, yield comparable results in terms of central predictions and uncertainties and across the different shower algorithms. This clearly shows that we can indeed expect the same resummation accuracy using these profiles. The variations towards high p ⊥ for the theta profile expose the effect of the different phase-space limitations. In the QTilde shower the upward variation of the scales (µ Q ) is ultimately irrelevant, as there are no possible emissions at this scale; looking at the dipole shower one sees the effect of such variations. However, this is not the case for the resummation profile whose interpolating region is sensitive to such variations, and displays similar variations between showers. For large transverse momenta, the uncertainties should reflect the case that parton-shower emissions in these regions are unreliable. We observe this for both the theta and resummation profiles and to some extent for the hfact choice, though the variation is considerably smaller than indicated by the theta-type choices. The power shower, however, shows no increased uncertainty and in fact is dominated by variations of µ H . Given the marked differences in the hardness of jets between the two showers, the power shower seems to offer no handle towards the assessment of shower uncertainties. We can also clearly observe the intrinsic limitation of the QTilde shower phase space,  that in this case is not able to populate high-p ⊥ emissions which ultimately needs to be supplied by matching and/or matrix element corrections similarly to the 'dead zone' effect in e + e − collisions. We therefore conclude that within this basic setting the showers and profile scale choices do admit the expected behaviour, and the two showers using theta-type profiles exhibit similar central predictions and uncertainties.

Jetty Processes
Having established shower uncertainties using simple benchmark processes, the next simplest examples are the processes studied in Sec. 4 with an additional hard emission off the hard process, e.g. H/Z plus one (inclusive) jet. In addition, pure di-jet production is investigated because of the absence of a colour singlet setting a hard scale and the related ambiguities in possible hard scale choices. We do not investigate the shower cutoff as we shall now focus on properties which are not expected to be significantly altered by hadronisation effects.
As with the clean benchmarks presented in Sec. 4, we consider variations of the three relevant scales discussed in Sec. 3, changing them by factors 1/2 and 2, respectively, to span a cube of a total of 27 variations; we will also perform  cross-validations between both available showers. From arguments given in Sec. 3 we expect observables and/or regions in phase space where the uncertainty is mainly driven by ξ H , i.e. in the case of inclusive observables. As all uncertainties connected with scale choices stem from logarithmic arguments there is no a priori way to exclude any of the possible variations when determining shower uncertainties, unless one is able to identify scale compensation patterns between the different scales for which we see no evidence in the setting considered in this study. For the rapidity distributions of the Higgs and Z boson, shown in Fig. 11 and 12, respectively, we find that the distributions are consistent with the prediction of the hard matrix element, as is expected from such inclusive quantities; this applies to all of the profile scales considered, with the power shower showing larger deviations in the forward region. Scale variations affect these observable mainly through variations present in the hard process.
Similarly to the rapidity distributions, we expect the p ⊥ -spectra of the leading jet to be predicted mainly by the hard matrix element, according to the consistency conditions discussed in Sec. 2.5. In Fig. 13 and 14 (H and Z production, respectively) we show the results for the QTilde shower, while Fig. 15 and 16 contain our findings for the Dipole shower. We again find that the uncertain- ties are dominated by the variation of the hard scale. For both showers the resummation profile is consistent with the hard matrix element prediction, except for jets close to the threshold where cut migration effects are being probed 7 . For the hfact profile with the QTilde shower we find a spectrum compatible with the one anticipated by the matrix element; for the dipole shower, a significantly harder spectrum is obtained. A similar, but even more dramatic picture emerges for the power shower setting. The spread of predictions for the QTilde shower is smaller than the spread for the dipole shower, owing to the intrinsic limitations of the phase-space volume available to angular-ordered emissions as already pointed out in the previous sections. The combinations QTilde plus power, and Dipole plus hfact or power contradict the criterion of controllable showering, which in this case is expected to not significantly alter the jet p ⊥ spectrum. Combined with the empirical findings of Sec. 4, we will therefore not consider the power shower profile choice any further. Turning to more exclusive observables 8 , we consider the angular separation between the boson and the lead-  LO ⊕ PS (pow) ME Fig. 13. Transverse momentum of the leading jet for Higgs plus one (inclusive) jet as computed by the QTilde shower for resummation (red), hfact (lime) and power (brown) profile compared to the ME (black) prediction. Top ratio plot: same as before. Other ratio plots: resummation, hfact respectively power profile with full error band vs. variation of only either µH , µQ or µS.
ing jet ∆R (H/Z)j , which probes both matrix element and LO ⊕ PS (pow) ME Fig. 14. Transverse momentum of the leading jet for Z plus one (inclusive) jet as computed by the QTilde shower for resummation (red), hfact (lime) and power (brown) profile compared to the ME (black) prediction. Top ratio plot: same as before. Other ratio plots: resummation, hfact respectively power profile with full error band vs. variation of only either µH , µQ or µS. LO ⊕ Dipoles (pow) ME Fig. 15. Transverse momentum of the leading jet for Higgs plus one (inclusive) jet as computed by the Dipole shower for resummation (red), hfact (lime) and power (brown) profile compared to the ME (black) prediction. Top ratio plot: same as before. Other ratio plots: resummation, hfact respectively power profile with full error band vs. variation of only either µH , µQ or µS. LO ⊕ Dipoles (Pow) ME Fig. 16. Transverse momentum of the leading jet for Z plus one (inclusive) jet as computed by the Dipole shower for resummation (red), hfact (lime) and power (brown) profile compared to the ME (black) prediction. Top ratio plot: same as before. Other ratio plots: resummation, hfact respectively power profile with full error band vs. variation of only either µH , µQ or µS.
shower dominated regions in a continuous observable: Matrix element emissions in this case can only populate the phase-space region ∆R (H/Z)j ≥ π. The region below is solely filled by the parton shower, typically operating at the boundary of validity of the underlying approximation as this phase space requires the shower to produce a hard, large-angle emission. Within the definition of controllable and consistent uncertainties, we therefore expect large uncertainties for ∆R (H/Z)j ≤ π, while the shower should reproduce the matrix element dynamics above. Results for the QTilde shower are shown in Fig. 17 and 19 (H and Z production, respectively) and for the Dipole shower in Fig. 18 and 20. We place particular emphasis on the comparison of the resummation and hfact profiles. For all processes/showers we find that hfact predicts a small uncertainty band and produces slightly more hard jets; the latter can be attributed to the available phase space, while the former can be obtained by analysing Eq. 7, stressing the fact that the region in which the derivative of the profile is varying significantly extends over a larger region than for the other profiles, though with less overall variation implied. Contrary, and matching the expectations motivated by the logarithmic structure, the uncertainty for the resummation profile in the small ∆R ( breakdown shows how different subsets of variations constitute the full uncertainty band. Besides the simple domination of the uncertainty by one variation, other regions of phase space show that the uncertainty is strongly underestimated by considering the variations separately. We therefore argue that only the full, combined, scale variation produces a reliable error band. As another probe of the interaction of shower emissions with the hardest jet, we consider k ⊥ -splitting scales, particularly the one in which an event with two jets would turn into an event with one jet as the jet p ⊥ threshold passes through the scale obtained. These observables have also been proven to be accessible to analytic considerations [75], for which comparisons to full parton showers are highly desirable though are beyond the scope of this paper. In Fig. 21 we show our results for the QTilde shower for Higgs production 9 . Once again we compare the resummation profile choice with the hfact profile. It is noteworthy that the hfact profile introduces a strong change in the shape of the Sudakov peak, on top of the harder spectrum already observed for the first jet; besides the tail effects we are therefore concerned that profile scale choices along these lines may significantly impact the resummation properties of the parton shower, as may already be expected from the  arguments presented in Sec. 3. We therefore conclude that, even with intrinsically restricted phase space, the hfact profile does not provide controllable uncertainties and will not be taken further into account in this study. We also use Fig. 21 to perform an comprehensive breakdown of the different variation directions in the 'cube' of possible variations, showing that no individual variation actually covers the full dynamics present. For LO plus PS simulations, we therefore argue that the full band is taken into consideration and improvements in the context of matching and merging will be subject to future work. We have so far considered processes with a colourless, massive object that dominates the scale hierarchy at hand, and, even in the presence of an additional jet, makes the dynamics rather insensitive to additional radiation (as far as this radiation is confined to reasonable phase-space regions as identified above). A process where this is clearly not the case is pure jet production in hadron collisions, which also probes different colour structures that have not been encountered in the hard processes considered thus far. Owing to the back-to-back configuration at lowest order, we expect considerable parton-shower effects in comparison to the hard matrix element for a number of observables and expect to make a more detailed comparison to fixed order only once NLO improvement has been incorporated. Nevertheless, we can still test as to what extent the shower variations match up to expectations  in signalling regions where the prediction should generally be considered unreliable. We also test, once more, if the two showers are comparable within their uncertainties. Following the previous arguments, we only consider the resummation profile, with a hard scale again given by the jet p ⊥ . Sample results comparing to the hard matrix element are shown in Figs. 22, 23 and 25, which show that the two showers preform in a very similar way both in their central predictions and variations; they also show that qualitatively we find a behaviour similar to the singlet plus jet benchmarks as if we had replaced the hard, colourless, object with a jet as hard probe. Quantitatively, however, we observe significant changes in rates for the second jet, which need to be confronted with the impact of cut migration as well as the impact of higher order corrections. We also note that choosing the hard veto scale in this setting has a significant impact on showered results.
With the transverse momentum of the third jet and the 2 → 3 resolution shown in Figs. 24 and 26 we consider purely shower driven quantities; both of these nicely reveal that the two showers, together with the resummation profile, are perfectly compatible with each other, exhibiting the same resummation accuracy.

Conclusions and Outlook
We have performed a comprehensive and detailed study of the sources of uncertainty in parton showers, utilising the two parton-shower algorithms available in Herwig 7. We have investigated different choices of profile scales to approach the boundary of hard emissions, as these are highly relevant to effects that appear in the context of NLO plus PS matching. We have systematically categorised the sources of uncertainty and outlined their interplay with other simulation components, putting this study into context of a bigger work programme to eventually establish uncertainties for event generators in total. Focussing on the perturbative, parton-shower part, of the simulation, we have deliberately chosen LO plus PS calculations to establish a baseline of controllable and consistent variations that will allow us to subsequently identify improvements and reduction in these uncertainties as higher order corrections are included. We have found that profile scale choices are very constrained when applying consistency conditions on both central predictions (which should not alter input distributions of the hard process) and uncertainties (with large uncertainties to be expected in unreliable regions or regions dominated by hadronisation corrections), as well as stable results in the Sudakov region. Particularly the hfact and power shower configurations do not admit results compatible with these cri- The subsequent ratio plots compare the full error band to certain subsets of scale variation choices, namely: variation of µH , µQ, µS, µH with ξQ and ξS fixed to 2, µQ with ξH and ξS fixed to 2, µS with ξQ and ξH fixed to 2, µH with ξQ and ξS fixed to 0.5, µQ with ξH and ξS fixed to 0.5, µS with ξQ and ξH fixed to 0.5, µS and µH in a correlated manner and µS and µH in an anti-correlated manner. Fig. 22. Transverse momentum of the first jet in (inclusive) di-jet production as computed by the QTilde (red) and Dipole (blue) shower with the resummation profile compared to the ME (black) prediction. Top ratio plot: same as before. Subsequent ratio plots: QTilde (second) and Dipole (third) with full error band vs. variation of only either µH , µQ or µS. Transverse momentum of second jet dσ/dp ⊥ (jet

2)
[pb/GeV]  Fig. 23. Transverse momentum of the second jet in (inclusive) di-jet production as computed by the QTilde (red) and Dipole (blue) shower with the resummation profile compared to the ME (black) prediction. Top ratio plot: same as before. Subsequent ratio plots: QTilde (second) and Dipole (third) with full error band vs. variation of only either µH , µQ or µS.
teria. Utilising a resummation profile, which is very close to the theta cutoff for hard emissions as implemented in previous algorithms, we find that the angular-ordered and dipole-based shower algorithms are compatible with each other, both in central predictions and uncertainty claims, despite their very different nature.