Revisiting Radiation Patterns in $e^+e^-$ Collisions

We propose four simple event-shape variables for semi-inclusive $e^+e^- \to 4$-jet events. The observables and cuts are designed to be especially sensitive to subleading aspects of the event structure, and allow to test the reliability of phenomenological QCD models in greater detail. Three of them, $\theta_{14}$, $\theta^*$, and $C_2^{(1/5)}$, focus on soft emissions off three-jet topologies with a small opening angle, for which coherence effects beyond the leading QCD dipole pattern are expected to be enhanced. A complementary variable, $M_L^2/M_H^2$, measures the ratio of the hemisphere masses in 4-jet events with a compressed scale hierarchy (Durham $y_{23}\sim y_{34}$), for which subleading $1\to 3$ splitting effects are expected to be enhanced. We consider several different parton-shower models, spanning both conventional and dipole/antenna ones, all tuned to the same $e^+e^-$ reference data, and show that a measurement of the proposed observables would allow for additional significant discriminating power between the models.


Introduction
General-purpose event generators (see [1][2][3][4] for recent reviews) aim to give a complete description of high-energy interactions, down to the level of individual particles. They are extensively used as research vessels for exploring new approaches to phenomenological questions within and beyond the Standard Model, and they are relied upon to provide explicit simulations of high-energy reactions in a broad variety of contexts. The achievable accuracy depends both on the inclusiveness of the chosen observable and on the sophistication of the calculation itself. An important driver for the latter is obviously the development of improved theoretical models; but it also depends crucially on the available constraints on the remaining free parameters. Using existing data to constrain these is referred to as generator tuning.
The main experimental reference for final-state radiation and fragmentation studies is the process e + e − → Z/γ * → hadrons. Prior to and during the LEP era, a large set of event measurements were performed (see, e.g., [5][6][7][8]) and used to constrain the shower and hadronization models of the day, such as HERWIG [9], JETSET/PYTHIA [10], and ARIADNE [11]. Most of the relevant analyses were corrected to the particle level and have subsequently been encoded in RIVET [12]. This makes it straightforward to apply almost the same comprehensive battery of tests to any model today 1 . A main question we wish to examine in this study is whether the existing constraints are sufficient in the context of present-day models. The reasons to ask this question are threefold.
Firstly, current parton-shower models are, in fact, quite sophisticated, at least as far as pure finalstate radiation effects are concerned. For instance, they all include colour coherence (though the way this is achieved differs from model to model), the inclusion of dominant contributions of two-loop splitting kernels by suitable renormalization-scale choices (e.g., µ R ∝ p ⊥ ), and effects of momentum conservation (again with individual models employing different "recoil" strategies), and several even incorporate further subleading aspects such as gluon-polarization or helicity-conservation effects. Their precision is therefore typically much better than their nominal "leading-logarithmic" (LL) labels indicate; in comparison with the experimental uncertainties at LEP, differences on observables dominated by LL effects are typically too small to show up clearly (cf., e.g., [24]). It is therefore interesting to study whether more information can be extracted from variables designed to remove LL contributions and isolate specific subleading aspects.
Thirdly, the desire for reliable descriptions of jet production and jet substructure for signal and background estimates at the LHC is causing the subleading aspects of shower models and matrixelement matching strategies to come under increasing scrutiny, in particular in the context of the interplay between matching and tuning. While all shower and matching strategies are designed to have the same leading behaviours, they do exhibit differences at subleading levels, making subleadingsensitive observables especially interesting for cross checks.
In this paper, we are interested mainly in inclusive four-jet observables sensitive to coherence properties and to effective 1 → 3 splittings. The starting points are the θ * variable proposed in [31], θ 14 and M 2 L /M 2 H proposed in [45], and the energy correlation functions proposed in [46]. The former two, θ * and θ 14 , are designed to be sensitive to the coherent emission of a soft fourth jet from a threeparton state (with cuts restricting the opening angles of the jets, as will be described below), with a radiation pattern dictated by colour coherence. In particular, they can be used to test whether the angular distribution of the fourth jet is well described by a three-parton system represented by partons / dipoles / antennae, and how this description depends upon the choice of shower ordering variable. The latter two variables, M 2 L /M 2 H (the ratio of hemisphere masses) and the energy correlation functions, have sensitivity to the effective description of 1 → 3 splittings and the energy spectrum of the fourth jet, respectively, as will be discussed below. For all observables, we impose an explicit cut on the Durham k T resolution scale of the fourth jet, y 34 > 0.0045 (corresponding to ln(y 34 ) > −5.4), thus restricting it to be in the perturbative domain and avoiding possible contamination from B decays.
The salient properties of each shower model will be summarized briefly in section 2. As a cross check, and to ensure a fair comparison between the models, we tune all of them to the same reference data in section 3. The main study of soft-jet and event-shape variables is presented in section 4. Finally, we round off with conclusions in section 5.

Theory Models
Parton showers are not guaranteed to respect coherence. For example, in a traditional shower based on the collinear DGLAP formalism [47][48][49], the linear sum of n DGLAP splitting kernels (one for each parton in an n-parton state) can substantially overcount the amount of wide-angle soft radiation in comparison, e.g. [50], with (n+1)-parton matrix elements. Physically speaking, if we approximate the radiation from an n-parton ("colour-multipole") state by the incoherent sum of n monopole terms, there is a substantial risk that highly important destructive-interference effects will be neglected, leading to double counting of soft gluon emission.
It was found in the early eighties [51], that DGLAP-based parton showers can nonetheless be brought to agree with the correct soft limits of QCD (up to azimuthal averaging effects), by choosing the shower ordering variable to be proportional to energy times angle. This is the basis of the angularordered showers [25] in HERWIG++, which is the first shower model we include in our study.
An alternative DGLAP-based shower model is that of PYTHIA 8, the second model included in our study. In this framework [26], small opening angles are reinterpreted as corresponding to highly boosted colour dipoles. The resulting Lorentz-boosted DGLAP radiation patterns combined with an ordering in transverse momentum of the dipoles are used to obtain approximately coherent results.
A more formal definition of showers based on colour dipoles can be obtained by replacing the DGLAP splitting kernels by intrinsically coherent radiation functions such as Catani-Seymour (CS) dipole functions [39] or QCD antenna functions (also called Lund dipoles) [38,40]. These reproduce the leading collinear and soft singularities of QCD amplitudes for each single emission without the need of a particular phase-space restriction as present in angular-ordered showers. They can, however, differ in the ordering variable, affecting multiple emissions and hence potentially higher-order coherence properties. Another difference is the recoil strategy taken, which can lead to differences at the level of next-to-leading logarithms or beyond. In order to explore these ambiguities more fully, we include four different variants of dipole-antenna shower models in our study, two based on a dipole formalism and two based on antennae, with differences as follows.
For each radiation term, the dipole formalism identifies a single parton as the emitter, with a colour partner assigned to be the spectator. The recoil is constrained to be purely longitudinal, in the rest frame of the dipole pair. By itself, the dipole radiation function only accounts for half of the soft singularity of the dipole pair, and there is no collinear singularity associated with the spectator. There is a separate radiation term in which the roles of the two are reversed, such that the sum is correct in all the infrared limits. The preferred choice of ordering variable is transverse momentum, p ⊥dip , the relative transverse momentum of the splitting products with collinear direction defined by the spectator. This defines the third model included in this study. As a fourth option, we consider ordering in the virtuality of the splitting products, q dip (see table 1 for precise definitions).
In the antenna formalism, there is no unique distinction between emitters and spectators. Instead, a single antenna radiation function captures the collinear limits of both of the colour partners together with their full soft singularity, and a 2 → 3 kinematics map is used, which smoothly interpolates between the two collinear limits (both parents generally acquire some recoil). In this context, it has been shown explicitly [44] that the choice of p ⊥ as evolution variable absorbs all logarithms through

Radiation Kinematics
Ordering variable functions (a.k.a. recoils) for gluon emissions Table 1: The six shower models considered in this paper. The ordering variables shown correspond to I → ij for the DGLAP models, the same with K as the spectator for the CS dipole models, and to IK → ijk for the antenna ones. We use the notation Q 2 the fraction of the light-cone momentum of parton I carried by parton i, in the DGLAP functions, P (z).
second order in α s (i.e., up to and including α 2 s ln Q 2 corrections), hence this is the preferred choice, defining the fifth shower model in our study. As an alternative, we also consider ordering in antenna mass, which is known to exhibit an α 2 s ln Q 2 discrepancy with respect to second-order QCD [44]. A systematic comparison of the salient differences between these six different shower models is given in Tab. 1. Contours of constant value of each of the corresponding evolution variables are shown in Fig. 1, over the triangular dipole branching phase space. Labelling the pre-and post-branching partons by IK → ijk, the axes of the plots are defined by the dimensionless branching invariants Q 2 I /M 2 IK and Q 2 K /M 2 IK , so that the collinear singularities lie along the axes and the soft singularity lies at (0,0). Note that the DGLAP-and dipole-based evolution variables, 2-4, correspond to the evolution of a single parton, I, hence the corresponding radiation functions only have collinear singularities along the y axis; the antenna evolution variables, 5-6, correspond to the evolution of the IK antenna, with collinear singularities along both axes.
In order to focus on the pure shower aspects and make the models more directly comparable, a few non-default choices have been made in the context of our study. In particular for VINCIA, ME corrections at both LO [43] and NLO [44] were switched off, and we use the smoothly-ordered showers [43] with a one-loop running of α s . The HERWIG++ dipole-shower simulations likewise used a one-loop running and no matrix-element corrections nor NLO matching has been applied. For the default shower models (angular-ordered in HERWIG++ and p ⊥evol -ordered in PYTHIA), we use the respective default settings, which includes matrix-element corrections for the first emission, for both  Figure 1: Illustration of the progression of the shower evolution variables over the dipole phase space, for each of the models listed in Tab. 1. Note that 2-4 correspond to radiation functions whose only singularities lie along the y axis, while 1, 5, and 6 have singularities along both the x and y axes.
codes, and one-loop (two-loop) running for PYTHIA (HERWIG++), respectively. As a cross-check, we investigated the effect of including NLO matching for the p ⊥dip -ordered dipole shower of HERWIG++ and found that the four-jet observables, which we study here, are not sensitive to these corrections. An enlarged set of results, including plots of the last-mentioned study and strong vs. smooth ordering, will be included in [52].

Tuning
In order to compare the models on as equal a footing as possible, we first adjust ("tune") the shower and hadronization parameters of each model to the same set of existing LEP measurements. We perform this tuning with the PROFESSOR [53] tuning system, via analyses that are encoded in RIVET [12], for all shower models. This relatively agnostic (automated) tuning approach also makes it possible to make (relatively) objective statements concerning whether each shower model is able to describe the existing data with a similar quality 2 .
The goodness-of-fit per degree of freedom provides information about how well data measure-ments are described by the predictions of Monte Carlo (MC) event generators. It is defined as

Observables and Parameters
As observables for the tuning we use event shapes, identified-particle spectra, jet rates, particle multiplicities and b-quark fragmentation functions, provided by the ALEPH [5,54], DELPHI [6] and OPAL [55] experiments and by the Particle Data Group PDG [56]. The observables and their weights can be found in Tabs. 5-7 in the appendix. The parameters for the hadronization and shower models of HERWIG++, PYTHIA 8 and VINCIA, that we readjust here, can be found in Tabs. 9 and 10 in the appendix, together with a short description.
After performing a first tune with HERWIG++ we obtain flat distributions in χ 2 for two parameters, the soft scale µ soft,F F and the smearing parameter Cl smr . Therefore, we keep Cl smr fixed at its default value and set µ soft,F F to zero for a slight increase of the value of the shower cutoff. This approach leads to slightly smaller values in the goodness-of-fit values since the minimization works better due to the reduction of the dimensionality of the parameter space.
To get a good description of the MC response by the interpolation function of PROFESSOR, we use a fourth-order polynomial. Due to fixing those parameters which exhibit flat distributions in χ 2 , as explained above, we remain with six parameters for each combination of shower and hadronization model. The minimal possible number of MC runs needed for the tuning is defined by the number of coefficients for the polynomial; here we need at least 210 runs. To get reasonable results we perform oversampling of about a factor 3, leading to 650 MC runs with different randomly selected values of the parameters that are tuned. We use 500 randomly selected runs 300 times to interpolate the generator response and check the quality of the interpolation by comparing the χ 2 of the interpolation response with real MC runs at certain parameter values. By removing parameter regions where the interpolation did not work sufficiently well we increase the quality of the interpolation. Unfortunately we cannot remove all bad regions for HERWIG++ since the values of some observables are not a smooth function of the gluon mass in the region where the MC predictions fit the data well. This is backed by the possibility of new splitting processes for higher gluon masses. We use the 300 different run combinations again in the tuning step where the goodness-of-fit is minimized in order to obtain the parameters that describe the observables best. Afterwards we perform real MC runs for these different parameter sets and calculate the real χ 2 /N dof to get the best tune.

Tuning Results
This section presents the results of the tuning process, starting with a short overview in terms of the total χ 2 /N dof values for the different shower models. In order to validate the results of the tuning, we apply different analysis tools. The results for the p 2 ⊥dip -ordered dipole shower are presented as an example for HERWIG++ and for the p 2 ⊥ant -ordered shower as an example for VINCIA. The parameter values obtained by the best tune are listed in the appendix, in Tab. 11 for HERWIG++ and in Tab. 12

Quality of the Overall Description
The goodness-of-fit function per degree of freedom, χ 2 /N dof , is listed in Tab. 2 for each of the shower models included in the study, before and after tuning. The previous (default) tunes of VINCIA and PYTHIA 8 already describe the existing LEP measurements very well. The description of the LEP data by the default angular-ordered tune of HERWIG++ is fine as well. Therefore only small improvements in the quality of the description of LEP data are achieved. Note that the angular-ordered shower is the only one that describes the mean particle multiplicities better than the other observables. In the context of the string-based models, one would presumably need to include the spin-and flavoursensitive parameters in the tuning as well, to reoptimize the agreement with the mean identifiedparticle multiplicities. We did not look into this here, since the four-jet observables we investigate are not sensitive to the particle composition, and since including these parameters would have greatly inflated the dimensionality of the parameter space. For the HERWIG++ dipole shower, for ordering in transverse momentum as well as for ordering in virtuality, the tuning greatly improved the quality of the description of the LEP data. The goodnessof-fit values are reduced by factors up to 17.
In terms of the overall description of the LEP data, VINCIA with ordering in transverse momentum fits the data the best, followed by PYTHIA 8 and VINCIA with m 2 ant -ordering. Especially the two p ⊥ordered models achieve very similar χ 2 /N dof values and hence cannot be told apart using the present data, nor does the mass-ordered version of VINCIA stand out very clearly after retuning. (Among the event shapes, the in-and out-of-plane p ⊥ distributions exhibit the most significant individual discrepancies with the data. We suspect colour-reconnection effects may play a role for these distributions, an issue which is still very actively investigated [32,[57][58][59][60][61].) The three shower models interfaced to the cluster hadronization model in HERWIG++ come in at somewhat higher overall χ 2 /N dof values.
We note that all the LEP measurements used PYTHIA [62] or JETSET [62] to generate MC event samples for the detector correction, hence there may be a small systematic bias favouring the stringbased models (here PYTHIA 8 and VINCIA). HERWIG event samples were used as well, to estimate the systematic uncertainties. Therefore, the experiments claim that the observable distributions are independent of the underlying MC generator for the detector corrections within the experimental systematics.

Validation
The distribution of the χ 2 /N dof values of the 300 tunes, each based on 500 randomly selected runs at different parameter points, are plotted in Figs. 2 and 3 for two parameters for the HERWIG++ p 2 ⊥dip -ordered dipole shower and for VINCIA with ordering in p 2 ⊥ant . Narrow distributions indicate that the observables are very sensitive to this parameter. Broader distributions are obtained if either the observables are less sensitive to a parameter or, as for the Lund parameters a L and b L , if two parameters are highly correlated.
In order to verify the result of the generator tuning with PROFESSOR we perform real MC runs where we change only one parameter with randomly distributed values and set all other parameters to their new tuned value. We reproduce the histograms at the same parameter points by using the interpolation function calculated by PROFESSOR to model the MC response. The distribution of the goodness-of-fit is shown with respect to the parameter value for two different parameters for the HERWIG++ p 2 ⊥dip -ordered dipole shower and for VINCIA with ordering in p 2 ⊥ant in Figs. 4 and 5. The χ 2 /N dof value is split for the different groups of observables where the lines correspond to the interpolation result and the points to the real MC runs. The single observables enter in the calculation of the goodness-of-fit for a group of observables with the same weight as for the calculation of the overall χ 2 /N dof . By comparing the interpolation with the real generator response, the quality of the interpolation function can be evaluated as well. Figs. 4 and 5 show that the parameter values of the best tune, marked by the vertical line, are clearly favoured, mostly driven by event shapes. As mentioned above, we were not able to remove all regions for HERWIG++ where the interpolation did not work sufficiently well. This leads to the different χ 2 /N dof values for interpolation and MC runs. Since the quality of the interpolation is disrupted by the possibility for new splitting processes for higher gluon masses, identified particle spectra and mean multiplicities cannot be described very well. This affects of course also the other parameters. As shown in Fig. 5, the interpolation works better for VINCIA, where interpolation and generator response agree perfectly.
Besides the χ 2 /N dof distribution of the parameters we have shown here, we obtain parameters with flatter distributions as well. In addition some parameters prefer to be at the limit of the scanned range as occurring for example for the strong coupling α S within the tuning of PYTHIA 8 and VINCIA with m 2 ant -ordering.

Eigentunes
To estimate the uncertainty in the MC predictions in connection with changing the parameter values during the tuning, so-called eigentunes are performed. The parameters are varied along the eigenvectors in parameter space where the eigenvectors are obtained by certain changes, ∆χ 2 /N dof , in χ 2 /N dof . For each parameter two eigenvectors, one in the "+" and one in the "−" direction, exist.
If the goodness-of-fit were distributed as a true χ 2 function ∆χ 2 /N dof = 1 would correspond to a one sigma deviation and ∆χ 2 /N dof = 4 to a two sigma deviation from the minimum (i.e., the central tune), etc. Given, however, that none of the models achieves a χ 2 /N dof ≤ 1, the eigentunes can at most be used to give a rough indication of the range of accessible model variations near the respective minimum for each model. This is still valuable, as it can help us determine whether the central tunes of two (or more) different theory models could easily be retuned to give the same result or not, on a given observable (overlapping versus non-overlapping eigentune variation ranges).
We calculate two sets of eigentunes with PROFESSOR, corresponding to one-and two-sigma deviations, and perform MC runs to obtain envelopes around the central tune for each of the six different   theory models. For the four-jet observables we propose (see section 4), we find that the model differences are larger than the individual eigentune envelopes. Hence we conclude that these observables do have sensitivity to distinguish between the theory models within the limits of the tuning uncertainty. For all further studies we will use the central tunes.

Results
We consider hadronic Z events (photon ISR is switched off) and use the Durham k T clustering algorithm [63] to cluster all events back to two jets, keeping track of the intermediate clustering scales along the way. The 3 → 2 clustering scale is denoted y 23 = k 2 T 3 /m 2 Z , and so on for higher jet numbers. We require both y 23 and y 34 to be greater than 0.0045, to obtain an inclusive 4-jet event sample with minimal contamination from B decays and lower (non-perturbative) scales.
Strong ordering corresponds to y 23 y 34 . . ., while events with, e.g., y 34 ∼ y 23 should be more sensitive to the ordering condition and to the effective 1 → 3 splitting kernels. Further, we may in principle also keep track of which "side" each clustering step happens on, which can give us an additional handle on the relative contributions to the four-jet rate from "opposite-side" 1 → 2 ⊗ 1 → 2 splittings versus "same-side" 1 → 3 ones. Within the context of this study, however, we only explicitly used the former requirement (y 23 ∼ y 34 ), in the context of the definition of the M 2 L /M 2 H variable, though we note that the latter (same-side vs opposite-side sequential clusterings) is implicitly present along the M 2 L /M 2 H axis.

Observable 1: θ 14
We consider the event at the stage when it has been clustered back to four jets, and order the jets in hardness. To be sensitive to coherence we constrain the angles between the jets such that the first (hardest) jet lies back-to-back to a near-collinear jet pair, formed by the second and third jet; θ 12 > 2π/3, θ 13 > 2π/3 and θ 23 < π/6. From this near-collinear three-jet state we probe the emission angle of the soft fourth jet with respect to the first jet, θ 14 . Before presenting the main results for θ 14 , we note that, for HERWIG++ with default shower and hadronization parameters an enhancement for small values of θ 14 shows up due to surprisingly large non-perturbative effects. This enhancement decreases for the dipole shower due to changing the values of the hadronization parameters throughout the tuning, but unfortunately not for the angular-ordered shower. By checking the influence of the hadronization parameters on the shape of the distribution of θ 14 , we identify the mass exponent for daughter clusters, P split , as the cause of the enhancement. By keeping it fixed at a value of 0.6 during the tuning, we achieve a better agreement between hadron level and parton level for the normalized distribution of θ 14 , which we regard as physically more reasonable given the cut of y 34 > 0.0045. This distribution is shown in the upper row of Fig. 6 for keeping P split fixed on the right and for no constraints on the left. We see that the influence of hadronization is reduced strongly by keeping the hadronization parameter fixed. However, some sensitivity to nonperturbative effects is still left for small values of the observable θ 14 . The HERWIG++ dipole shower gives similar results in the comparison of the angular observables on hadron and parton level. In addition we show this comparison in the lower row of Fig. 6 for PYTHIA 8 and the p 2 ⊥ant -ordered shower of VINCIA. For PYTHIA 8 we can see a small enhancement for small values of θ 14 as well, whereas the predictions of VINCIA agree well with each other.
To compare the predictions of the different theory models, Fig. 7 shows the normalized distribution of θ 14 in the upper left plot. To show the differences more clearly and reduce the observable to a simpler quantity with better statistics, we divide the full θ 14 range into three regions labelled "Towards" (small θ 14 ), "Central" (intermediate θ 14 ), and "Away" (large θ 14 ). We may then consider the ratio between regions, In the "Towards" region, the first and fourth jet are collinear, while they are back-to-back in the "Away" region. Events where the fourth jet is a wide-angle emission from the three-jet system populate the "Central" region. We consider nine different possibilities for the exact divisions between the regions, listed in Tab. 3 and corresponding roughly to looser or tighter cuts. The ratios of the integrated θ 14 rates for each of the nine different region definitions are shown in Fig. 7.
Since large non-perturbative effects occur in the towards region for the HERWIG++ shower models, we consider the ratio of the central to away region to be the most robust observable. This ratio reflects the relative amount of soft wide-angle emissions to emissions where the first jet lies back-toback to all other jets in the event. Compared to the angular-ordered shower, the HERWIG++ dipole shower with q 2 dip -ordering predicts up to 30% higher values for this ratio; a very significant differ-  Table 3: Definition of the different regions for the asymmetry of θ 14 . Columns 2-5 specify the limits for the regions and the first column gives the numbering. The ratio of the central to towards region is built with the 2nd and 3rd column, central to away with the 2nd and 4th and towards to away uses the 4th and 5th column.
ence. The predictions of the p 2 ⊥dip -ordered dipole shower of HERWIG++ are very similar to the ones of PYTHIA 8 and lower than the predictions of all other shower models. VINCIA with both ordering variables agrees with the result of the HERWIG++ angular-ordered shower within the statistical errors.
To distinguish between the two evolution variables of VINCIA we can use the ratio of the central to towards region. This ratio reflects the relative amount of soft wide-angle emission compared to collinear emission. The predictions of the m 2 ant -ordered shower are about 35% higher than the ones of the p 2 ⊥ant -ordered shower. We expect this behaviour since the m 2 ant -ordered shower prefers wideangle soft over collinear emissions, see Fig. 1, whereas the p 2 ⊥ant -ordered shower prefers the opposite. The third ratio, shown in the lower right plot of Fig. 7, is the towards over away region, which is predicted very similarly by PYTHIA 8, HERWIG++ with p 2 ⊥dip -ordering and VINCIA with m 2 antordering. The predictions of these theory models are 20% to 30% smaller than the one of the angularordered shower of HERWIG++. The q 2 dip -ordered shower on the other hand produces values up to 40% higher. In both ratios including the away region, we see a 10% to 20% difference between the predictions of PYTHIA 8 and the p 2 ant -ordered shower of VINCIA, where PYTHIA 8 produces more events populating the away region. Thus, we conclude that these ratios have significant discriminating power between the models, including between PYTHIA 8 and VINCIA which appeared very similar in the global analysis.

Observable 2: θ *
In addition to the cuts for the previous observable, we require the fourth jet to be close in angle to the near-collinear (23) jet pair, θ 24 < π/2, in order to enhance the sensitivity to coherent emission off the (23) jet system. We then define our second angular observable as the difference in opening angles, θ * = θ 24 − θ 23 , and, similarly to above, we introduce the asymmetry, with respect to an arbitrary dividing point, θ * = x 0 , which separates the small-θ * region from the large-θ * one. The normalized distribution of θ * and the asymmetry (as a function of the dividing point x 0 ) are shown in Fig. 8. Due to the additional cut on θ 24 for this observable, the error bars are higher and thus the statistical power in discriminating the different theory models smaller. The only shower model which can be distinguished from the others is the q 2 dip -ordered dipole shower of HERWIG++. This model tends to predict more events where the difference in opening angles of the fourth and third jet is large, compared to the angular-order shower. The p 2 ⊥dip -ordered dipole shower of HERWIG++ predicts larger values for the asymmetry and therefore more events with a smaller difference in opening angles.

Observable 3: C
(1/5) 2 Ref. [46] defines the N -point energy correlation function (ECF) as where the sum runs over all particles of a jet. To be sensitive to the global event structure we replace this sum by the sum over all jets in the event. Thus, θ i 1 i 2 denotes the angle between two jets i 1 and i 2 . The ECFs are used to build double ratios We choose a value of β = 1/5 to give all angles about equal weights and to be sensitive to soft configurations. Sensitivity to collinear configurations can be achieved by choosing β = 2 and giving greater angles more weight. , on the left and its asymmetry with respect to the asymmetry axis, x 0 , on the right.
In the 4-jet events described in section 4.1 we use the 2-point double ratio where the sums run over the four jets. Due to the cuts on the angles between the jets, the events look like three-jet systems to the observable. This system contains two hard jets, jet 1 and jet (23) 3 , lying approximately back-to-back and a third soft jet, jet 4. With this notation Eq. (6) can approximately be written as Taking the small energy of jet 4 and the large angle θ 1 23 > 2π/3 into account, the denominator can be reduced to its first term, This leaves only the angles relative to the fourth jet and the energies as free parameters. For β = 1/5 all angles are weighted relatively equal and hence C (1/5) 2 is proportional to the energy of the fourth jet, relative to the remaining energy of the event.
For the normalized distribution of the 2-point double ratio, C (1/5) 2 , we see non-perturbative effects for HERWIG++ and the m 2 ant -ordered shower of VINCIA. For all of these shower models, hadronization and decays enlarge the number of events with a harder fourth jet, hence higher values of C (1/5) 2 . 3 The combination of the second and third jet. , and the according asymmetry are shown in Fig. 9. As indicated by the asymmetry, the p 2 ⊥ -ordered shower models of HERWIG++, PYTHIA 8 and VINCIA give similar predictions. Compared to that, the prediction of the m 2 ant -ordered shower of VINCIA is higher, as expected due to the preference of soft over collinear emissions during the population of phase-space.

Observable 4:
To force a "compressed" scale hierarchy, we impose the cut y 34 > 0.5 y 23 , and plot the ratio M 2 L /M 2 H of the invariant masses (squared) of the jets at the end of the clustering, ordered so that M 2 L ≤ M 2 H . With four partons at LO, the light jet mass, M L , is zero if both the 4 → 3 and 3 → 2 clusterings happen in the same jet, while it is non-zero otherwise. Thus, the region close to zero is sensitive to events with a 1 → 3 splitting occurring in one of the jets, while the region above ∼ 0.25 is dominated by opposite-side 1 → 2 splittings.
The normalized distribution of the mass ratio is shown on the left side in Fig. 10. The ratio plot shows that the difference between the theory models mainly occurs in the region for values M 2 L /M 2 H 0.3, leaving smaller differences per bin for higher values due to the normalization. To condense these difference we use the asymmetry, as defined in Eq. (3), whose values are shown on the right in Fig. 10 with respect to the asymmetry axis, x 0 . The asymmetry roughly reflects the relative amount of events with a 1 → 3 splitting occurring in one of the jets, divided by events with opposite-side 1 → 2 splittings.
Compared to the angular-ordered shower of HERWIG++, the p 2 ⊥dip -ordered dipole shower and PYTHIA 8 predict a higher value for the asymmetry, whereas the the predictions of VINCIA with both ordering variables and the q 2 dip -ordered dipole shower of HERWIG++ are smaller. Both evolution variables of VINCIA result in the same value for the asymmetry, whereas a difference of 5% to 12% occurs between PYTHIA 8 and VINCIA. For PYTHIA 8 and the p 2 ⊥ant -ordered shower of VINCIA we also see differences up to nearly 20% in the bins for small mass ratios. Thus, this observable can be used to tell these theory models apart.
Note that we obtain a distribution similar to the mass ratio by using the ECF, defined in Eq. (4). The 1-point double ratio is defined as where the sum runs over all particles of a jet. By building a ratio similar to the mass ratio we get with C 1,H . With the expansion cos θ ≈ 1 − θ 2 /2, the invariant mass squared of two particles is By using a value of β = 2, Eq. (10) is approximately equal to the mass ratio and we thus obtain similar results for the two variables.

Conclusions
We have studied four event-shape variables, designed to be sensitive to subleading aspects of the event structure in semi-inclusive e + e − → 4-jet events, with a cut on y 34 > 0.0045. Six different partonshower models were compared, available through the HERWIG++, PYTHIA 8, and VINCIA Monte Carlo codes. These models span a wide range of theoretical ideas, from conventional parton showers to ones based on dipoles and antennae, with different ordering criteria, different recoil strategies, and different radiation functions.
To make the comparison as fair and unbiased as possible, we first tuned all the theory models to the same set of existing LEP measurements, using the PROFESSOR and RIVET tuning tools. We find that the existing data already provides some discriminating power, with the models using string hadronization achieving somewhat lower χ 2 values than those based on cluster hadronization 4 . Therefore, it is important that we limit ourselves to draw conclusions only from observables that are not very sensitive to non-perturbative effects. VINCIA with ordering in transverse momentum provides the best overall description of the LEP data. Using just the existing data, however, it is nearly impossible to tell, e.g., PYTHIA and VINCIA apart, despite significant differences between the shower models. Although the HERWIG++ models are easier to tell apart already using the existing data, we do also see larger differences between them in the variables proposed here, corroborating our conclusion that the new observables add significant discriminating power.
We have shown that the observables proposed here, which are sensitive to coherence properties and to effective 1 → 3 splittings, allow for additional significant discriminating power between the different models, given a sample size of order 500k events or more. The theory models for the shower implemented in HERWIG++ can clearly be told apart by most of the observables we propose. Depending on the tuning parameters, however, the cluster model may generate rather large non-perturbative corrections to the 4-jet rate, especially at low θ 14 . For the θ 14 variable, we therefore also highlighted the integrated "Central/Away" ratio as an observable that should be particularly robust against corrections at low θ 14 . As expected and measured with the 4-jet angular observable θ 14 , we see that the m 2 ant -ordered shower of VINCIA predicts a higher ratio of wide-angle to collinear emissions from a three-jet system, compared to the p 2 ⊥ant -ordered shower. With the same observables as well as with the ratio of jet masses, M 2 L /M 2 H , we can distinguish between VINCIA and PYTHIA 8. The shower model of the latter produces more events where one hard jets lies back-to-back to the remaining jets of the event.
We round off by emphasizing that a comparison against corrected LEP data would be extremely interesting, and, we believe, of great importance to constraining the subleading properties of modernday QCD models.

Observable Weight
In-plane p ⊥ in GeV w.r.t. sphericity axes 1.0 In-plane p ⊥ in GeV w.r.t. thrust axes 1.0 Out-of-plane p ⊥ in GeV w.r.t. sphericity axes 1.0 Out-of-plane p ⊥ in GeV w.r.t. thrust axes 1.0 Mean out-of-plane p ⊥ in GeV w.r.t. thrust axis vs.   Table 6: Jet rates and the associated weights, taken from Ref. [55].
Weight b quark fragmentation function f (x weak B ) 7.0 Mean of b quark fragmentation function f (x weak B ) 3.0 Table 7: Observables for b quarks and the associated weights, taken from Ref. [54].

PSplit
Mass exponent for daughter clusters  Shower cutoff Hadronization a L aLund Parameter of the Lund symmetric fragmentation function b L bLund Parameter of the Lund symmetric fragmentation function a ED aExtraDiquark a parameter for diquarks, with total a = a L + a ED σ PTsigma Total width of the fragmentation p ⊥