Measurement of observables sensitive to coherence effects in hadronic Z decays with the OPAL detector at LEP

A study of QCD coherence is presented based on a sample of about 397000 $e^+e^-$ hadronic annihilation events collected at $\sqrt{s}=91$ GeV with the OPAL detector at LEP. The study is based on four recently proposed observables that are sensitive to coherence effects in the perturbative regime. The measurement of these observables is presented, along with a comparison with the predictions of different parton shower models. The models include both conventional parton shower models and dipole antenna models. Different ordering variables are used to investigate their influence on the predictions.


Introduction
Processes involving the strong interaction, described in the standard model (SM) by quantum chromodynamics (QCD), dominate in high energy particle collisions. It is therefore important to account for QCD effects and to model them accurately. Colour coherence, the destructive interference effect between colour-connected partons, is an important aspect of high energy collisions and QCD parton cascades. Coherence is itself a subject of considerable interest, and QCD offers a situation in which coherence effects in a perturbative framework can be studied in a uniquely precise way. Furthermore, by testing different theoretical schemes for coherence, QCD Monte Carlo (MC) event generators (see Refs. [2][3][4][5] for recent reviews) can be modified to better describe the results of experiments. For example, in new-physics searches at the CERN LHC, QCD multijet events often represent the most difficult SM background to characterize. Improvements in the reliability of QCD event generators may help to better constrain this background.
The e + e − annihilation process offers a favorable environment to study colour coherence, because the lack of strong interactions in the initial state allows simple and conclusive comparisons between experiment and theory. Previous studies of coherence in e + e − annihilation events are presented, for example, in Refs. [6,7]. Within the context of a QCD shower, coherence implies an ordering condition, such as a requirement that each subsequent emission angle in the shower be smaller than the previous angle [8,9]. However, there are many ambiguities in the definition of the ordering variable and in its implementation. In this study, we present the first experimental tests of recently proposed [10] observables designed to discriminate between coherence schemes. The data were collected with the OPAL detector at the CERN LEP collider at a centre-of-mass energy of √ s = 91 GeV. The observables examined here are based on four-jet e + e − annihilation configurations in which a soft gluon is emitted in the context of a three-jet topology, with two of the three jets approximately collinear. This event configuration has been shown to be favorable for the manifestation of coherence [11] and sensitive to the choice of the ordering variable in the shower [12].
We examine six different models for coherence, which are implemented in currently available QCD MC event generator programs. Specifically, we compare the defaultq 2 parton shower of HER-WIG++ [13] with angular-ordering, the p 2 ⊥dip -and q 2 dip -ordered dipole showers of HERWIG++ [11], the default p 2 ⊥evol -ordered shower of PYTHIA 8 [14], and the p 2 ⊥ant -and m 2 ant -ordered showers of VINCIA [15], a plugin to the PYTHIA 8 event generator that replaces the PYTHIA 8 shower with a shower model based on antenna functions. The definitions of the ordering variables are given in the next section, with more details presented in Ref. [10].
This paper is organized as follows. In Section 2, we define the observables to be used in the analysis. Sections 3, 4 and 5 present the detector, the data sample and simulation, and the data analysis, respectively. The results are presented in Section 6 and our conclusions in Section 7.

.1 Observables
We consider hadronic events from e + e − annihilation at the Z boson peak and use the Durham k T clustering algorithm [16] to cluster all particles of an event into jets, keeping track of the clustering scales along the way. The algorithm begins by assigning all particles in an event to a list. Each entry in the list is called a jet. The algorithm then computes, for all pairs of four-momenta i and j in the event, the distance measure where E i and E j are the corresponding energies and θ ij is the angle between objects i and j. The center-of-mass energy-squared is denoted by s. The pair of objects with the smallest y ij is combined by summing their four-momenta and the sum is added to the list while the original four-momenta are removed. This procedure is iterated until one entry is left in the list. To obtain an inclusive fourjet event sample in the perturbative regime we impose an explicit requirement on the value of the clustering scale at which the event goes from having four to having three jets. Denoting this scale (given by the value of min(y ij ) evaluated at the stage when the event has been clustered to four jets) by y 4→3 , we require y 4→3 > 0.0045 (corresponding to ln(y 4→3 ) > −5.4), as in Ref. [10]. This value originates from a compromise; on the one hand a smaller value results in a data sample with greater statistical precision, and on the other hand a larger value provides a more direct representation of the shower properties. The event topologies resulting from four-jet events with requirements on the angles between the jets: θ 12 > 2π/3, θ 13 > 2π/3, and θ 23 < π/6. b) Illustration of the observable θ 14 , the angle between the first and fourth jet in the latter events. c) The event topologies resulting from fourjet events with requirements θ 12 > 2π/3, θ 13 > 2π/3, θ 23 < π/6, and θ 24 < π/2. d) Illustration of the observable θ * = θ 24 − θ 23 , the difference in opening angles. e) The sketch shows event topologies where the third and fourth clusterings occur within the same jet and hence the mass ratio M 2 L /M 2 H is small. f) Events with large mass ratios, where the third and fourth clusterings occur on opposite sides of the event.
We investigate four different observables, where for the first three we consider the event clustered into four jets, and order the jets in energy. To be sensitive to coherence, the angles between the jets are constrained such that the first (hardest) jet lies back-to-back to a nearly collinear jet pair, formed by the second and third jet: θ 12 > 2π/3, θ 13 > 2π/3, and θ 23 < π/6. The event topology resulting from these requirements is shown in Fig. 1 a). To investigate QCD colour coherence effects we examine the following observables: • θ 14 , proposed in Ref. [12]: The emission angle of the soft fourth jet with respect to the first jet; a sketch can be found in Fig. 1 b).
• θ * , proposed in Ref. [11]: A restriction on the angle between the second and fourth jet, θ 24 < π/2, is imposed in order to require the fourth jet to be close in angle to the nearly collinear (23) jet pair, see Fig. 1 c).
The observable is the difference in opening angles, θ * = θ 24 − θ 23 , and is sensitive to coherent emission from the (23) jet system. A sketch of this observable is shown in Fig. 1 d).
, proposed in Ref. [17]:  23 4 denotes the angle between the softest jet and the (23) jet pair and analogously for θ 1 23 . Therefore the 2-point double ratio is mainly sensitive to the relative energy of the fourth jet.
Strong ordering in the parton shower refers to strong ordering of the clustering scales, y 3→2 y 4→3 . . ., with y (n+1)→n the value of the jet distance parameter in the Durham algorithm for which the configuration changes from n + 1 to n jets. In contrast, events with, e.g., y 4→3 ∼ y 3→2 are more sensitive to the ordering condition and to situations where the same parton participates in two splitting processes, hence to effective 1 → 3 splittings. For the final observable, we cluster events into two jets and apply the restriction y 4→3 > 0.5 y 3→2 . This forces events into a compressed hierarchy, i.e., a hierarchy without strong ordering. The investigated observable is: [12]: The ratio of the invariant masses-squared of the jets at the end of the clustering process, ordered such that M 2 L ≤ M 2 H . For "same-side" events, where one 1 → 3 splitting occurs, the mass ratio is close or equal to zero, whereas for "opposite-side" events with 1 → 2 ⊗ 1 → 2 splittings, the mass ratio is larger. In Fig. 1 e) and f) we illustrate examples of these event topologies. For references to heavy jet masses see, e.g. Refs. [18,19].
To exhibit the differences between the theory models more clearly, we introduce the asymmetry for a given observable x, where n i is the number of events in histogram bin i and x(i) is the bin center. The dividing point x 0 separates the regions with small and large values of x. We use this asymmetry for three of the four observables, θ * , C (1/5) 2 , and ρ, and thus introduce three dividing points: θ * 0 , C (1/5) 2,0 , and ρ 0 . As in Ref. [10], we divide the full θ 14 range into three regions labelled "towards" (small θ 14 ), "central" (intermediate θ 14 ), and "away" (large θ 14 ), denoted "T", "C", and "A" respectively. In the towards region, the first and fourth jets are collinear, while they are back-to-back in the away region. Events in which the fourth jet represents a wide-angle emission from the three-jet system populate the central region. We then consider the ratio between regions R j and R k , We define 9 different versions of the ratio, with different definitions of the regions, which are given in Table 1.

Theory models
For parton showers based on 1 → 2 splittings of a parton I to daughters i and j, momentum conservation requires that the virtuality of the branching parton must be compensated for by a recoil somewhere Table 1: Definitions of the θ 14 intervals used for the asymmetry ratios defined in Eq. (3). We define 9 different versions of the ratio with the labeling of the regions given in the first column. The ratio between the results in the central and towards regions is based on the definitions in columns two and three, between the central and away regions on the definitions in columns two and four, and between the towards and away regions on the definitions in columns four and five. Taken from Ref. [10]. else in the event; we refer to parton I as the "emitter" and to the parton (system) absorbing the recoil as the "recoiler". The six different theory models for the parton shower, mentioned in Section 1, are based on different formalisms and radiation functions: • In the collinear DGLAP formalism [20][21][22], each parton is evolved separately and undergoes 1 → 2 like branchings, which we denote p I → p i p j . In order to respect QCD coherence properties, a specific choice for the evolution variable [13,23], or additional vetos [24], are applied. The momentum balancing can either include all partons of the event, which we refer to as global recoils, or only one recoiler parton, which we refer to as local recoil.
• Another formalism is based on the Catani-Seymour (CS) dipole functions [25], where a single parton emission from a pair of partons is considered. We denote the momenta involved in this splitting process with p I p K → p i p j p k . The full splitting probability is partitioned into two pieces, corresponding to partons I and K, respectively, acting as the emitter with the other acting as the recoiler. The recoil is limited to the longitudinal direction of the recoiler parton in the rest frame of I and K. If the dipole shower uses an evolution with ordering in transverse momentum, the shower correctly reproduces the soft properties of QCD.
• In the QCD antenna (also called Lund dipoles) [26,27] picture, there is no fundamental distinction between the emitter and the recoiler. Each colour-connected parton pair of an event is represented by an antenna and undergoes a splitting process of the form p I p K → p i p j p k with a 2 → 3 recoil prescription. A single antenna thereby accounts for the equivalent of two CS dipoles.
The theory models we investigate here span all the above formalisms. For the ordering variables we use the notation Q 2 I = (p i + p j ) 2 , Q 2 K = (p j + p k ) 2 , and M 2 for the splitting processes as stated above. For the DGLAP-based models, the parton K acts as the recoiler and can either represent a single parton (PYTHIA 8) or multiple partons (HERWIG++). In the following we briefly describe the main differences between the theory models used in this paper, mostly concentrated on the aspects described above. HERWIG++q 2 [13], a parton shower model based on DGLAP splitting kernels, uses global recoils. The evolution is ordered in a variable proportional to energy times angle,q .
The shower includes a matrix-element correction for the first emission and uses two-loop running of α s . The QCD coherence properties are respected due to the angular ordering of the parton branching cascade. The second shower model in the HERWIG++ event generator is HERWIG++ p 2 ⊥dip [11], which is based on partitioned CS dipoles with local recoils within dipoles. The ordering variable is the relative transverse momentum of the splitting pair, We do not apply matching or matrix-element corrections and use one-loop running of α s . The dipole shower with ordering in transverse momentum respects QCD coherence. As an alternative we use the same shower model, but with a different ordering variable. HERWIG++ q 2 dip [11] orders the shower cascade in virtuality of the splitting pair, As before we do not apply matching or matrix-element corrections and use one-loop running of α s . VINCIA p 2 ⊥ant [15] is a shower model based on antenna functions with local recoils within antennae. The ordering variable is the transverse momentum of the antenna, Matrix-element corrections at LO [28] and NLO [29] are switched off and we use one-loop running of α s . Colour coherence is respected, since it is an intrinsic property of the antenna functions. Transverse momentum as the evolution variable is the preferred choice in VINCIA, as has been shown in Ref. [29]. However, we also use VINCIA m 2 ant [15] as an alternative to the transverse momentum ordering, which orders the shower evolution in antenna mass, defined as The last shower model is PYTHIA 8 p 2 ⊥evol [14], a parton shower based on DGLAP splitting kernels and ordered in transverse momentum, defined as In contrast to the angular ordered HERWIG++ shower, local recoils within dipoles are applied. A matrix-element correction for the first emission is included and we use one-loop running of α s . To obtain QCD coherence properties, the shower applies angular vetoes.
Besides the shower models used in this paper, there are several other models: ARIADNE [30], based on antenna functions, which is very similar to VINCIA; the CS dipole shower models of Weinzierl et al. [31], and SHERPA [32], which are similar to the HERWIG++ dipole shower; the deductor by Nagy and Soper [33], which is not interfaced with a hadronization model; and the virtualityordered final-state showers of PYTHIA [24,34], NLLJET [35] and HERWIRI [36].
To compare the models on as equal a footing as possible, the shower and hadronization parameters have been readjusted with the PROFESSOR [37] tuning system, utilizing LEP data available through RIVET [38]. The tuning procedure is described in Ref. [10], where the resulting parameter values can also be found.

OPAL experiment
The OPAL experiment at LEP operated between August 1989 and November 2000. The detector components were arranged around the beam pipe, in a layered structure. A detailed description can be found in Refs. [39][40][41]. The tracking system consisted of a silicon microvertex detector, an inner vertex chamber, a jet chamber, and chambers outside the jet chambers to improve the precision of the z-coordinate 1 measurement. The jet chamber was approximately 4 m long and had an outer radius of about 1.85 m. This device had 24 sectors each containing 159 sense wires spaced by 1 cm. All tracking systems were located inside a solenoidal magnet, which provided a uniform axial magnetic field of 0.435 T along the beam axis. The magnet was surrounded by a lead glass electromagnetic calorimeter and a sampling hadron calorimeter. The electromagnetic calorimeter consisted of 11704 lead glass blocks, divided into barrel and endcap sections, covering 98% of the solid angle. Outside the hadron calorimeter, the detector was surrounded by a system of muon chambers. Similar layers of instrumentation were located in the endcap regions.
Since the energy resolution of the electromagnetic calorimeter is better then that of the hadron calorimeter, the resolution of jet directions and energies is not significantly improved by incorporating hadron calorimeter information. Thus, our analysis relies exclusively on charged particle information recorded in the tracking detectors and on clusters of energy deposited in the electromagnetic calorimeter.

Data and MC samples
In the first phase of LEP operation, denoted LEP1 (1989 to 1995), the e + e − center-of-mass energy was chosen to lie at or near the mass of the Z boson, √ s ≈ 91 GeV. During the second phase of operation, denoted LEP2 (1995)(1996)(1997)(1998)(1999)(2000), the center-of-mass energy was increased in successive steps from 130 to 209 GeV. Interspersed at various times during the LEP2 operation, calibration runs were collected at the Z boson peak. In this analysis, we utilize data collected at √ s = 91.2 GeV during the the LEP2 calibration runs. This allows us to exploit conditions when the detector was operating in its final, most advanced configuration. In addition, this will facilitate possible future comparisons with data collected under essentially identical conditions at higher energies. We use a sample corresponding to an integrated luminosity of 14.7 pb −1 . This sample is of sufficient size that systematic uncertainties dominate the statistical terms. To correct the data in order to account for experimental acceptance and efficiency, simulated event samples produced with MC event generators are used. The process e + e − → qq is simulated using PYTHIA 6.1 [42] at √ s = 91.2 GeV. Corresponding samples using HERWIG 6.2 [43,44] are used for systematic checks. We examine the MC events at two levels. We refer to "hadron level" as events without event selection, and without simulation of the detector acceptance and resolution, for which all particles with lifetimes less than 300 ps decay. In contrast, "detector level" refers to MC events that are processed through the GEANT4-based [45] simulation of the OPAL detector, called GOPAL [46], and that have been reconstructed using the same software procedures that are applied to the data. The MC events generated for the detector-level samples are the same as the hadron-level samples except that K 0 S mesons and weakly decaying hyperons are declared to be stable, as these particles can interact with detector material before decaying, and so their decays are handled within the GEANT framework.
In addition, for comparisons with the corrected data, large samples of hadron-level MC events are employed, using the event generators HERWIG++ 2.7.0 [47], PYTHIA 8.176 [48], and VIN-CIA 1.1.0 [15] interfaced with the hadronization model of PYTHIA 8.176.

Selection of events
The same criteria for the selection of charged tracks and electromagnetic clusters are applied as described in Ref. [49]. Charged tracks are required to have transverse momentum relative to the beam axis larger than 0.15 GeV, and photons to have energies larger than 0.10 GeV (0.25 GeV) in the barrel (endcap) region of the electromagnetic calorimeter. The selection of hadronic annihilation events is the same as described in Ref. [50]. Briefly, a minimum of five charged tracks is required, and a containment condition | cos θ T | < 0.90 is applied, where θ T is the polar angle of the thrust axis [51,52] with respect to the beam axis, calculated using all accepted charged tracks and electromagnetic clusters. A total of 397 452 candidate hadronic annihilation events are selected, with a negligible expected background.
Since the energy loss due to initial-state radiation is highly suppressed at the Z peak, we do not apply a cut to that effect. However, radiative corrections are applied by requiring √ s − √ s < 1 GeV for the MC detector-level samples used to correct the data, where √ s is the effective center-of-mass energy after initial-state radiation.

Reconstruction and correction
For each of the accepted events, the values of all observables described in Section 1 are computed. To avoid double-counting of energy between tracks and electromagnetic clusters, an energy-flow algorithm [53,54] is applied, which matches the tracks and clusters and retains only those clusters that are not associated with a track. Figure 2 shows a comparison of the uncorrected data with the detector-level predictions of PYTHIA 6 and HERWIG 6 for the θ 14 , ρ, θ * , and C variables. The θ 14 and θ * variables are normalized by a factor of π. The simulations are seen to provide a generally adequate description of the measurements.
To correct the data for detector and resolution effects, we implement an unfolding procedure based on the ROOUNFOLD [55] framework. We use the iterative Bayes method, as proposed by D'Agostini [56], with four iterations, which is the recommendation from Ref. [55]. A necessary ingredient for the unfolding is the response matrix of the MC event generator used for the correction procedure. For the standard analysis, PYTHIA 6 is used to determine the response matrix. The response matrix gives the bin-to-bin migration from the hadron to the detector level, and vice versa. In order to obtain reliable results for the corrected distributions, we adjust the bin widths of the histograms such that the probability for a hadron-level event to migrate to a different bin at the detector level is less than 50%.
The corrected distributions are presented in Fig. 3 and Tables 2-3. Tables 2-3 include the covariance matrices calculated with ROOUNFOLD. The statistical uncertainties are given by the square root of the corresponding diagonal element in the covariance matrices. Systematic uncertainties are discussed in Section 5.3. Figure 3 includes the predictions of PYTHIA 6 and HERWIG 6 at the hadron level. The differences between the MC predictions and the data are seen to be similar to those observed at the detector level (Fig. 2), demonstrating that the correction procedure does not introduce a discernible bias.
The values of the derived distributions, i.e., the ratios of the different regions for θ 14 and the asymmetry for the other observables, are listed in Table 4. The quantities are determined by summing and dividing the histogram entries. The statistical uncertainties are evaluated from propagation of errors, while the systematic uncertainties are determined as described in Section 5.3.

Systematic uncertainties
Systematic uncertainties are evaluated by repeating the analysis with different selection requirements and with variations in the correction procedure. Specifically, we consider the following: • The requirement on the thrust angle direction is changed to | cos θ T | < 0.7 from the default | cos θ T | < 0.9.
• The minimum number of charged tracks is increased to seven from the default of five.
• Variation of the reconstruction procedure: All tracks and clusters are taken into account. In this case the detector correction takes care of the double counting.
• HERWIG 6 is used in place of PYTHIA 6 to determine the response matrix.
The systematic uncertainty is determined for each variation from the bin-by-bin difference in the corrected distributions with respect to the standard result. The total systematic uncertainty is given by the quadrature sum of the individual terms. The total uncertainty of the data is defined by summing the statistical and systematic contributions in quadrature. As additional systematic checks on the unfolding procedure, we consider the following variations: • We use the unfolding method with three and five instead of four iterations.
• Instead of the iterative method, we use the singular value decomposition, as proposed by Höcker and Kartvelishvili [57] and implemented in ROOUNFOLD.
We find the systematic variations that arise from these two checks to be smaller or comparable to the variation observed when using HERWIG 6 in place of PYTHIA 6. Since adding all these effects together would likely double count the uncertainty associated with the unfolding procedure we do not add the observed differences to the systematic uncertainty.

Comparison with Monte Carlo models
In this section, we present a comparison between the coherence schemes described in Section 2.   Table 2: The normalized corrected data and the correlation matrix at the hadron level for the emission angle θ 14 . The first uncertainty is statistical and the second systematic.    tuned parameter sets mentioned in Section 2.2. We present the predictions for the different schemes in terms of the observables defined in Section 1. As a measure of the level of agreement with data, we calculate the significance, defined as where MC i and D i represent the predicted and observed values in bin i of a distribution, with σ D i the corresponding uncertainty in D i . In the following, we present a distribution of the significance in a plot below the distribution of the variables. In addition we calculate the χ 2 values for the distribution as where V is the full covariance matrix, representing statistical terms, with systematic uncertainties added to the diagonal elements. We present the χ 2 results and corresponding p-values in Table 5. The latter are calculated with the ROOT [58] program and give the probability that the deviations of the MC predictions from the data are consistent with the evaluated uncertainties.

Angle between first and fourth jet: θ 14
In Figs. 4 a) and b) we show the normalized distribution of the emission angle of the soft fourth jet from the nearly collinear three-jet system, θ 14 . All models are found to provide adequate descriptions of the data, except that the HERWIG++ p 2 ⊥dip model lies about three standard deviations above the measurements for a narrow region around θ 14 ≈ 0.7π. Even for this model, the p-value is 0.42 (Table 5), implying overall compatibility with the experimental results.
We show the ratio C/T of the central-to-towards regions, which gives the relative amount of wideangle to collinear emissions, in Figs. 5 a) and b). For the p 2 ⊥dip -ordered dipole shower of HERWIG++ and the parton shower of PYTHIA 8 we find nearly perfect agreement with the data for all nine C/T regions ( Table 1). The HERWIG++q 2 model and the VINCIA p 2 ⊥ant model lie below the data by up to two standard deviations in some regions, while the VINCIA m 2 ant model lies about two standard deviations above the data in all regions. The two Vincia models exhibit the expected behavior: When the antenna mass is used as the evolution variable, soft wide-angle emissions are preferred over collinear ones, which leads to higher values for the relative level of wide-angle to collinear emissions. This demonstrates the sensitivity of the θ 14 variable to the choice of evolution scheme. The largest deviation from the data in Figs. 5 a) and b) is observed for the q 2 dip -ordered dipole shower of HERWIG++, for which the predictions lie up to around three standard deviations below the data in some regions. Thus, this model predicts too many collinear emissions compared to wide-angle emissions.
In Figs. 5 c) and d) we show a comparison of the MC predictions to the data for the ratio C/A of the central-to-away regions. This ratio measures the relative amount of wide-angle emissions to emissions in a backwards direction, away from the leading jet and near to the collinear (23) jet pair. For the HERWIG++q 2 model and for VINCIA, we find a good agreement with the data and observe small differences for the different evolution variables of VINCIA. PYTHIA 8 and the p 2 ⊥dip -ordered dipole shower of HERWIG++ lie below the data, by around one and two standard deviations, respectively, and thus predict too few wide-angle emissions compared to the backwards emissions. The HERWIG++ q 2 dip model lies around one standard deviation above the data and thus predicts relatively too many wide-angle emissions. The observations the ratios C/T and C/A are confirmed by the measurements of the ratio T/A of the towards-to-away regions, presented in Figs. 5 e) and f).  Figs. 6 a) and b) we show the normalized distribution of the difference in opening angles between the third and the fourth jet with respect to the second jet, θ * = θ 24 − θ 23 . All models are seen to provide an adequate description of the data, with the exception of the region around θ * ≈ 0.07π (second bin of Figs. 6 a) and b)), where the models predict somewhat fewer events than are observed. The largest discrepancy in this region arises from the HERWIG++ q 2 dip model.
We show the asymmetry as a function of the dividing point θ * 0 in the Figs. 6 c) and d). The largest discriminating power is found for θ 0 * = 0.16π, where the q 2 dip -ordered dipole shower of HERWIG++ generates a deviation of almost four standard deviations with respect to the data. The number of events with large differences in the opening angles of the third and fourth jets is overestimated by this non-coherent shower model. The p 2 ⊥dip -ordered HERWIG++ shower, based on the same shower kernels, but respecting coherence due to the choice of evolution variable, gives a better description of the asymmetry. This emphasizes the need for coherence in order to describe the data properly.    , shown in Figs. 7 a) and b), we find rather large deviations between the data and the MC prediction for most of the shower models. We again find that the q 2 dip -ordered HERWIG++ shower exhibits the largest discrepancies. Only two models, PYTHIA 8 and the VINCIA m 2 ant model, yield p-values larger than 50% (Table 5). In Figs. 7 c) and d), we show the asymmetry in the C  Figs. 8 a) and b). For the PYTHIA 8 and the two VINCIA models, we find reasonable overall agreement with the data, with differences on the level of two standard deviations or less. The HERWIG++ models demonstrate larger differences, with discrepancies reaching the level of four standard deviations for the HERWIG++ q 2 dip model.
In Figs. 8 c) and d) we show the asymmetry of the ρ variable as a function of the dividing point ρ 0 . This asymmetry is sensitive to the relative number of same-side versus opposite-side events, whose definitions were given in Section 2.1. This asymmetry is seen to provide discrimination between most of the shower models. The PYTHIA 8 and HERWIG++q 2 models yield predictions that lie within one standard deviation of the data. However, the HERWIG++ q 2 dip model predicts too small an asymmetry by about four standard deviations, meaning that there are too few same-side compared to oppositeside events. The two VINCIA models also predict too few same-side events, but only at the level of around one standard deviation. In contrast, the HERWIG++ p 2 ⊥dip model predicts relatively too many same-side events, at the level of two standard deviations.

Summary and conclusion
We have presented measurements of distributions in e + e − annihilations at √ s = 91.2 GeV that are sensitive to QCD colour coherence, the ordering parameter in parton showers, and to whether four-jet events arise from two separate 1 → 2 splittings or from a 1 → 3 splitting. The data, corresponding to a sample of about 397 000 hadronic annihilation events, were collected with the OPAL detector at LEP. The event selection criteria are defined in a way to minimize the influence of non-perturbative (hadronization) effects. We compared the data with six different models for the parton shower, based on the HERWIG++, PYTHIA 8, and VINCIA Monte Carlo event generator programs, which differ in the choice of the radiation function, ordering variable, and recoil strategy. Each of the six models was found to be in general agreement with the data. However, interesting differences between the models   The thin solid lines correspond to HERWIG++ with angular-ordering (q 2 ), the thick solid lines to the dipole shower of HERWIG++ with ordering in p 2 ⊥dip , and the dash-dotted lines to ordering in q 2 dip . VINCIA with ordering in p 2 ⊥ant is shown with medium solid lines, ordering in m 2 ant with dashed lines and PYTHIA 8 is shown with dotted lines. The error bars limited by the horizontal lines indicate the statistical uncertainties, while the total uncertainties correspond to the full error bars. The ratio plots show the deviation of the predictions from the data in units of the total uncertainty. and between some of the models and the data were observed when asymmetries in the distributions were examined.
Until now it was nearly impossible to distinguish between the predictions of PYTHIA 8 and VIN-CIA, or between the different variants of VINCIA. Our study of the asymmetry of the ratio of squared jet masses, shown in Fig. 8 d), shows that VINCIA predicts somewhat too many opposite-side events (i.e., events with two 1 → 2 splittings) compared to same-side events (i.e., events with a 1 → 3 splitting), and that the data prefer PYTHIA 8. We find that the different variants of VINCIA can be distinguished using the central-to-towards (Fig. 5 b)) and central-to-away (Fig. 5 d)) ratios in the θ 14 variable, which indicate that the VINCIA variant based on antenna mass-squared evolution predicts somewhat too many wide-angle emissions for the soft fourth jet, compared to collinear emissions.
To summarize the results of our study, we show the average p-value for each model, calculated from the four values of the single observables, in the bottom row of Table 5. The variant of HERWIG++ with a q 2 dip -ordered dipole shower is found to provide the least satisfactory description of the data. This model does not contain coherence; it has intentionally been introduced to confront it with coherent evolution. Thus our results emphasize the importance of incorporating coherence into the description of the QCD multijet process. Since HERWIG++ uses the cluster [23] and PYTHIA 8 and VINCIA use the Lund string [59,60] hadronization model, a direct comparison of the predictions from the two groups of shower models is somewhat ambiguous. It would be interesting to perform a comparison based on use of the same hadronization model for all models. However, when comparing all shower models together, we find PYTHIA 8 and VINCIA with evolution in transverse momentum to give the best description of the measurements presented here.