Dipole Showers and Automated NLO Matching in Herwig++

We report on the implementation of a coherent dipole shower algorithm along with an automated implementation for dipole subtraction and for performing POWHEG- and MC@NLO-type matching to next-to-leading order (NLO) calculations. Both programs are implemented as add-on modules to the event generator Herwig++. A preliminary tune of parameters to data acquired at LEP, HERA and Drell-Yan pair production at the Tevatron has been performed, and we find an overall very good description which is slightly improved by the NLO matching.


Introduction
Many physics analyses at the Large Hadron Collider (LHC) are nowadays based on Monte Carlo simulations [1][2][3][4][5], e.g. for acceptance determination or even for background subtraction. With the high precision aimed for in many analyses it is mandatory to provide many of the simulations with the highest possible theoretical accuracy. For most processes this is now next-to-leading order (NLO) in the perturbative expansion of Quantum Chromodynamics (QCD). During the last decade, enormous progress was made in the development of techniques to match NLO calculations on the one hand and to merge multiple jet tree level matrix elements on the other hand with parton shower algorithms.
First attempts to improve parton shower emission patterns with the information from the full matrix element for the hardest gluon emission were made with so-called matrix element corrections [6,7], that have long been implemented in the standard event generators. The next big improvement was made when matrix elements for multiple hard emissions were merged with parton shower algorithms, first for e + e − annihilation processes [8,9] and then also for hadronic collisions [10]. An alternative approach was proposed in [11], where different implementations have been systematically compared as well. The experience that was made with these algorithms over the last years [12] has lead to further improvements [13,14] such that now the systematic uncertainties due to e.g. matching scale dependence have been significantly reduced.
Matching to NLO matrix elements has been initiated first with a phase space slicing method [15][16][17]. A more systematic matching has then been introduced by Frixione and Webber in the MC@NLO approach [18]. This approach has then been generalised to include massive partons [19]. Many processes have been included in the meantime [20][21][22]. As the algorithm depends on subtraction terms for a specific parton shower implementation, the first versions of MC@NLO have been tailored to work with Herwig only. Now, it also works with Herwig++, i.e. as the subtraction scheme has been generalised towards the Herwig++ parton shower implementation, all processes available in the MC@NLO package can also be showered with Herwig++ to achieve formal accuracy at NLO [23].
As the matching of NLO matrix elements and parton shower algorithms takes place perturbatively to the specified order, i.e. the next-to-leading order, there is formally an ambiguity left that can be used to devise alternative matching schemes. One such scheme has been proposed by Nason [24] and now goes under the name powheg. The guiding principle of this algorithm is to allow for a matching algorithm that does not introduce events with negative weight, as the MC@NLO prescription does. This approach has also been very successfully established during the last years and implemented as a separate program package [25]. Many processes are available in this program package [26][27][28][29][30]. However, the method itself is also used by other groups to match NLO calculations with parton showers within a given shower package. Many processes are available with Herwig++ [31][32][33][34][35] or sherpa [36]. The internal implementations benefit from the inclusion of truncated showers (see below).
On the parton shower side, a number of new parton shower algorithms have been developed during the last years, partly together with the rewrite of old generators [37,38]. Many new developments have addressed the idea of implementing a shower that is directly related to the subtraction terms commonly used in NLO calculations. This led to the implementation of parton showers with splitting kernels based on the Catani-Seymour subtraction scheme [39,40] for NLO calculations [41,42], which was proposed in [43]. Similar ideas are followed with other subtraction schemes as e.g. in the vincia shower [44] where QCD antenna subtraction terms are facilitated.
With more and more NLO calculations being matched one-by-one the question arises whether this step can be automated. In fact, the powheg method is already a first step into this direction, as the method as such is independent of the showering algorithm. In particular, no specific subtraction terms or the like are needed in order to match a given NLO calculation to any shower. There are subtleties on the shower side, though. The powheg method guarantees to give the hardest emission within the parton evolution and ensures that this is generated according to the phase space weighting of the NLO matrix element. However, if the shower does not evolve in the same hardness measure as the powheg algorithm, one has to introduce so-called truncated showers. This has been discussed already in early powheg implementations [45] and is now part of Herwig++ [14] and sherpa [13].
Many NLO calculations are available as ready-to-use computer codes that often come as packages that include a number of processes at NLO already. Most of these codes use the Catani-Seymour subtraction method to regularise infrared divergences. More recently, also the complete automation of NLO calculations has been discussed with first tools readily available [46,47], based on the approach [48]. Some more calculations are already based on a fully automated tool chain [49][50][51][52][53]. Part of this progress relies on the automatic generation of Catani-Seymour subtraction terms [54][55][56] or FKS subtraction terms [57]. The latest developments unify the matching of multiple treelevel emissions and the matching of NLO corrections to the Born level [58,59].
In this paper we introduce an implementation of a parton shower based on the Catani-Seymour subtraction terms, similar to the showers introduced in [41,42]. The goal of the implementation is to provide a framework for an automatic matching of NLO computations to a parton shower. The use of the subtraction terms is highly beneficial as the MC@NLO like matching, that is based on a subtraction of the parton shower contribution to the NLO observable becomes trivial. Together with a framework to handle powheg like matching we will have the possibility to check systematics within a single implementation. By using a shower based framework we may directly make use of truncated showers in order to minimise systematic uncertainties inherent to the matching formalism. As a first step in this programme we present the shower implementation, which is embedded as a module in the Herwig++ event generator. In addition we present NLO matchings to the basic QCD processes.
The paper is organised as follows. In Sec. 2 we introduce the dipole shower in detail. Sec. 3 introduces the implementation of an automatic matching with this parton shower, that we call Matchbox. In Secs. 5, 6 and 7 we present comparisons to data from LEP, HERA and the Tevatron, respectively. In Sect. 8 we consider the matching of the Z 0 plus jet matrix element at NLO which contains less trivial colour structures as well.

Dipole Showers
The dipole shower algorithm outlined in [60] has been implemented as an add-on module to Herwig++, [1]. In this section we briefly review its properties and give a full description of the implementation.
The authors have shown that parton showers based on Catani-Seymour subtraction kernels [39] correctly reproduce the Sudakov anomalous dimensions and properly include effects of soft gluon coherence, upon using an ordering of emissions in transverse momenta as defined by the emitting dipoles. The simple inversion of the kinematic parametrisation used in the context of NLO subtraction, however, does not resemble a physical picture for initial state radiation. An alternative has been suggested and implemented in the simulation presented here.

Starting the Shower
The dipole shower starts evolving off a hard sub process, which is assigned colour flow information in the large-N c limit. This colour flow information is used to first sort all coloured partons attached to the hard sub process into colour singlets. Practically, this is done by making use of the fact that a colour singlet is 'simply connected' in the sense of its colour flow topology: Any parton i in a colour singlet can be reached from a parton j in the same singlet by just following colour lines and changing from a colour to an anti-colour line at an external gluon. Each colour singlet is now an independently evolving entity, and can only split into two colour singlets in the presence of a g → qq splitting. In the next step, the partons in each singlet are sorted such that colour connected partons are located at neighbouring positions, when representing the singlet group of partons as a sequence. Note that these sequences may be open or closed: We will call a sequence open, or non-circular, if there exists a circular permutation of the elements in it such that the partons at the first and last position are not colour connected. Conversely, if there does not exist such a permutation, the sequence is called circular or closed. The possible sequences are depicted in Fig. 1. Once this sorting has been accomplished, we will refer to these singlet sequences as dipole chains: each pair of subsequent partons in a singlet sequence forms a dipole, which may radiate. For each parton in each dipole, a hard scale is then determined as defined in [60], with the restriction that no transverse momenta larger than the ones present in the hard process are generated by the shower. This restriction can be switched off for the first emission within the context of NLO matching.

Kinematics
For completeness we here review the kinematics parametrization used for dipole splitting. We referr the reader to [60] for more details on the relevant phasespace measures.
For final state radiation, we employ a standard Sudakov parametrization of the splitting products, using the spectator momentum to absorb the longitudinal recoil of this splitting such that before and after the splitting all momenta are on their mass shell, while retaining exact energy-momentum conservation. The parametrizations for an emission with momentum q off a final state emitter i and in presence of a final state spectator j (initial state spectator a) thus take the form for lightlike p i , p j,a . The spacelike transverse momentum k ⊥ satisfies k ⊥ ·p i = k ⊥ ·p j,a = 0 and the other parameters are constrained by q 2 i = q 2 = 0 and q i +q ±q j,a = p i ±p j,a . For initial state radiation it is crucial to allow for nonzero transverse momenta of the incoming partons, such that any initial state emission will contribute to the transverse momentum transferred to the final state. In this case, for the emission q off an initial state emitter a in presence of a final state spectator j (initial state spectator b), we use for lightlike p a , p j,b . Similarly to the final state case, the spacelike transverse momentum k ⊥ satisfies k ⊥ · p a = k ⊥ · p j,b = 0 and we impose the constraints q 2 a = q 2 = 0 and q a − q ∓ q j,b = p a ∓ p j,b .
After the shower evolution has been terminated, the event needs to be re-aligned back to the beam axis, i.e. a Lorentz transfomration needs to be applied transforming the incoming partons with finite transverse momenta back to momenta collinear with the beam axis. Details of this procedure are discussed in section 2.5.

Modification of the Splitting Kernels
The Catani-Seymour dipole functions are not positive throughout, rendering a probabilistic interpretation as required by a parton shower algorithm problematic. The region where they turn negative is readily identified as the phasespace for large-angle, hard emission. As the shower approximation breaks down in this region anyway, one simple possibility to cure this problem is to set the kernel equal to zero when it becomes non-positive. For reasons of flexibility and future extensions, we choose a different approach by adding finite terms to render the dipole functions positive.
There is no first principle of what these finite terms should look like; indeed, the dipole functions themselfs contain finite, non-singular pieces, which one should abandon in a strict approach of exponentiating singular terms only. To this extent, we leave it to a comparison of data and Monte Carlo, wether exponentiating finite terms should be considered a bug or a feature. Ultimately, their impact should be included as a measure of uncertainty of the parton shower prediction.
To be precise, we have determined the finite terms for the initial-final and final-initial quark-gluon dipoles to symmetrically share the full real emission contribution (averaged over event orientation) of a gluon being emitted off a space-like quark-current, The remaining non-postive dipole function is the finalinitial gluon-gluon dipole, where a less motivated choice has been used, reflecting the z ↔ (1 − z) symmetry of this splitting. The modification of the dipole kernels have accordingly been implemented in the automated dipole subtraction to be discussed in section 3.

Evolution of the Parton Ensemble
The main shower algorithm acts on a set of dipole chains, and proceeds as long as this set is not empty. Dipole chains are removed from the list, if they stopped evolving, i.e. if there was no splitting selected with a p 2 ⊥ above the shower's infrared cutoff µ 2 IR . The first entry in the set of dipole chains is taken to be the current chain. For each dipole (i, j) in the current chain (with both possible emitter-spectator assignments, i.e. also considering (j, i) along with (i, j)), any possible splitting (i, j) → (i ′ , k, j) is considered to compete with all other possible splittings of the chain. For any such splitting, given a hard scale p 2 ⊥ associated to the emitter under consideration, a scale q 2 ⊥ is selected with probability given by the Sudakov form factor where P (i,j)→(i ′ ,k,j) (q 2 , z) is the appropriate splitting probability as defined in [60], using the respective dipole splitting function V i ′ ,k;j . The splitting with the largest selected value of q 2 ⊥ is then chosen to be the one to happen, except the largest q 2 ⊥ turned out to be below the infrared cutoff. In this case → → Gluon emission off a circular chain. The chain stays circular.

→ →
Gluon emission off a non-circular chain. The chain stays non-circular.
→ → g → qq splitting in a circular chain. The chain becomes non-circular.

→ →
g → qq splitting in a non-circular chain, triggering breakup of the chain. the current chain is removed from the set of dipole chains, inserted into the event record and the algorithm proceeds with the next chain. The momentum fraction z is chosen to be distributed according to dP (i,j)→(i ′ ,k,j) (q 2 ⊥ , z). Since for now we use azimuthally averaged splitting kernels, the azimuthal orientation of the transverse momentum is chosen to be distributed flat. The momenta of the splitting products and the spectator after emission are then calculated as specified in [60].
As the evolution factors into dipole chains as independently evolving objects, all possible emitters in the chain -after having inserted the generated splitting -now get the selected q 2 ⊥ assigned as their hard scale, or stay at the kinematically allowed scale p 2 ⊥,i,j if q 2 ⊥ > p 2 ⊥,i,j . If a g → qq splitting has been selected for a circular chain, this chain becomes non-circular. If it has been selected for an already non-circular chain, this chain breaks up into two independent chains exactly between the qq-pair, owing to the colour structure of this splitting. This situation, along with non-exceptional splittings is depicted in Fig. 1.

Finishing the Shower
After the shower evolution has terminated, the incoming partons with momenta p a,b in general have non-vanishing transverse momenta with respect to the beam directions. This necessitates a realignment of the complete event encountered at this stage. Following the arguments of [60], the momenta of the evolved incoming partons p a,b are taken to define the frame of the collision at hand, i.e. hadron momentaP a,b . We then seek a Lorentz transformation to takeP a,b to the externally fixed hadron momenta P a,b , which is in turn used to realign the complete event.
To construct the momenta of the incoming hadrons P a,b , we require the three-momenta ofP a,b being collinear with the respective partonic three-momenta and define momentum fractions The momentum fractions are further constrained by requiring that where S is the centre-of-mass energy squared of the collision, such that the desired Lorentz transformation exists. The second constraint is in principle to be chosen in such a way as to preserve the most relevant kinematic quantity of the hard process which initiated the showering. By default, we choose this to be the rapidity of a system X, which is either the system of non-coloured particles at the hard sub-process, or the complete final state in case of a pure QCD hard scattering.

Cluster Hadronization
The cluster hadronization model, originally proposed in [61], is the hadronization model used by the Herwig++ event generator. The model in its initial stage just after parton showering, performs a splitting of gluons into quark-antiquark pairs such that in the large-N c limit a set of colour singlet clusters emerge from the event under consideration.
These clusters are then subsequently converted into hadrons, by either splitting them into clusters of lower invariant mass or performing directly the decay to meson pairs, in case another qq pair is 'popped' from the vacuum inside the cluster, or baryon pairs, where the creation of a diquark-antidiquark pair is assumed. Further details of the model will not be discussed here.
The main assumption of the model is however, that both quarks are located on their constituent mass shell, and gluons are as well assigned a non-vanishing constituent mass, entering as a parameter of the model. In the standard Herwig++ parton shower, acting as a 1 → 2 cascade, only scales and momentum fractions of the splittings are determined during the evolution, the full kinematic information being constructed after the end of the perturbative evolution. This setup thus straightforwardly allows to include the constituent masses in this particular step. Since the dipole shower preserves momentum conservation locally to each splitting, ending up with a set of massless partons, such a treatment is not possible.
The way to perform the 'reshuffling' of the massless parton momenta to their constituent mass shells is chosen to be the following algorithm: Let Q c be the total momentum of all final state partons and perform a boost Λ c to the centre-of-mass system of Q c , Λ c Q c = (Q c , 0). The boosted parton momenta p i are now put on the constituent mass shell, including a global rescaling of their three-momenta, Momentum conservation then implies the following relation be satisfied, which may be solved numerically to yield a value for ξ. Finally the inverse boost Λ −1 c is applied to the new parton momenta p ′ i .

Comparison to other Dipole-type Showers
As has been extensively discussed in ref. [60], one of the main differences of our implementation as compared to similar approaches, [41,42], is the way intial state radiation is handled. The effect of the alternative scheme, in which every initial state emission can contribute to the final state transverse momentum, has been studied at parton level in [60]. For enabling a consistent subtractive NLO matching (cf. sec. 4.1), the shower is allowed to fill the available phase space. 1 Additionally, we include a simple modification of α s for low scales, such as to allow shower evolution to very small scales. Algorithms along these lines have been pointed out as an alternative modelling of intrinsic transverse momentum [62]. For the shower implementation at hand, this is of particular importance when considering deep inelastic scattering where no intrinsic p ⊥ can be generated at the end of the evolution without the scattered electron taking up parts of the resulting recoil.

The Matchbox Framework
Closely related to the dipole shower implementation, though technically independent of it, is the development of the Matchbox module. Matchbox is based on an extended version of ThePEG, the extensions providing functionality to perform hard process generation at the level of NLO QCD accuracy and easing the setup of run time interfaces to external codes for hard process generation. We have implemented an automated generation of subtraction terms based on the dipole subtraction formalism [39], based on the information available from ThePEG matrix element implementations, which will be discussed in further detail in section 3.4. A full NLO calculation to be run in the Matchbox framework only requires the implementation of tree-level and one-loop amplitudes, the presence of colour (and spin) correlated amplitudes for the Born process and the presence of a phase space generator appropriate to the process under consideration. Fig. 2 sketches the involved software modules and their interaction with an external implementation of a NLO calculation. Besides being capable of performing a Monte Carlo integration of 'plain' NLO corrections, the main purpose of Matchbox is to turn a NLO calculation into a matched calculation to be consistently combined with a parton shower. Here, functionality is especially provided to calculate the inclusive NLO cross section differential in the Born degrees of freedom, which, along with a matrix element correction to the shower, is the main ingredient to the powheg method of combining parton showers and NLO QCD corrections.
Matchbox is automatically generating matrix element corrections from the NLO real emission contribution. It further allows the possibility to overcome problems in the powheg matching owing to radiation zeroes in the Born matrix element. The matrix element correction splitting kernel, which is essentially defined by the ratio of real emission and Born matrix elements squared is turned into the corresponding distribution including the Sudakov form factor by using the ExSample library, [63]. ExSample allows the efficient sampling of distributions of this type, without having to provide any analytic knowledge on the splitting kernel or trying to estimate enhancement factors to simpler functions such as dipole splitting kernels. ExSample is also used to sample emissions in the dipole shower implementation.

Notation
We consider NLO calculations carried out using the dipole subtraction method, [39]. Instead of using the notation established there, we unify the indices of all possible dipoles to ease readability, as expressions become quite complicated especially when considering the powheg type matching. For the subtraction dipoles we choose the notation where the arguments are unified and we make explicit the dependence on either real emission or 'tilde' kinematics, e.g.
(11) In this notation, p n now refers to the whole phase space point, where we have added hat symbols to the momenta to distinguish a single momentum from a complete phase space point. The 'tilde' mapping and its inverse are denoted by . Differential cross sections are considered in collinear factorisation, where the partonic cross section is in general of the form Here F (p a ,p b ) is the appropriate flux factor and X(p n ) generically denotes any contribution to the cross section which can be cast in the above form, i.e. tree-level amplitudes squared, one-loop tree-level interferences, subtraction terms, or the 'deconvoluted' finite collinear terms to be discussed below. The phase space measure dφ(p n |Q) is given by In latter sections, it will turn out to be useful to rewrite this as where we dropped making explicit the factorisation scale dependence from now on. The finite collinear terms originating from counter terms to renormalise parton distribution functions and integrated subtraction terms are reported in [39]. These are given as convolutions of Born-type cross sections of colour correlated amplitudes with certain 'insertion operators', e.g. for the incoming parton a 1 0 dz C(p a n (z))dφ(p n |Q a (z))dF (x a , zp a , where the superscript a along with an argument z indicates, that parton a's momentum is rescaled by z. The insertion operators themselves include +-distributions, and events should be generated according to the rescaled incoming momentum zp a . A numerical implementation is at first sight not obvious. Considering however the integration over the momentum fraction x a , these contributions can be rewritten in terms of a Born-type cross section multiplied by modified PDFs along the lines of (19) and the +-distributions can be expressed in a way to allow for numerical implementation. All possible contributions for light quarks are implemented in Matchbox.
Any NLO cross section within the dipole subtraction thus takes the form where the insertion operators I are given in [39] and have been implemented for light quarks in full generality as well.P,K and dφ F denote the deconvoluted versions of the finite collinear terms originating from the insertion operators P,K given in [39]. Here, the test functions u(p n ) refer to the class of events to be generated by a Monte Carlo realisation of the above integrals, and M B,R denote the Born and real emission amplitudes, respectively. Since only the structure of the real emission and subtraction terms turns out to be relevant for matching purposes, we from now on collectively denote Born, virtual and insertion operator contributions by Since all the integrals will be dealt with by means of Monte Carlo methods, differentials are expressed in terms of a Jacobian expressing the physical variables in terms of random numbers and a volume element on the unit hypercube of these random numbers, e.g.
We identify ratios of differentials to actually mean the ratios of the corresponding functions multiplied by the Jacobian in use to express them in terms of random numbers, e.g. for two cross sections we define

Phasespace Generation and Matrix Elements
Matchbox organizes differential cross sections directly in terms of the physics quantities entering their definition to maintain a maximum of flexibility and transparency. The structure described in the following enables several levels of implementing new processes or interfacing external codes at runtime, while keeping the already existing steering of Herwig++ event generation in terms of subprocess selection and cuts unaltered. At leading order, a differential cross section is decomposed in terms of PDFs provided through existing infrastructure, phasespace generation and a squared matrix element for the process of interest. These contributions can be implemented in one single class. Alternatively, phasespace generation can be separated into an independent implementation, while matrix elements squared may still be provided for each subprocess directly; they can also be decomposed into helicity amplitudes multiplying a given colour structure. In the latter case, the relevant colour algebra needs to be provided through an independent class and amplitudes can be calculated independently of a given subprocess in a convention that all momenta are outgoing. Other conventions are possible, requiring information on how a physical subprocess is crossed to the amplitude's convention. The case of providing colour decomposed helicity amplitudes is most relevant for the fully automatic generation of subtraction terms to be discussed below.
Another important ingredient to generating events according to a given differential cross section is the information which (tree-level) Feynman diagrams do contribute to a given subprocess, and which large-N c colour flows are associated to these diagrams. Though the diagram information is unphysical, it has been included in ThePEG primarily for the purpose of setting up the event record in a most meaningful way. As for the modularity provided for the contributions to a differential cross section, diagram information and/or colour flows can be provided by either direct implementation 2 , or can be generated automatically by a dedicated tree-level diagram generator based on the vertex objects present in Herwig++, and the colour algebra implementation providing information on which partons are considered colour connected for a given colour structure. The weights driving selection of colour flows are then determined automatically.
The diagram information is used by Matchbox to determine subtraction terms for real emission contributions to NLO calculations, and can be used for efficient phasespace generation. A phasespace generator based on mapping out the peak structure relevant to the contributing diagrams is part of Matchbox, [65].
At NLO, differential cross sections receive contributions from subtracted real emission matrix elements, and the finite remainder of the sum of integrated subtraction terms and virtual corrections. Various conventions of defining this finite remainder are supported, as well as the choice between dimensional regularization and dimensional reduction (including conversion from DR to MS renormalized one-loop amplitudes). The finite piece of one-loop corrections can, similarly to the implementation of tree-level matrix elements, be either directly implemented in terms of the Born/one-loop interference or as colour decomposed helicity amplitudes. The contributions from integrated subtraction dipoles behave similar to the Born contribution as discussed in the previous section, with the exception that colour correlated matrix elements are required, which will be discussed in the following.

Handling of Colour Bases
Any QCD amplitude |M , considered here a vector in colour space with spin quantum numbers implicit, can be docomposed in terms of a finite set of colour structures |α as For a given choice of a colour basis {|α } it is instructive to consider the equivalent object of a plain complex vector, for a colour basis of dimension d c . Within this approach, and owing to the fact that the most common choice of colour bases are not orthogonal (with the notabe exception of [66]), calculating a squared matrix elemnt or similarly a Born/one-loop interference thus requires knowledge of a scalar product matrix S αβ = α|β , in terms of which a colour-summed, squared matrix element is calculated as Similarly, colour correlated matrix elements can be expressed as whereS is the scalar product matrix for a final state with an additional parton and the T i are appropriate representaions of the colour charge operators. A more detailed description of this paradigm is given in [67]. Within this context, we are mainly concerned with the fact that this treatment is blind to a particular choice of basis, and linear algebra otherwise which is performed with help of the linear algebra package of Boost. Based on this fact, Matchbox provides a very generic notion of a colour basis, implementing precisely this picture and requiring from a particular choice of colour basis solely the calculation of scalar products between colour structures, as well as the matrix elements of the T i .

Automated Dipole Subtraction
Matchbox provides an automated generation of subtraction terms according to the dipole subtraction formalism [39]. Similar implementations exist in other event generators, [54,68,69]. The Matchbox implementation is smoothly integrated with the hard process generation framework of ThePEG, and offers modified subtraction terms to match the evolution kernels used in the dipole shower, cf. sec. 2.3. All dipole kernels and insertion operators for massless quarks have been implemented, and the framework is general enough to straightforwardly include the contributions relevant for massive quarks, parts of which are already present [70]. The process dependent ingredients needed to set up subtraction terms, in particular colour-and spin-correlated matrix elements, can either be provided directly through a general set of interfaces, or methods providing colourordered subamplitudes may be implemented. In the latter case, the infrastructure outlined above is employed to evaluate colour correlations and spin correlations by means of translating the correlation tensors used in [39] to a correlation of amplitudes of different gluon helicity.
Which dipoles will contribute to a given process is determined from the diagram information discused in sec. 3.2. Subtraction dipoles are determined by a simple algorithm of checking, for any contributing diagram, if any two external coloured legs are attached to the same vertex. By removing this vertex from the diagram information, the diagram of the corresponding 'underlying Born process' is obtained along with a mapping of how the parton momenta need to be assigned to the underlying Born process. Conversely, the same pairing of diagrams provides a way to identify which real emission processes are to be considered given any Born process. This information is used when setting up the inclusive NLO cross section calculation and generating matrix element corrections for the parton shower. From a given matrix element object implementing a real emission contribution, Matchbox checks a set of Born matrix element objects provided along with the real emission ones for the underlying Born processes obtained and adds all matching pairs to the calculation if there exists a subtraction dipole object which claims responsibility for the given pairing. Similarly, all insertion operator implementations present are checked if they claim responsibility for a given Born process, thus completing the setup of a NLO calculation.

Summary of Fixed-order Cross Sections
Fixed-order cross sections at LO or NLO can be assembled with Matchbox through a series of interfaces at different levels such as (colour and/or spin correlated) squared matrix elements and Born-virtual interferences, or directly at the level of colour ordered helicity amplitudes. If the tree level contributions are available via the amplitude level interface, subtraction terms are setup in a completely automatic way requiring no user intervention or additional information. Hybrids of these interfaces are as well possible, allowing e.g. one-loop corrections to be provided at the level of the Born-virtual interference, while real emission matrix elements are given at the amplitude level.
Given a physical process determined by incoming hadrons or leptons, and a final state which can explicitly contain jets, all contributing subprocesses are determined and diagram information is generated for use in phase space generation and the setup of subtraction terms. The complete LO or NLO calculation is then injected as a ThePEG SubProcessHandler object into the stage of event generation.
For running unmatched calculations, a group of events consisting of real emission and 'tilde' phase space points is provided along with the relative weights of the individual contributions present in the group. The sum of these weights, i.e. real emission minus subtraction term contributions is driving the cross section integration and potentially event unweighting.
More details on the Matchbox framework, particularly the interfacing and/or implementation of contributions to a fixed-order cross section are given in appendix A.

Subtractive NLO Matching
Owing to the fact that the dipole shower implementation uses splitting kernels which precisely equal the dipole subtraction terms, following the steps leading to MC@NLO here results in a very simple matching. 3 This subtractive matching is basically identical to the NLO calculation itself, except that instead of event groups now a single real emission phase space point is generated from the subtracted real emission contribution. In an algorithmic manner, the matching may thus be expressed very simply: -Generate Born-type events p n with density generate real-emission type events q n+1 with density and feed either into the dipole shower.
A subtlety, however, arises here. Since we are interested in describing the hardest emission according to the exact real emission matrix element, the parton shower should not generate harder emissions than the one fixed from the NLO calculation. Practically, this is implemented by calculating the p α ⊥ as defined by the inverse 'tilde' mapping from each dipole configuration α, since the kinematics of the emission appears differently depending on the emitting dipole considered. p α ⊥ is communicated as a veto scale to the dipole shower, which is not allowed to generate emissions with p ⊥ > p α ⊥ off the emitter, emission and spectator partons used to evaluate D α . Another approach, in which the dipole shower is generally not allowed to emit at scales p ⊥ larger than final state transverse momenta can equivalently be used and may become the default in a future version. This treatment is then very similar to the Herwig shower in use with the traditional MC@NLO implementation.

NLO Matching with Matrix Element Corrections
The splitting kernels to be used for a matrix element correction are given by the ratio of real emission and Born matrix elements squared, weighted by (in principle) arbitrary weight functions for each kinematic mapping of a subtraction term, i.e. for each subtraction term. It is most simple to choose the subtraction terms themselves to define these weight functions. This has the advantage that all divergences but the divergence associated to the subtraction term D α are divided out from the real emission matrix element, and dynamical features of the Born matrix element, like peaks owing to unstable particles, are flattened out in the splitting kernel considered.
Within this procedure, one faces three major problems: -Some of the subtraction dipoles, in particular the ones with initial state emitter and final state spectator or vice versa, are not positive-definite. This makes a Monte Carlo treatment of the corresponding Sudakov-type distribution hard to implement. Since the regions, where these dipole kernels become negative correspond to hard, large angle parton emission, it is clear that this problem can be cured by changing the irrelevant finite terms of the subtraction dipoles, provided they are consistently taken into account in the integrated ones. Within the Matchbox implementation this has so far been carried out for the qq initial-final dipoles, which have been modified to reproduce the matrix element squared for gluon emission off the corresponding vector current and are thus positive by definition. -The Born matrix element squared may contain zeroes.
In this case, its inverse is obviously ill-defined. -The implementation of the parton densities at hand, which enter as a ratio in the splitting kernels as well, may not be stable in particular for large x in the sense that the interpolation used oscillates around zero rather than tending to zero smoothly. This poses a problem similar to the zeroes in the Born matrix element, however now without any physical interpretation.
The latter two problems can be solved by introducing an auxiliary cross section dσ screen (p n |Q; p 2 ⊥ ) which enters into the definition of the splitting kernels where we have already written the splitting kernel differential in the random numbers determining p 2 ⊥ , z and φ, and the dependence of q α n+1 = q α n+1 (p n ; p 2 ⊥ , z, φ) on the splitting variables is understood implicitly. In order not to change the divergence structure implying the resummation of large logarithms, the screening cross section needs to vanish as p 2 ⊥ → 0. Since Born zeroes cannot occur for p 2 ⊥ → 0 (the QCD singularities factor in this limit with respect to the Born process) Eq. (28) is free of these problems. If, in addition, the screening cross section does not depend on the parton distributions, the technical issues with PDFs becoming zero are cured as well.
The screening cross section has however to be taken into account for the fixed order calculation in order to reproduce the correct NLO cross section and will thereby spoil the original simplicity of using the NLO K-factor differential in the Born variables to generate events to enter the matrix element corrected shower. Including the screening cross section the fixed order cross section can then be calculated to be constructed of densities for Born-type and real emission type events. The densities for Born-type events closely resemble the K-factor modification, and .
To generate events according to these densities, a k + 3dimensional random number point is chosen, where the three additional degrees of freedom are discarded. Owing to the fact that the integration volume in terms of random numbers is the unit hypercube, this procedure produces the integration over the degrees of freedom of the parton emitted in the real emission on average. Events of real emission type are to be generated with density which is just a reweighting of the real emission contribution. Events of both classes can then be showered by a parton shower using a matrix element correction as defined at the beginning of this section, and a communication of veto scales applies to the real emission contribution along the same lines as for the subtractive matching. Note that the individual contributions are positive, as long as the screening cross section is bounded from above by a reasonable value. Since this type of matching is independent of the parton shower to act downstream, the actual implementation does not make any reference to the dipole parton shower, and real emission contributions according to the matrix element correction are generated outside any shower module, presenting a real emission sub process supplemented with proper veto scales, or a Born-type sub process to the shower, if radiation has been generated according to the matrix element correction or not, respectively.
Note that, when putting the screening cross section to zero, the original simplicity of the powheg-type matching is recovered. The matrix element corrections, inclusive and real-emission type contributions are all setup and calculated in an automated way within the Matchbox implementation. The screening cross section is by default chosen from the corresponding phase space and the dimensionality required by the phase space, i.e. dσ screen,α (p α n (q n+1 )|Q; where p α ⊥ is the transverse momentum associated to the mapping p α n (q n+1 ), s α (q n+1 ) is the appropriate mass squared of the emitter-spectator pair in p α n , and n out is the number of outgoing particles. Other choices may be possible.

Results at LEP
The variety of data acquired by the LEP experiments allow for a systematic fit of parameters of the parton shower and the hadronization model. In a preliminary fit, the parameters assumed to mainly determine the description of event shape variables and jet rates as measured by the DELPHI experiment [71] and jet observables as reported by the OPAL collaboration [72] have been fitted using the Rivet [73] and Professor [74] systems. The parameters and ranges considered are given in Tab. 2, along with a short description. Parameters which are known to mainly affect individual hadron multiplicities have not been varied, and fragmentation parameters for heavy quarks have been set equal to the values of those for light quarks. A simple modification of the running of α s in the infrared has been adopted by replacing its argument q 2 → q 2 + µ 2 soft . This modification has originally been motivated to supply another model for intrinsic transverse momentum generation by letting the initial state shower evolve down to very small scales along the lines of [62]. We see however no reason that it should not be considered for final state radiation as well.
Separate fits have been performed for LO and NLO predictions. LO predictions have been obtained by running just the parton shower, using a one-loop running α s . NLO prediction have been obtained by means of supplementing the shower with the matrix element correction matching without using the Born screening cross section and a two-loop running α s . In total we find that the NLO simulation gives a marginally better fit than the LO one, though the description of data is completely comparable within experimental uncertainties.
The fitted parameter values are displayed in Tab. 3. Most notably, the hadronization parameters for the LO and NLO fit do not significantly differ. For both predictions, a modification of the infrared running of α s seems not to be preferred. The infrared cutoff of the parton shower is determined more precisely by the NLO fit, which prefers a smaller cutoff. Also α s (M 2 Z ) is determined more precisely by the NLO fit. Both α s values obtained are compatible with the world average [75] of 0.1184, where the NLO result is closer to this value. Note that this should be regarded a coincidence at the level of the approximation considered and it is certainly not possible to uniquely relate the obtained value to one applying to the M S scheme. In Figs. 3 and 4 the LO and NLO simulation results are compared for selected observables. Fig. 5 shows the energy-energy-correlation, which has not been included in the fit.

Comparison of Matching Strategies
The Matchbox framework provides the facility to switch between the powheg-type matching with matrix element corrections including or excluding the auxiliary Born screening cross section, and subtractive matching. For reasons of systematics it is instructive to compare these approaches. No separate fit for the variants not considered so far has been performed and the NLO fit values as given in the previous section have been used. The different matching strategies give completely comparable results. If there are small visible differences, there is no clear tendency that either variant would give a better description than any of the others. Fig. 6 compares the matching strategies for the two jet rate. To this extent, the subtractive matching could be preferred amongst the powheg-type ones owing to its smaller computational complexity. This statement, however, not only includes that negative weighted events do not pose a major problem, but also has to be verified in a process dependent matter since there is no hint, if the behaviour observed here is a general feature -particularly at hadron colliders.

Results at HERA
Owing to the approximation underlying the dipole parton shower, diagrams contributing to parton emission of a given dipole (i, j) may be considered a gauge invariant subset in the soft and/or collinear limits for N c → ∞. This implies that the infrared cutoffs and soft scales entering the emission probabilities need not be the same for all dipoles. The emitter-spectator configurations forming gauge invariant quantities in this sense are the two emitter choices for final-final dipoles, initial-initial dipoles, and the combination of initial-final and final-initial configurations. Fitting DIS data therefore allows one to fix the infrared cutoff and soft scale for the latter, before finally constraining the same parameters for initial-initial dipoles at a hadron collider, which is considered in the next section.
For the fit described here, the same technique as for LEP, and data accumulated by the H1 experiment [76] have been used. For LO and NLO, the default Herwig++ PDFs, MSTW 2008 LO** [77,78] and MRST 2002 NLO [79], have been used. The same PDFs were considered for hadron collider data to be discussed in the next section. The NLO fit was obtained by running the matching with matrix element correction.
The findings are similar as for the fit to LEP data. We find a reasonable prediction of transverse energy flows over the whole range of (x, Q 2 ) plane. The matched NLO prediction gives a comparable fit to the LO simulation, while preferring both a smaller infrared cutoff and screening scale. The fitted parameters are given in Tab. 4. Fig. 7 shows the average transverse energy as a function of Q 2 in the central detector region. This observable is clearly improved by the NLO matching at small momentum transfers. A more detailed analysis of DIS data including inclusive jet and event shape data is currently underway.

Results at the Tevatron
After having determined the simulation parameters for hadronization, final state radiation, and radiation off a final-initial dipole by fitting LEP and HERA data, two parameters remain to be determined: the infrared cutoff and soft scale for radiation off an initial-initial dipole. We here consider the p ⊥ spectrum of e + e − Drell-Yan pair production as measured by the CDF collaboration [80]. Since the Drell-Yan process receives rather large QCD corrections from leading to next-to-leading order and a still consider- able correction at NNLO, both fits have been performed by normalising the simulation to the measured cross section. The matrix element matching including the Born screening cross section has been used here, as for the DIS data.
The Professor algorithm here turned out not to be applicable, as the cubic interpolation was not capable of describing the complete dynamics of letting the shower evolve to rather small infrared cutoffs, owing to the prescription of introducing a soft scale in α s as already described before. We have therefore performed a preliminary fit by generating 300 random points uniformly in parameter space, which here includes the infrared cutoff for initial-initial dipoles, the soft scale for initial-initial dipoles, as well as the widths of a Gaussian distribution for intrinsic transverse momentum, Λ ⊥ . The latter has been chosen to be potentially different for valence and sea partons.
Out of these random points we have picked the one with lowest χ 2 with respect to the data, again both for LO and NLO simulations. The resulting parameters are given in Tab. 5. Note that the p ⊥ distribution for sea partons is narrower, corresponding to a broader spatial distribution as can be motivated on different grounds.
We show the comparison of LO and NLO simulations in Fig. 8 showing similar systematics to the distributions discussed before. In order to determine the predictivity of the simulation already at this very coarse level of tuning, we additionally show the pseudo-rapidity distribution of a third jet in events with at least two hard jets, Fig. 9, as carried out at CDF [81]. Reasonable agreement with data is found. On top of the work presented in [60], this constitutes another crucial test of coherent parton evolution.  Fig. 9. The pseudo-rapidity distribution of a third jet in events with at least two jets. We here only show the leading order prediction in order to check the predictivity of the tune carried out so far.

Z 0 +jets
In addition to the processes discussed so far we have included the simulation of Z 0 /γ * production in association with a single hard jet at NLO. Matching of this process has been discussed in the literature in both, the powheg and the subtraction method [36,82,83]. As in the previous examples, the matrix element has been calculated and was included in the Matchbox framework as a built-in process. This time, however, the process was included on the amplitude level, providing spinand colour correlated matrix elements automatically. The generic building blocks needed to test our framework are just three amplitudes, implemented as complex functions, the Born, the virtual (one-loop) and the real emission amplitude. The implementation and integration of the NLO amplitude at parton level has been validated against mcfm [84] on the level of distributions and integrated cross section. In addition, the Born amplitude was validated against the internal matrix element in Herwig. All tests have been carried out at the example process pp → e + e − j. Different final states are only trivial modifications of the matrix elements and are available in the code via simple switches.
Regarding our framework, compared to the previously discussed examples, this is the first test of the matching machinery on the amplitude level and with non-trivial spin and colour correlations. Furthermore, with this process almost all of the possible emitter spectator combinations are included in order to test the automatic setup of Catani-Seymour subtraction terms, including splittings that involve the triple gluon coupling.
In order to study the effect of the NLO corrections, we have studied a number of observables for different setups of our code at parton level, namely -LO+PS, LO with parton showers, -NLO, NLO without parton showers, and -NLO+PS, NLO with parton showers using matrix element corrections.
We consider the invariant mass of the Z 0 /γ * boson via the electron pair in a mass window of 65 GeV < M ee < 115 GeV. In addition we ask for at least one jet with p ⊥ > 20 GeV in the pseudorapidity range 0 < |η| < 5. Jets are clustered with the k T algortihm (R = 0.7). In Fig. 10 we show the exclusive jet multiplicity for all the setups mentioned above. Clearly, the NLO calculation can only have up to two jets, while additional jets are produced by the parton shower. As the NLO+PS setup already start from harder configurations, here also the jet multiplicity increases towards more realistic values as compared to the LO+PS simulation.
The transverse momentum of the Z 0 boson is shown up to very large values in Fig. 11 as a check. This observable is described well by the LO matrix element already, which is not affected by parton showering in the region of high transverse momenta. At NLO only a small correction is applied in the normalization; similarly to the LO setup, the NLO behaviour is preserved by the matched simulation.
The azimuthal angle difference ∆φ 12 between the first and the second jet is displayed in Fig. 12. This observable is strictly leading order for the NLO matrix elements while the LO+PS setup only gets a second jet from the parton shower alone. Population at small values results from very soft and collinear emissions as given correctly by the parton shower evolution while large values towards π arise in events with widely separated jets as only generated by the hard matrix element. Here is the domain of the NLO calculation. A simulation that incorporates a matching between parton shower and NLO calculation is hence expected to interpolate smoothly between these two domains, following the parton shower on the left side of the plot and the NLO calculation on the right.
The transverse momenta of the hardest and second hardest jet are shown in Fig. 13. The behaviour of the  hardest jet is similar as for the transverse momentum of the Z 0 boson. The second jet, however, is only covered by the real correction part of the NLO matrix elements. Hence, the LO+PS result is too soft, the parton shower is not hard enough to produce as many hard second jets as given by the matrix elements.

Conclusions and Outlook
We have introduced a new dipole shower module for the event generator Herwig++ that allows for an automatic matching of NLO computations with a parton shower. A tune of the hadronization module to the most important data sets show that we can achieve very good results from this simulation already without the inclusion of NLO terms. Including NLO corrections at this relatively simple level only marginally improves the results. This effect is expected as it is known that the Catani-Seymour showers tend to mimic the behaviour of NLO matrix elements very well also in phase space regions well outside the collinear limits. However, the matching poses no technical problem and can be seen as a proof-of-concept for the idea to provide a framework for automatic matching. At this time with relatively simple matrix elements at NLO that are provided by internal code. Future work will concentrate in the inclusion of external code via a well defined interface, following the ideas in [85].

Note added
During the editorial process of this work another preprint, [83], appeared, studying in detail the systematics of NLO matching in a setting similar to ours, though for more complicated processes. While the present work focusses on presenting simple processes as proof of concept and thus refrains from giving an in-depth analysis of NLO matching, work is in progress towards processes involving several coloured partons. The Matchbox implementation particularly aims at an assesment of matching systematics and provides all tools needed for exploring established or newly developed matching schemes, details of which we leave to future work.

A Matchbox code structure and interfaces
In this appendix, we give a brief overview of Matchbox's code structure and the interfaces available tom implement fixed-order matrix elements. One of the major design criteria of Matchbox has been to retain the steering of hard matrix elements as already present in ThePEG/Herwig++, while allowing for flexible input of new processes. This structure is reflected in fig. 16. Table 1 gives an overview of what contributions to a fixed-order calculation are provided by Matchbox and/or may be provided by an external code.

B.1 Shower Splitting Kernels
The sampling of shower splitting kernels has been explicitly verified in situ, meaning using the full implementation as present in the simulation code, against an independent implementation using a numerical integration to obtain the Sudakov-type distributions. Fig. 14 shows an example for a final-final splitting kernel, proving correctness of this part of the code.

B.2 NLO QCD Corrections
All leading order matrix elements implemented in the Matchbox framework have been cross-checked against the Her-wig++ matrix elements.
The functionality of the automatically generated subtraction terms has been verified. Fig. 15 shows a typical examples of the ratio of subtraction to real emission cross section, plotted against each of the invariants entering the propagator denominators.
The 'plain' NLO cross section, and the inclusive one entering the matching with matrix element correction have been checked to agree, with and without the usage of the Born 'screening' cross section. The NLO cross section for  e + e − → jets has been validated against the analytically known K-factor of 1 + α s /π. The NLO cross section for DIS and Drell-Yan has been checked against the existing powheg implementation in Herwig++. For deep inelastic scattering, the subtraction terms have been modified in order to have positive definite dipole kernels, finite terms of the integrated subtraction terms have been changed accordingly. The functionality of the subtraction has been checked with both variants, and the NLO cross sections with and without modifications are found to agree.

B.3 NLO Matching with Matrix Element Corrections
A non-trivial cross check of the matrix element correction code and ExSample as the underlying 'working horse', is to consider the spectra for a gluon emission off a qq dipole as generated by the shower, which is validated against a numerical integration of the expected distribution implemented in a completely independent code. By putting the real emission matrix element entering the matching to be equal to the sum of dipoles (the correctness of which has been checked by verifying that the cross section of the subtracted real emission matrix element is consistent with zero), the matrix element correction must produce the same spectrum as the shower code. We have checked that this is indeed the case. It should be stressed that the machinery underlying the setup of the matrix element correction is much more complex than the shower implementation, and, that the splitting kernel entering the matrix element correction does depend on more parameters 4 than the one parameter of the shower kernel (corresponding to the dipole invariant mass).