Extending the Matrix Element Method beyond the Born approximation: Calculating event weights at next-to-leading order accuracy

In this article we illustrate how event weights for jet events can be calculated efficiently at next-to-leading order (NLO) accuracy in QCD. This is a crucial prerequisite for the application of the Matrix Element Method in NLO. We modify the recombination procedure used in jet algorithms, to allow a factorisation of the phase space for the real corrections into resolved and unresolved regions. Using an appropriate infrared regulator the latter can be integrated numerically. As illustration, we reproduce differential distributions at NLO for two sample processes. As further application and proof of concept, we apply the Matrix Element Method in NLO accuracy to the mass determination of top quarks produced in e+e- annihilation. This analysis is relevant for a future Linear Collider. We observe a significant shift in the extracted mass depending on whether the Matrix Element Method is used in leading or next-to-leading order.


Introduction
With the steadily increasing computing power multivariate methods are nowadays standard techniques in the experimental analysis. Initially introduced in experimental studies where the event rates are small and signals are difficult to disentangle from overwhelming backgrounds, multivariate methods have been proven useful also in a more general context. Among these methods the Matrix Element Method represents a prominent example, since it allows a direct comparison of observed event samples with expectations within a specific theoretical model. As originally introduced in refs. [1,2] the method is based on the assumption that the probability to observe a specific event can be calculated using the corresponding matrix element together with so-called transfer functions which model the probability for the observation of a given partonic event as a specific (hadronic) event at the detector level. It has been argued that if all the ingredients in this procedure are known with optimal accuracy, the event likelihood defined in this way represents an optimal statistical test. The method thus makes maximal use of the information contained in the single event. Based on this assumption the Matrix Element Method has been used for example to measure the top-quark mass at the Tevatron (see for example refs. [3][4][5]). Alternatively the Matrix Element Method can be used to distinguish different hypotheses like for example background versus signal hypothesis or SM versus BSM physics hypothesis for a given event sample (see for example ref. [6,7]). Let us also mention that the Matrix Element Method exists today in different flavors like for example the MELA approach as used for example in refs. [8,9]. The construction of optimal observables as used for example in refs. [10][11][12] may also be seen as a variation of the same theme. So far in most cases the method is only applied using leading-order (LO) matrix elements. To simplify the application of the Matrix Element Method the automated calculation of the required event weights has been studied recently in ref. [13]. Given the recent progress concerning the calculation of NLO corrections it is natural to include also NLO corrections in the evaluation of the matrix elements. A first attempt in this direction has been made in ref. [14] where the effect of QCD radiation has been studied. In refs. [15,16] the radiation pattern of boosted Higgs bosons and top quarks is studied and compared with the radiation profile of QCD jets. In ref. [17] the information from the hard matrix element and a parton shower is used for a signal versus background discrimination for the signal process Z decaying to boosted top quarks. In refs. [18][19][20] the impact of NLO corrections including also the virtual corrections is investigated and a possible extension of the Matrix Element Method beyond the Born approximation is discussed. A first detailed application has been presented in ref. [21] where Higgs production with subsequent decay into H → Zγ is investigated. So far the method presented in refs. [18,19] is restricted to the production of uncolored objects and the extension to the production of colored particles like for example top-quark pairs is still missing. In ref. [20] the extension to include hadronic production of jets is investigated by means of a longitudinal boost along the beam axis to remove the unbalanced transverse momentum and map NLO and LO jets.
The extension of the final state phase space encountered in the generalisation of the Matrix Element Method beyond the Born approximation is an intrinsic problem which makes the NLO extension non-trivial. Real corrections which appear as additional contribution, when higher order corrections are taken into account, allow for the emission of an additional parton. In addition to the 2 → n Born kinematics also contributions living on an n+1 parton phase space need thus to be considered. Restricting the attention to jet physics, one may argue that only regions of the extended phase space in which the additional parton is clustered/merged into a jet contribute to n-jet observables and we end up again with a 2 → n configuration-now in terms of jets instead of partons. However, applying standard jet algorithms, the resulting jets typically do not satisfy the kinematical constraints of the Born process. In particular, the clustering will in general create a non vanishing mass for the jets even for massless partons. Also momentum conservation is not necessarily required by all jet algorithms. It is thus not clear how the contribution from the real corrections can be unambiguously combined with the virtual corrections which respect the Born kinematics. Furthermore, the practical calculation of next-to-leading order event weights-which must take into account virtual as well as real corrections-is non-trivial. As we shall describe in more detail in the next section, the problem is related to the phase space integration of real corrections which lead to an n-particle final state after clustering. Defining event weights for 'jet-events' in NLO accuracy thus requires two aspects to be addressed: 1. In the theoretical predictions a modified clustering may be used which guarantees momentum conservation and keeps the clustered jets on-shell. More precisely, the mass of a jet which is formed through a merging of a massless parton with another parton should be equal to the mass of the radiating parton. Merging two massless partons should result in a massless jet. It is worth stressing that also for established jet algorithms, which rely on a 2 → 1 clustering, the jet masses created in the perturbative calculation have very little to do (in case of massless partons) with the jet masses observed in the experimentally measured jets in case of 'light' jets. 1 The latter are mostly related to non-perturbative effects. In that sense, a modified clustering as proposed here is equally well motivated as what is currently used. In fact, one may even argue that a clustering fulfilling the above constraints may lead to a better separation of perturbative and non-perturbative physics.

2.
A method needs to be constructed allowing the efficient integration of the regions in the n + 1-parton phase space contributing to the considered n-jet configuration. (In principle an inclusive observable may also receive contributions from n + 1 jet configurations requiring the evaluation of the corresponding weights. In this case the problem is however very similar to the leading-order situation.) In this article we illustrate that both aspects can be addressed by using an appropriate jet clustering which is intimately related to a factorised parameterisation of the n + 1-parton phase space into an n-particle phase space of the n jets times the phase space related to the 'clustered' (unobserved) parton. The general idea is to extend the typical 2 → 1 clustering to a 3 → 2 clustering as it is well known for example from the dipole subtraction method [22,23]. This clustering satisfies momentum conservation as well as on-shell conditions at the expense of introducing an additional spectator which allows to guarantee momentum conservation which would be otherwise violated by enforcing the on-shell condition. Some freedom exists how to choose the additional 'spectator'. For example, to minimise the difference to the traditional clustering, one may choose the spectator such that the momentum reshuffling is minimised. Having chosen the spectator, a recombination according to the Catani-Seymour phase space factorisation [22,23] is applied. In this way, the reduced kinematics appearing in the Catani-Seymour subtraction formalism can be identified with the final state jets. At the same time, the factorised phase space can be integrated over the unre-solved regions to obtain the contributions from the real corrections to the event weight. We will give more details in the next section.
It should be noted that using the four mappings given in the Catani-Seymour subtraction algorithm as clustering prescriptions in the proposed 3 → 2 jet algorithm is sufficient, to construct appropriate jet algorithms covering most of the relevant collider experiments: production of at least 2 jets at a lepton collider, deep inelastic scattering and hadronic production of electroweak final states and/or jets. The respective final states can have arbitrary masses.
Let us briefly compare the method outlined above with some existing work where similar ideas have been applied in a different context. For example, the ideas presented here share some features with ref. [24]. There are however important differences. In ref. [24] an additional resolution parameter is introduced to define 'resolved' partons similar to what is done in the phase space slicing method [25,26]. Using the resolved partons in an intermediate step any physical jet algorithm should in principle be applicable. Care has to be taken that the two cuts-the artificial one to define resolved partons and the physical one to define jets within a given jet algorithm-do not interfere.
In the approach described above we work directly with jets defined by the jet algorithm used in the experimental analysis. In ref. [24] only final state singularities in e + e − annihilation are considered.
Here we include initial state singularities as well. Technically, only one phase space mapping is required in ref. [24]. As we will see in the next section, in general more mappings are required and the application of the Catani-Seymour subtraction method is less obvious. In refs. [27,28] the implementation of a parton shower based on the Catani-Seymour subtraction method is studied.
Although the aim of this work is rather different, the parameterisation of the phase space is similar to what is used in this article and many useful results which we collect in the following can be found also in refs. [27,28]. In ref. [29] a method to generate the phase space of n + 1 massless particles by forward branching of configurations of n massless particles is presented. This method which is applied in refs. [18][19][20][21] employs two 3 → 2 prescriptions to cluster three massless partons to two massless jets. The method that we present in this article can be seen as a generalisation of this approach with two further prescriptions for the clustering and the extension to massive particles. As a proof of concept, especially for the aforementioned generalisations, we apply the Matrix Element Method in NLO accuracy to a process with massive colored particles in the final state.
The article is organised as follows. In section 3 we present the phase space parameterisation used in the numerical integration. Roughly speaking the Catani-Seymour mapping is inverted. In section 4 we validate the approach by various cross checks. As a proof of concept we illustrate in section 5 the Matrix Element Method including NLO corrections applied to top-quark pair production in e + e − annihilation. We summarise our main findings in the conclusion in section 6.

Matrix Element Method and event weights at next-to-leading order
The Matrix Element Method tries to make maximal use of the information provided by an individual event. Instead of considering distributions, calculated for event samples, the probability of the event in the context of a given theory is investigated. In what follows we assume that all experimentally available information of the event is collected in the variable x. In the ideal situation that all momenta have been reconstructed, one may think of x as the collection of the observed momenta which we label with J 1 , . . . , J n : However, since some particles may escape detection or are only partially reconstructed, the experimentally accessible information may be in practice only a subset of this information.
Putting aside for the moment higher order corrections, one may interpret the partonic cross section calculated for a specific model-for example the Standard Model-as the probability distribution to observe a partonic event. A model-dependent likelihood, with model parameters Ω, for observing an event x is than given schematically by where the differential cross section is denoted by d n σ/dy 1 . . . dy n and the so-called transfer function W( x, y) describes the probability that a partonic event y is measured in the detector as the event x.
In principle, the variables collected in y may be chosen independently from the variables in x. Even the dimension of the two vectors does not need to agree. However, it may prove beneficial to chose the two sets as closely related as possible. Assuming that the two can be identified, an ideal detector would than correspond to the situation in which the transfer function is given by a delta function: Roughly speaking, maximising the likelihood with respect to Ω for a given event sample gives an estimator for the model parameter. This is the essence of the so-called Matrix Element Method (MEM). (More details can be found in refs. [1-3, 5, 7, 13, 30-35].) Since all the information available in the single measurement is retained, this approach is believed to make maximal use of the information content of the single event.
While in principle the integration over the transfer functions looks straightforward, in practice it is not trivial due to the peak structure of the transfer functions. In addition, we note that the transfer functions need to be determined within the experimental analysis, which may also represent a nontrivial task. In the following we do not consider this issue any further since current experimental analyses using the Matrix Element Method are already used to this type of problem. The focus of this article is the extension of the MEM beyond the Born approximation. To be specific we assume y in the following to be the collection of final state momenta-which may be obtained in the case of the real corrections through the merging of collinear or soft partons according to a specific clustering procedure. To distinguish these momenta from the partonic momenta we call them jet-momenta in what follows. They may be seen as the perturbative approximation of the jets observed in the experiments. (More details will be given in the next subsection.) The motivation to use the jet-momenta is threefold: 1. Being differential in the jet-momenta all relevant information about the differential cross section is kept, allowing in a second step also to use a different set of variables.
2. Using the jet momenta, identifying the transfer functions with delta-functions may provide a reasonable first approximation.
3. Being able to calculate event weights for 'jet events' including higher order corrections is interesting on its own right.
In the rest of this subsection we illustrate the main obstacle in the calculation of event weights for jet events when higher order corrections are included. To start we consider first the cross section in Born approximation. Although not useful in practice, we may write the differential cross section in terms of jet-momenta by introducing delta-functions: where s denotes the center of mass energy squared, |M| 2 is the squared matrix element and dR n is the Lorentz invariant phase space measure dR n (P, p 1 , . . . , where m i denotes the mass of the i-th parton. The incoming particles with momenta p a and p b are assumed colorless. In case strongly interacting particles are considered in the initial state additional convolutions with the parton distribution functions need to be introduced. The momenta of the final state partons are given by p 1 , . . . , p n . The functionsJ i (p 1 , . . . , p n ) describe how the jet momenta are calculated from the parton momenta. Since in leading-order no recombination is possible, the jet momenta are identified with the parton momenta: Obviously, it is than straightforward to evaluate the delta-functions and obtain the differential cross section in terms of the jet-momenta. Including next-to-leading order corrections, we need to consider the contribution from virtual corrections as well as real corrections. Ignoring for the moment the fact that both contributions are individually divergent due to soft and collinear singularities, we may apply the same argument as above to calculate the virtual corrections to the differential cross section. For the real corrections, however, the situation becomes more complicated. If we ask for precisely n jets in the final state, we need to integrate over the regions of the n + 1 parton phase space in which n + 1 partons are clustered to n jets. More precisely we need to evaluate integrals of the form dR n+1 (p a + p b , p 1 , . . . , p n+1 )|M(p a , p b , p 1 , . . . , p n+1 )| 2 ×δ(J 1 −J 1 (p 1 , . . . , p n+1 )) . . . δ(J n −J n (p 1 , . . . , p n+1 ))Θ n-jet (2.6) where the functionsJ n encode now how the n + 1 partons are clustered to n jets and Θ n-jet restricts the integration to the n jet region. (We note that the inclusion of the n + 1 jet region is straightforward since no clustering occurs.) The functionsJ n depend on the phase space region through the recombination procedure since in different phase space regions different partons are merged to form a jet. Evidently, the delta functions cannot be integrated numerically and an analytic approach is required. This is one facet of the problem we address in the next subsection. There is, however, a further problem: Using the standard recombination procedure, which is often simply the sum of the four momenta of the merged objects, we obtain in general massive jets even in case that we started with massless objects. As mentioned in the introduction it is thus a priori not clear how the contribution of the real corrections can be combined point-wise with the virtual corrections where the jets may have different masses. This is, however, required to define an event-weight with NLO accuracy. In the next section we will show how the two issues are connected and can be addressed by a modification of the clustering prescription. In particular, we show how-by using a modified recombination procedure-the 'real' phase space can be factorised into an n jet phase space and a remainder with the property that the n jet phase space preserves the Born kinematics. As long as the transfer function is not approximated by a δ-function one could in principle relax the requirement to map the unresolved regions of the real corrections onto the Born kinematics, since eq. (2.2) may be calculated for an arbitrary set of partonic variables used to describe virtual or real corrections. However, using the aforementioned identification of the phase space for real and virtual corrections allows to define point wise event weights in NLO accuracy. It is then straightforward to generalise the Matrix Element Method to NLO: The set of y variables in eq. (2.2) are just reinterpreted as describing 'theory jets' as introduced before. No further extension of eq. (2.2) is required. Note that this approach is meant to be applied in fixed order: Parton shower corrections which partially resum higher order corrections would lead to a double counting if naively included in this approach.

A modified recombination prescription and phase space factorisation
Using the resolution y i j to define the jet function F n J 1 ,...,J n (p 1 , . . . , p n+1 ) we may write for the n-parton final state F n J 1 ,...,J n (p 1 , . . . , The resolution depends on the final state objects: partons in case of the perturbative calculation and hadrons in case of the experimental analysis. y cut defines a preset value for the jet resolution. The momenta p 1 , . . . , p n refer to the parton momenta, while J 1 , . . . , J n refer to the jet momenta. We have included them in the definition of the jet function since every jet algorithm includes in addition to the resolution also a prescription how to define the momenta of a jet in case it is formed by the merging of two finale state objects. In leading-order the jet momenta are identified with the parton momenta, since no merging is possible. In NLO we also need the jet function for the case that n + 1 partons form n jets. Since for soft and collinear configurations the jet function F n+1 J 1 ,...,J n needs to reproduce F n J 1 ,...,J n to ensure the cancellation of soft and collinear divergencies, the jet function may be written as for soft or collinear configurations. (For the moment we ignore initial state singularities. As we shall see the extension to include them as well is straightforward.) The step functions assure that a recombination of two partons into one jet occurs. For each possible combination i, j with y i j < y cut the momenta J i are obtained through the respective recombination procedure from the original momenta p i . As mentioned before the mapping should respect momentum conservation and keep the recombined particle on the respective mass-shell in the sense defined above. This can be achieved by using the mapping introduced in the Catani-Seymour subtraction method. Depending on the unresolved partons and the chosen spectator four different mappings are given in ref. [22,23].
For each unresolved configuration we may choose for example the combination with the smallest momentum transfer to the spectator parton. More general we may define functions with the requirement that to select a specific mapping. For the numerical phase space integration using Monte Carlo methods it might be useful to use smooth functions Θ k instead of step functions.
For the jet cross section the contribution from the real corrections reads with the phase space measure as defined in eq. (2.4). In what follows we consider the functions Θ k as part of the jet algorithm. The role of the Θ k is to select in each phase space region where partons/jets are merged the appropriate clustering. Since in each region Θ k selects a mapping (p 1 , . . . , p n+1 ) → (J 1 , . . . , J n ) we may change the integration variables accordingly using the respective Catani-Seymour parameterisation of the phase space: where dR i j,k denotes the respective phase space measure introduced in ref. [22,23]. Note that p 1 , . . . , p n+1 appearing in the jet function and the matrix elements should be expressed in terms of the the momenta J 1 , . . . , J n and the integration variables used in dR i j,k . Using the factorised phase space it is straightforward to calculate the contribution of the real corrections to the event weight in NLO accuracy: In the integration the momenta p 1 , . . . , p n+1 are determined from the jet momenta J 1 , . . . , J n and the variables used in dR i j,k . The inversion of the mapping (p 1 , . . . , p n+1 ) → (J 1 . . . , J n , Φ) where Φ denotes the collection of variables used in the (unresolved) phase space measure dR i j,k is discussed in section 3.
We have ignored so far that the phase space integration is in general divergent due to soft and collinear singularities. Since the Catani-Seymour phase space factorisation is valid in d dimension it is straightforward to regularise the divergencies within dimensional regularisation. Conceptually the easiest way to deal with the singularities is to apply a phase space slicing [25,26]. In the numerical integration the integration over the unresolved parton is cut-off to avoid the collinear and soft configurations. In the singular regions, soft and collinear factorisation can be used to simplify the matrix elements such that the integration can be done analytically. The singularities obtained in this way are then combined with the virtual corrections.
As far as the application of the Catani-Seymour subtraction method is concerned the situation is more involved: To allow the combination of the (integrated) subtraction term with the virtual corrections the jet algorithm or in general the observable is evaluated for the reduced kinematics in the Catani-Seymour formalism. The term which is added (and subtracted) thus reads: where D i j,k denote the dipoles defined in the Catani-Seymour subtraction method. Note that the mapping to obtain the jet momentaJ i from the parton momenta p i is encoded in the dipole. We are not free to chose the mapping in this case as this would result in a mismatch with the contribution integrated analytically and combined with the virtual corrections. The contribution from the subtracted dipoles can thus not be combined point wise with the real corrections calculated using eq. (2.15). In ref. [20] a similar conclusion regarding the application of Catani-Seymour dipole subtraction within that method is drawn.
So far we have assumed only final state singularities. The above approach can be easily extended to initial state singularities. The jet function needs to be extended to cover also initial state singularities. Using the different mappings as introduced in refs. [22,23] is sufficient to handle all different cases.

Phase space parameterisation
The parameterisation of the n + 1 particle phase space in terms of an n particle phase space times an 'unresolved' phase space follows the phase space factorisation as given in the context of the Catani-Seymour subtraction method [22,23]. As mentioned before, the mapping of the real phase space to the reduced kinematics defines the clustering prescription for the 3 → 2 jet algorithm, generalising the 2 → 1 clustering normally used. Since in the modified jet algorithm the resolution is not affected and the recombined jet momenta reproduce the naive soft and collinear limits 2 , the modified jet algorithm automatically fulfills infrared safety, factorisation of initial state collinear singularities, and momentum conservation while keeping the resulting jets on-shell. Furthermore, the phase space factorisation allows to span the respective real phase space associated with each point in the n-jet phase space in a straightforward manner.
Each combination of unresolved partons i, j picked by the resolution of the jet algorithm (Θ(y cut − y i j (p 1 , . . . , p n+1 )) = 1) and the spectator k selected through Θ k defines a specific mapping to cluster n + 1 partons to n jets (p 1 , . . . , p n+1 ) Depending on whether j and k are final or initial state particles (i is always a final state parton) there are four qualitatively different types of mappings which can be formulated for massless or massive particles [22,23]. To apply the method outlined in section 2 we need to invert these mappings. For a given set of on-shell jet momenta (J 1 , . . . , J n ) and a set of variables describing the unresolved phase space (Φ) we need the mapping to generate the n + 1 parton phase space. In the following subsections we collect the required formulae. As mentioned in the introduction related formulae can be found in refs. [24,27,28].
3.1 Final state clustering with final state spectator

Massless particles
As described in ref. [22] the phase space of n + 1 massless partons can be factorised in terms of a phase space of n massless momenta-which we identify with the jet momenta J i -and the dipole phase space measure dR i j,k related to the emission of an additional parton: with the n-particle phase space as defined in eq. (2.4). The incoming momenta are given by p a and p b . The dipole phase space measure as given in Ref. [22, (5.20)] reads in four space time dimensions The set of variables Φ = {φ, z, y} used to parameterise the phase space dR i j,k is discussed below. The phase space parameterisation corresponds to the following clustering of n + 1 partons to n jets which fulfill momentum conservation ( n i=1 J i = P) and the respective on-shell conditions (J 2 i = 0, i = 1, . . . , n). To invert the mapping (p 1 , . . . , p n+1 ) → (J 1 , . . . , J n , Φ) we first observe that the momentum p k and the momenta p m (m i, j, k) can be obtained in terms of the momenta J i and the variable y through To determine the missing momenta p i and p j we use The momenta p i and p j can be easily expressed in the rest frame of p i j rotated such that J j points along the positive z-axis. Using 2(J j · p i j ) = s i j the momentum J j in this particular frame is given by The momentum J j is obtained from the given J j by a boost into the rest frame of p i j and two subsequent rotations to annihilate the x and y component of J j : with the Lorentz transformations Λ r x (φ x ), Λ r y (θ y ), and Λ b (p i j ) given in the appendix. We used the hat to denote the parity transform of a four vector x:x = (x 0 , − x). The angles θ y and φ x are determined through (3.14) In this frame the momenta p i and p j read where we have used the definition [22, (5.6)] The momenta p i , p j follow from p i , p j by inverting the Lorentz transformations:

Massive particles
For massive partons i, j, k the phase space can again be factorised in terms of a phase space of n jets and the dipole phase space measure dR i j,k related to the clustered parton [23]: The n-jet phase space is again given by eq. (2.4), where some of the m i are non-zero now. The dipole phase space measure as taken from Ref. [23, (5.11)] reads in four dimensions with the Källén function defined by and The integration boundaries are given by [23, (5.13)] with the relative velocities between p i + p j and p i or p k [23, (5.14)] . (3.28) The phase space parameterisation corresponds to the following clustering of n + 1 partons to n jets [23, (5.9)] where we have used together with the definition Similar to the massless case it is convenient to express p i and p j in the rest frame of p i j = p i + p j rotated such that Q in the respective frame points along the positive z-axis. Using 2(Q · p i j ) = Q 2 + s i j − m 2 k the momentum Q in this particular frame is then given by Again Q is obtained from Q through a boost to the rest frame of p i j and subsequent rotations: where the angles are similar to eq. (3.13) and eq. (3.14). In this frame the momenta p i and p j read Using Under the exchange µ 2 i ↔ µ 2 j , z → 1 − z we have cos(θ ) → − cos(θ ) as it should be.
The momenta p i , p j follow again from p i , p j by The dipole phase space measure in four dimensions is given by [22, (5.48)] Note that dR i j,a includes an integration over x leading to a convolution of the measures given in eq. (3.44). The space parameterisation corresponds to the clustering of n + 1 partons to n jets J i = xp a + p b ) and the on-shell conditions (J 2 l = 0, l = 1, . . . , n). Inverting this clustering allows to parameterise n + 1 partons by means of the n jet momenta and three integration variables x, z, φ as follows and because of

50)
p i and p j can be calculated using the steps outlined in eq. (3.11) -eq. (3.19).

Massive particles
Using ref. [23] the phase space of n + 1 massive partons can be expressed as a phase space of n particles convoluted with the dipole phase space dR i j,a : dR n+1 p a + p b , p 1 , . . . , p n−1 , p j , p i = dR n xp a + p b , J 1 , . . . , J n−1 , J j dR i j,a . (3.51) The dipole phase space measure dR i j,a reads [23, (5.48)]: The integration boundaries are given by The phase space parameterisation corresponds to the clustering of n + 1 partons to n jets as in eq. (3.46) and eq. (3.47) but satisfying now the on-shell conditions J 2 j = m 2 i j and J 2 l = m 2 l for l j. To invert the mapping (p 1 , . . . , p n+1 ) → (J 1 , . . . , J n , Φ) (Φ = {x, z, φ}), we start again in the rest frame of p i j = p i + p j rotated such, that the momentum J j points along the positive z-axis. Using (p i j · J j ) = 1 2 (s i j − m 2 i j ) the corresponding momenta J j in the rest frame of p i j is given by The relation to J j is again given by a sequence of one Lorentz boost and two rotations: the momenta p i and p j are given by (3.60) p i and p j follow from p i and p j by inverting the Lorentz transformations given in eq. (3.57).

Massless particles
The phase space of n + 1 massless partons can be expressed as a phase space convolution of a phase space of n massless jets and the dipole phase space measure dR ia,k for the emission of an additional massless parton from the initial state with a massless final state spectator. Most statements from section 3.2 can be carried over by the replacements a → k and j → a (see Ref. [22]). However, since we now are dealing with clustering in the initial state, collinear singularities must be factorisable into the parton distribution functions. Because

Massive particles
The phase space of n massive partons and one massless parton can be expressed as a phase space convolution of a phase space of n massive jets and the dipole phase space measure dR ia,k for the emission of an additional massless parton from the initial state with a massive final state spectator. All statements from section 3.2 can be carried over by the replacements a → k and j → a, m i → 0 and m i j → m k (see ref. [23]). The argument from eq. (3.61) also holds.

Initial state clustering with initial state spectator
In case of initial state clustering with an initial state spectator the phase space can again be written as a convolution [22, (5.149)]: dR n+1 (p a + p b , p 1 , . . . , p n , p i ) = dR n (xp a + p b , J 1 , . . . , J n ) dR ia,b , (3.62) with the n-particle phase space given in eq. (2.4). The dipole phase space measure dR ia,b reads [22, (5.151)]: The phase space parameterisation corresponds to the following clustering of n+1 (massless/massive) partons to n (massless/massive) jets: and the Lorentz boost transforming K into K given by [22, (5.144)]: The inverse boost is obtained by exchanging K and K. All outgoing momenta p i are transformed to balance the transverse momentum. Momentum conservation ( n i=1 J i = xp a + p b ) and on-shell conditions (J 2 l = m 2 l , l = 1, . . . , n) are not affected by the boost. Inverting this clustering allows to parameterise n + 1 partons by means of the n jet momenta and three integration variables x, , φ as follows. The momenta p m (m = 1, . . . , n) are obtained by inverting the boost: it is straightforward to express the momentum p i in the rest frame of p a + p b rotated such that p a points along the z-axis. In this particular frame the momentum p i is given by (1, sin θ i cos φ, sin θ i sin φ, cos θ i ). the angle θ can be read off The momenta p i is obtained according to as in the previous cases with J j → p a . Note that no massive case as in the previous sections needs to be studied since the two incoming partons and the collinear parton are always assumed to be massless.

Consistency checks
To validate the approach, we apply the procedure outlined in the previous sections to two processes.
As an example, where one has to deal with initial state singularities, we study Drell-Yan production in hadronic collisions. More precisely, we calculate Drell-Yan production at the LHC running at a center-of-mass energy of √ s = 13 TeV. We apply phase space cuts similar to what is used in the LHC experiments. For simplicity, we veto any additional jet in the final state since we are only interested in the case where recombination occurs in the real corrections. The inclusion of the contribution due to an additional jet is straightforward since no recombination occurs. For the final state electrons, we require the invariant mass m ee of the electron pair in the region defined by 116 GeV< m ee < 3 TeV. Furthermore, we demand a minimum transverse momentum p ⊥ e > 25 GeV for the electron and restrict the rapidity of the electron to |η e | < 2.5. Unresolved initial state radiation is clustered with the beam according to section 3.4. As a second example, where one has to deal with final state radiation, we analyse top-quark pair production in e + e − annihilation. Similar to the Drell-Yan case, we veto again the emission of an additional jet. For the top-quark mass we use m t = 174 GeV. For the center-of-mass energy we choose √ s = 500 GeV relevant for a future linear collider. We do not include the decay of the top quarks, instead, we treat them as tagged top-jets. These jets are obtained with a kt-jet algorithm with the resolution criteria defined by and the resolution y cut set to y cut = 0.1. For the recombination of unresolved particles the modified 3 → 2 clustering prescription according to section 3.1 is used.
Although very simple, these two examples cover essentially all relevant cases. Furthermore, compact analytic results are available for the higher order corrections and it is straightforward to apply the ideas outlined in this article. For details on the NLO calculations using phase space slicing we refer to refs. [36,37].
Exclusively demanding n jets in the final state allows to define a differential n-jet event weight at NLO accuracy: We use the superscripts B, V and R to indicate the contributions from the Born matrix elements, the virtual corrections and the real corrections. In case of the real corrections a regularisation of the soft-and collinear singularities using the phase space slicing method is understood. The 'unresolved' contribution is included in the virtual corrections and cancels the respective soft and collinear singularities. We note, that the real corrections are calculated using eq. (2.15) which means that for each phase space point (J 1 , . . . , J n ) an additional three dimensional integration is required to obtain dσ R . To check the approach and the numerical implementation we use eq. (4.2) integrated over the phase space to calculate the total cross section. The results can be compared with the ones obtained by a standard parton level Monte Carlo. We have checked that the results using eq. (2.15) are in perfect agreement with the results of a conventional parton MC. One may argue that the comparison of the total cross section is not very sensitive to the details of the calculation and inconsistencies in specific phase space regions could escape detection. In fact eq. (4.2) can also be used to calculate arbitrary distributions: Again these contributions can be compared with the outcome of a parton level Monte Carlo. This comparison allows a detailed check of the entire phase space. We stress that in the parton level Monte Carlo the same modified jet algorithm (3 → 2 clustering!) has to be used. In figure 1 0  we collect various distributions calculated for Drell-Yan production. In particular, we show the angular and the energy distribution of the scattered electron/positron. In addition, the invariant mass distribution and the rapidity distribution of the e + e − system are given. The blue solid lines show the results obtained with a conventional parton level Monte Carlo. The red dashed lines show the results using the factorised jet phase space as illustrated in the previous sections. In the lower part of the plots we show the discrepancy between the two approaches in terms of standard deviations, where the uncertainty of each approach is due to the limited statistics of the Monte Carlo integrations. For the parton distribution functions we use the CT10nlo pdf set [38]. The center-ofmass energy is set to 13 TeV and we applied the aforementioned cuts. For all four distributions, we find perfect agreement between the two approaches. In most cases the discrepancy is less than one standard deviation. In figure 2 a similar analysis is shown for top-quark pair production in e + e − annihilation. In particular, we show distributions with respect to the cosine of the azimuthal angle of the outgoing top quark, the polar angle distribution, the transverse momentum distribution and the rapidity distribution. Again the blue solid curves show the results of a conventional parton level Monte Carlo while the red dashed curves give the results using the factorised jet phase space. Note that in both cases the modified 3 → 2 clustering is employed. Again we find perfect agreement between the two approaches.

Impact of NLO correctionsk-factors
It has been pointed out in ref. [18] that restricting the NLO analysis to the Born level kinematics may lead to rather moderate k-factors in general . In figure 3 we show the respective k-factors for the previously studied differential distributions. In case of the angular distribution of the outgoing  Figure 3. Impact of NLO corrections on differential distributions for Drell-Yan production at a hadron collider.
electron, NLO corrections at the level of only a few percent are observed. For the rapidity distributions they are slightly larger but still small in absolute size. For the energy distribution of the electron and the invariant mass distribution of the lepton pair the corrections seem to be larger. However, the k-factor suffers from statistical uncertainties and shows large fluctuations. In regions where the statistical fluctuations are small we find again a moderate k-factor. In figure 4 the k-factor for top-quark pair production in e + e − annihilation is shown. In all cases we find NLO corrections of a few per cent only and thus k-factors very close to one. We thus extend the observations of ref. [18] also to final state radiation.

Impact of modified clustering / jet algorithms
All the previously shown differential distributions have been obtained using the modified jet algorithm: in the conventional parton level Monte Carlo as well as in the alternative approach using a factorised jet phase space. As pointed out in the introduction we consider the modification of the clustering as part of the intrinsic ambiguities of jet algorithms. As a consequence we do not expect a large effect of the clustering. If in contrast a large effect is observed one should question the definition of the observable since it shows a large sensitivity on an aspect of the jet algorithm tt-prod. e + e − → γ * /Z * → tt √ s = 500 GeV, kt3→2-Alg., y cut = 0.1 Figure 4. Impact of NLO corrections on differential distributions for top-quark pair production in e + e − annihilation.
which is not well defined. Since it is difficult to make general statements about the size of possible effects, one should investigate the impact of the new clustering on a case by case study to make sure that no large deviations are observed. In figure 5 the conventional 2 → 1 clustering which does not respect the on-shell condition of the clustered objects is compared for Drell-Yan production with the 3 → 2 clustering advocated here. Note that both results have been obtained using a conventional parton-level MC. The blue solid curves show the result using the 2 → 1 clustering, while the red dashed lines give the results for the 3 → 2 clustering. Since for Drell-Yan production the clustering never includes the outgoing electron we do not expect a major effect. Indeed figure 5 shows essentially no difference within the statistical uncertainty. A minor effect is visible in the angular distribution and in the rapidity distribution. This can be related to the initial state clustering which may introduce an additional boost orthogonal to the beam axis which can influence the angular distributions. In figure 6 the corresponding result is shown for top-quark pair production. For the angular distribution the two different algorithms give the same result within the statistical uncertainties. For the transverse momentum distribution a large effect is visible at large transverse momentum. This is not surprising since at phase space boundaries we expect to become sensitive to the details of the clustering. Below 160 GeV we observe that the 3 → 2 clustering leads to distributions which are between two and five per cent below the traditional 2 → 1 combination. In p ⊥ e > 25 GeV, |ηe| < 2.5, CT10nlo Figure 5. Impact of 3 → 2 clustering with respect to 2 → 1 clustering on differential distributions for Drell-Yan production at a hadron collider.
general the 2 → 1 clustering leads to an increase of the mass of the clustered object which could be responsible for the observed pattern. To analyse this effect we show in figure 7 the distribution of the mass of the top-quark jet using the conventional 2 → 1 clustering. As one can see most of the events have a jet-mass close to the nominal top-quark mass. However, there are also events with jet masses up to 280 GeV. Note that using the modified clustering the jet mass is fixed to the top-quark mass. In particular at phase space boundaries, the difference in the jet mass may result in distortions of distributions which are sensitive to mass effects. This is precisely what we observe in the lower plots of figure 6 where one can see that indeed the largest effects arise at the phase space boundary. This is not surprising since minor changes in the mass of the clustered objects become important. Since the distributions are normalised this effect introduces also a modification in the distribution away from the phase space boundary. Using cuts to avoid the phase space boundaries should thus result in smaller differences between the two different clustering prescriptions. This is illustrated in figure 8 where we show the same distributions but now using additional cuts. Again the blue solid line shows the conventional 2 → 1 clustering while the red dashed line shows the alternative 3 → 2 clustering. Indeed we find that the difference becomes smaller and is of the order of 1% only, which might be seen as an intrinsic uncertainty. To close this section we stress that minor differences between the two clusterings are not per se problematic as long as everything is GeV, kt-Alg., y cut = 0.1 Figure 6. Impact of 3 → 2 clustering with respect to 2 → 1 clustering on differential distributions for topquark pair production in e + e − annihilation. done consistently and the same clustering is used in the experimental analysis. tt-prod. e + e − → γ * /Z * → tt √ s = 500 GeV, kt-Alg., y cut = 0.1, |yt| < 0.65 Figure 8. Same as figure 6 but with additional cuts to avoid the phase space boundaries.

Application
In this section we apply the MEM to top-quark pair production in e + e − annihilation. First we use the aforementioned procedure to generate unweighted events with NLO accuracy. These events are then analysed using the MEM in LO and NLO accuracy. In particular, we illustrate the extraction of the top-quark mass from the event sample.

Generating unweighted jet events
For a chosen jet algorithm with a preset value of the resolution y cut it is now straightforward to generate unweighted jet events using an 'acceptance-rejection' algorithm. The respective NLO jet weight is given by where the right hand side is evaluated according to eq. (4.2). The acceptance-rejection method requires an upper boundary ρ max of the weight ρ (J 1 , ..., J n ). This can be obtained for example within a phase space integration. An n-jet candidate event is then constructed using (3n−4) random numbers. As a measure for the probability the weight introduced in eq. (5.1) is calculated for the candidate event. Note that a three dimensional integration must be performed to do so. Generating an additional uniformly distributed random number r u between 0 and ρ max the candidate event is accepted if r u is below the aforementioned weight. In principle it is also possible to generate unweighted NLO n-jet events (J 1 , ..., J n ) together with n + 1-jet events (J 1 , ..., J n+1 ) from n + 1 partons (p 1 , ..., p n+1 ) by augmenting the definition of ρ: The jet functions F n+1 J 1 ,...,J n and F n+1 J 1 ,...,J n+1 decide whether n or n + 1 jets are resolved and how the momenta p i are clustered into the jets. The n + 1-jet events (J 1 , ..., J n+1 ) are obtained by the identification The n-jet events (J 1 , ..., J n ) follow from clustering by the 3 → 2 jet algorithm (p 1 , ..., p n+1 ) → (J 1 (p 1 , ..., p n+1 ), ..., J n (p 1 , ..., p n+1 )) .
The main difference with respect to the previously described event generation is, that now n + 1 parton momenta are generated using (3(n + 1) − 4) random numbers. While this method will generated n-jet events with NLO accuracy we stress that the generated n + 1-jet events have only LO accuracy.
To validate the generation of unweighted events, we reproduce the differential distributions calculated in section 4. In total we generated 73128 events with NLO accuracy. As in section 4 we veto the emission of an additional jet. In figure 9 we the parton-level Monte Carlo while the red dashed lines show the distributions calculated from the unweighted jet events generated as described above. Below we show the difference between the two distribution in units of one standard deviation. As one can see we find perfect agreement within the statistical uncertainties.

Matrix Element Method
The possibility to generate unweighted jet events with NLO accuracy together with the possibility to assign NLO event weights to them, allows to perform a validation of the Matrix Element Method at NLO using the generated events as input to a toy experiment. As a concrete example we illustrate the extraction of the top-quark mass in e + e − → tt employing the MEM at NLO. We note that this study may be relevant for the top-quark mass measurements at a future linear collider.
As mentioned before we ignore for simplicity the top-quark decay and assume that top-(anti)quark jets are observed. An event is than defined by the energies and angles of the respective jets (E t , cos θ t , φ t , Et, cos θt, φt). This fixes the jet momenta depending on the top quark mass m t as Jt = (Et, |pt| cos φt sin θt, |pt| sin φt sin θt, |pt| cos θt) , Exclusively demanding a top-quark pair without an additional jet fixes the energies E t = Et = √ s/2. A 2-jet NLO event weight for x = (cos θ t , φ t , cos θt, φt) can be obtained according to is generated for some 'true" top-quark mass m true t = 174 GeV. The NLO likelihood L NLO for the sample can be constructed from the differential 2-jet cross section as follows where the dependence on m t is shown explicitly and J t and Jt follow from x i according to eq. (5.5). Note that the jet momenta when evaluated for the event x i depend on the mass m t . The negative logarithm of the likelihood (or "Log-likelihood") therefore reads Maximising (minimising) this likelihood (Log-likelihood) with respect to m t yields an estimator m t The lefthand plot of figure 10 illustrates the Log-likelihood for 73128 events generated with NLO in perfect agreement with m true t . We stress that in both cases we use the same unweighted events generated with NLO accuracy. In principle the discrepancy between m LO t and m NLO t is not surprising since we used in m LO t LO predictions to analyse events generated with NLO accuracy. What is however remarkable is the large size of the effect. As has been illustrated in section 4 the NLO corrections are usually small for most of the distributions. Nevertheless we observe a large effect using LO or NLO predictions within the Matrix Element Method. From the above results we may conclude that the Born matrix element evaluated for m t = 178 GeV gives a better approximation of the NLO corrections evaluated for m t = 174 GeV than the Born approximation evaluated for 174 GeV. To investigate this point further we show in figure 11 the comparison of the two predictions. Obviously the NLO corrections cannot be completely absorbed by changing the mass in the LO predictions. Comparing however the black with the gray line in figure 11 we find that indeed m t = 178 GeV gives a slightly better description of the NLO result. In the range −1 < cos θ t 0.75 the difference is below 1% and the maximal deviation at cos θ t = 1 is 4%. The difference is below 1% in the range −1 < cos θ t 0.38 and the maximal deviation at cos θ t = 1 is 6% when m t = 174 GeV is used in the Born approximation.
In view of a future linear collider we stress that the renormalisation scheme is well defined in the above procedure. Applying the above procedure to realistic data, the top-quark mass within the pole mass scheme would be determined. In figure 12 we show the consistency of the approach. In particular, we illustrate that m NLO t provides indeed an unbiased maximum likelihood estimator (ref. [40]). The lefthand plot of figure 12 shows the distribution of the estimator m NLO t if we interpret our event sample with 73128 events as 203 independent toy experiments with 360 events each. The dashed line shows a gaussian fitted to the data. The righthand plot illustrates how increasing the number of events results in a tightening of the dip in the Log-likelihood around the true value of m t and the approximate scaling of the error of the estimator ∆ m t ∝ N − 1 2 (see bottom of righthand plot).
As a final remark we comment on the impact of the modified jet algorithm. Top-quark pair production in e + e − annihilation is highly constrained through momentum conservation and the underlying symmetries of the interaction. Most of the sensitivity to the top-quark mass stems essentially from information contained already in in the cos(θ t ) distribution. On the other hand for this distribution the two clustering algorithms give the same result at the permille level as we have shown in figure 6. As a consequence we do not expect major differences in case the conventional clustering would be used in the experimental analysis.

Conclusion
In this article we have shown how to calculate event weights for jet events at NLO accuracy. The ability to define event weights at NLO is a necessary prerequisite to extend the Matrix Element Method beyond the Born approximation. The basic ingredient of the method presented here is a modification of the clustering prescription used in jet algorithms. Instead of using the conventional 2 → 1 clustering, where the momentum of the clustered object is just the sum of the two initial jet candidates, we use a recombination inspired by the phase space mapping used in the Catani-Seymour subtraction method. This leads naturally to a factorisation of the phase space for the real corrections into resolved and unresolved contributions. Furthermore, the factorisation allows to in-tegrate numerically the contribution of the unresolved configurations after an appropriate regulator to handle the mass and soft singularities has been chosen. Similar ideas have been investigated in a different context already in refs. [24,27,28]. The major difference is that we consider the new clustering not only as a technical trick but as an essential part of a modified jet algorithm. Using this modified clustering no artifical jet mass is generated and it is straightforward to map the events obtained onto the born kinematics. As validation of the proposed method, we have successfully reproduced differential distributions at NLO accuracy in Drell-Yan production and top-quark pair production in e + e − annihilation. Although simple, these two examples cover essentially all relevant cases. We have also investigated the impact of the modified jet algorithm. At phase space boundaries the effects can be large. Additional cuts can be used to reduce the impact. The remaining effect may be considered as an intrinsic uncertainty inherent to jet algorithms. For the examples studied here the effect is reduced to the per cent level, after applying cuts. We stress however that in hadronic collisions the situation could be different and one needs to investigate the impact of the new clustering on a case by case study. As a further application we have studied as a toy example the Matrix Element Method at NLO applied to top-quark pair production in e + e − annihilation.
More precisely, we investigated the determination of the top-quark mass. This study is relevant for a possible future Linear Collider. Applying the Matrix Element Method to events generated with NLO accuracy we observe that the MEM in LO fails to reproduce the input value. While the NLO analysis correctly reproduces the input with an uncertainty of about 1 GeV for about 70000 simulated events, the LO analysis leads to a value off by 4 GeV. These findings should be taken into account, when top-quark mass measurements at the Tevatron using the MEM are discussed. Let us end with a final remark concerning parton shower corrections. As mentioned in section 2 the naive inclusion of corrections due to the parton shower would lead to a double counting. Further studies are required to extend the method presented here in this direction.

Note added:
While we were in the process of writing this article ref. [41] appeared, where an extension of the jet algorithm in e + e − annihilation to massless quarks similar to what is discussed here has been presented.

A Explicit form of the Lorentz transformations
For a given four vector X with X 2 0 the rotational free boost from the rest frame of X to the system where X takes the form as given reads when applied to the four momentum y: DefiningX = (X 0 , − X) the boost from the frame in which X is given to the rest frame is given by Λ b (X). In fact, eq. (A.1) is a special case of the more general boost given in eq. (3.66): The Lorentz transformations for rotations around the x and the y axis are given by (A.4) For the product we have 1 0 0 0 0 cos θ sin θ sin φ − sin θ cos φ 0 0 cos φ sin φ 0 sin θ − cos θ sin φ cos θ cos φ (A.5)