The Matrix Element Method at next-to-leading order QCD for hadronic collisions: Single top-quark production at the LHC as an example application

In a recent work the authors have presented a general algorithm to extend the Matrix Element Method (MEM) to the hadronic production of coloured partons taking into account next-to-leading-order (NLO) corrections in quantum chromodynamics (QCD). In this article, the general algorithm is applied to the production of single top quarks at the LHC. In particular, the generation of unweighted events following the NLO predictions is presented. Treating these events as the result of a toy experiment we illustrate the first application of the Matrix Element Method at NLO QCD for hadronic jet production. As a concrete example, we study the determination of the top-quark mass. We show that the inclusion of the NLO corrections can lead to sizeable effects compared to the Matrix Element Method relying on leading-order predictions only and that the incorporation of the NLO corrections is mandatory to obtain reliable estimates of the theoretical uncertainties. In addition, we find that measuring the top-quark mass using the MEM in single top-quark production offers an interesting alternative to mass measurements in top-quark pair production.


Introduction
In recent years multivariate methods have been proven to be instrumental for the data analysis at the Large Hadron Collider (LHC). Neuronal networks (NN), boosted decision trees (BDT) and the Matrix Element Method (MEM) as specific examples belong to the standard techniques employed in the LHC analysis today. Among these methods the Matrix Element Method is of particular interest since the approach provides a very clean statistical interpretation based on a direct link between theory and event reconstruction. It can be used for signal-background dis-crimination as well as for parameter extraction. The Matrix Element Method as introduced in ref. [1,2] relies on the assumption that the differential cross section can be interpreted as the probability distribution to observe a particular event signature. Under this assumption, the cross section is used to define a likelihood function. Parameter determinations can be performed by maximising this likelihood with respect to the model parameters used in the theoretical predictions. Signal-background discrimination can be performed by defining likelihood ratios for signal+background and the background only hypotheses. Both aspects have been pioneered in the top-quark analysis at the Tevatron. In the early days of the top-quark discovery the Matrix Element Method was used to determine the top-quark mass based on O(100) events only [3,4,5]. See ref. [6] for an introduction to the Matrix Element Method in the context of top-quark mass measurements. In the single top-quark studies the method was instrumental to separate the signal from the overwhelming background (see e.g. ref. [7]). More recent applications can be found in refs. [8,9,10,11,12,13,14,15,16]. An automation of the Matrix Element Method at leading order is presented in ref. [17].
The strength of the Matrix Element Method, namely offering a clean statistical interpretation, also points to a potential weakness: The method will fail to produce reliable results if the theoretically calculated cross section does not provide a decent description of the underlying probability distribution. In this case the statistical analysis shows that in general parameter extraction with the Matrix Element Method no longer provides an unbiased estimator. In principle, this does not preclude the usage of the Matrix Element Method for parameter extraction, since the bias can be removed by an additional calibration procedure. However, it would make the application of the Matrix Element Method less attractive because the calibration introduces additional uncertainties and might rely on auxiliary modelling. Facing the increasing precision current experiments are aiming for, it is thus mandatory to include higher-order QCD corrections to the cross section predictions used in the Matrix Element Method. However, so far the Matrix Element Method as applied in the experimental analysis relies on leading-order cross section predictions only. In recent years various attempts have been made to extend the Matrix Element Method to NLO. In ref. [18] the effect of QCD radiation is studied as a first step in this direction. In refs. [19,20] the information from the hard LO matrix element is combined with a parton shower. In ref. [21] a method is presented to consider NLO QCD corrections within the Matrix Element Method for the production of uncoloured final states. A first application is studied in ref. [22]. In ref. [23] the inclusion of hadronic production of jets by mapping NLO and LO jets with a longitudinal boost along the beam axis to remove the unbalanced transverse momentum is investigated. However, no general algorithm to include NLO QCD corrections for the hadronic production of coloured partons within the Matrix Element Method is presented. Only very recently a complete algorithm to systematically include NLO QCD corrections within the Matrix Element Method has been presented in ref. [24]. As a proof of concept the Drell-Yan process and top-quark pair production in e + e − -annihilation has been studied in ref. [24]. While ref. [24] relies on a wellmotivated modification of the clustering prescription used within the jet algorithm, it was shown in ref. [25] how the method can be extended avoiding the modification of the clustering prescription. However, the extension presented in ref. [25] relies on the numerical solution of a system of non-linear equations and the proof of the feasibility of this approach in practical applications is still missing.
The examples studied in ref. [24] were chosen to offer sufficient complexity to test all the aspects of the proposed algorithm while still being simple enough to not pose any challenge to the numerical implementation. However, phenomenologically wise the two examples are of limited interest although the studies of top-quark pair production in e + e − -annihilation may provide useful information for a future linear collider. In this article we apply the Matrix Element Method at NLO QCD to hadronic single top-quark production as a particular example for the general case that coloured partons occur in the initial as well as the final state. Because of the large backgrounds single top-quark production is very challenging and the usage of the Matrix Element Method is well motivated. Although top-quark production in hadronic collisions is dominated by top-quark pair production, single top-quark production plays a central role in top-quark physics at the LHC, since it provides complementary information. In particular, single top-quark production is a unique source of polarised top-quarks and allows detailed tests of the V − A structure of the coupling to the W-boson and gives access to the Cabbibo-Kobayashi-Maskawa matrix element V tb .
In leading order three different production modes are distinguished in the Standard Model (SM): At the LHC single top-quark production is dominated by the t-channel followed by the Wtchannel. The t-channel total cross section has been precisely measured at the LHC at 7, 8 and 13 TeV by both CMS and ATLAS [26,27,28,29,30,31,32,33,34].
The s-channel production gives only a very small contribution at the LHC. Because of the complicated backgrounds the signal is difficult to extract and only recently evidence has been reported by the ATLAS collaboration [11]. In ref. [11] the usage of the Matrix Element Method has been proven instrumental to extract the signal. The independent measurement of t-and schannel production provides useful information to constrain physics beyond the Standard Model (see for example ref. [7] for an overview of recent results). Anticipating further experimental progress, we present in this article the application of the Matrix Element Method to s-and tchannel production including NLO QCD corrections, relying on existing results for the NLO corrections. Next-to-leading order results for the s-and t-channel have been known for quite a while (see e.g. ref. [35,36] and are available in numerical implementations like MCFM (see ref. [37]) or HATHOR (see ref. [38]).
In this article we apply the algorithm presented in ref. [24] to single top-quark production at the LHC. We present the first application of the MEM at NLO to the hadronic production of jets originating from coloured final state partons. As a proof of concept, we generate unweighted sand t-channel single top-quark events following the NLO predictions (with and without a veto on an additional resolved jet) and apply the MEM at NLO to reproduce the top-quark input mass from these events.
The article is organised as follows. In section 2 we give a brief review of the Matrix Element Method. In particular, we focus on aspects relevant for incorporating NLO QCD corrections.
Furthermore, we give a short description of the approach presented in ref. [24] to extent the Matrix Element Method to NLO accuracy. In section 3 we discuss the generation of unweighted events following the NLO cross section predictions. By comparing with a traditional parton level Monte Carlo we also validate the entire procedure. We study an inclusive as well as an exclusive event definition. The exclusive event definition in which we veto additional jet emission is used to allow a more detailed test of the procedure. Interpreting the unweighted events generated in section 3 as the result of a pseudo-experiment, we illustrate the application of the Matrix Element Method including NLO QCD corrections in section 4. As a detailed test of the method, we simulate a measurement of the top-quark mass. The main results are summarised in section 5.

The Matrix Element Method in the Born approximation
To set up the notation used in this article, we briefly review the essence of the Matrix Element Method in the Born approximation. A detailed description focusing on top-quark mass measurements with the Matrix Element Method is given for example in ref. [6]. The basic idea is to treat the normalised differential cross section calculated from the squared matrix elements as the probability distribution to observe a specific event in the final state. We may think of a parton event as a collection of partonic momenta. However, for the following discussion it turns out to be useful to generalise the event definition to an arbitrary collection of variables used to describe the event. This set of variables which we may collect into a vector x does not need to be complete in the sense that all available information about the event can be reconstructed. The dimension r of the vector x may thus be smaller than the number of independent variables 3n − 4, where n denotes the number of outgoing partons. In the spirit of the Matrix Element Method the probability distribution for the production of an event described by x is then described by the (parameter-dependent) differential cross section The vector α collects all the parameters e.g. couplings and masses on which the cross section depends. The normalisation factor σ( α) is given by the (parameter-dependent) fiducial cross section: The probability distribution P( x, α) is thus normalised to one. The differential cross section is defined schematically through As usual, the F i/H 1 (x) denote the parton distribution functions and s had the hadronic centre-ofmass energy squared. The sum runs over all possible parton configurations (i j) in the initial state. If more than one sub-process contributes to a given event signature (e.g. background) the right hand side of eq. (2.3) needs to be extended to include a sum over all contributing subprocesses. The partonic variables used to parameterise the n-parton phase space measure dR n as well as the squared matrix elements |M i j | 2 are collected in y. The function x( y,x 1 ,x 2 ) maps the integration variables to the variables x used to describe the partonic event. In experiments we do not observe partons but hadrons and leptons. Using again a collection of variables X to describe the hadronic event as observed in the experiment the probability distribution reads: where the 'transfer function' W( x, X) describes the probability to observe a partonic event x as an hadronic event X. The transfer functions have to be normalised in order to ensure the correct normalisation of the probability density. In most applications one assumes that W factorises into a product of transfer functions describing the transition of individual partons and leptons. For leptons where the energy and the direction of flight can be measured with good accuracy the transfer functions are often modelled using narrow Gaussian distributions. For unobserved particles the transfer function is replaced by 1 leading to an integration of the related variables over the full phase space and the aforementioned reduction of variables used to describe the differential cross section. For simplicity, we assume in the following an ideal detector where the transfer functions are replaced by δ-functions. Note that the transfer functions model a variety of different effects. They form an interface between theoretical predictions and experimental observations and thus bridge the gap between the two. Their impact may be reduced by improving the theoretical modelling (e.g. considering W + W − bb instead of tt final states) or including higher-order corrections in the predictions. Furthermore, it is conceivable that the experimental data could be unfolded to some intermediate level further reducing the impact of the transfer functions. Using δ-functions may thus provide a useful first approximation. As far as the conceptual aspects of the method used in this article are concerned, the inclusion of non-trivial transfer functions will not change any of the arguments presented here. Non-trivial transfer function would only add additional integrations. In practice however, the inclusion of more realistic transfer functions may lead to additional technical problems as far as the numerical integration is concerned. This is well known from the application of the Matrix Element Method in leading-order. We thus consider the results presented in this article only as a first step towards a more realistic analysis. Additional technical improvements may be required to include transfer functions as determined in the experimental analysis.
We now briefly review the possibilities to use the Matrix Element Method for signal-background discrimination and parameter estimation. In addition, we introduce the concept of the extended likelihood and collect some useful properties of estimators extracted with the Matrix Element Method.
Signal-background discrimination As mentioned before, eq. (2.3) may include signal as well as background processes. In ref. [39] it has been argued that calculating the probability distribution independently for signal and signal plus background an optimal discriminator can be constructed using the ratio. Thus, the Matrix Element Method makes optimal use of the information contained in an event sample. Recent experimental work on establishing s-channel single top-quark production provides a nice example [11] of the usefulness of the Matrix Element Method in establishing "signals".
Parameter extraction However, this is not the only application of the Matrix Element Method.
Since P( X, α) in eq. (2.4) depends on the theory parameters α the Matrix Element Method can also be used to determine the model parameters. For a given event sample { X 1 , . . . , X N } the likelihood is a function of the model parameters. Maximising the likelihood with respect to the model parameters α yields an estimator for the true parameters. This technique has been applied with great success to the top-quark mass determination at the Tevatron collider (see refs. [3,4,5]).

Extended likelihood
The likelihood as defined in eq. (2.6) only depends on the normalised differential cross sections and therefore only on the relative distribution of the events in the event sample { X 1 , . . . , X N }. The total cross section σ( α) and therefore the expected number of observed events may also depend on the model parameters. To make use of the additional information contained in the size N of the event sample the so-called extended likelihood function can be applied (see e.g. ref. [40]). Here the number of observed events N is assumed to be a random number distributed according to a Poisson distribution with expectation value ν( α) = σ( α)L, where L denotes the integrated luminosity of the collider. The extended likelihood function is given by the likelihood in eq. (2.6) multiplied with the Poisson probability to observe an event sample of size N under the parameter-dependent model assumption (here incorporated in σ( α)): In the extended likelihood the normalisation in front of the differential cross section cancels. The Matrix Element Method employing extended likelihoods is thus not only sensitive to the relative distribution but also to the total number of observed events. Using the additional information contained in the absolute number of observed events may result in a more efficient estimator when the Matrix Element Method is used within the context of parameter determination. However, the integrated luminosity L is needed in eq. (2.7) which may introduces an additional uncertainty potentially spoiling the gain in efficiency. We shall come back to this issue when we discuss as an illustration the top-quark mass extraction in detail in section 4.2.

Properties of Matrix Element Method estimators
Since the likelihood as defined in eq. (2.6) depends on the measured event sample {X 1 , . . . , X N } which follows a probability distribution and can thus be seen as a collection of random numbers, the Maximum Likelihood estimator α({X 1 , . . . , X N }) is itself a random number with expectation value E[ α] and variance V[ α].
The variance of an estimator is bounded from below by the Rao-Cramér-Fréchet inequality (see ref. [41]). In the large sample limit Maximum Likelihood estimators approach the Rao-Cramér-Fréchet bound making them in a sense maximally efficient. Furthermore, in this limit Maximum Likelihood estimators exhibit 'asymptotic normality', meaning that they are distributed according to a Gaussian normal distribution with µ = E[ α] and σ 2 = V[ α] (see ref. [41]). A potential disadvantage of parameter extraction based on Maximum Likelihood is that the estimator is prone to have a non-vanishing bias if the analysed data is not distributed exactly according to the assumed probability density.
In practice, such biases can be accounted for by the experiments through a calibration of the method, however, at the cost of introducing associated uncertainties. Therefore it is preferable to reduce the bias by modelling the true probability density as accurately as possible. This is an important motivation to extend the Matrix Element Method to NLO accuracy.

The Matrix Element Method at next-to-leading order accuracy in QCD
It is well known that for many processes the Born approximation gives only a rough estimate and higher-order corrections are sizeable. Furthermore, high energetic hadron scattering leads to an increase in jet activity which needs to be modelled appropriately unless the additional jet activity is vetoed which would however lead to a reduction of the event sample and may also spoil the convergence of the perturbative expansion. Higher-order corrections lead also to a better modelling of the jets and are mandatory to obtain more realistic predictions. In Born approximation each parton is identified with a jet. Only beyond leading order the recombination of two partons to form a jet occurs. It is thus highly desirable to include higher-order corrections in the Matrix Element Method. When the Matrix Element Method is used for parameter determinations higher-order corrections are crucial: in general, next-to-leading order corrections are required to unambiguously define the renormalisation scheme. As long as only leading-order calculations are used, the renormalisation scheme is not defined.
Extending the Matrix Element Method to include higher-order corrections is however nontrivial. To understand the origin of the difficulties we start with the general structure of NLO corrections. Schematically, the differential cross section including NLO corrections reads: We denote with dσ BV (dσ R ) the Born and virtual (real) contributions to the next-to-leading order differential cross section. F n J 1 ,...,J n (F n+1 J 1 ,...,J n ) defines the jet function which implicitly introduces a mapping of the n (n + 1) parton momenta to the n jet momenta and implements phase space cuts depending on the jet momenta. F n+1 J 1 ,...,J n+1 defines the jet function for the (n + 1)-parton phase space in which all partons are resolved as individual jets and no recombination occurs. It is in fact through F n+1 J 1 ,...,J n that a jet is modelled for the first time in a non-trivial way within perturbation theory. Together with the contribution due to F n+1 J 1 ,...,J n+1 this leads to the aforementioned improved description of additional jet activity.
As stated above, no recombination can occur in leading order and the jet momenta are identified with the parton momenta: (2.10) The jet functions for identifying the n partons with n jets have the form with functions Θ i which encode whether the n jet momenta are resolved as a n-jet final state.
(The same is also true for the contribution involving F n+1 J 1 ,...,J n+1 .) Since in leading order the mapping is trivial it has been omitted in eq. (2.3) although the jet function might be required to render the cross section finite in cases in which the leading-order predictions contain already soft and/or collinear singularities (e.g. pp → tt j). The virtual corrections can thus be treated in a similar way as the Born approximation, ignoring for the moment that they may contain soft and collinear singularities which must be cancelled. In case of the real corrections, the situation is more involved since now recombination can occur and the jet momenta are in general non-trivial functions of the partonic momenta with the mapping depending on the phase space region J i = J i (p 1 , . . . , p n+1 ), i = 1, . . . , n. (2.12) The jet functions for the recombinations of n + 1 partons to form n jets have the form Θ i (J 1 (p 1 , . . . , p n+1 ), . . . , J n (p 1 , . . . , p n+1 )) (2.13) where the mapping from the n + 1 parton momenta to the n jet momenta is given by the functional dependences of the jet momenta on the parton momenta in eq. (2.12). Note that the differential cross section is differential in 'jet variables' and not in partonic variables: We use the function x(J 1 , . . . , J n ) to relate the variables used to describe the differential cross section to the jet momenta. Strictly speaking this detour is not required and one could avoid introducing jets as long as the definition of variables x used to describe the final state is infrared safe. Using jet momenta in intermediate steps however, simplifies the construction of infrared safe distributions and leads to a general algorithm to extend the Matrix Element Method to NLO accuracy. Inspecting eq. (2.9) in more detail various problems, preventing a straightforward application, become obvious: 1. The first two contributions are individually ill-defined because of soft and collinear singularities. Only in the sum of the two a finite result is obtained.
2. Being differential in the measured jet observables x introduces conditions on the recombination of unresolved real parton momenta to form respective jet momenta corresponding to the fixed values in x. These conditions prevent an efficient numerical integration of the real contribution.
3. In the n-parton contribution the jets are identified with the partons. As a consequence the jet momenta satisfy the n-parton kinematics, i.e. the jet momenta respect the on-shell condition as well as four-momentum conservation. For the recombined jets obtained from n + 1-parton momenta this is in general not the case. In particular, defining the momentum p j of the recombined jet as the sum of the combined parton momenta violates the on-shell condition unless the recombined momenta are strictly collinear. As a consequence the jet momenta cannot be used to evaluate dσ BV which would be the naive way to establish a point wise cancellation of the soft and collinear divergences. 1 In ref. [24] it has been argued, that the last two obstacles can be overcome using modified jet algorithms relying on a 3 → 2 clustering instead of a 2 → 1 recombination usually applied. The additional spectator particle allows to guarantee four-momentum conservation and the on-shell conditions at the same time. In practice, a recombination inspired by the phase space parameterisation used in the Catani-Seymour dipole subtraction method [42,43] is used in ref. [24]. This technique allows a factorisation of the n + 1-parton phase space measure into an unresolved and a resolved contribution. Schematically, this factorisation reads: where dR n ( y) denotes the phase space of the n jets obtained after recombination, and dΦ parameterises the unresolved phase space leading to a clustering of two partons. The phase space measure dR n ( y) can thus be identified with the one occurring in the leading-order calculation. Using this factorisation the second and third issue are solved. To address the first issue we advocated in ref. [24] to use the so-called Phase Space Slicing Method. For more details we refer to ref. [24]. At this point a further remark concerning the use of the Phase Space Slicing Method might be in order. It is well known that the Phase Space Slicing Method is often numerically challenging and requires significant computing time unless the phase parametrisation is carefully chosen and optimized to the specific process. For more involved processes this may lead to severe restrictions concerning the applicability of the method applied here. We stress however, that the usage of the Phase Space Slicing Method is not mandatory and could be avoided. In general, subtraction methods may be used as well, as long as the method itself does not interfere with the phase space factorisation used here. This excludes the dipole subtraction method a la Catani and Seymour (see refs. [42,43]) as has been pointed out in ref. [24]. However, the Frixione-Kunszt-Signer type subtraction (see refs. [44,45]) should be applicable. This shall be studied in the future.
In the following section we provide some further details on the modified jet algorithm and the phase space parameterisation used in this article. In ref. [25] an alternative method to circumvent the two last issues has been presented to avoid introducing modified jet algorithms. However, no implementation of the method has been presented so far and it remains to be shown that the practical application of this approach is not limited by the required computer resources. For the purpose of this article we stick to the method presented in ref. [24] to explore the potential of the Matrix Element Method extended to NLO accuracy and leave the implementation of the method presented in ref. [25] for future studies.

Modified jet algorithm and phase space parameterisation
As mentioned before we augment an existing 2 → 1 algorithm by using a modified 3 → 2 clustering. As an example we study the k t -jet algorithm for hadron colliders with the resolution criteria defined by (see ref. [46]) Based on these resolution criteria we define a 3 → 2 clustering algorithm as follows: 1. Pick final state partons i and j or final state parton i and beam B with minimal d i j or d iB to be clustered.
2. a) If d i j is minimal, pick spectator parton from final state (k) or beam (a).
b) If d iB is minimal, pick beam particle (a) and spectator parton from final state (k) or beam (b).
After clustering, only the jets that pass the experimental cuts p ⊥ > p ⊥ min , |η| < η max (2.16) are kept in the list of resolved jets. In contrast to the widely used anti-k t -jet algorithm (see ref. [47]) the k t -jet algorithm can be formulated in an 'exclusive variant' where exactly a desired number of jets is required to be resolved which is strongly discouraged for the anti-k t -jet algorithm 2 . For example, we define the 'exclusive single top-quark production' by requiring a resolved top-tagged jet 3 accompanied by exactly one light jet We stress that by requiring the signal signature in eq. (2.17) we focus on the part of the NLO corrections which leads to the same signal signature as the Born process. On the contrary, in the 'inclusive variant' of the jet algorithm also events with several jet multiplicities are included. Correspondingly, we define the 'inclusive single top-quark production' by requiring a resolved top-tagged jet accompanied by at least one light jet Contrariwise to the exclusive case the signal signature in eq. (2.18) allows to include the real contribution corresponding to an additional resolved jet X in the NLO corrections.
Compared to the usual definition of sequential 2 → 1 jet algorithms (cf. e.g. ref. [49]) only step three is modified in the 3 → 2 jet algorithm used here. Instead of the four-momentum sum p i + p j the partons are clustered using the 3 → 2 mapping introduced within the dipole subtraction method to form the jet momentum J j (p i , p j , p k ). Note that the clustering offers the additional freedom to choose a spectator particle. In principle, it is possible to always choose the same spectator particle (as long as the particle itself is not collinear or soft). However, making different choices for the spectator particle, the additional freedom can be used to reduce the difference of the modified clustering prescription with respect to the conventional 2 → 1 recombination p i + p j . In order to quantify the difference between the two schemes we study the difference between the respective clustered jet momenta for a recombination of two unresolved final-state partons i and j It is thus possible to try different final-or initial-state particles as spectator k and choose the one which minimises the quantity given in the right-hand side of eq. (2.19). In case of unresolved radiation i which is associated with the beam, we have the freedom not only to choose a finalor initial-state spectator k but also the beam particle a to be clustered with. In 2 → 1 clustering prescriptions additional radiation too close to the beam is usually just omitted from the event without altering the other final state momenta. We can quantify the difference between the two schemes by studying the difference in the resulting final states For given unresolved radiation i we can choose the beam particle a together with the initialor final-state spectator k to minimise the quantity given in the right-hand side of eq. (2.20). We stress that regarding the computing time of the jet algorithm, the additional loop over all possible spectator particles in eq. (2.19) and eq. (2.20) could be avoided by sticking to simpler criteria for choosing the spectator. As noted above, the choice of the spectator is arbitrary and one could also choose the same (non-soft/non-collinear) particle (e.g. from the beam). The actual choice of the spectator has no influence on the validity of the presented algorithm. We use the specific setup to minimise the effects with respect to commonly used 2 → 1-algorithms.
While the modified clustering is mainly introduced to address the aforementioned problems it has been argued in ref. [24] that it might also provide a cleaner separation of perturbative and non-perturbative effects. The jet mass produced applying the momentum sum has in general very little to do with the observed jet masses which are mostly due to non-perturbative effects. Since the modified clustering is a crucial part we shall discuss it in more detail. As a concrete example we illustrate in the following the case that two final-state partons i and j are clustered with a final-state spectator k.
As a technical detail we add that we have used in the current analysis a slightly different parameterisation of the unresolved phase space measure compared to ref. [24]. This leads to an improved convergence of the numerical integration over the unresolved phase space in the context of the so-called Phase Space Slicing Method [50,51] which we adopt to regularise soft and collinear divergences. The details are given in appendix A. In case the spectator k (with mass m k ) is chosen from the final state we may factorise the phase space measure using the parameterisation as described in detail in refs. [42,43] within the context of the dipole subtraction method. For example, for two massive partons i and j with masses m i and m j the parameterisation reads [43, (5.11)]: where Φ denotes the collection of variables used to parameterise the 'unresolved' phase space.
As unresolved regions we define the phase space regions in which two partons cannot be resolved any longer (according to our jet resolution criteria) as separate partons and are thus clustered to form a single jet. The n-body phase space measure is defined as usual by: The mapping p 1 , . . . , p n , p n+1 → J 1 , . . . , J n , Φ, (2.23) induces a clustering prescription (p k and J k denote the momenta of the spectator before and after the clustering). For eq. (2.21) the mapping is given explicitly in the appendix (see eq. (A.7)). As has been shown in ref. [24] the mapping is invertible. Given the jet momenta J 1 , . . . , J n it is thus straightforward to identify the unresolved phase space regions leading to the recombined jets J 1 , . . . , J n (see also appendix A). Identifying in eq. (2.21) the phase space of the n recombined jets as the n-body phase space used in the Born and virtual contribution in eq. (2.9), the integration of the unresolved contributions of the real corrections can be simplified allowing a direct integration: The variables z 1 , . . . , z t are expressed in terms of the jet momenta J 1 , . . . , J n (or equivalently the variables y = (y 1 , . . . , y s )) and the variables in Φ used to parameterise the unresolved phase space. We may define the last part of eq. (2.25) as the real part of a differential jet cross section at NLO accuracy Combining this contribution (a similar factorisation holds for all unresolved contributions) with the virtual corrections and the Born contribution allows us to define a differential jet cross section at NLO accuracy (cf. eq. (2.9)) .
Note that the formalism presented here allows as an important application the generation of unweighted jet events following the NLO cross section.
To end this section let us add a comment on the use of non-standard jet algorithms. Although the modified jet clustering as advocated here may have some theoretical advantages, for example potentially providing a clearer separation of perturbative and non-perturbative aspects, a significant amount of work is required from the experimental side before the modified algorithms can be used in praxis. Tuning as well as studies of experimental uncertainties need to be redone using the modified setting. Obviously, this effort is only justified if an increased precision can be reached in the end. One purpose of this article is therefore to explore the reachable accuracy. We also note-as will be shown in a future publication-that by restricting the kinematic variables used in the analysis, the modification of the jet algorithm can be avoided.

Event generation with NLO event weights
In this section we use the differential jet cross section from eq. (2.27) as an event weight to generate unweighted jet events which are distributed strictly according to the fixed-order NLO cross section. Note that the possibility to generate unweighted jet-events following the NLO cross section is a substantial progress compared to established methods. As concrete example processes we study the s-and t-channel single top-quark production at the LHC. After a brief review of the calculational setup we define exclusive (cf. eq. (2.17)) and inclusive events (cf. eq. (2.18)) by specifying jet observables to characterise the events. The construction (and implementation) of the differential jet cross section (eq. (2.27)) including full NLO effects is a non-trivial, errorprone task. Hence, we start by thoroughly scrutinising the phase space factorisations entering eq. (2.27) as well as the consistency of the Phase Space Slicing Method used to regulate the IR divergences of the differential NLO calculation. We then show that the generated events are indeed distributed according to the NLO cross section.

Setup
For this work the NLO corrections for single top s-and t-channel production employing the Phase Space Slicing Method to subtract the infrared divergences are taken from ref. [36].
If not stated otherwise, we use the following setting throughout this work: All calculations are done for proton-proton collisions at the LHC with a centre-of-mass energy of √ s = 13 TeV. For the parton distribution functions (PDFs) we use the PDF set 'MSTW2008nnlo68cl' [52]. To be consistent with the PDF evolution the QCD coupling constant α s is taken from the PDF set. The electromagnetic fine structure constant is set to Note that α(m Z ) appears as an overall factor which cancels in cross section ratios or can be easily adjusted to other values in the cross section predictions. For the masses of the electroweak gauge bosons we use m Z = 91.1876 GeV and m W = 80.385 GeV.
For the weak mixing angle we use the on-shell value defined through The top-quark mass renormalised in the pole-mass scheme is set to For the renormalisation (factorisation) scale µ R (µ F ) we choose a dynamical scale. As central scale choice we set where the transverse energy is defined as E ⊥ = E sin(θ). The sum in eq. (3.5) runs over all resolved final state jets (cf. eq. (2.17) or eq. (2.18)). The jets are defined according to the jet algorithm as described in section 2.3 (with R = 1 as a common setting for the exclusive formulation (see e.g. ref. [48])). For the experimental cuts on resolved final-state objects we set (cf. eq. (2.16)) We assume that the detector is blind outside these cuts.

Phase Space Slicing parameter dependence
Virtual and real contributions in eq. (2.9) are individually IR divergent. Only their sum is finite for infrared-safe observables as guaranteed by the Kinoshita-Lee-Nauenberg theorem. In     practical calculations, we thus need to regularise the related divergences and cancel them when combining real and virtual corrections. Following ref. [36] we employ the Phase Space Slicing Method (see also refs. [50,51]) to achieve this cancellation. Within the Phase Space Slicing Method the phase space for the real corrections is split into regions where the squared matrix elements contains soft and collinear divergences and the remaining phase space. The latter can be integrated numerically in 4 dimensions. Using soft and collinear approximations the unresolved contributions can be analytically integrated in a process-independent manner and the singularities can be cancelled with the ones in the virtual corrections. Note that the separation of the phase space into 'singular' regions and 'finite' regions should not interfere with the jet clustering. We employ the so-called one cut-off Phase Space Slicing Method method in the form as presented in ref. [36] with the slicing parameter s min , controlling the separation into singular and finite phase space regions. Since the slicing is to some extent arbitrary the result must be independent of s min . However, because of the soft and collinear approximations applied in the singular regions a systematic error is introduced and the result is no longer independent of the slicing parameter. Since the error scales with s min it can be neglected for sufficiently small s min .
On the other hand, since singular regions and finite regions both individually depend logarithmically on s min -only in the sum we find the aforementioned linear dependence-the statistical error of the numerical Monte-Carlo integration grows with smaller s min and it takes more time to perform the numerical integration. An important test within the Phase Space Slicing Method is thus to numerically establish the approximate independence of the final results of the choice of s min for sufficiently small values and to find a compromise between statistical and systematic uncertainties.
Because s-channel and t-channel largely differ in size, we perform this analysis for each channel separately since otherwise potential problems in the s-channel could hide because of the large t-channel contribution. The fact that the s-channel gives only a small contribution to single top-quark production which is difficult to establish experimentally makes the application of the Matrix Element Method to the s-channel also particularly interesting. The Matrix Element Method might help in separating the s-channel contribution from t-channel production. suggests to choose a value for the slicing parameter well below 1000 GeV 2 for the s-channel and 100 GeV 2 for the t-channel. As a compromise between statistical and systematic uncertainties we choose s min = 5 GeV 2 for the s-channel and s min = 1 GeV 2 for the t-channel. Concerning the total cross sections this seems rather small given the results shown in fig. 3.1. However, differential distributions which we study next, may introduce additional scales and may be more sensitive to s min . In this context 5 GeV 2 and 1 GeV 2 seem to be a good choice since they should be well below all relevant physical scales and are still large enough to prevent large statistical fluctuations in the numerical integration. Fig. 3.2 shows distributions of the top-tagged jet and the light jet for three different values of s min . In the plots we show results obtained with the nominal values s min = 5 GeV 2 (s-channel) and s min = 1 GeV 2 (t-channel) as discussed above and results obtained for lower values of s min = 0.01 GeV 2 (respectively s min = 0.001 GeV 2 ) and higher values of s min = 100 GeV 2 (respectively s min = 50 GeV 2 ). At the bottom of each plot the difference of the results obtained with different s min are shown normalised to the statistical uncertainty ('pull'). As we can see from fig. 3.2 (and various other distributions which we also checked but do not show here) any s min dependence of the differential distributions is at most of the order of the statistical uncertainties and thus negligible. While this conclusion is already obvious from the distribution of the pull in the lower plots, we also calculated, as a more quantitative measure, the corresponding p-value and the reduced χ 2 for the comparison of the histograms as described in ref. [53] and implemented in refs. [54,55,56]. From the results we conclude that within the uncertainties the distributions obtained for different s min agree with each other. The chosen s min values are thus sufficiently small also for differential cross sections. Note that for the aforementioned studies we used the factorised phase space parameterisation as described in section 2.3. Since this involves different parameterisations in different (singular) phase space regions, the approximate slicing-parameter independence of the differential distributions also serves as a first consistency check of their implementation.

Validation of the phase space parameterisation
As mentioned above depending on which partons are clustered and how the spectator is chosen different phase space parameterisations are used in different phase space regions in case of the real corrections. The implementation involves some combinatorics and the coding of the formulas as presented in the appendix A and in ref. [24]. To validate the phase space parameterisation we reproduce various distributions of n-jet observables at NLO accuracy that have been calculated with a conventional parton level Monte-Carlo generator in combination with the 3 → 2 jet algorithm. Using the factorised phase space parameterisation any distribution of an n-jet observable O J 1 , . . . , J n can also be calculated at NLO by integrating the differential jet cross section as defined in eq. (2.27) over the n jet momenta: Comparing the two approaches allows us to check the implementation of the phase parameterisations. In addition, the identification of the n jet momenta given by the n-body phase space in the factorised phase space in eq. (3.7) with the momenta of the jets obtained from the corresponding 3 → 2 jet algorithm is checked. As examples, we study the energy distribution of the top-tagged jet for the s-channel and the polar angle distribution of the light jet for the t-channel in fig. 3  at the bottom of each plot. In addition, we show again p-value and normalised χ 2 as introduced in the previous section. Since the distributions have not been normalised the cross check of the differential cross sections also serves as a validation of the fiducial cross sections. We study the exclusive case since we are especially interested in the case when real radiation is unresolved and there is a non-trivial mapping from partonic to jet momenta. (The contribution from events with an additional jet would be calculated according to the last line of eq. (2.9) and does not involve any technical or conceptual problems.) Inspecting the plots in fig. 3.3 (and various other distributions which we also checked but do not show here) we find perfect agreement within the statistical uncertainties between the two approaches. The results for the p-values and the reduced χ 2 further support this interpretation.

Generation of unweighted single top-quark events distributed according to the NLO cross section
Interpreting the differential jet cross section as an event weight allows to use eq. (2.27) to generate unweighted jet events following the NLO cross sections. As described in ref. [24] we use a simple Acceptance-Rejection algorithm to unweight the NLO events. To validate this procedure, we first generate unweighted events and recalculate distributions of n-jet observables that have been calculated in the previous section using the conventional parton level Monte-Carlo.

Exclusive event definition
Having in mind the extraction of the top-quark mass with the Matrix Element Method from our generated events, we have to specify an event definition that does not make any assumption on the top-quark mass. For example, one may use energies and angles to define the measured events in our pseudo experiment. Again, we use the 3 → 2 clustering prescription introduced in the previous section. Because in current experiments a 2 → 1 recombination is used, we study first the impact of the modified recombination procedure. Since the difference in the results using different recombination procedures is an NLO effect, this study provides also valuable information to minimise the impact of NLO corrections by choosing a sensible event definition. We study first the exclusive case where we demand precisely one additional light jet and veto events with more than one light jet.

Impact of the new jet clustering
The Matrix Element Method at NLO, as described in ref. [24], is only sensitive to normalised distributions (shapes) but not to the total number of measured events. Obviously, the distributions are affected by the jet algorithm. In fig. 3.4        differ up to 50% in bins near the kinematical threshold (cf. fig. 3.4). This is easily understood by the fact that the 3 → 2 prescription strictly keeps the jets on their mass-shell while the masses of the 2 → 1 clustered jets can differ severely from the masses of the parent partons. This is studied in detail in fig. 3.6, where we show the mass distribution of the jet containing the top quark and the light jet for the 2 → 1 clustering. In the 3 → 2 clustering the distribution would be proportional to delta functions: δ(J 2 t − m 2 t ) in case of the top jet and δ(J 2 j ) in case of the light jet. In particular, at the phase space boundaries this can lead to significant differences, explaining the large effects in the threshold region. In addition, the 3 → 2 clustering always maintains exact four momentum conservation and in particular transverse momentum balance also in the case of an unresolved jet. 4 The angular distributions do not have a pronounced mass dependence and as a consequence the corresponding differential distributions show only minor differences of a few percent in all bins (cf. fig. 3.5). It is worth mentioning that the aforementioned differences are formally due to higher orders in perturbation theory. The large differences in specific phase space regions may also signal that the NLO corrections are large in these regions and fixed order perturbation theory might become unreliable. Defining the events in terms of variables with only a weak dependence on the clustering prescription may thus also help in improving the reliability of the perturbative description of measured observables. 5 To define an exclusive single top-quark event in pp → t j, we use the pseudo rapidity of the measured top-quark jet (t) and the energy, pseudo rapidity and azimuthal angle of the light jet ( j). Since the decay of the top quark is not considered in the calculation presented here, we use only the rapidity of the top quark and information related to the light jet. From fig. 3.4  and fig. 3.5 (and also fig. B.1 to fig. B.4) we conclude that for these variables the impact of the modified clustering prescription is small. The vector x introduced in the previous section, collecting the hadronic variables, is thus given by x = (η t , E j , η j , φ j ). In terms of these variables the two (resolved) jet momenta read: (3.8a) Note that the parameterisation of the jet momenta depends on the top-quark mass which occurs as a free parameter. Using Note that using the P-scheme where the recombined momentum is defined as the sum of the 4-momenta of the recombined particles 4-momentum conservation is also guaranteed in the recombination. However, particles close to the beam are simply dropped and not recombined. In this case the transverse moment is no longer balanced. 5 In all applications usually some reduction on the used variables occurs: Either because some variables are in principle unmeasurable (e. g. neutrino variables) or because the experimental accuracy is not good. The definition of events in terms of four-momenta is then accomplished from the variables by imposing certain kinematics (cf. below).  and dx a dx b dR 2 (J t , J j ) the event weight including NLO corrections reads: . Again we have also checked various other distributions which we do not show here. In all cases we find perfect agreement between the two approaches. We can thus conclude that the generation of unweighted events with NLO accuracy is successfully validated.

Inclusive event definition
In this section we use an inclusive event definition. We study pp → t jX without a veto on additional jet emission. We note that in the contributions with an additional resolved jet no clustering takes place at NLO accuracy. Taking these contributions into account can therefore only improve the impact of the 3 → 2 clustering studied in section 3.2.1 since the relative weight of clustered events is reduced. Hence, we use the same set of variables as before: the pseudo rapidity of the top-quark jet (t) and the energy, pseudo rapidity and azimuthal angle of the hardest light jet ( j): x = (η t , E j , η j , φ j ). For events with only one top-tagged jet and one light jet the same parameterisation as described in the previous section can be used. In case that an additional jet is resolved (denoted by J X , J j denotes the hardest light jet) we use   with Since we are inclusive in the additional jet the related variables have to be integrated over the allowed range: The boundaries follow from the definition of the inclusive event sample. For the inclusive event definition we choose the central renormalisation scale µ R and factorisation scale µ F on an eventby-event basis as the total transverse energy Using again eq. (3.9) and the 3-jet contribution to the NLO jet event weight can be obtained from The NLO jet event weight for the inclusive event sample is the sum of the exclusive 2-jet contribution from eq. (3.11) and the 3-jet contribution defined above Again we validate the NLO event generation by comparing differential distributions calculated using unweighted events with the distributions determined through a conventional Monte-Carlo integration. Fig. 3.8 shows results of the two approaches for s-and t-channel single top-quark production. As one can see from fig. 3.8 the two results are in perfect agreement with each other. We have also scrutinised the other distributions of the observables in the inclusive event definition which we do not show here. Again, we find perfect agreement between the two approaches. We thus conclude that the generation of unweighted t jX events including NLO corrections works.

Application: MEM at NLO for single top-quark production
In this section we apply the Matrix Element Method including NLO corrections to a hadron collider process. As an application, we analyse the potential to measure the top-quark mass using single top-quark events. To do so, we interpret the unweighted events generated in the previous section as pseudo data and use the Matrix Element Method to extract the top-quark mass. Note that in the unweighting, the NLO event weight has been used. The simulated event sample thus incorporates the NLO corrections.

Impact of NLO corrections
Before applying the Matrix Element Method to simulate a top-quark mass measurement it is instructive to investigate the size of the NLO corrections. Tab. 1 summarises the fiducial Born and NLO cross sections (with and without additional jet veto) for the s-and t-channel production of a single top quark in association with a light jet. From Tab. 1 we see that the fiducial s-         clude that the scale variation of the leading-order results does not offer a reliable estimate for the theoretical uncertainty due to missing higher-order corrections in general. As can be seen from fig. 4.3, not vetoing the additional resolved jet results in somewhat flatter k-factors in regions where most of the events are expected; except for the t-channel pseudo rapidity distribution of the light jet which receives NLO corrections of up to +30% in the central region. When using the extended likelihood in the Matrix Element Method the unnormalised distributions are relevant. For completeness they are given in fig. B.7 to fig. B.8 in appendix B.2.

Top-quark mass extraction
The NLO jet event weights defined in eq. (3.11) and eq. (3.16) can be used to set up likelihood functions as defined in eq. (2.6) and extended likelihood functions as defined in eq. (2.7) at NLO accuracy. Interpreting the unweighted events generated in section 3.2.1 and 3.2.2 as pseudo data, we now apply the Matrix Element Method to them. To investigate the impact of the additional 3-jet events on the inclusive analysis we choose the relative sizes of the even samples to reflect the fiducial cross sections (cf. table 1): N excl. .   In addition, we estimate the uncertainty of the determined mass value. Repeating the mass determination using µ = 2µ 0 and µ = µ 0 /2 a measure for the theoretical uncertainty is calculated from the difference with respect to the result for µ = µ 0 . The shifts for µ = 2µ 0 (µ = µ 0 /2) are indicated in the legend as superscripts ∆2µ 0 (subscripts ∆µ 0 /2). The scale uncertainty represents a limiting factor on the reachable accuracy. In addition, we quote the statistical uncertainty of the estimator ∆ m t determined from the fit. The results are illustrated as data points in the lower part of the plots with the uncertainties due to the scale variation shown as thick error bars and the statistical uncertainties shown as thin error bars. While for the t-channel the mass values extracted using LO, respectively NLO predictions are marginally consistent (taking into account the large scale uncertainty), we find a significant difference of about −8.3% in case of the s-channel, which is not covered by the scale variation.  As mentioned before the Maximum Likelihood estimator is prone to introduce a bias in case the data is not well described by the assumed probability distribution. To make sure that this indeed the origin of the aforementioned discrepancy, we investigate the consistency of the procedure by splitting the data sample into 20 subsamples, treating each subsample as an individual experiment. Binning the results based on the 20 subsamples allows to check whether the bias is of systematic origin as suggested before or due to some statistical outliers. The result for the s-channel with 637 events per subsample is shown in fig. 4.6. As expected, including the NLO corrections in the evaluation of the event weights, fig. 4.6 shows that no bias is introduced in the estimator. In contrast, employing only the Born approximation in the calculation of the event weights, a significant bias is introduced. Fig. 4.6 illustrates that the shift observed using LO predictions is indeed a consequence of assuming the 'wrong' probability distribution for the event sample and not due to a statistical fluctuation, since the results of the 20 pseudo experiments give a consistent picture. We stress that this shift per se does not preclude the application of the Matrix Element Method in LO: the shift can be accounted for by an additional calibration procedure. However, since this calibration introduces additional uncertainties it is preferable make the assumed probability distribution as accurate as possible. While the occurrence of the mass shift using LO or NLO predictions is not surprising we stress that the size of the effect was never precisely quantified before. Furthermore, as mentioned above estimating the leading-order theory uncertainties using scale variation does not provide a reliable estimate of higher order effects. While the calibration may reproduce the true mass value, the uncertainty estimates would be unreliable. In particular, compared to the NLO result the uncertainties could be underestimated.
We shall also comment on the scale uncertainty observed using LO and NLO predictions. While in the t-channel the uncertainty due to the scale variation is reduced when going from LO to NLO predictions, this is not the case for the s-channel. Varying the scale in the likelihood used to extract the top-quark mass results in a shift of the NLO estimator between −2.2% and +3.3% while the Born estimator varies between −1.6% and +1.5%. For the s-channel, the related uncertainties thus increase when going from LO to NLO, in contrast to the naive expectation. However, the LO predictions are due to electroweak interaction and the scale enters only through the factorisation scale dependence of the parton distribution functions. Furthermore, since the event weight is calculated from normalised distribution, this dependence cancels to some extent in the ratio. As a consequence, the scale variation of the leading-order result does not provide a reliable method to estimate the effect of higher-order corrections. It is thus not surprising that the picture we observe is different from the naive expectation and the uncertainties based on the LO scale variation underestimates the true uncertainty. We also point out that for s-channel and t-channel production the scale uncertainties dominate. The statistical uncertainties amount to 0.9 GeV for s-channel and 1.6 GeV for the t-channel (in the exclusive s-channel (t-channel) analysis the event sample contains only 12755 (24088) events).

Inclusive event definition
Note that vetoing additional resolved jets requires phase space cuts on the additional radiation. These cuts introduce additional scales into the NLO calculation leading to potentially uncancelled logarithms which might spoil the convergence of perturbation theory. In the inclusive event definition this veto is dropped. Fig. 4.7 shows the corresponding analysis using the inclusive event definition. In difference to the exclusive case discussed before, we observe for both production channels an improvement of the scale uncertainties when going from LO to NLO. For the t-channel, the uncertainties are almost reduced by a factor of ten. In case of the s-channel the improvement is less impressive. The scale variation of −1.7% and +1.4% for the Born estimator is reduced to −0.9% and +1.3% in case the NLO likelihood is used. We also note that for the inclusive event definition the difference of −1.3% between the central values in LO and NLO is now well covered by the scale variation. For both production channels, we find that the extracted top-quark mass is consistent with the true value used in the event generation. While in the s-channel the scale uncertainty dominates, the scale uncertainty in the t-channel amounts to less than 1 GeV uncertainty of the extracted top-quark mass and the total uncertainty is dominated by the statistical uncertainty. The latter could be reduced using a larger event sample. We also note that the increased sample size of the inclusive events with respect to the exclusive event sample (cf. eq. (4.1)) does not result in improvements of the relative uncertainties of the analyses. As presented further down in eq. (4.2) and eq. (4.3), the inclusive cross section is slightly less sensitive to the top-quark mass which compensates the statistical gain of the larger sample sizes.

Extended likelihood
In the previous section we argued that with extended likelihoods the information contained in the total event number may be used to improve the parameter determination. Fig. 4.8 ( Fig. 4.9) shows the extended log-likelihoods for s-channel and t-channel production using the exclusive (inclusive) event definition. As before we observe that using the NLO likelihood correctly reproduces the true mass value used in the event generation. Using the extended Born likelihood the values of the estimators are driven by two competing effects: Reproducing both the NLO distributions and the NLO fiducial cross sections using only Born predictions. When comparing the exclusive analyses presented in fig. 4.5 and fig. 4.7 with the fiducial cross sections given in table 1, the analyses based on extended likelihoods show that the impact of the expected total event numbers is dominating this compromise: In the exclusive case the fiducial s-channel cross section receives only a small NLO correction which leaves little room for a shift of the top-quark mass. On the other hand, the exclusive fiducial t-channel cross section receives a large negative NLO correction. Hence, the estimator extracted with the extended Born likelihood gets pushed to higher top-quark mass values in order to reduce the Born cross section accordingly. In the inclusive analysis the 3-jet contribution significantly increases the NLO correction to the fiducial s-channel cross section. As a result, the Born estimator is pushed to smaller top-quark mass values to open up more phase space for the Born cross section. In the inclusive t-channel the 3-jet contribution compensates the large negative exclusive NLO correction leaving only little room for a shift in the Born estimator. The shifts of the NLO and Born estimators due to scale variation in the extended likelihoods largely follow the same pattern deduced from table 1 which overall leads to a considerable reduction of the theoretical uncertainty estimates when taking NLO corrections into account. Concentrating on the statistical uncertainties, we observe a significant improvement in case the extended likelihood is used: The uncertainty is roughly reduced by a factor of two. The relative uncertainty fits nicely with the naive expectation based on the assumption that the cross section will be measured with an relative uncertainty of the order 1/ √ N where N denotes the total event number: In fig. 4.10 we show the dependence of the fiducial exclusive and inclusive cross sections calculated including NLO corrections and infer the top-quark mass sensitivities by approximating the dependence of the cross section on the top-quark mass by a polynomial as presented in refs. [38,57].
To estimate the impact of the information contained in the event sample sizes (N excl. and N incl. ) we form weighted averages of the results given in eq. (4.4) and eq. (4.5) and the results of the analyses presented in fig. 4.5 and fig. 4.7 (which are only sensitive to normalised differential cross sections). We find relative uncertainties for the respective weighted averages of ∆m t /m t = ±0.23% (±0.40%) in case of the exclusive s-channel (t-channel) and ∆m t /m t = ±0.21% (±0.37%) in case of the inclusive s-channel (t-channel). These improved top-quark mass sensitivities achieved by this simple combination match the relative uncertainties of the extended likelihood analysis presented in fig. 4.8 and fig. 4.9. As mentioned before this gain in sensitivity should be compared with the additional systematic uncertainty due to the imperfect knowledge of the integrated luminosity. To estimate the impact of the uncertainty of the luminosity, we have repeated the analysis varying the luminosity by ±∆L/L ≈ ±2% (see e.g. ref. [58]) . We identify the observed shifts of the extracted top-quark masses of about ±1 GeV (±2 GeV) for the s-channel (t-channel) as an additional systematic uncertainty. Because of this additional uncertainty the extended likelihood does not lead to a more precise measurement unless the uncertainty of the integrated luminosity is significantly reduced.
The results from the various analyses shown in fig. 4.5 to fig. 4.9 are summarised in table 2 and  table 3. As mentioned before, using the NLO likelihood always reproduces the true mass value m true t = 173.2 GeV used in the event generation. Using instead the LO likelihood to extract the top-quark mass can lead to significant deviations from the true value. These deviations tend to be smaller for the inclusive event definition. In most cases the inclusion of the NLO corrections reduces the uncertainty related to the scale variation. However, as mentioned before, one should keep in mind that the LO prediction is a purely electroweak process and the variation of the factorisation scale alone does not give a reliable estimate of higher-order corrections. This is also reflected in the observation that for the s-channel the mass values extracted from the exclusive event sample using LO and NLO predictions do not agree within the scale uncertainty.  In fig. 4.11 we compare the sensitivity to the top-quark mass for s-and t-channel single topquark production for a fixed number of events. The s-channel shows a higher mass sensitivity leading as far as the statistical uncertainties are concerned to a more precise mass measurement assuming the same number of events. Because of the higher mass sensitivity of the s-channel, the log-likelihood is significantly narrower than the log-likelihood for the t-channel leading to a reduction of the statistical uncertainty by roughly a factor 2.3 compared to the t-channel. This factor is also consistent with the results presented for the fiducial cross sections in eq. (4.2). In practice, the higher sensitivity of the s-channel is compensated by the 20 times larger event rate of the t-channel. It is thus interesting to study the impact of the s-channel on a mass measurement assuming a realistic mixture of s-and t-channel events.

Conclusion
In this article we have presented the first application of the MEM at NLO to the hadronic production of jets using the example of single top-quark production at the LHC.
Modifying the recombination prescription of conventional 2 → 1 jet algorithms by using a 3 → 2 clustering inspired by the Catani-Seymour dipole formalism allows to factorise the real phase space such that the unresolved phase space regions can be efficiently integrated numerically. Because of the 3 → 2 clustering the clustered jet momenta satisfy four-momentum conservation and the on-shell conditions. The momenta of the resolved jets can thus be used in the virtual matrix element enabling a point-wise cancellation of IR divergences in combination with the real contribution-assuming an IR regularisation scheme has been used in intermediate steps.
It is thus possible to define a fully differential event weight for jets instead of partons. As an important application the event weight can be used to generate unweighted jet events distributed according to the NLO cross section. It is also possible to attribute NLO weights to jet events observed in experiments-as needed for the MEM at NLO.
As proof of concept, we have generated events following the NLO cross section for the hadronic s-and t-channel production of a single top quark in association with a light jet (with an optional veto on additional jet activity). We have checked that the unweighted events reproduce the differential distributions obtained using a conventional parton level Monte-Carlo integration. As a further consistency check, the top-quark mass has been successfully reproduced from the event samples using the MEM at NLO. When extracting the top-quark mass using Born matrix elements in the MEM we find significant biases in the estimators. In particular, we observe that in general the shift is not covered by the theoretical uncertainties estimated using scale variation. While the shift could be taken into account through a calibration procedure most likely the theoretical uncertainties would be underestimated relying on LO results only. Using the Matrix Element Method at NLO accuracy might thus help in reducing the required calibration (together with the associated uncertainty) and providing a reliable estimate of the theoretical uncertainty. Furthermore, when using the Matrix Element Method at NLO accuracy the renormalisation scheme of the extracted parameters is uniquely defined.
We have applied the general algorithm presented in ref. [24] which relies on a modification of the clustering prescription to cover the most general event definitions in terms of any observables that can be constructed from the 4-momenta of the observed jets.
We stress that the analyses presented in this paper only present the starting point for the MEM at NLO machinery. An important aspect of this article is to serve as a proof of concept and illustrate the potential gain of using then Matrix Element Method at NLO accuracy. Further studies are required to describe more realistic final states: While the inclusion of the top-quark decay in the calculation of the likelihood should be conceptually straightforward more work is needed to incorporate also the effect of a parton shower. (The possibility to combine the approach pursued in refs. [19,20] with the MEM at NLO framework presented here could for example be investigated.) For more complicated processes the Phase Space Slicing Method used in this article to cancel the IR singularities, should be replaced by a subtraction type method. As already stated in section 2.1, considering non-trivial transfer functions in the framework of the Matrix Element Method at NLO accuracy might require careful studies and dedicated research on its own. With all that in mind, let us point out that the performance of the Matrix Element Method at NLO and LO accuracy should ultimately be compared in the application to events that mimic realistic outcomes of experiments as closely as possible to get the best estimate on what would happen with real data. Nevertheless, despite some limitations the studies presented in this work illustrate that missing higher-order corrections in the theoretical predictions might not only introduce biases in the extracted estimators but may also render the inferred systematic uncertainty estimates unreliable. Under the assumption that fixed-order NLO calculations give a better approximation of real data than LO ones, we expect these effects to play a role in the experimental application of the Matrix Element Method as well. For future studies it will be worthwhile to repeat the corresponding analyses with events obtained from an independent, state-of-the-art NLO event generator with a parton shower that can be tuned to data like e.g. POWHEG (see ref. [59]) or MG5 aMC@NLO (see ref. [60]). This will allow to further quantify whether effects of similar size to the above findings are also to be expected in parameter extractions when the method is applied to real data.
When employing the Phase Space Slicing Method it is useful to express the integration via the invariants The Källén function λ is defined as The integration boundaries read with the relative velocities between p i + p j and p i or p k given by .
The phase space parameterisation corresponds to the following clustering of n + 1 partons to n jets which fulfil momentum conservation ( n i=1 J i = P) and the on-shell conditions (J 2 j = m 2 i j , J 2 l = m 2 l for l j).
The inversion of the clustering is the same as in ref. [24] with y and z as defined above.
In a 2 → 1 clustering the clustering of two partons would be achieved by summing the two momenta while leaving all other momenta untouched. The difference between the 3 → 2 clustering and the 2 → 1 clustering as defined in eq. (2.19) can be expressed as For a given unresolved final state pair i, j the spectator k might be chosen such that it minimises ||J j − (p i + p j )|| given above.

A.2. Final-state clustering with initial-state spectator
The phase space of n + 1 massive partons can be expressed as a phase space of n particles convoluted with the dipole phase space dR i j,a : dR n+1 p a + p b , p 1 , . . . , p n−1 , p j , p i = dR n xp a + p b , J 1 , . . . , J n−1 , J j dR i j,a . (A.9) Using the Phase Space Slicing Method it is again useful to use a slightly different parameterisation compared to what has been used in ref. [24]. We use the invariants The phase space measure (after convolution with the PDFs and the flux factor) is given by (A.13) The integration boundaries read The phase space parameterisation corresponds to the clustering of n + 1 partons to n jets which fulfils momentum conservation ( n i=1 J i = xp a + p b ) and the on-shell conditions (J 2 j = m 2 i j and J 2 l = m 2 l for l j.). The inversion of the clustering is the same as in ref. [24] with x and z as defined above.
From eq. (A.14) the difference between the 3 → 2 clustering and the 2 → 1 clustering defined in eq. (2.19) follows as For a given unresolved final state pair i, j the spectator a might be chosen such that it minimises ||J j − (p i + p j )|| given above.
If both final state clusterings are possible the spectator should be chosen either from the final or the initial state according to the minimum of ||J j − (p i + p j )|| given in eq. (A.8) and eq. (A.16).

A.3. Initial-state clustering with final-state spectator
The phase space of n massive and one massless parton can be expressed as a convolution of the phase space of n massive jets and the dipole phase space dR ia,k for the emission of an additional massless parton from the initial state with a massive final-state spectator. All statements from section A.2 can be carried over by the replacements a → k and j → a, m i → 0 and m i j → m k (see also ref. [24]).
The phase space parameterisation corresponds to the clustering of n + 1 partons to n jets which fulfils momentum conservation ( n i=1 J i = xp a + p b ) and the on-shell conditions (J 2 l = m 2 l , l = 1, . . . , n).
In a 2 → 1 clustering the unresolved additional radiation associated with the beam would be removed from the list of momenta without changing the other final-state momenta. So the difference between the 3 → 2 clustering defined in eq. (2.20) and the 2 → 1 clustering can be expressed as (see eq. (A.17)) For a given unresolved final state parton i the beam particle a and the spectator k might be chosen such that ||J k − p k || given above is minimised.

A.4. Initial-state clustering with initial-state spectator
In case of initial-state clustering with an initial-state spectator the phase space can again be written as a convolution : When employing a Phase Space Slicing Method it is useful to express the integration after convolution with the PDFs and the flux factor via the invariants and v = s ia Q 2 + s ia + s ib . (A.22) The phase space measure (after convolution with the PDFs and the flux factor) is given by 2x A x B s dR n+1 p a + p b , p 1 , . . . , p n−1 , p j , p i = dx A dx B f a x A x f b (x B ) 2x A x B s dR n (xp a + p b , J 1 , . . . , J n ) Q 2 32π 3 (Q 2 + s ia + s ib ) 2 × dφ ds ia ds ib Θ (φ (2π − φ)) Θ s ia s + ia − s ia Θ s ik s + ik − s ik . (A.23) The integration boundaries read The phase space parameterisation corresponds to the following clustering of n+1 (massless/massive) partons to n (massless/massive) jets: with K = p a + p b − p i , K = xp a + p b , (A. 25) and the Lorentz boost transforming K into K given by The inverse boost is obtained by exchanging K and K. All outgoing momenta p i are transformed to balance the transverse momentum. Momentum conservation ( n i=1 J i = xp a + p b ) and on-shell conditions (J 2 l = m 2 l , l = 1, . . . , n) are not affected by the boost. The inversion of the clustering is the same as in ref. [24] with x and v as defined above.
The sum of the jet momenta in this 3 → 2 clustering is given by For a given unresolved final state parton i the beam particle a and the spectator b might be chosen such that || J m − p m || given above is minimised.
If both initial-state clusterings are possible the beam particle and the spectator (either from the final or the initial state) should be chosen according to the minimum of the quantities given by eq.