Automation of one-loop QCD corrections

We present the complete automation of the computation of one-loop QCD corrections, including UV renormalization, to an arbitrary scattering process in the Standard Model. This is achieved by embedding the OPP integrand reduction technique, as implemented in CutTools, into the MadGraph framework. By interfacing the tool so constructed, which we dub MadLoop, with MadFKS, the fully automatic computation of any infrared-safe observable at the next-to-leading order in QCD is attained. We demonstrate the flexibility and the reach of our method by calculating the production rates for a variety of processes at the 7 TeV LHC.


Introduction
The capability of improving systematically the predictions for any given observable by means of perturbative techniques has been of great importance in helping establish the Standard Model as the correct theory of electroweak and strong interactions for sub-TeV energies. Although there exist many different choices for the expansion parameter of the perturbative series, the most common one is by far that of the coupling constant, which we identify here with that of QCD, α S . A power series in α S is expected to converge well (asymptotically) for observables that are sufficiently inclusive (such as total rates), or are associated with small-probability events (such as large-angle, large-energy emissions). The computation of contributions of increasingly higher orders in α S for a given observable is analogous to, and goes hand-in-hand with, the accumulation of data in a real experiment. While the latter results in the decrease of the statistical error associated with the measurement, the former implies the reduction of the theoretical uncertainty affecting the prediction, as determined by the dependence on the unphysical mass scales that enter the computation. Precise determinations on both the theoretical and experimental sides have been essential in the success of physics programmes at particle colliders, and continue playing a very important role in pinning down potential discrepancies between predictions and observations. It should be further stressed that some discovery strategies pursued by LHC experiments (which are thus analogous, in this sense, e.g. to the single-top analysis of CDF and D0) make extensive use of theoretical predictions, whose accuracy is crucial to reduce as much as possible any theoretical bias on evidence of new physics.
The reduction of the uncertainties that affect theoretical predictions is only one of the consequences of computing higher orders in the perturbative series. In general, one expects to find corrections that are non-trivial, in the sense that they cannot possibly be obtained by simply rescaling the leading-order (LO) results by a constant. Hadroproduction processes have a particularly rich structure from this point of view, since typically at each order in perturbation theory new partonic channels open up, which bring about an involved dependence on parton density functions (PDFs). However, it is easy to realize that the vast majority of these "dynamical" effects can be approximated in an excellent manner by just keeping the contributions to the perturbative expansion due to tree-level processes, subject to suitable cuts (that prevent them from diverging when integrated over the phase space). The fact that such an approximation is actually fairly good is the main reason why these tree-level computations (which are equivalent, process-by-process, to LO results) have enjoyed an enormous success in recent years: from the technical viewpoint, they are incredibly simpler than proper higher-order computations.
It is therefore clear that the decision on whether to perform a fully-fledged higher order computation for a given observable must be taken after careful consideration of the benefits versus costs. It is obvious that an increased precision in the theoretical predictions is always beneficial. Unfortunately, most of the physics we are interested in studying at the LHC involves high-multiplicity processes for which even the first non-trivial order in perturbation theory, i.e. the next-to-leading order (NLO) one, is horrendously complicated to compute. Furthermore, owing to the presence of many different mass scales, the results will display a non-negligible theoretical uncertainty, although in general much reduced w.r.t. that of the LO predictions.
An easy way to solve the issue above is that of reducing to zero the costs of performing higher-order computations, by letting a computer do the job through a complete automation of the necessary procedures. The ultimate goal of the work presented in this paper is that of showing that this is an actual possibility, as far as NLO results are concerned. As is well known, NLO calculations can be achieved by performing two tasks of overall comparable complexity. One of them is the computation of the one-loop matrix elements. The second of them is the computation of the tree-level matrix elements, followed by their combination with the one-loop ones, their recasting into terms which are locally non-divergent, and finally their integration over the phase space. The complete automation of the latter task was presented in ref. [1]; that of the former task is the subject of this paper.
The present work has therefore two immediate aims.
• A technical aim. Namely, the complete automation of the computation of QCD oneloop corrections for any user-defined process in the Standard Model, independently of external particle identities (in particular, whether they are massless or massive). This is achieved by embedding the Ossola-Papadopolous-Pittau integrand reduction technique [2] (through its incarnation in CutTools) into the MadGraph framework. The resulting tool, which we call MadLoop, is an independent software module that, given the process, returns the finite part of the one-loop corrections (UV renormalized), and the residues of the double and single infrared poles.
• A physics aim. By making use of MadLoop and of MadFKS (the automated software developed in ref. [1]), we present results for the total cross sections for a variety of processes at the 7 TeV LHC.
We point out that the second item here demonstrates the achievement of the ultimate goal we have stated above. Namely, that one can compute fully-physical NLO results at zero cost, given that the only human interventions necessary to obtain the cross sections (and any other infrared-safe observable) are those of specifying the process, of defining the input parameters, and possibly of imposing final-state cuts where necessary.
This paper is organized as follows. In sect. 2 we present a few representative physics results. In sect. 3 we illustrate the main features of the way in which we perform our computations. The current version of MadLoop has a few limitations, which we describe in sect. 4. The steps to be taken in order to remove these limitations, and other future improvements, are reported in sect. 5. We give our conclusions in sect. 6. Appendix A shows all comparisons between the results obtained with MadLoop and those available in the literature. Appendix B reports some technical details.

Selected physics results
In this section, we present a sample of the results that one can obtain with MadFKS and MadLoop. Although one typically discusses technical details before reporting their phenomenological implications, the ordering adopted in this paper underscores our belief that, thanks to the automation achieved here, NLO QCD corrections to Standard Model processes are as trivial to calculate as LO results, and must thus be treated on equal footing with the latter. This fact has two main implications. Firstly, technicalities are independent of the process, and can be understood once and for all. Secondly, the choice of which processes to consider is essentially limited only by CPU-time considerations 1 .
It is important to keep in mind that MadFKS and MadLoop are fully independent modules, which communicate with a standardized interface as prescribed by the Binoth-Les Houches accord [3]. MadFKS has already employed several programs other than MadLoop for the computation of one-loop contributions: BlackHat [4], Rocket [5], and Samurai [6] (see e.g. ref. [7] for a recent application). Likewise, MadLoop can be called by any code compatible with the accord of ref. [3]. It is also appropriate to remind the reader that in the first paper on MadFKS (ref. [1]) numerical results were presented only for e + e − collisions. However, all the formulae necessary for dealing with hadronic collisions were laid out in that paper, and their subsequent implementation in MadFKS has not posed any problem. In this work, we present for the first time hadroproduction results obtained with MadFKS. We remind the reader that MadFKS is based on the FKS universal subtraction formalism [8,9]. The FKS method is very effective in limiting the total number of subtractions, which scale at most as n 2 , with n the number of strongly-interacting particles that enter the process. Furthermore, the method renders it particularly easy to further reduce the subtractions to perform, by efficiently exploiting the symmetries of the matrix elements in the case of identical final-state particles. In the FKS formalism one introduces three arbitrary parameters (ξ cut , δ I , and δ O , associated with soft, initial-state collinear, and finalstate collinear singularities respectively) that control the subtractions, and of which any physical observable is strictly independent. That such an independence does indeed occur is a powerful check of the correct implementation of the method, and we have extensively verified it. We have also found that in hadroproduction the same feature holds as we have documented in ref. [1] for the case of e + e − collisions. Namely, that the convergence of phase-space integration depends very mildly on ξ cut , δ I , and δ O . This is the signal that such   an integration is very robust, and implies that it is not necessary to search for the values of the arbitrary parameters introduced above that are "optimal" for physical predictions, since essentially any values will do.
Given that our goal here is not that of performing any phenomenological studies, we have considered processes that feature a fair diversity (pure QCD, with EW vector bosons, with SM Higgs, with massive/massless b quarks, and at different jet multiplicities) in order to fully test and prove the flexibility of our setup. We have limited ourselves to presenting results for total rates. For reasons of space, it is impossible to report here even a small sample of differential distributions. We remark that we have checked a few standard ones (such as transverse momenta, rapidities, and pair invariant masses), and have found that the statistics used to obtain fairly accurate results for total rates is also sufficient to get reasonably smooth distributions. This is consistent with past experience with FKS subtraction, and is ultimately due to the fact that in this formalism, for any given integration channel, the total number of kinematic configurations associated with subtraction counterterms is always equal to one, thereby reducing to a minimum the probability of mis-binning (see ref. [1]).
Physics results are simply obtained by giving in input to MadFKS and MadLoop the process type, and the QCD and EW parameters to be used in the runs. Code-writing is limited to defining the observable(s) one wants to predict, and to imposing cuts (including the appropriate jet finders). We simulate pp collisions at 7 TeV. Masses, couplings, and widths are chosen as reported in table 1, with some process-specific exceptions, to be described below. For NLO (LO) results with five light flavours, we use the MSTW2008nlo (MSTW2008lo) PDF set [10], while in the case of four light flavours we adopt MSTW2008nlo nf4 (MSTW2008lo nf4). Each of these sets is associated with a different value of α S , which we report in table 1. Jets are defined using the k T -clustering algorithm [11] (as implemented in FastJet [12]), with p (jet) T > 25 GeV and pseudo-cone size ∆R = 0.7. Renormalization and factorization scales are set equal to a common value, Since we present results for total cross sections, it is appropriate to assign a fixed (i.e., that does not depend on the kinematics) value to µ, which is process-dependent as reported in table 2. The results are therefore easily reproducible and can be used as a standard reference. In table 2, by n lf we have denoted the number of quarks whose mass is equal to zero. Thus n lf is equal to five or to four when the b quark is considered to be massless or massive, respectively. In all cases, all six quark flavours have been included in the loops.
As discussed in ref. [1], MadFKS allows one to integrate all contributions to the NLO cross section in one single computation, regardless of whether they have a real-emission or a Born-type kinematics. For the results presented here, however, we have integrated the oneloop contributions separately from the other ones (i.e., the Born and real-emission matrix elements, and the subtraction counterterms). This is because for a given phase-space point the evaluation of virtual corrections performed by MadLoop takes much longer than all the other operations carried out by MadFKS. On the other hand, no phase-space subtraction is done on virtual corrections, and therefore the numerical computations are inherently more stable than those relevant to the subtracted real-emission contributions. Hence, it turns out to be more efficient to integrate the one-loop contributions separately from all the others, using a reduced statistics (on average, about one-tenth 2 of that employed for real corrections). Even so, for the processes with the highest multiplicities considered here, the virtual corrections require more computing time than the rest of the calculation, in order to attain similar integration uncertainties. This is actually good news since, as discussed in sects. 4 and 5, the amount of optimization included in MadLoop is so far very minimal, and hence we expect to reduce the CPU-time load in a significant way in the near future.
We finally mention a few technical points relevant to phase-space integration. All contributions to the cross sections are integrated using multi-channel techniques, following the procedure outlined in ref. [1]. The sums over colours and helicities are performed explicitly (we point out that both MadFKS and MadLoop are equipped to carry out helicity sums with Monte Carlo methods, but this was simply not necessary for the processes considered here). The virtual contributions are integrated in a direct manner, i.e. no reweighting by the Born matrix elements has been performed. Further process-specific comments are given in what follows.   • In the case of process c.5, the photon has been isolated with the prescription of ref. [13], with parameters and parton-parton or parton-photon distances defined in the η, ϕ plane. The photon is also required to be hard and central: • In the case of processes a.3, a.4, a.5, and e.7, diagrams with EW vector bosons in the loops have been removed, which is necessary given that a complex-mass scheme is presently not implemented (see sect. 4). These contributions are colour-suppressed.
• In the case of processes a.3, a.4, and a.5, diagrams with s-channel W 's have been removed, in order to avoid tt-type resonances. In the narrow width approximation this is a well-defined procedure, and for consistency we thus set Γ W = 0 for these processes.
• In the case of processes a.4, a.5, c.1, c.3, and e.6, we do not apply any cuts on b quarks, which is possible since m b = 0 implies the possibility of integrating down to zero transverse momenta. This is important, firstly because it allows us to test the robustness of phase-space integration in a very demanding situation (the b quark being very light), and secondly in view of the matching of these results with parton shower Monte Carlos, where it gives the possibility of studying b-flavoured hadrons also at small transverse momenta.
All the results reported in table 2 can be computed by employing up to two hundreds machines running simultaneously for two weeks. This running time does not include that required for actually generating the codes to be run. This latter operation is not parallelized in the current versions of MadLoop and MadFKS, and one must use one machine per process 3 . For the most complicated among the processes considered in table 2, the running time of the generation phase amounts to a few hours. The uncertainties reported in table 2 are of statistical origin. In a fully-numerical approach as the one adopted here, another source of uncertainty is that associated with potential numerical instabilities. We shall discuss in sect. 3.4 the procedure adopted by MadLoop to treat such instabilities. Here, we limit ourselves to reporting that they are very rare, and that the corresponding uncertainties are completely negligible w.r.t. the statistical errors, being at least two orders of magnitude smaller than the latter.

Automation
In this section, we describe the techniques we have employed in order to obtain the oneloop contributions to the results given in sect. 2. As discussed in the introduction, the automation of the computation of one-loop amplitudes has been achieved by means of a computer program that we call MadLoop. The core of the procedure followed by MadLoop is the Ossola-Papadopolous-Pittau (OPP) reduction technique. We devote the next section to summarizing it, and will then describe MadLoop proper in more details.
Before proceeding, it is worth stressing that the very idea of automating virtual corrections would have been unthinkable without the introduction of procedures for the computations of loop tensor integrals that are alternative to the traditional ones based on analytic techniques. Although the latter, by a clever use of tensor-reduction methods [14,15,16], have helped obtain remarkable results (see e.g. refs. [17,18] for some recent ones), they do not constitute an effective starting point for automation, the main drawbacks being the need of heavy symbolic manipulations, and that of special treatments of unstable decompositions (in particular, the analytic approach obliges one to guess a priori where numerical instabilities could occur, before taking actions such as Taylor-expanding small Gram determinants). With some degree of arbitrariness, we may classify the modern procedures alluded to before into two classes, that we call Generalized Unitarity (GU) [19,20,21] and Integrand Reduction (IR) [2,22,6]. Both have obtained very significant results: so far, GU-and IR-based efforts have focused primarily on studies of large-multiplicity final states [7,23] and of massive final states [24,25] respectively. As can be seen from sect. 3.1, Integrand Reduction is a procedure independent of the identities of the particles entering in the process (i.e. if they are fermions or bosons, or if they are massless or massive). It is thus perfectly suited to our goal of performing computations in the most flexible way, which is the reason why it has been adopted in MadLoop.

The Ossola-Papadopolous-Pittau reduction
Let us consider an UV-unrenormalized, n-point one-loop amplitude A (n,1) U . We have: where the sum runs over all Feynman diagrams relevant to A (n,1) U , and C α is the contribution of a given Feynman diagram after loop integration. The OPP procedure can be viewed as a linear operator, and therefore in what follows we shall consider only a given C αhence, the index α will be dropped in order to simplify the notation. The quantity C is in general a tensor in Lorentz and colour spaces. The following arguments are however independent of the nature of C, which will therefore be understood; this is equivalent to fixing the Lorentz and colour indices in C, and manipulate the resulting scalar quantity. It simplifies the present discussion to consider all external momenta as outgoing: We consider a diagram with m propagators in the loop; the value of m need not be specified here, and it suffices to say that it satisfies the constraint 1 ≤ m ≤ n. It is not restrictive to assume that the external momenta are in the same order as in fig. 1 (since such a configuration can always be obtained through a relabeling). We denote the loop momentum in d = 4 − 2ǫ dimensions byl, and decompose it into the sum of a 4-dimensional and of a (−2ǫ)-dimensional components, which we denote by ℓ andl respectively. Hence: We introduce the partial sums that enter the propagators that form the loop (see fig. 1): The values of the integers M i depend on the particular diagram considered (e.g. in fig. 1 we have M 1 = 1, M 2 = 3, M 3 = 6), but they must always fulfill the following conditions: where the last equality of eq. (3.5) follows from eq. (3.2). The inverses of the loop propagators in d and four dimensions we denote byD and D respectively. Hence: which follows from eq. (3.3), and from the fact that the (−2ǫ)-dimensional parts of the external four-vectors are equal to zero, since the 't Hooft-Veltman scheme is adopted. Note that m i is the mass of the particle flowing in the i th propagator, and therefore in general p 2 i = m 2 i . As is known [14], the one-loop integral C can be expressed as a cut-constructible part, i.e. a linear combination of scalar boxes, triangles, bubbles, and tadpoles, plus a (non cut-constructible) remainder term R, called rational part: The essence of the OPP method is that of computing C by determining (in a numerical manner) the set of coefficients and the rational part and by using the well-known expressions for scalar loop integrals [26,27,28]. The determination of the quantities in eq. (3.8) is achieved by working at the integrand level. We start by writing which follows from fig. 1 and eq. (3.6), and implicitly definesN (l). Next, we decompose the numerator functionN into the sum of a four-dimensional part and an extra piece (which by definition contains all the dependence onl and on ǫ; note that there is no dependence on the (−2ǫ)-dimensional parts of the external four-vectors in the 't Hooft-Veltman scheme): We get:C (3.14) Here, R 2 contributes only to the rational part R, while C cc+R 1 is the sum of a cutconstructible and of a rational term (called R 1 ); we discuss how to disentangle the latter two in what follows. One starts by showing [2] that the numerator function N (ℓ) can always be cast in the following form: where the terms proportional tod,ĉ,b andâ (called spurious terms) vanish upon integration. By exploiting the fact that D i =D i −l 2 (see eq. (3.6)) in eq. (3.15), one obtains the identity: where we have defined (3.17) As the notation suggests, N cc is identical to eq. (3.15), except for the formal replacements of all D i 's with the correspondingD i 's (i.e., the four-dimensional denominators by their d-dimensional counterparts). Equation (3.16) implicitly defines N R 1 . From eq. (3.6) we see that: We can now define the cut-constructible and R 1 contributions separately: The rational part R introduced in eq. (3.7) is the sum of R 1 and R 2 , defined in eqs. (3.21) and (3.14) respectively.
The key point is that the coefficients d, c, b, and a which appear in eq. (3.15) are the same as those which appear in eq. (3.7), as can be easily seen by inserting eq. (3.15) into eq. (3.17), and by using the result so obtained in eq. (3.20). This is ultimately the reason why the OPP procedure is carried out at the integrand level, where the loop momentum (ℓ,l) is just an external and fixed quantity. Thus, eq. (3.15) is the master equation used in the OPP method for the determination of the coefficients d, c, b, and a, which is achieved by solving numerically a system of linear equations. The idea is that of computing N (ℓ) for suitable values of ℓ, which render the just-mentioned linear system easy to solve. A pre-condition for this to happen is the fact that the spurious terms can be determined fully as functions of the external momenta; this has been proved in ref. [22]. At this point, the easiest way to proceed is that of computing the cut-constructible and R 1 contributions separately. One starts with the former, by settingl 2 = 0 and using eq. (3.18). Then, one determines the two solutions 4 ℓ ± of the equations: for given i 0 , i 1 , i 2 and i 3 . Equation (3.15) then becomes (3.23) and one can prove [2] that the coefficients of the box diagrams are simply given by . (3.24) We point out that eq. (3.22) is nothing but the application of quadruple unitarity cuts. Once the solutions for the d coefficients are known thanks to eq. (3.24), the corresponding terms in eq. (3.15) are moved to the l.h.s. there. The procedure is then iterated by considering triple, double, and single unitarity cuts in succession, i.e. values of ℓ such that three, two, and one denominators vanish respectively. In exactly the same way one deals with the computation of R 1 [29]. The only difference w.r.t. the case of the cut-constructible part is that the basis of the scalar loop integrals used in the two cases is not the same (with that relevant to R 1 being almost trivial). The procedure described above is implemented in the computer program CutTools which, being given in input the function N (ℓ), the momenta defined in the partial sums of eq. (3.4), and the masses m i of the corresponding propagators, returns the numerical values of the cut-constructible part and of R 1 . Note that by giving to CutTools the momenta and the masses entering the loop propagators, rather than the numerical values of the propagators themselves, one is allowed to bypass the problem of introducing d-dimensional quantities in MadLoop -these are completely dealt with internally in CutTools. As far as R 2 is concerned, this quantity is not returned by CutTools. On the other hand, its computation can be performed by considering tree-level Feynman diagrams, which get contributions both from the usual rules of the theory under consideration, and from special R 2 functions with up to four external lines (in any renormalizable theory), as we discuss in sect. 3.2.2. These special rules can be worked out once and for all from the Lagrangian of the theory. Both the use of CutTools and the calculation of the R 2 contribution for a given one-loop amplitude are controlled by MadLoop, in a way which we outline in the next section.

Organization of the calculation
The input to MadLoop is a 2 → n Standard Model partonic process 5 r = (I 1 , I 2 , I 3 , . . . I n+2 ) , (3.25) which can be either user-defined, or generated by a third-party code such as MadFKS; examples of the two uses are given in appendix A and in sect. 2 respectively. The main output of MadLoop is the finite part 6 of the quantity: with A (n,0) and A (n,1) being the tree-level and one-loop amplitudes of the process r, after the latter has been UV-renormalized. Note that A (n,1) is the same quantity as that in eq. (3.1), except for the fact that in the latter equation the amplitude was not UV-renormalized, and that by n we had denoted there the total number of particles entering the process (and not only those in the final state as we do here). The bar over the sum symbol on the r.h.s of eq. (3.26) understands the average factors relevant to the colour and spin degrees of freedom of initial-state particles. The finite part of V is convention dependent and, unless otherwise specified, we have adopted here the same as in ref. [1], namely CDRsee appendix A for further discussion on scheme choices. As a by product, MadLoop also returns the residues of the double and single infrared poles. The complete information on A (n,1) is available internally in MadLoop (see app. A.2.7 for a case in which we have used such an information), and may be given as an additional output if so desired -this is useful e.g. for computing the LO cross section of a loop-induced process, which is proportional to A (n,1) 2 . Schematically, for a given input process r, MadLoop goes through the following steps.
1. Generates the diagrams relevant to A (n,1) (r). There are two classes of them, one for the cut-constructible plus R 1 contribution, and one for the R 2 contribution and UV renormalization.
2. Constructs the two integrands associated with the diagrams determined in item 1.
The one relevant to the cut-constructible plus R 1 contribution is the linear combination of eq. (3.1) at the integrand level, whose components are in the form of the first term 7 on the r.h.s. of eq. (3.11).
3. For a given 2 → n kinematic configuration (user-defined or generated by MadFKS), applies the OPP reduction to each of the terms of the linear combination determined in item 2. This is achieved by passing to CutTools the function N (ℓ) and any other inputs it needs (see sect. 3.1), and is done separately for each helicity configuration. After summing over all diagrams and helicities, one thus obtains the cut-constructible plus R 1 contribution C cc+R 1 .
4. For the same kinematic configuration as in item 3, computes the rational part R 2 (which is not returned by CutTools), and performs UV renormalization if necessary. These calculations are also carried out at fixed helicities, which are summed over as the final step.

Performs sanity checks.
Items 1 and 2 only involve symbolic manipulations, and construct the functions whose numerical evaluations will be performed in items 3-5. In the case of the computation of a physical cross section (when MadLoop is called by MadFKS or by an analogous program), a loop over items 3-5 is performed, with each iteration of the loop using a different kinematic configuration.
We give a brief description of the least-trivial aspects of this procedure in sects. 3 is now available, and the work to make MadLoop compatible with it has already started. When completed, this will allow us to remove most of these limitations -see sects. 4 and 5 for a discussion on this point.

Generation of one-loop amplitudes from tree amplitudes
Given the fact that MadLoop is based on the MadGraph framework, it is clear that the most economic way of generating one-loop amplitudes is that of exploiting as much as possible the capabilities of the latter code. These are however limited to constructing tree-level quantities, and therefore MadLoop must be able to perform some non-trivial operations on top on those available from MadGraph in order for us to achieve our goal. We start by observing that any one-loop diagram can be turned into a tree-level diagram by simply cutting one (and only one) of the propagators entering the loop. It must be clear that this cut has nothing to do with the cuts performed when computing one-loop integrals with unitarity methods. Thus, in order to avoid any confusion, we shall call L-cut the cut we are talking about here. The tree-level diagram obtained by L-cutting a one-loop diagram will be called L-cut diagram. In an L-cut diagram, there will be two particles (that we consider as being in the final state by definition) which arise from L-cutting the chosen propagator in the loop: their identities will be denoted by q ⋆ andq ⋆ , and they will be called L-cut particles. If we consider one-loop corrections to the process in eq. (3.25), the L-cut diagrams we obtain with the L-cut operation will be a subset of those relevant to the process: This discussion suggests to define a procedure which is the inverse of L-cutting. Namely, for a given 2 → n process such as that in eq. (3.25), we consider all possible L-cut processes of the kind of that in eq. (3.27), use MadGraph to construct the corresponding amplitudes, and sew together the two L-cut particles. In this way, we achieve the construction of one-loop diagrams without actually having to start from one-loop topologies, as done for example by FeynArts [32]. This idea is also at the basis of the one-loop computations performed by HELAC-1Loop (see ref. [33]).
This construction of one-loop amplitudes by sewing tree-level ones poses several problems. Firstly, we have to define a minimal set of L-cut processes so as not to miss any contributions to the one-loop amplitude we are seeking to construct. Secondly, for a given L-cut process, when sewing together the L-cut particles we shall obtain one-loop diagrams with an incorrect multiplicity: one particular diagram may appear more times than prescribed by perturbation theory. We have therefore to discard the one-loop diagrams in excess after sewing. We call this operation diagram filtering. Finally, after filtering the amplitudes for the L-cut diagrams we are left with are constructed. However, these amplitudes will not coincide with the corresponding one-loop amplitudes, because of the special roles played by the L-cut particles. These are associated with external wave functions in Lcut diagrams, and with an internal propagator in one-loop diagrams (therefore, technically the sewing operation corresponds to removing the wave functions of the L-cut particles, and to replacing them with a suitable propagator). MadGraph must therefore be instructed to treat L-cut particles in a special way -this includes the fact that such particles are off-shell and carry complex momenta.
It is easy to convince oneself that when computing QCD corrections the L-cut processes one needs to consider correspond to the following choices of the L-cut particles: Here, Q denotes the heaviest flavour one wants to circulate in the loop (in physical applications, Q is typically either a bottom or a top quark). Diagram filtering is performed in the following way. We start from observing that any L-cut diagram can be depicted as in fig. 2. There, T i denotes a tree-level structure 8 . The particles whose propagators enter in the loop have been denoted by p ⋆ j , to make the distinction clear from those that contribute to the tree-level structures T i ; the notation is also consistent with that used for the L-cut particles. MadLoop starts from obtaining the necessary information on T i and p ⋆ j from MadGraph (such as particle identities and four momenta); this is done for each L-cut diagram. In this way, each diagram has an unambiguous representation (its "identity"), internal to MadLoop, that we can for example identify with the string for the case of fig. 2. As we shall explain in appendix B.1, when writing diagram identities one need not distinguish between fermions and antifermions in the case of particles circulating in the loop, and hence no overlines appear in eq. (3.31) or its analogues (despite the fact that one of the two L-cut particles has been correctly denoted byq ⋆ on the l.h.s. of eq. (3.28)). At the end of the day, MadLoop will loop over all L-cut diagrams, checking their identities. If a diagram identity has not been previously found, the diagram is kept, otherwise it is discarded (i.e., it is filtered out). It must be stressed that two diagram identities must be considered equivalent if they are identical up to a cyclic permutation, or to mirror symmetry, or to a cyclic permutation plus mirror symmetry; however, mirror symmetry must not be considered in the cases of loops that contain only fermions or only ghosts, when one only checks equivalence under cyclic permutations. This is because two L-cut diagrams that differ by a cyclic permutation correspond to the same one-loop diagram which was L-cut in two different propagators (see appendix B.1 for more details). For example, a cyclic permutation equivalent to eq. (3.31) could read: Likewise, two L-cut diagrams that differ by mirror symmetry correspond to the same oneloop diagram with the loop momentum flowing in opposite directions. The identity of the L-cut diagram obtained by mirror symmetry acting on eq. (3.31) reads: Equivalence under cyclic permutations and mirror symmetry can be used as a powerful self-consistency check by MadLoop -see sect. 3.3.

R 2 contribution and UV renormalization
As it was discussed in sect. 3.1, MadLoop computes by calling CutTools the cut-constructible part of, and the R 1 contribution to, the one-loop amplitude A (n,1) (r) that appears in eq. (3.26) (item 3 of the list in sect. 3.2). Therefore, in order to obtain the full V (r), we still need to include the R 2 contribution, and the UV renormalization (item 4 of the list in sect. 3.2). The key observation here is that, thanks to the fact that the R 2 part can be seen as arising from a set of finite counterterms, its automated computation proceeds through the same steps as UV renormalization. In essence, both the R 2 and UV-renormalization contributions to V (r) can be cast in the following form: that is, an interference between the Born amplitude, A (n,0) (r), and another tree-level amplitude, A (n,R 2 ) (r) or A (n,UV) (r). The latter are the amplitudes of the 2 → n process r, constructed with the standard Feynman rules, plus the rules relevant to either the R 2 or the UV-renormalization contributions, with the condition that the amplitude be exactly one order in α S higher than the Born-level one. The R 2 rules are obtained by explicit loop calculations, and can be cast in the form of n-point functions, with 2 ≤ n ≤ 4 (see ref. [34]). As far as UV renormalization is concerned, the additional rules are simply read off the one-loop UV counterterms of QCD. The consequence of these facts, together with the condition on the power of α S that must appear in the amplitudes, is that A (n,R 2 ) (r) and A (n,UV) (r) are constructed using the standard vertices and propagators, plus one and only one R 2 or UV-renormalization n-point function. It is clear, therefore, that in order to compute the quantities in eq. (3.34) we can exploit the ability of MadGraph to construct tree amplitudes, and the functionalities of MadLoop to calculate interference terms, which we already employ when computing the cut-constructible and R 1 contributions to eq. (3.26). Note that the above procedure to compute eq. (3.34) should be applied to truncated one-loop amplitudes. When generating tree-level diagrams with MadGraph, we also get the contributions due to two-point functions on external legs, which are discarded. As is known, wave-function renormalization must be carried out through the multiplication of the Z factors, which we effectively do according to eqs. In what follows, we mention briefly a few technical details related to its current implementation in MadLoop, and its being based on MadGraph v4.
In order to compute the R 2 contribution, we have implemented new HELAS routines, that correspond to the necessary R 2 n-point functions 9 . We have restricted ourselves to the functions relevant to the Standard Model (with a few limitations, see sect. 4), in keeping with the scope of the present work. The condition on the presence of one R 2 function in A (n,R 2 ) (r) is conveniently rephrased in the MadGraph language by associating to each R 2 function one power of a new coupling (whose numerical value is equal to one), which does not appear in regular functions, and by requiring the full amplitude to have exactly one power of such coupling.
As discussed before, the automation of UV renormalization may be achieved in exactly the same way as done for the R 2 contribution. In practice, the cases we consider in this paper allow us to make some further simplifications. Before going into that, let us mention that we automated the UV renormalization of the Standard Model, using a scheme which subtracts the massless modes according to MS, and the massive ones at zero momentum. The few relevant counterterms have been typed in once and for all, as done for the R 2 functions; their explicit forms are reported in appendix B.2. As can be seen there, all UV counterterms except the one relevant to mass renormalization give contributions proportional to the Born amplitude squared (we have assumed that all contributions to such an amplitude factorize the same powers of α S and α. See sect. 4 for further comments on this point). We then gather from MadGraph the information on the power of α S , on the number of Yukawa vertices with massless and massive particles, and on the identities of the external particles; these will serve for (QCD) charge, Yukawa, and wave-function renormalizations respectively.
A simple observation allows one to solve the problem of mass renormalization, which is a procedure that does not factorize the Born. It turns out that the structure of the UV counterterm due to mass insertion (eq. (B.9)) is basically identical to that of the two-point R 2 functions that originate from bubble diagrams (in the HELAS language, they both correspond to attaching an extra term -a fermion propagator times a pre-factor -to the propagator present in the HELAS routines that give an off-shell fermion current; this is a work-around due to the absence of two-point vertices in MadGraph v4). Hence, we simply define an R 2 -plus-UV function, and therefore we piggyback UV mass renormalization on the computation of the R 2 contribution.

Summary of MadLoop features
We list here the features that required substantial writing of computer code, either in MadLoop proper, or in one of the modules that MadLoop uses.
• The generation of loop diagrams in a way that exploits as much as possible the capabilities of MadGraph (through L-cut-diagram construction and subsequent filtering).
• The computation of the resulting one-loop amplitude, in a form suited to being given in input to CutTools. This implies, in particular, the removal of the denominators of the propagators of loop particles, and the reconstruction of the propagator of the sewed L-cut particles.
• The algorithm that allows MadGraph to deal with two processes simultaneously (the L-cut and Born ones), in order to compute their interference, including the relevant colour algebra (in all cases except one-loop computations, MadGraph needs to treat one process at a time, since only an amplitude squared is relevant).
• The possibility of using complex four-momenta (that circulate in the loops).
• The implementation of ghosts.
• The implementation of special R 2 functions and of UV counterterms, and the computation of the amplitudes relevant to the R 2 and UV-renormalization contributions, which use the former functions.
In general, we have strived to make MadLoop a black box for MadFKS (or an equivalent calling program) -the only talk-to is in the form dictated by the Binoth-Les Houches accord [3]. Likewise, CutTools is a black box for MadLoop, so any future upgrades of the former code (with the condition that its inputs be those introduced in sect. 3.1) will be compatible with the latter. On the other hand, MadLoop needs information from Mad-Graph which are not available to the ordinary user of the latter program. However, this poses no problem, firstly because the amount of code-writing on the MadGraph side is next-to-minimal, and secondly (and more importantly) because MadLoop has to be considered a module of MadGraph, to become part of the official release in due time (hence, new versions of the two codes are being and will be developed together).
In the same spirit, we point out that the implementation of new HELAS routines specific to one-loop computations has been done in a way which is fully independent of the MadLoop code proper. By exploiting the capabilities of MadGraph v5, in the future we shall generate these routines automatically, but this will not imply the modification or the writing anew of any piece of code in MadLoop.

Checks
One of the main advantages of the automation of computations as performed by MadLoop is that results for individual processes (irrespective of their complication) are basically guaranteed to be correct. The key point is that the computer code which returns the relevant one-loop amplitude is written by (in other words, is an output of) MadLoop, which uses a fixed number of pre-defined building blocks (the HELAS routines), without any involvement from the user. Therefore, the correctness of any one-loop amplitude follows from establishing the correctness of MadLoop. In turn, this entails two kinds of checks.
a) The building blocks used by MadLoop must be proven correct.
b) The way in which the building blocks are combined must be proven correct.
We point out that this structure of checks is fully analogous to that one has in the case of MadGraph, although of course the technical aspects are not identical.
It is easy to realize that the checks of item a) are process-independent, while those of item b) do depend on the particular one-loop amplitude one wants to compute. In spite of this, the most straightforward way to carry out the checks of item a) is that of comparing the results of MadLoop with those available in the literature, for a suitable list of processes. Since the number of building blocks used by MadLoop is finite, so is such a list. Clearly, when following this procedure one also performs the checks of item b) for the processes of the list. However, this does not guarantee that the same checks will be successful for processes which do not belong to the list (although it is a powerful hint that this is indeed the case). Hence, in order to carry out fully the program of item b), we have implemented several self-consistency checks, that MadLoop will perform for any generated process and for any desired 2 → n kinematic configurations among those chosen by the user or by the calling program. We shall describe these checks later in this section.
The list of processes that we have used in order to compare MadLoop results with their analogues available in the literature, and in computer codes other than MadLoop, is reported in appendix A. We stress that this list is actually redundant as far as the goal of item a) is concerned (i.e. each HELAS "one-loop" function is checked at least once, but typically more than once). For each process, we compare the finite part and the residues of the infrared poles of the quantity V defined in eq. (3.26) with those of other computations. It should be pointed out that, although in some cases codes other than MadLoop will not return the residues of the infrared poles, these can in any case be checked against their analytically-known expressions, which we get from the implementation in MadFKS of eq. (B.2) of ref. [1]. As can be seen from appendix A, the agreement between MadLoop and previously-known results is excellent.
We now turn to discussing the self-consistency checks we have mentioned previously. Some of them are not applicable to specific processes; the general strategy we have followed is that of testing each process generated by MadLoop in the largest possible number of ways. The checks we have set up are listed in what follows.
• Alternative diagram filtering. L-cut diagrams whose identities differ by cyclic permutations or mirror symmetry correspond to codes which are not written by MadLoop in the same way, in spite of the fact that they will eventually give exactly the same loop-integrated results 10 . We check that this is indeed what happens. An easy way to do this is that of choosing the L-cut particles in an order different from that of eqs. (3.28)- (3.30). This is because L-cut-diagram identities not filtered out when looping over all diagrams will not be the same as those kept with the "canonical order" of eqs. (3.28)-(3.30), but rather their equivalents under cyclic permutations and/or mirror symmetry (for example, this corresponds to keeping eq. (3.31) and discarding eq. (3.32), versus keeping eq. (3.32) and discarding eq. (3.31)).
• Crossing. We consider the two processes For a given 2 → n kinematic configuration we check that the corresponding one-loop amplitudes fulfill the following equation: with ω a constant that only depends on the identities of the particles that are crossed. This is non trivial, given that MadLoop constructs A (n,1) (r 1 ) and A (n,1) (r 2 ) in two different ways (which includes the fact that the HELAS routines used in the constructions of the two amplitudes may not even be the same). We point out that crossing is a very powerful method during the debugging phase, since eq. (3.37) holds diagram by diagram.
• Dependence on the mass of a heavy quark Q. When the one-loop amplitude for a given process includes diagrams that feature a closed fermion loop, we can study the dependence of the result on m Q (by default, such a dependence is included exactly by MadLoop). We typically identify the heavy quark with the top, but this is not mandatory. In particular, we may check the following two regimes.
1. Decoupling limit: m Q → ∞. We compute the amplitude with the full m Q dependence for increasingly large values of m Q , and compare it to the one we obtain by excluding altogether the heavy quark from contributing to the loops (in the case Q = t, the latter is a five-light-flavour amplitude).
2. Zero-mass limit: m Q → 0. We compute the amplitude with the full m Q dependence for increasingly small values of m Q , and compare it to the one we obtain by replacing the heavy quark with an additional massless quark in the loops.
We stress that the comparisons between these two limits of the amplitude and the n lf -and (n lf + 1)-light-flavours results are not straightforward, owing to the possible presence of anomalies and to the UV-renormalization scheme adopted here respectively. Each process is to be studied as a case on its own, which is what we do in appendix A.
• Gauge invariance. If the process r under study involves at least a gluon, we check that the corresponding one-loop amplitude satisfies the gauge-invariant condition: for any gluon momentum k i . The same kind of check is performed for photons as well.
• Infrared pole residues. The form of the double and single infrared poles is analytically known for any process r (see e.g. eq. (B.2) of ref. [1]). We compare the residues returned by MadLoop with those computed with the analytic formulae, implemented in MadFKS. We point out that by checking the single pole result we also indirectly test the correctness of the UV renormalization procedure.
We stress that all these five checks are local in phase space, i.e. they are performed for a given 2 → n kinematic configuration. In principle, they can therefore be carried out for each kinematic configuration when integrating over the phase space; in practice, this would significantly increase the load on CPU. Therefore, we have chosen to do these tests for only a few (i.e., less than five) kinematic configurations before starting the actual integration. It is easy to realize that this is not restrictive, since given the nature of the checks it is basically impossible for an one-loop amplitude to pass them for some configurations, and to fail them for others. An exception to this situation is a failure due to some numerical instability occuring during the calculation: such a case is discussed in sect. 3.4.

Robustness against numerical instabilities
Once a process has been generated and the checks described in sect. 3.3 have been performed, the process is deemed correct by MadLoop, and phase-space integration can be carried out. While doing so, kinematic configurations may be encountered that render it particularly difficult to execute the necessary numerical manipulations. Numerical instabilities are typically related either to inaccuracies in the solution of the system of linear equations performed by CutTools which determines the quantities in eq. (3.8), or to the occasional occurrence of almost linearly-dependent subsets of external four momenta. For each 2 → n kinematic and helicity configuration given in input to MadLoop, the detection of numerical instabilities relies on two self-diagnostic tests performed by CutTools. When potential problems are encountered, MadLoop adopts a procedure that either fixes them, or provides one with an estimate of the resulting uncertainties. We describe this procedure in what follows. Before proceeding, however, we would like to stress that numerical instabilities are a fairly rare occurrence indeed. For all but two of the processes that we consider in table 2, their fraction is less than 10 −5 of the total number of kinematic configurations generated during phase-space integration. Processes b.3 and b.6 have slightly larger fractions (3 · 10 −4 and 3 · 10 −3 respectively), but as was mentioned in sect. 2 the associated uncertainties are completely negligible w.r.t. the statistical errors on the final results.
The first test performed by CutTools is based on the fact that the set in eq. (3.8) can also be obtained in an independent way by first shifting all masses in the D i 's that appear in eq. (3.15) by a common amount (see ref. [35]): The value of M is arbitrary, but one finds it convenient to relate it to the typical scale of the process; we choose M = O( √ŝ /10), with √ŝ the partonic c.m. energy. This results in a new set that allows an independent determination of the cut-constructible and R 1 contributions, which we denote by C ′ cc+R 1 . CutTools then checks whether the following inequality is satisfied: When this is not the case, the sets in eqs. (3.8) and (3.40) are determined again by Cut-Tools, but using routines written in multi-precision (except for the function N (ℓ), which is still in double precision). The inequality in eq. (3.41) is then checked with these multiprecision results. It turns out that the vast majority of kinematic configurations that give results that do not pass the test of eq. (3.41) in double precision, do so when multi-precision calculations are carried out. As second test, by using the coefficients d, c, b, and a previously determined, CutTools chooses an arbitrary loop momentum ℓ (again related to the typical scale of the process, e.g. ℓ i = O( √ŝ /5) and ℓ 2 = O(ŝ/100)), and computes the two sides eq. (3.15), which must agree at a user-defined level. This second test is typically able to detect instabilities that would not appear if also the numerator function N (ℓ) were computed in multiple precision.
If a kinematic configuration passes both these tests it is defined stable by CutTools, and the computation proceeds to the next step. Otherwise, i.e. when at least one of the two tests fails, the kinematic configuration is called an exceptional phase-space point (EPS), and the problem then becomes that of estimating its contribution to the one-loop amplitude.
It should be clear that a given kinematic configuration is an EPS only in connection with a given diagram: other diagrams typically display no problems, and are reliably computed by CutTools. One can therefore trust the result for the total one-loop integral in the case in which the unstable diagram does not contribute significantly to the total. This can be determined by checking that the resulting one-loop amplitude is gauge invariant (which is possible only if at least one gluon is involved in the process), i.e. that eq. (3.38) is fulfilled to an accuracy which is the same as that which holds for stable kinematic configurations, and which we determine at the beginning of the run. When this is the case, we say that the EPS is rescued to stability, MadLoop accepts the result given by CutTools, and proceeds to the next kinematic configuration.
In the present version of the code, in order to be very conservative we have chosen not to rescue to stability any EPS's at this stage. What we do, instead, is to estimate the contribution of the EPS's to the one-loop amplitude, and to associate an uncertainty with this procedure. We start by observing that in basically all cases EPS's originate from the inability of performing all the relevant computations in multiple precision 11 (or, perhaps more importantly, from the unwillingness to do so, since multi-precision calculations are very demanding from the CPU viewpoint). This is because EPS's are the result of the lack of cancellations between pairs of numbers, which would take place beyond double-precision accuracy. These cancellations are so delicate that even a small change in the kinematic configuration can bring them back into the double-accuracy regime. This suggests that, given an EPS, one can slightly deform it, and use the (now generally stable) result given by CutTools as an estimate of the true result associated with the EPS.
In order to carry out this deformation, we consider the EPS in the partonic c.m. frame, and rescale there the z components 12 of the particle three-momenta: The transverse momenta are left invariant, and the energy components (and √ŝ ) are adjusted so as to impose on-shellness conditions and four-momentum conservation. In eq. (3.42), λ ± are two small and arbitrary real numbers. Typically, λ − = −λ + , but this is not strictly necessary. In this way, we obtain two kinematic configurations, that we call shifted EPS's, and compute the associated Born and one-loop contributions. Let us denote by the Born amplitude computed with the original EPS, the two Born amplitudes computed with the shifted EPS's, and the two finite parts of the quantity defined in eq. (3.26) computed with the shifted EPS's, respectively (needless to say, the computation of the Born can always be fully trusted, regardless of whether a kinematic configuration is an EPS). It will also be convenient to define the one-loop contributions normalized to the corresponding Born amplitudes squared, i.e.
We write our estimate V FIN λ=0 for the finite part of V computed with the EPS as follows: We point out that eq. (3.45) is used with all stable kinematic configurations in the evaluation of the total rate and differental distributions. In doing so, we compute three integrals (associated with the central value c and the extrema c ± ∆), whose spread will be interpreted as the uncertainty associated with the presence of EPS's. In this procedure, stable configurations can be thought of having the same form as in eq. λ ± , according to the following three possibilities.
• Both shifted EPS's are classified by CutTools as stable points, or can be rescued to stability by the gauge-invariance check. We then set: (3.47) • One of the shifted EPS's (say, that with λ + to be definite) is classified by CutTools as stable, or can be rescued to stability by the gauge-invariance check, while the other shifted EPS is an EPS that cannot be rescued to stability by the gauge-invariance check. We then set: The value of ∆ is set in the same way as for the case immediately below, and we shall give its definition later.
• Both shifted EPS's are classified by CutTools as EPS's, and cannot be rescued to stability by the gauge-invariance check. In this case, we set: where med() denotes the median of its argument, which is a set of real numbers. Such a set in eq. (3.49) is that of the values of the ratios of the finite part of V over the Born amplitude squared, computed for all the stable phase-space points encountered so far.
As far as the uncertainty ∆ is concerned, associated with the central values defined in eqs. (3.48) and (3.49), we define it as follows: which is the median absolute deviation of the same set as that used in eq. (3.49). The use of the median and of the median absolute deviation in eqs. (3.49) and (3.50) in place of the more common mean and standard deviation is discussed below.
A couple of comments are in order. Firstly, the effect of shifting the kinematic configuration has been tested on stable points. One can see that the ratio V / A (n,0) 2 depends very weakly on the value of λ ± , while the dependence on λ ± of either of the two quantities that appear in this ratio is larger. This is one of the motivations for considering this ratio (rather than V ) when computing the r.h.s. of eq. (3.45). Consistently with this, we point out that the phase-space weight associated with V FIN λ=0 is that of the original (unshifted) EPS -by using the shifted-EPS phase-space weights one would introduce an unnecessary dependence on λ ± . Secondly, the ratio V / A (n,0) 2 allow us to use stable-point results for estimating EPS results, as done in eqs. (3.48)-(3.50), because by dividing by the Born amplitude squared one becomes largely insensitive to the structure of the Born itself. The actual values of λ ± must be chosen so as the shifted EPS's have a good chance of being classified as stable by CutTools, or of being rescued by means of the gauge-invariance test, while still being small. Empirically, we have determined that λ ± = ±O(0.01) are sufficient to this end.
Owing to the logarithms that appear in one-loop amplitudes, the ratio V / A (n,0) 2 may be very large in certain corners of the phase space. This is not a problem in the course of a numerical integration, because these corners have small measures (which is equivalent to saying that the logarithm has an integrable singularity), and hence large ratios will appear quite infrequently, thus giving an overall small contribution to the total integral. In spite of being rare, however, arbitrarily large ratios can still drive the mean of all ratios to arbitrarily large values. In a statistical language, this implies that the mean is not a robust measure of central tendency, being sensitive to outliers (i.e., to measurements that lie in the tails of distributions, or to deviations w.r.t. the assumed probability distribution). This would not be an issue if one were able to estimate the mean of the stable values by using large statistics, and before encountering the first EPS: unfortunately, neither of these conditions is true. The quantity that measures the central tendency with the largest robustness is the median, which is the reason why we have chosen it in eq. (3.49). The analogous quantity that measures the variability of a sample is the median absolute deviation (which is the analogue of the standard deviation), adopted in eq. (3.50). It should be kept in mind that the median and the mean are always less than one standard deviation apart, and this implies that they are statistically equivalent for our purposes. Furthermore, since the median absolute deviation is larger than the standard deviation, eq. (3.50) represents a conservative choice for the estimation of the uncertainty associated with the procedure described here. It should be finally remarked that by choosing c in eq. (3.45) equal to the mean, the total integral (i.e., the sum over all kinematic configurations) would be strictly equal to the exact one, provided that the Riemann sums computed for the numerical evaluation of such an integral were associated with phase-space partitions of equal-volume cells. This is not the case when adaptive-integration algorithms are used, as in the module Vegas [36] that we employ. However, it is easy to realize that when using Vegas the total integral is equal to the mean of the integrand, times the Vegas weight (which plays the role of the jacobian associated with a change of variables). Therefore, in MadLoop what we actually use are the quantities defined in eq. (3.44), times the Vegas weight. We have refrained from introducing the latter in the equations above just in order to simplify the discussion and the notation.
In conclusion, with the procedure described in this section we are able to detect, cure or approximate on the fly all possible numerical instabilities. In the case in which an approximation is necessary, we give an estimate of its impact. We point out again that EPS's almost never appear, and those which are still EPS after the kinematic shifts of eq. (3.42) are extremely rare. As mentioned at the beginning of this section, among the processes considered in table 2, only b.3 and b.6 have a fraction of EPS's larger than 10 −4 , of which 99.9% are recovered to stability after shifting. The uncertainties due to EPS's on the cross sections of processes b.3 and b.6 are equal to 4·10 −4 pb and 2·10 −4 pb respectively.

Limitations of the current implementation
We have presented in sect. 2 a few examples that prove that MadLoop is able to cope with a large variety of process. There are obviously computations that MadLoop cannot perform, the reason being due either to the physics of the process, or to computer-related issues. Before going into the details, let us remind the reader that MadLoop currently computes QCD one-loop corrections to tree-level SM processes; in other words, other kind of corrections (e.g. EW ones), and in general corrections to processes defined in other theories (e.g. SUSY) are not yet possible 13 -see sect. 5 for an outlook on this point.
We start with physics-related limitations. There are four of them 14 , with the first three that prevent one from generating some classes of processes (in the sense that results cannot be obtained for any phase-space point), and a fourth one that does not allow one to integrate straightforwardly over the phase space. We list them in what follows.

1.
A process cannot be generated if, at the Born level, it contains a four-gluon vertex.

The presence of EW massive vector bosons in the loops imply that certain processes
cannot be generated, depending on the number of the bosons and on the identities of the other loop particles.
3. A process cannot be generated if all contributions to the Born amplitude squared do not factorize the same powers of α S and α.
4. Finite-width effects, due to intermediate massive unstable (i.e., that can decay) particles that can also enter in the loops, are not implemented.
We now briefly comment on these four points in turn. The condition in item 1 is due to the fact that the four-gluon vertex is represented internally by MadGraph in a very involved way, owing to the non-trivial mixture between its Lorentz and colour structures. This implies that the R 2 vertices with an analogous form would require a substantial amount of debugging before being deemed correct since, although their implementation does not pose any problems of principle, it is technically error-prone 15 . It appears that the investment in time required for the implementation of four-gluon-type R 2 vertices is not justified in light of the fact that MadGraph v5 is now available, and when MadLoop will be made compatible with it, this task will become trivial. The restriction in item 2 is due to the fact that CutTools gives a sensible answer only if the largest power of the loop momentum ℓ in N (ℓ) is smaller than or equal to the number of denominators (propagators are always manipulated so as their denominators are quadratic in the loop momentum). It is therefore clear that the term k µ k ν /m 2 that appears in the propagators of the massive EW vector bosons may lead to violations of the above condition. Obviously, this happens because MadGraph v4 adopts the unitarity gauge. This difficulty will be lifted when MadGraph v5 will be used, since this will allow us to use the Feynman gauge, in which the term mentioned above is not present.
The case of item 3 applies to processes that feature interference between QCD and EW contributions at the Born level (e.g., in the qq → tt process one may want to include the diagrams that contain Z and γ * exchanges, which then interfere with the diagram that contains a gluon exchange). This implies that the counterterms needed for UV renormalization cannot be proportional to the Born amplitude squared, as assumed in appendix B.2 (except for the case of mass insertion). Hence, in this situation the UV renormalization procedure as set up in the current version of MadLoop fails. Although it is not difficult to fix this 16 , in practice it is not particularly useful, since the case discussed here is that of a double perturbative expansion in α S and α. Hence, consistency would dictate that EW corrections be computed as well, which is something that neither MadLoop nor MadFKS are equipped to do at the moment.
Finally, SM cases relevant to item 4 are those of the top quark and of the massive EW vector bosons. When considering diagrams which include one or more propagators of one or more of these particles, there may be configurations of external momenta which end up in putting some of them on their mass shells. This thus results in a divergence, which can be avoided by giving finite widths to the unstable particles. In the case in which the relevant particles also enter in the loops, the use of non-zero widths is non trivial, and consistency dictates the use of a scheme like the complex-mass one [37]. At present, such a scheme is not implemented in MadGraph (on the other hand, CutTools is already able to treat complex masses).
Let us now turn to computer-related limitations. A minor one is the condition that each L-cut process have less than 10 4 diagrams; this condition will be removed in future versions of MadLoop. Furthermore, we cannot restrict the particle types flowing in the internal lines of the diagrams through the input cards, as done in MadGraph. More relevant is the fact that, since the present one is the first version of MadLoop, it has been constructed with a very minimal amout of optimization in order not to complicate the code structure with features not dictated by physics. This implies that large-multiplicity processes 17 cannot be computed even in a few days on a O(100)-machine cluster. At present, the optimization is limited to caching the wave functions of external particles, and to scanning (diagram-by-diagram) the helicity space for the first few phase-space points, in order not to consider in the rest of the run those helicity-configuration/diagram combinations that give a contribution exactly equal to zero to the one-loop matrix elements. The possibility of summing over helicities using a Monte Carlo procedure is also implemented.
As a general and concluding remark on CPU-driven issues, it is clear that, regardless of the amount of optimization done on the code, the use made of Feynman diagrams in the current version introduces a factorially-growing complexity in the calculation. However, it is easy to realize that the FKS subtraction method is completely independent of Feynmandiagram techniques, and that to a large extent this is also the case for the OPP reduction procedure. For comments on this point and in general on the optimization of future versions of MadLoop, see sect. 5.

Outlook
What we have discussed so far proves that the combination of OPP reduction and FKS subtraction is sufficient to deal with any kind of SM process in a fully-automated manner. The limitations of the present version of MadLoop, which we have listed in sect. 4, are due either to features inherited from MadGraph v4, or to the lack of optimization of the code. We shall now sketch our short-and mid-term plans for improving MadLoop.
From the physics viewpoint, the most serious limitations at present are those reported in items 1 and 2 of sect. 4. These limitations will be fully removed when MadLoop will use MadGraph v5 rather than v4, which we expect to happen in the next few months, given that all the necessary ingredients are already available and reasonably well tested. The reason for this is due to the capability of MadGraph v5 of constructing HELAS routines 18 starting from a set of rules. Therefore, all R 2 HELAS routines (which removes the limitation of item 1) and all regular HELAS EW routines defined in any R ξ gauge (and in particular in the Feynman gauge, which removes the limitation of item 2) will become 17 Exactly how large depends on the nature of the final state. A large number of jets will use a lot more CPU than the same number of e.g. vector bosons, mainly because of gluon-dominated subprocesses. 18 This has to be compared to v4 and earlier versions, in which those routines had to be written by hand.
available in a straightforward way 19 . The chief advantage of this procedure is that it will render the construction of the building blocks of an amplitude virtually error-free. From the computing viewpoint, the most notable lack of optimization at present is due to the insufficient caching of information. In particular, for any given 2 → n kinematic configuration it is only the numerical values of the external wave functions that are saved in memory. This implies that the internal propagators and vertices that enter the tree structures T i 's attached to the loops (see sect. 3.2.1) are recomputed each time CutTools changes the loop momentum, in spite of the fact that they do not depend on such momentum. The storage of the values of the T i 's is technically almost trivial and we shall do that in the near future, again exploiting the features of MadGraph v5, whose framework is more naturally suited than that of MadGraph v4 to caching operations. We finally point out that at present the contributions due to different massless quarks circulating in closed-fermion loops are computed independently from each other, which is quite inefficient in view of the fact that some of them will give identical results (all of them in the case of pure-QCD processes). The optimization of this aspect of the computation does not pose significant problems.
Our plans for the mid-term future are described in what follows. First of all, the efficiency of MadLoop will be further increased, by using CutTools in a more rational way than at present. One simply observes that the OPP procedure need not be necessarily applied on a diagram-by-diagram basis, but one can sum diagrams whose amplitudes have the same denominators, and perform the OPP reduction on this sum (this amounts to identify all theC's of eq. (3.9) with the same combination ofD i 's, and to give in input to CutTools the sum of the corresponding N (ℓ) functions). It is easy to realize that the diagrams that can be summed together have the same number of T i 's, and that their colour structures along the loop are identical (these colour structures are equivalent to the set of the colour indices of the T i 's and of the loop particles that appear in the L-cut-diagram identities). Hence, the sum of the N (ℓ) functions will contain the sum of all T i 's functions with the same index i (i.e α T (α) i , with α running over the relevant Feynman diagrams), whose numerical values are different, but which have the same external four-momenta, and the same colour indices on the leg attached to the loop. At this point, one will cache the values of α T (α) i for all i's, and the caching procedure will be identical to the one discussed above for the case of a diagram-by-diagram OPP reduction. We expect a significant reduction in computing time w.r.t. that of the singlediagram approach, especially in the case of "small" loops with "large" T i 's.
This would be the end of story in the context of a calculation entirely based on Feynman diagrams. On the other hand, it is easy to realize that the α T (α) i combinations can be interpreted as (tree-level) currents attached to the loops. Therefore, their computations can be performed with more efficient methods than Feynman diagrammatics. In particular, in the case of many-gluon T i 's we plan to employ recursion relations, whose implementation in MadGraph v5 is well under way. However, apart from processes with very large multiplicities, we expect that the gain in efficiency due to the caching of tree structures will be larger than that due to the use of recursion relations. This is because recursion relations cannot be applied to the L-cut diagrams as a whole but only to the T i 's, since the loop propagators play a special role in CutTools, and must always be treated in a Feynman-diagrammatic way. On the other hand, recursion relations are a very powerful tool to reduce the computing time of the real-emission contributions to the cross section. We point out that their use is fully compatible with the implementation of the FKS subtraction done in MadFKS, since the latter only depends on the knowledge of the identities of the particles entering the process, and treats the matrix elements as black boxes.
Turning now to plans more directly related to physics, we point out that the use of MadGraph v5 will render the implementation of a complex-mass scheme an easy task. This will eliminate the limitation reported as item 4 in sect. 4. Although such a scheme is not expected to pose any technical problems, it has not yet been tested in MadGraph (while it is fully operational in CutTools). It is also worth mentioning that the possibility of computing NLO corrections to processes that involve a ggH effective vertex only requires the implementation of a few trivial R 2 functions -as shown in ref. [1], MadFKS is already capable of handling such a situation.
In a longer-term perspective, the possibility of using complex masses and a renormalizable gauge will pave the way to automate the computation of EW corrections to any SM process. This will therefore allow us to treat consistently a double perturbative expansion (in α S and α), and ultimately to remove the limitation reported as item 3 in sect. 4. In order to achieve this goal, some technical work is still necessary on both MadLoop and MadFKS (where it is essentially trivial, since it just amounts to including the subtraction of QED singularities).
It is also interesting to notice that the automation of one-loop computations achieved here is based on a few components that have required analytical work: the scalar integrals, and the UV and R 2 counterterms. While the former are not related to any particular theory, and their validity is thus universal, the latter are theory-specific: e.g., QCD and EW counterterms are different. It would be extremely useful to be able to obtain the counterterms directly from the Lagrangian of the theory, in the same way as is done now for the usual Feynman rules. This will clearly open up the possibility of automating oneloop corrections in any renormalizable theory.

Conclusions
The work we have presented in this paper is based on the strategic assumption that, for the word "automation" to have its proper meaning, the only operation required from a user is that of typing-in the process to be computed, and other analysis-related information (such as final-state cuts). In particular, the code that achieves the automation may only differentiate between processes depending on their general characteristics, but must never work on a case-by-case basis. Furthermore, such a differentiation must be hidden to the user. This approach guarantees a maximum amount of flexibility, which we believe is clearly shown by the variety of the physics results presented in sect. 2. It also ensures that, after an initial validation phase (which we have described in details here -see in particular sect. 3.3 and appendix A), the computations of new processes will be much more likely to be correct than if they were performed from scratch using traditional methods.
An essential component of the physics results of sect. 2 is the calculation of the subtracted real-emission contribution, its combination with the one-loop part, and their integration over the phase space: we have performed these operations with MadFKS. We point out that MadFKS and MadLoop are fully independent, and can be used in connection with other codes.
The drawback of the automation strategy adopted here is that, for any given process, the amount of computer time required for evaluating the corresponding cross section will be typically larger than that one would have needed by using dedicated codes. Furthermore, one may not be able to compute classes of processes. We have discussed the limitations of our code in sect. 4. It is very encouraging that they are quite few, and the majority of them will be eliminated in the next version of MadLoop, which is foreseen to appear in the near future, as we have discussed in sect. 5.
In summary, we have shown that NLO computations in QCD are essentially on equal footing with LO ones, up to some minor improvements that we have illustrated above. This highly non-trivial situation has been achieved thanks to a detailed understanding of perturbative techniques, which has allowed one to define process-and multiplicity-independent subtraction formalisms, and to new methods for the computation of the virtual contributions to cross sections.

Acknowledgments
We thank Tim Stelzer for his help with technical aspects of MadGraph. We are grateful to John Campbell for his assistance with MCFM, to Adrian Signer for having provided us with the code MENLO PARK, and to Stefan Dittmaier for discussions concerning ttj production. All the authors who are not based at CERN would like to thank the CERN TH group for the continuous hospitality during the course of this work. This

A. Comparisons with existing results
In this appendix, we present the comparisons between MadLoop results, and those obtained either with public computer codes, or by implementing ourselves analytical results published in the literature. As discussed in sect. 3.3, these comparisons are an essential part of the validation of MadLoop, and allow us to check all the building blocks used by the code to construct the one-loop amplitudes relevant to arbitrary processes.
We compute the quantity 20 defined in eq. (3.26), which can be re-expressed as follows: We shall also denote by the Born matrix element squared, summed/averaged over spin and colour degrees of freedom. The constant F in eq. (A.1) may clearly be absorbed into the coefficients c i ; we have introduced it in order to facilitate the comparison between MadLoop results and those of other codes. In the following, where not explicitly indicated otherwise, its value has been set equal to one. The expression in eq. (A.1) understands that we may treat as independent the Ellis-Sexton scale Q, the factorization scale µ F , and the renormalization scale µ R (which is the argument of α S , contained implicitly in c i ). As explained in ref. [1] (see in particular appendices B and C there), this is most conveniently done by computing the one-loop contribution by setting all scales equal to the Ellis-Sexton scale there, and by introducing in the short-distance cross section a compensating contribution, proportional to the Born amplitude squared and which depends linearly on log µ F /Q and on log µ F /µ R . In our setup, we have included the latter contribution in MadFKS, and consistently with this choice only the Ellis-Sexton scale Q enters the computations performed by MadLoop. It should be remarked that most of the codes we have used in this appendix as benchmarks for the validation of MadLoop are forced in any case to set and hence the compensating factor in MadFKS is equal to zero. In a few cases we were able to relax the condition of eq. (A. 3), and thus to test the full scale dependence of our computations. All the comparisons we present here are local, i.e. performed at fixed 2 → n configurations. We typically show the results for one such configuration, although as a safety measure we have checked a few more of them. We did not attempt to choose the same set of input parameters for different processes, since priority was given to using the codes we compare our results to with their defaults, in order to limit as much as is possible the number of operations we perform on them. This consideration is particularly important when it comes to choosing the kinematic configuration(s) we use in our comparisons. When using a public code which embeds the one-loop amplitudes in a cross-section integrator, we run the code and save in a file the kinematic configurations and the corresponding one-loop results. We then pick up at random a few of them, and give them in input to MadLoop. This procedure guarantees that the public code is not modified except for a few trivial output statements.
As discussed in sect. 3.3, it is not always possible to compare our results for c −2 and c −1 with those of public codes. However, we do always compare them with their known analytic forms 21 , and in doing so the agreement we find is at the level of the 12 th digit or better for all processes. The corresponding comparisons with public codes are often worse than this, being at the level of single-precision computations of real-number algebra. The reason for this is indeed a single-precision to double-precision conversion done by the computer, since some of the parameters in the input cards of MadLoop are in single precision (whereas all computations are performed in double precision). These differences are obviously completely irrelevant in cross section calculations, and are mentioned here for the sake of completeness.
In the context of the computation of a cross section, the infrared divergences of V (r) cancel those present in the subtracted real-emission contribution. Since the latter is computed using Conventional Dimensional Regularization (CDR), this is the scheme of choice for V (r) as well. On the other hand, other schemes are more suitable to one-loop computations. The 't Hooft-Veltman scheme is used by CutTools: the finite part c 0 computed in such a scheme is identical to that computed in CDR. The residues c −2 and c −1 in the 't Hooft-Veltman scheme can be obtained from those in CDR by setting the number of space-time dimensions equal to four there (in CDR one has d = 4 − 2ǫ). Another popular infrared scheme is Dimensional Reduction (DR). The difference between the finite parts c 0 computed in CDR and DR is proportional to the Born, and can thus be easily accounted for. The relevant formulae can be found in eqs. (B.3) and (B.4) of ref. [1]. For more detailed discussions on infrared schemes, see e.g. ref. [40]. In this appendix, we shall use either the 't Hooft-Veltman or the DR scheme for the finite parts, while pole residues will always be given in the 't Hooft-Veltman scheme.
When considering closed fermion loops with one EW vector boson leg there is possibly an anomalous contribution. By default, we shall always include complete quark families in the loops, thereby avoiding this problem. In this appendix, we have studied cases in which the anomaly cancellation may not be immediately evident (this occurs in connection with the decoupling limit -see sect. 3.3). We have discussed the relevant processes in some details.
Unless otherwise indicated, all dimensionful quantities that appear in this appendix are given in GeV.

A.1 QCD processes
All processes considered in this section are pure-QCD ones, i.e. EW-boson exchanges are excluded from computations.

A.1.1 The process uū → dd
The following set of parameters is used:

A.1.3 The processes dd → tt and gg → tt
We compare the MadLoop results for top-pair production with those of MCFM [42]. The following input parameters are used:

Parameter value
Parameter value Note that the top quark is considered a stable particle, and hence its width is set to zero. The phase-space point used for this check is: with p 1 and p 2 the momenta of the initial-state partons coming from the left (g or d) and from the right (g ord) respectively. The finite part is given in the DR scheme. We obtain what follows: The residues of the double and single poles have been checked against those returned by MadFKS, and perfect agreement has been found. This is not entirely trivial, because of the role played by UV renormalization, and in particular by the insertion of the UV mass counterterm on the virtual top-quark line that appears in the gg channel, which could not be checked in the case of dijet production.
Recently the results of this calculation have been verified by HELAC-1Loop [45] and by Melnikov and Schultze [46]. Here we consider the ug → ttu channel, and compare our results with those of ref. [44]. The parameters we use are:

Parameter value
Parameter value S a 0 where the value of F has been chosen in order to follow the conventions of the appendix of ref. [44]. The kinematic configuration we adopt is: This six-quark amplitude is one of the results presented in ref. [33] by A. van Hameren et al. 22 . The results by MadLoop are thus a check of the implementation of the Cut-Tools software in a framework different from that of HELAC-1Loop [45]. The following parameters are used: Parameter value Parameter value where the value of m top reminds one that top quarks enter the loops. The kinematic configuration considered is: The finite part is given in the 't Hooft-Veltman scheme. For consistency with ref. [33], no UV counterterms are included 23 , and the residue of the single pole. We obtain: uū → bbbb MadLoop Ref. [33] a 0 5.75329342809431E-009 5.753293428094391E-009 c −2 -9.205269484950836E-008 -9.205269484951069E-008 We have also considered different crossings of this process, moving either a b or ab quark to the initial state, and found full consistency among MadLoop results.
A.1.6 The process uū → ttbb with massless b This process has been first computed by Bredenstein et al. in ref. [47]. Here, we compare MadLoop with the result of A. van Hameren et al. [33], similarly to what was done in sect. A.1.5. The following set of parameters is used: Parameter value Parameter value The top width and the mass of the b quark are set equal to zero. The kinematic configuration considered is: The finite part is given in the 't Hooft-Veltman scheme. For consistency with ref. [33], no UV counterterms are included except the one relevant to top-mass renormalization. We obtain: It is a matter of trivial algebra to show that for this process one has: where s is the parton center-of-mass energy squared. For this simple case, it suffices thus to compute the ratio c 0 /a 0 , which only depends on s, µ and α S . These parameters are chosen as follows: The finite part is given in the DR scheme. We obtain what follows: The residues of the double and single poles have been checked against those returned by MadFKS, and perfect agreement has been found.
A.2.3 The processes dd(→ γ * /Z) → e − e + g and dg(→ γ * /Z) → e − e + d These two processes contribute to the NLO corrections to Z + 1 jet. We compare the MadLoop results with the implementation of MCFM [42]. The amplitudes in MCFM do not include closed fermion loops attached to the vector boson, which simplifies the calculation slightly. This means that in MadLoop we have to remove these contributions (with the UserFilter function in the Diagram.cpp file). It is straightforward to remove the triangle loop diagrams attached to the EW vector boson. There is, however, one subtlety: UserFilter does not affect the R 2 contributions. Even though there is no R 2 contribution coming from closed fermion loops when summing over a complete family (as one should do to have a proper anomaly cancellation), when including only five massless quarks as done in MCFM, the R 2 contribution is non zero. We therefore have to set the one R 2 diagram with the ggV interactions to zero by hand.
The following input parameters are used: Parameter value Parameter value The kinematic configuration is: The finite part is given in the DR scheme. We obtain what follows: In order to further test the internal consistency of MadLoop, we have redone the calculation by including the contributions of the closed fermion loops, that have been neglected in the comparison with MCFM. In doing so, we are also able to perform the m topdependence studies introduced in sect. 3.3. To be definite, we consider only the process dd(→ γ * /Z) → e − e + g. In fig. 3, we present the results for the finite part c 0 as a function of m top , for the kinematic configuration given above. We compare the five-flavour calculation (which is independent of m top , and from which we remove the contribution of closed fermion loops) with the six-flavour calculation that retains the full m top dependence (and where closed fermion loops are included). We see that in the decoupling limit (m top → ∞) the six-flavour result does not agree with the five-flavour one. This difference is due to the non-anomalous part of the b-quark triangle diagrams, which contribute only to the six-flavour result (having been excluded by hand from the five-flavour one), since in the decoupling limit the third fermion family effectively includes only the b quark (the anomalous part, being mass-independent, is zero also in the decoupling limit). The limit for m top → 0 is equal to the five-flavour result even though we are using a renormalization scheme in which we subtract the top quark loop at zero momentum. This means that the heavy-flavour contributions to the UV counterterms relevant to strong coupling and gluon wave function renormalization diverge, but for this particular process there are an equal number of powers of α S as of external gluons, so that these divergences cancel (see eqs. (B.5) and (B.7)). Furthermore, due to the fact that the mass difference between the top quark and the bottom quark goes to zero as well, the contribution of triangle diagrams is exactly zero at m top = 0, and hence the six-flavour result coincides with the five-flavour one.
A.2.4 The processes dc(→ W − ) → e −ν e uc and dg(→ W − ) → e −ν e ug The virtual corrections to these processes can be inferred from those relevant to e + e − → 4 partons that have been first calculated by Bern, Dixon and Kosower (BDK) [48]. Here we compare the MadLoop results against the implementation of the crossings of the BDK amplitudes in MCFM [49]. The BDK amplitudes and their implementations in MCFM assume five massless quark flavours, plus the top quark which is taken to be massive. However, only terms up to 1/m 2 top have been kept, with higher inverse powers of m top being neglected. The MadLoop implementation also features five massless quarks plus a massive top quark, but the dependence on m top is retained in full. Therefore, we only expect agreement with MCFM for large top-quark masses. The following input parameters are used:  ref. [48]). However, from the plot it is clear that the two results converge to the same number in the limit m top → ∞. For m top ≈ 1.7 TeV, the relative difference between MadLoop and MCFM is smaller than 10 −5 . For yet larger top masses numerical instabilities in MadLoop render it impossible to reach such level of precision. These instabilities are due to large cancellations between the R 2 and the cut-constructible-plus-R 1 contributions. For a sensible computation of their sum one would need to obtain these two contributions with an accuracy beyond double precision.
For the process dg(→ W − ) → e −ν e ug, fig. 4(b), the situation is similar. At m top = 600 GeV, the relative difference between MadLoop and MCFM is of the order of 10 −7 . For larger masses, MadLoop displays the same numerical instabilities as those mentioned above.
We conclude this section by pointing out that the results presented here do not allow one to assess the impact of terms of order 1/m 4 top and higher on observable cross sections. In fact, although for the physical value of the top mass these terms seem generally to have a small effect (O(1%)), they do depend on the partonic channel and the kinematic configuration chosen. A firm conclusion can thus be reached only through a comparison at the level of integrated cross sections.
A.2.5 The process ug(→ Z/γ * ) → e − e + ug The virtual corrections to this process can be inferred from those relevant to e + e − → 4 partons that have been first calculated by Bern, Dixon and Kosower (BDK) [48]. Here we compare the MadLoop results against the implementation of the crossings of the BDK amplitudes in MCFM [49], where they contribute to the pp → Z/γ ⋆ + 2j cross section. We start by restricting ourselves to testing the pure vector coupling, and hence we can switch off completely the Z-boson exchange contributions in both MadLoop  The situation of fig. 5(b) is different, since the MadLoop and MCFM results are in clear disagreement. The contribution to c 0 considered here is due to six box diagrams, whose sum is gauge invariant. Moreover, it is a finite contribution, so it does not require UV counterterms (nor terms to switch from the 't Hooft-Veltman scheme to the DR scheme). We have checked that for m top → ∞ the ratio between MadLoop and MCFM is not an overall constant, but depends on the kinematic configuration chosen. We have also checked that in such a decoupling limit the MadLoop result agrees with the one that we obtain by running MadLoop with five massless flavours only. Furthermore, in the limit m top → 0 the MadLoop result coincides with that obtained by running MadLoop with six massless flavours. These tests clearly suggest that the box contribution is either wrongly implemented in MCFM, or has been wrongly extracted from MCFM by us. We have also carried out the analogue of the comparison shown in fig. 5(b), by considering the axial (for a Z exchange) rather than the vector coupling. We have found the same pattern as for the vector coupling, namely a disagreement with MCFM 24 . In order to confirm that there is no problem with the computation of these box diagrams in Mad-Loop, we have checked them against a different implementation of the BDK amplitudessee sect. A.2.6.
A.2.6 Closed fermion loops contributing to e + e − → ddgg As anticipated in sect. A.2.5, we compare here the MadLoop results for the BDK box and triangle diagrams [48] (with the latter contributing only to the axial part), with that of their implementation in the program MENLO PARC [50,51]. We do so by considering the process e + e − → ddgg. We remind the reader that the BDK amplitudes retain the top-mass dependence in the loops only up to terms of order 1/m 2 top . For our comparisons with use µ = m Z = 91.187 and α S = 0.118. Furthermore, we set the coupling constants of the photon to the fermions (both quarks and leptons) equal to one. For the Z boson couplings to fermions, we adopt a pure axial coupling, with magnitude 1 for leptons and up-type quarks, and −1 for down-type quarks. We choose the following kinematic configuration: and present the result in the 't Hooft-Veltman scheme.
We start by considering the intermediate (off-shell) photon case. In such a way we test only the vectorial couplings and we need only include the six boxes in MadLoop, which correspond to the A v 6;4 term of ref. [48]. In the program MENLO PARC, A v 6;4 can be easily computed by setting the color part parameter equal to vect. As can be seen from fig. 5(b), this term is independent of the top mass. Thus, in order to compare it to the prediction of MadLoop, we can either consider the decoupling limit, or exclude the top-quark contribution to the loops; we adopt the latter option here 25 . We find an excellent agreement: Refs. [50,51] (closed fermion loops) We next consider the case of a purely-axial intermediate Z boson. This corresponds to the A ax 6;4 and A ax 6;5 terms of ref. [48], which can be obtained from MENLO PARC by setting the parameter color part equal to axal. The results are presented in fig. 6 as a function of the top mass, for the same kinematic configuration as was used above. As in the case of the purely vector couplings, we find excellent agreement between the two results. In particular, for large m top the relative difference between MadLoop and MENLO PARK is of the order of 10 −4 . When m top = O(1 TeV) numerical instabilities in MadLoop spoil the accuracy of the comparison; as in previous cases, these instabilities are due to large cancellations between the cut-constructible-plus-R 1 and R 2 contributions. For m top → 0, 25 We have also computed the decoupling limit, and found a perfectly regular numerical behaviour.  the BDK amplitudes diverge (due to the fact that terms are kept only up to 1/m 2 top ), while MadLoop reproduces the correct result. This is equal to zero, since in this limit the bottom and top quark loop contributions exactly cancel each other.
We have explicitly checked that MadLoop is self-consistent, and consistent with the results given in sect. A.2.5, by crossing the antiquark and one of the gluons to the initial state, and the e + e − pair to the final state.
A.2.7 The gg → Zg one-loop amplitude squared As a final check on the computations of closed fermion loop diagrams with one external EW boson leg, we consider the process gg → Zg, whose amplitude was first computed by Van der Bij and Glover in ref. [52] 26 . At variance with the cases discussed in sect. A.2.5 and A.2.6, this process does not have a Born-level contribution. So we shall not compute here the quantity V defined in eq. (3.26), but rather the one-loop amplitude squared ( A (n,1) 2 ), summed/averaged over spins and colours 27 . This implies that for the present case we had to hack the MadLoop code, in order for it to compute the square of an amplitude rather than the interference of two amplitudes. As far as the computation of ref. [52] is concerned, the helicity amplitudes presented in appendix B of that paper are typed in Mathematica, which is then used to perform all subsequent analytical and numerical manipulations, and to obtain the results denoted by "Ref. [52]" in what follows. We stress that the results of ref. [52] are given for one quark flavour circulating in the loop.
We use the following input parameters: 26 The results also agree with an independent calculation by F. Tramontano. 27 We also point out that in the present case the Z is on-shell, while the intermediate vector bosons were off-shell in the cases discussed in sect. A.2. 5  We choose a kinematic configuration such that the Mandelstam variables are t = u = − 3 2 M 2 Z , as defined in ref. [52]. This can be obtained using e.g. the following four-momenta: Note that √ s = 2M Z , and hence we have adopted a value of m top that is not effectively close to the decoupling limit -this is useful lest we have to deal with very small amplitude values. In any case, the dependence on m top has been studied as well, as can be seen in fig. 7. The axial and the vector coupling lead to amplitudes separately gauge independent, which do not interfere (simply because of the symmetry and antisymmetry of their colour factors respectively); we thus check them independently. We start here with the vector coupling, for which we can limit ourselves to considering only the top quark running in the loop, since the corresponing amplitude is not anomalous. The resulting Feynman diagrams are then six (finite) massive fermion-loop boxes. We obtain: gg → Zg vector MadLoop Ref. [52] c 0 1.41346852305044352E-006 1.4134685231123695E-006 We have also computed these results by varying m top . The comparison between MadLoop and ref. [52] is presented in fig. 7. The thresholds at m top /m Z = 0.5 and m top /m Z = 1 can be easily understood in terms of the optical theorem. When one considers the axial couplings of the Z boson, there is an anomalous contribution independent of the mass of the fermion that circulates in the loop. The most straightforward way to circumvent the problem of the anomaly is that of considering both the top and the bottom quarks as loop particles. The bottom mass is kept fixed and equal to 20 GeV. We cannot set m b = 0 since the computation of ref. [52] is performed assuming non-zero quark masses. On the MadLoop side, we have studied the bottom-mass dependence, and found no peculiar numerical behaviours. We have chosen m b = 20 GeV when presenting our results in order to be able to explore both regions m top ≫ m b and m top ≪ m b (see fig. 8). By setting m top = 80 GeV we obtain: Ref. [52] c 0 1.17192023257760489E-004 1.

1719202325756625E-004
Analogously to what was done in fig. 7, we present in fig. 8 the results for the axial part as a function of m top . In the decoupling limit m top → ∞, we are left with the non-anomalous part of the contribution due to the bottom quark. As is expected, the contributions due to top and bottom quarks exactly cancel each other when m top = m b . We have also checked that this is the case for various value of m b (including the large-mass limit). We conclude this section by mentioning that we have checked all possible gluon crossings of the process considered in this section. As for all other cases in which gluons appear, gauge independence (eq. (3.38)) is automatically checked.
A.2.8 The process ug → tbd (four-flavour t-channel single-top production) The t-channel single-top process in the four-flavour scheme is an interesting case from the point of view of NLO computations [53,54]. Even though it is a 2 → 3 process at the The finite part is given in the DR scheme. The b quark mass is set equal to zero in closed fermion loops as well as for α S renormalization (while the exact top-quark-mass dependence is kept everywhere). We obtain: The residues of the double and single poles have been checked against those returned by MadFKS, and perfect agreement has been found.

A.3 Processes with two vector bosons
This process, that contributes to the NLO corrections to fully-decayed W + W − production (i.e, all spin correlations are included), has been implemented in MCFM [42], using the virtual amplitudes calculated in ref. [57]. The computation of ref. [57] includes the singlyresonant contributions where an off-shell photon or Z boson "decays" into a pair of W bosons. However, it does not include diagrams in which the off-shell photon or Z boson "decays" to leptons and one of the leptons radiates a W boson. This latter contribution is however kinematically highly suppressed w.r.t. the others, and its neglect is a very good approximation for most physics applications. Such a contribution is included in the MadLoop result, but in order not to consider it and be consistent with ref. [57] we simply set the couplings of photons and Z bosons to leptons equal to zero. We have used the following input parameters: The CKM matrix is diagonal and the Higgs channel (i.e., diagrams that contain the "decay" H → W + W − ) is not included 29 . We have chosen the following kinematic configuration: with p 1 and p 2 the momenta of the initial-state partons coming from the left (u or g) and from the right (ū or g) respectively. The finite part is given in the 't Hooft-Veltman scheme. For consistency with ref. [33], no UV counterterms are included except the one relevant to top-mass renormalization. We obtain: The Z-boson decay width is not specified in ref. [33], and this is most likely the reason for which we find a relative difference of O(10 −6 ) between MadLoop and ref. [33] 30 . Since the agreement is however already quite satisfactory, we have refrained from investigating this point further.
As a final remark it should be noted that this process is not yet suitable for full phasespace integration with MadLoop. Due to possible intermediate top quarks, which can go on-shell, the top width should be taken into account. To do this in a gauge-independent and consistent way, a scheme such as e.g. the complex-mass one needs to be implemented 31 . For a discussion on this point, see sects. 4 and 5. 29 MadLoop can easily compute this contribution, but it is very small and anyhow not included in the computation performed by Van Hameren et al.. 30 We point out, in fact, that the two results for the residue of the double pole can be made to coincide by multiplying the MadLoop results by σBornHELAC/σ BornMadLoop . 31 Using the complex-mass scheme, the phase-space integration for the process pp → W + W − bb at the NLO has recently been performed by two groups [17,25]. with a diagonal CKM matrix. We point out that the R 2 SM vertices are sufficient for the computation of this process. We have chosen the following kinematic configuration: The residues of the double and single poles have been checked against those returned by MadFKS, and perfect agreement has been found.
A.4.3 The processes uū → ttH and gg → ttH Two groups have calculated these two subprocesses that contribute to ttH hadroproduction at the NLO in QCD -see refs. [62,63] and refs. [64,65]. However, their codes are not publicly available. We therefore compare MadLoop results with those obtained with the HELAC-1Loop code [33]. We use the following input parameters: We point out that it is likely that the agreement between the two codes could be further improved, since we are aware of differences beyond single-precision accuracy between the input parameters used in the two codes.

B.1 Example of filtering of loop diagrams
In this appendix, we illustrate a simple example of the procedure implemented in MadLoop for the generation of one-loop diagrams through L-cut diagrams. We work in QCD with one light flavour, which we identify with the u quark, and consider the process since ghosts do not contribute. The L-cut diagrams are shown in fig. 9, and the corresponding diagram identities are reported in tables 3 and 4. These diagram identities are constructed in the following way. Firstly, the L-cut particles q ⋆ andq ⋆ (with (q ⋆ ,q ⋆ ) = (g ⋆ , g ⋆ ) and (q ⋆ ,q ⋆ ) = (u ⋆ ,ū ⋆ ) for eqs. (B.2) and (B.3) respectively) are assigned momenta equal to ℓ and −ℓ respectively. This implies that the loop momentum flows from particlesq ⋆ to particle q ⋆ . Secondly, for any given L-cut diagram, one starts from particle q ⋆ and, by following the loop flow backwards, writes down either the identity of a loop particle, or a symbol T associated unambiguously with a tree structure attached to the loop. The diagram identity is completed when particleq ⋆ is encountered. As already anticipated in sect. 3.2.1, it is not necessary to keep track of whether a loop particle is a fermion or an antifermion when defining diagram identities. In fact, this distinction is meaningless if a fermion is not attached to an external fermion line (i.e., in the case of a closed fermion loop), since in such a case it depends solely on the orientation of the loop momentum, which is easier to keep track of than the fermion/antifermion identity. This observation obviously applies to the case of ghost loops as well. On the other hand, when a fermion in the loop is attached to an external fermion line, the information on its particle/antiparticle identity is implicitly included in that relevant to the tree structure T to which the external fermion belongs 32 . The difference between these two situations is the reason why in the case of a purely-fermion (or ghost) loop one must not consider as being equivalent two diagrams whose identities are equal up to mirror symmetry -such two diagrams differ by the orientation of the loop momentum, and for fermion or ghost loops both orientations must be considered 33 . Finally, as described in sect. 3.2.1, diagrams are filtered out according to the properties of their identities under cyclic permutation and (possibly) mirror symmetry. It is not difficult to see that this procedure allows one to associate symmetry factors with Diagram # ID Topology Action g.1 g ⋆ T 1 u ⋆ T 2 u ⋆ T 3 g ⋆ triangle keep g.2 g ⋆ T 1 u ⋆ T 4 g ⋆ bubble discard g.3 g ⋆ T 3 u ⋆ T 2 u ⋆ T 1 g ⋆ triangle discard (≡ g.1) g.4 g ⋆ T 4 u ⋆ T 1 g ⋆ bubble discard g.5 g ⋆ T 3 u ⋆ T 5 g ⋆ bubble discard g.6 g ⋆ T 5 u ⋆ T 3 g ⋆ bubble discard g.7 g ⋆ T 6 g ⋆ tadpole discard g.8 g ⋆ T 7 g ⋆ tadpole discard Table 3: Identities of L-cut diagrams for the process in eq. (B.2).
loop diagrams as customary in QCD -they are all equal to one, except in the case of a gluon bubble, where such factor is equal to one-half. This renders it trivial to take them into account in MadLoop. By following the general rules outlined above, the reader can work out the diagram identities reported in table 3 (table 4) using the diagrams depicted in the left (right) panel of fig. 9. By doing so, one is naturally led to introduce the tree structures reported in table 5. When filtering, we (arbitrarily) begin from diagram #1 of the process in eq. (B.2), moving eventually on to diagrams associated with the process in eq. (B.3). A diagram is kept or filtered out as indicated in the last columns of tables 3 and 4. All bubbles on external lines and tadpoles are discarded by definition. Diagram g.3 is identical to diagram g.1 up to mirror symmetry, while diagrams q.1 and q.3 are identical to diagram g.1 up to cyclic permutations. Diagram q.6 is identical to diagram q.5 up to a cyclic permutation. Diagram q.5 represents a closed fermion bubble on an internal line; it must be taken into account, but its contribution is equal to zero being proportional to the trace of a single Gell-Mann matrix. We are thus left out with one triangle loop diagram (arising from sewing Tree Particle content T 5ū (γe + e − )u T 6 gu u(γe + e − )ū T 7 gū ū(γe + e − )u T 8 guū Table 5: Tree structures used in tables 3 and 4. The brackets indicate sub-trees, and are inserted here for the sole purpose of simplifying the reading of the diagrams. diagram g.1) that contributes to the one-loop corrections to eq. (B.1), which is of course the well-known result.

B.2 UV renormalization counterterms
As anticipated in sect. 3.2.2, for UV renormalization we use a scheme which subtracts the massless modes according to MS, and the massive ones at zero momentum (see e.g. ref. [65]). In sect. 3.2.2, we also pointed out that for the cases we consider all UV counterterms except that relevant to mass renormalization give a contribution to eq. (3.34) which is proportional to the Born amplitude squared. We denote by C A , C F and T F the usual colour factors. N c is the number of colours, and n lf and n hf are the numbers of light and heavy flavours respectively that circulate in the loops (a quark is by definition heavy if it has a non-zero mass). We make use of the prefactor N ǫ defined as follows: N ǫ = 1 16π 2 (4π) ǫ Γ(1 + ǫ) .
(B.4) If the Born cross section is of order α b S , the contribution to eq. (3.34) due to strong-coupling renormalization reads: where the sum in the third term on the r.h.s. runs over all heavy flavours that circulate in the loops. The contribution due to the renormalization of the Yukawa couplings reads: with n yuk,l and n yuk,h the number of Yukawa vertices with massless and massive particles respectively. Colour singlets and massless colour triplets do not require any wave-function renormalization. The gluon wave function is renormalized only if there are massive colour triplets fermions running in the loop. Denoting by n g the number of external gluons at Born level, the contribution to eq. (3.34) due to gluon wave-function renormalization reads: The wave-function renormalization of the external massive quarks (denoted ext hf ) gives the following contribution: Finally, for carrying out mass renormalization for a quark line with momentum k, mass m, and colour indices i and j, one uses the mass insertion given by the following equation: (B.10)