OpenLoops 2

We present the new version of OpenLoops, an automated generator of tree and one-loop scattering amplitudes based on the open-loop recursion. One main novelty of OpenLoops 2 is the extension of the original algorithm from NLO QCD to the full Standard Model, including electroweak (EW) corrections from gauge, Higgs and Yukawa interactions. In this context, among several new features, we discuss the systematic bookkeeping of QCD-EW interferences, a flexible implementation of the complex-mass scheme for processes with on-shell and off-shell unstable particles, a special treatment of on-shell and off-shell external photons, and efficient scale variations. The other main novelty is the implementation of the recently proposed on-the-fly reduction algorithm, which supersedes the usage of external reduction libraries for the calculation of tree-loop interferences. This new algorithm is equipped with an automated system that avoids Gram-determinant instabilities through analytic methods in combination with a new hybrid-precision approach based on a highly targeted usage of quadruple precision with minimal CPU overhead. The resulting significant speed and stability improvements are especially relevant for challenging NLO multi-leg calculations and for NNLO applications.

In this paper we present the new version of OpenLoops, 1 an automated tool for the calculation of tree and one-loop scattering amplitudes within the Standard Model (SM). The OpenLoops algorithm is based on a numerical recursion 2 that generates loop amplitudes in terms of cut-open loop diagrams [9,33]. Such objects, called open loops, are characterised by a tree topology but depend on the loop momentum.
In the original version of the algorithm [9], implemented in OpenLoops 1 [16], loop amplitudes are built in two phases. In the first phase, Feynman diagrams are constructed in terms of tensor integrals using the open-loop recursion, while in the second phase, loop amplitudes are reduced to scalar integrals using external libraries such as Collier [19] or CutTools [10]. The main strengths of this approach are the high speed of the open-loop recursion and the possibility of curing numerical instabilities through the tensor-reduction techniques [4,34] implemented in Collier [19].
In the original open-loop algorithm [9], the rank of open loops increases at each step of the recursion. As a consequence, the CPU time required for their processing, the memory footprint, and also numerical instabilities, tend to grow rather fast with the number of scattering particles. For these reasons, in OpenLoops 2 the construction of loop amplitudes and their reduction have been unified in a single recursive algorithm [33] that makes it possible to avoid high-rank objects at all stages of the amplitude calculations. This is achieved by interleaving single steps of the construction of open loops with reduction operations at the integrand level [2]. The implementation of this method, called on-the-fly reduction, is one of the main novelties of OpenLoops 2. So far it is restricted to tree-loop interferences at NLO, while squared loop amplitudes are still processed in the same way as in OpenLoops 1.
The on-the-fly reduction algorithm in OpenLoops 2 is equipped with an automated system that avoids numerical instabilities in a highly efficient way. This stability system makes use of analytic techniques that have been introduced in [33 ] and have meanwhile been extended in various directions, and supplemented by a novel hybrid-precision system. The latter monitors the level of stability by exploiting information on the analytic structure of the reduction identities, and residual instabilities are stabilised on-the-fly through quadruple precision (qp). This system is implemented at the level of individual operations. In this way, the usage of qp is restricted to a minimal part of the calculations, which results in a huge speed-up as compared to complete qp re-evaluations. Thanks to these features, the on-the-fly reduction method makes it possible to achieve an unprecedented level of numerical stability, both for multi-leg NLO calculations with hard kinematics and for NNLO applications with unresolved partons.
The structure of the open-loop recursion [9,33] is model independent, and the explicit form of its kernels depends only on the Lagrangian of the model at hand. The original implementation [16] was applicable to any SM process at NLO QCD, and the other major novelty of OpenLoops 2 is the extension of NLO automation to the full SM [35,36], including any correction effect of O(α s ) and O(α). 3 In this respect, in this paper we present a detailed discussion of the interplay of QCD and EW effects in scattering amplitudes with more than one quark chain, which are relevant for LHC processes with two or more light jets. In that case, Born amplitudes consist of towers of terms of order α p s α q with fixed total power p + q but variable powers in the QCD and EW couplings. In such cases, as is well known, QCD and EW interactions mix through interference effects and, in general, NLO terms of fixed order α P s α Q involve correction effects of QCD and EW kind. However, as we will point out, each NLO term of order α P s α Q is always dominated either by QCD corrections to Born terms of order α P −1 s α Q or by EW corrections to Born terms of order α P s α Q−1 . In this paper the renormalisation of the SM and its implementation in OpenLoops are discussed in detail. In the QCD sector, quark masses and Yukawa couplings can be renormalised in the on-shell and MS schemes, and the α s counterterm can be flexibly adapted to any flavour-number scheme. The renormalisation of masses and couplings at O(α) is based on the on-shell scheme [37] and its extension to complex masses [38] for off-shell unstable particles. More precisely, in OpenLoops 2 these two approaches are unified in a generic scheme that can address processes with combinations of on-shell and off-shell unstable particles, such as for pp → tt + − , where Z-bosons occur as internal resonances, while top quarks are on-shell external states. Besides UV counterterms, OpenLoops 2 implements also Catani-Seymour's I-operator for the subtraction of infrared (IR) singularities at O(α s ) [39,40] and O(α) [36,[41][42][43][44].
For the definition of EW couplings, three different schemes based on the the input parameters α(0), α(M 2 Z ) and G µ are supported. Moreover, OpenLoops 2 implements an automated system for the optimal choice of the coupling of on-shell and off-shell external photons. Concerning the choice of α s and the renormalisation scale µ R , a new automated scale-variation mechanism makes it possible to re-evaluate scattering amplitudes for multiple values of α s and µ R with minimal CPU cost.
The OpenLoops 2 program can be combined with any other code by means of its native Fortran and C/C ++ interfaces, which allow one to exploit its functionalities in a flexible way. Besides the choice of processes and parameters, the interfaces support the calculation of LO, NLO, and loopinduced matrix elements and building blocks thereof, as well as various colour and spin correlators relevant for the subtraction of IR singularities at NLO and NNLO. Additional interface functions give access to the SU(3) colour basis and the colour flow of tree amplitudes. Besides its native interfaces, OpenLoops offers also a standard interface in the BLHA format [45,46].
The OpenLoops program can be used as a plug-in by the Monte Carlo programs Sherpa [26,47], Powheg-Box [27], Herwig ++ [32], Geneva [48], and Whizard [49], which possess built-in interfaces that control all relevant OpenLoops functionalities in a largely automated way, requiring only little user intervention. Moreover, OpenLoops is used as a building block of Matrix [50] for the calculation of NNLO QCD observables. In this context, the automation of EW corrections in OpenLoops 2 opens the door to ubiquitous NLO QCD+NLO EW simulations in Sherpa [51,52] and NNLO QCD+NLO EW calculations in Matrix [53].
The OpenLoops 2 code is publicly available on the Hepforge webpage https://openloops.hepforge.org and via the Git repository https://gitlab.com/openloops/OpenLoops. It consists of a processindependent base code and a process library that covers several hundred partonic processes, including essentially all relevant processes at the LHC. The desired processes can be easily accessed through an automated download mechanism. The set of available processes is continuously extended, and possible missing processes can be promptly generated by the authors upon request.
The paper is organised as follows. Section 2 presents the structure of the original open-loop recursion and the new on-the-fly reduction algorithm. Numerical instabilities and the new hybrid-precision system are discussed in detail. Section 3 deals with general aspects of NLO calculations and their automation in OpenLoops. This includes the bookkeeping of towers of terms of variable order α p s α q , the treatment of input parameters, optimal couplings for external photons, the renormalisation of the SM at O(α s ) and O(α), the on-shell and complex-mass schemes, and the I-operator. Section 4 provides instructions on how to use the program, starting from installation and process selection, and including the various interfaces for the calculation of matrix elements, colour/spin correlators, and tree amplitudes in colour space. Technical benchmarks concerning the speed and numerical stability of OpenLoops 2 are presented in Section 5. A detailed description of the syntax and usage of the OpenLoops interfaces can be found in the appendices.
While the paper as a whole serves as a detailed documentation of the algorithms implemented in OpenLoops 2, Section 4 together with Appendix A can be used alone as a manual.

The OpenLoops algorithm
The calculation of loop amplitudes in OpenLoops proceeds through the recursive construction of open loops and their reduction to master integrals. In this section we outline two variants of this procedure: the original open-loop method [9], which was used throughout in OpenLoops 1 and is still used for loop-induced processes in OpenLoops 2, and the new on-the-fly method [33] used for tree-loop interferences in OpenLoops 2.

Scattering amplitudes and probability densities
The main task carried out by OpenLoops is the computation of the colour and helicity-summed scattering probability densities which consist of the various interference terms that involve the Born amplitude M 0 and the oneloop amplitude M 1 for a certain process selected by the user. The usual summations and averaging over external helicities 4 and colours, as well as symmetry factors for identical particles, are included throughout and implicitly understood in the bra-ket notation in (2.1)-(2.3). The relevant average factors are encoded in (2.4) where S in denotes the set of initial-state particles, while N hel,i and N col,i are the number of helicity and colour states of particle i. The symmetry factors depend on the number n p of identical finalstate particles. They are applied to all types of final-state particles, p ∈ P out , treating particles and anti-particles as different types.
For standard processes with M 0 = 0, leading-order (LO) cross sections involve only squared tree contributions W 00 , while at next-to-leading order (NLO) virtual one-loop contributions W 01 and real-emission contributions of type W 00 with one additional parton are needed. The squared one-loop probability density W 11 is the main LO building block for loop-induced processes, i.e. processes with M 0 = 0. For the calculation of such processes at NLO also W 11 -type densities with one additional parton are needed. Otherwise W 11 is relevant as ingredient of next-to-next-to-leading order (NNLO) calculations.
In OpenLoops, L-loop matrix elements M L are computed in terms of Feynman diagrams, whose structures are generated with Feynarts [54]. The Feynman diagrams are expressed as helicity amplitudes, where Ω L is the set of all L-loop Feynman diagrams, h describes a specific helicity configuration of the external particles, and each diagram I is factorised into a colour factor C(I) and a colourstripped diagram amplitude 5 A L (I, h). The colour structures C(I) are algebraically reduced to a standard colour basis {C i } (see Section 4.5), where scattering amplitudes take the form and colour-summed interferences in (2.1)-(2.3) are built by means of the colour-interference matrix In the following we focus on the construction of the colour-stripped amplitudes A L (I, h).

Tree amplitudes
At tree level, each colour-stripped Feynman diagram is built by contracting two subtrees that are connected through a certain cut propagator, 6 A 0 (I, h) = Here k a = −k b and σ a , σ b are the momenta and spinor/Lorentz indices of the subtrees, while h a , h b denote the helicity configurations of the external particles connected to the respective subtrees. 7 The tilde in w b marks the absence of the cut propagator, which is included in w a . All relevant subtrees are generated through a numerical recursion that starts from the external wave functions and connects an increasing number of external particles through operations of the form w σa a (k a , h a ) = σa (2.10) The tensor X σa σ b σc corresponds to the triple vertex that connects w a , w b , w c , combined with the numerator of the propagator attached to w a . For quartic vertices an analogous relation is used. Each step needs to be carried out for all independent helicity configurations h b , h c . The resulting tree recursion is implemented in an efficient way by caching the amplitudes of subtrees that contribute to multiple Feynman diagrams.
The remaining part, M 1,4D , is constructed in terms of colour-stripped loop diagrams, with four-dimensional numerators N (I N , q, h) and denominatorsD i (q) = (q + p i ) 2 − m 2 i , where the bar is used for quantities in D dimensions, and the (D − 4)-dimensional part of the loop momentum is denotedq =q − q. The trace represents the contraction of spinor/Lorentz indices along the loop, and the index I N stands for the N -point topology at hand. The numerator is computed by cut-opening the loop at a certain propagator, which results into a tree-like structure consisting of a product of loop segments, where β 0 , β N are the spinor/Lorentz indices of the cut propagator. Loop segments that are connected to the loop via triple vertices have the form where an external subtree w i is connected to a loop vertex and to the adjacent loop propagator. The latter correspond to a rank-one polynomial in the loop momentum with coefficients Y and Z. A similar relation is used for quartic vertices.
The loop numerator is constructed by attaching the various segments to each other through recursive dressing steps, N k (q,ĥ k ) = N k−1 (q,ĥ k−1 )S k (q, h k ), for k = 1, . . . , N , (2.15) starting from the initial condition N 0 = 1. The labels h k andĥ k correspond, respectively, to the helicity configuration of the external legs entering the k th segments and the first k segments. The partially dressed numerator (2.15) is called an open loop. Schematically it can be represented as where blue and grey blobs correspond, respectively, to those loop segments that are already dressed and remain to be dressed. Each open loop is a polynomial in q,

Reduction to master integrals
In OpenLoops the reduction of loop amplitudes to master integrals is carried out with two different methods. Squared loop amplitudes and tree-loop interferences in the Higgs Effective Field Theory (HEFT) 8 are handled along the lines of the original open-loop approach [9], where the reduction is performed a posteriori of the dressing recursion. Since every dressing step can increase the tensor rank by one (see Fig. 1 a), this generates intermediate objects of high tensor rank, i.e. high complexity, with a negative impact on CPU speed. In contrast, all other tree-loop interferences are computed with the on-the-fly reduction approach [33], where dressing steps are interleaved with integrand reduction steps in such a way that the tensor rank, and thus the complexity, remain low at all stages of the calculation (see Fig. 1 b).

A posteriori reduction
The a posteriori reduction to scalar integrals is done by means of external tools. By default, the reduction is performed at the level of tensor integrals, 8 By HEFT we mean effective Higgs-gluon and Higgs-quark interactions in the heavy-top limit.  using the Collier library [19], which implements the Denner-Dittmaier reduction techniques [4,34] as well as the scalar integrals of [64]. Alternatively, the reduction can be performed at the integrand level using CutTools [10], which implements the OPP reduction method [5], in combination with the OneLOop library [65] for scalar integrals.

On-the-fly reduction
In the on-the-fly approach, the dressing of open loops is interleaved with reduction steps. The latter are applied in such a way that the tensor rank never exceeds two.
For objects with more then three loop propagators, D 0 , D 1 , D 2 , D 3 , . . . , the tensor rank is reduced using an integrand-reduction identity [2] of the form where the coefficients A µν i and B µν i,λ depend on the internal masses and external momenta. The four-dimensional D i terms on the rhs of (2.19) are cancelled against the D-dimensional loop denominators. This gives rise toq 2 dependent terms, D i /D j = 1−q 2 /D j , which are consistently taken into account and result into rational contributions of kind R 1 [2,33]. Note that the reduction (2.19) and the pinching of propagators can be carried out as soon as rank two is reached, irrespective of which loop segments are still undressed. Every reduction step generates four new pinched sub-topologies, and the proliferation of pinched objects is avoided by means of the merging approach described in Section 2.5.
Rank-two open loops with only three loop denominators can be reduced on-the-fly in a similar way as open loops with more than three propagators [33]. The remaining reducible integrals have the following number of propagators N and tensor rank R: N ≥ 5 and R = 1, 0; N = 4, 3 and R = 1; N = 2 and R = 2, 1. For their reduction to master integrals we use a combination of integral reduction and OPP reduction identities [33]. Master integrals are evaluated with Collier [19], which is the default in double precision, or OneLOop [65], which is the default in quadruple precision. Figure 2: Example of parent-child relation between open loops. The parent N -point diagram I N and the child (N − 1)-point diagramĨ N −1 share the first k segments (blue blobs). Thus N k (I N , q) and N k (Ĩ N −1 , q) are identical and need to be constructed only once.

Tree-loop interference
In the following we outline the calculation of tree-loop interferences (2.2) according to the original open-loop algorithm and with the on-the-fly approach [33]. The latter is used by default in OpenLoops 2. In both cases, the colour treatment is based on the factorisation of colour structures at the level of individual loop diagrams, M 1 (I, h) = C(I) A 1 (I, h). This makes it possible to cast the interference of loop diagrams with the Born amplitude into the form where A 1 (I, h) is the colour-stripped loop amplitude, and the colour information is entirely absorbed into the colour-summed interference factor where a j (I), A (i) 0 (h), and K ij are defined in (2.6)- (2.8). In this way, as detailed below, the full tree-loop interference can be constructed in terms of colour-stripped or colour-summed objects.

Parent-child algorithm
In the original open-loop approach, tree-loop interference contributions of type (2.20) are constructed as follows.
(i) The numerator of a colour-stripped N -point loop diagram (2.12) is constructed as outlined in Section 2.3, i.e. starting from N 0 = 1 and applying N dressing steps of type (2.15).
(ii) In general, open loops with higher number N of loop propagators do not need to be built from scratch, but can be constructed starting form pre-computed open loops with lower N exploiting parent-child relations [9] as illustrated in Fig. 2. The efficiency of the parent-child approach is maximised by means of cutting rules that set the position of the cut propagator and the dressing direction in a way that favours parent-child matching (for details see [9,33]).
(iii) After the last dressing step, the loop numerator is closed by taking the trace and, for every helicity state h, the colour-summed Born interference (2.20) is built as (iv) Helicity sums are performed, and the set of loop diagrams with the same one-loop topology t = {D 0 , . . . , D N −1 }, denoted Ω N (t), is combined to form a single numerator, Figure 3: Schematic representation of on-the-fly merging. Open loops with the same loop topology and the same undressed segments (grey blobs) are combined in a single object.
(v) The corresponding loop integral, is reduced to master integrals as described in Section 2.4.1, and all topologies are summed.
All operations in (i)-(v) are performed at the level of open-loop tensor coefficients.

On-the-fly algorithm
The on-the-fly construction of Born-loop interferences proceeds through objects of type where the partially dressed open loops, N k (I N , q,ĥ k ), are always interfered with the Born amplitude, summed over colours, and also over the helicitiesĥ k of all segments that are already dressed. The helicities of the remaining undressed segments are labelled with the indexȟ k . As outlined in the following, the algorithm interleaves dressing, merging and reduction operations in a way that keeps the tensor rank always low and avoids the proliferation of pinched objects that arise from the reduction. For a detailed description see [33].
(i) The generalised open loops (2.25) are constructed through subsequent dressing steps starting from U 0 (I N , q,ȟ 0 ) = U 0 (I N , h). The summation over the helicities h k is performed on-the-fly after the dressing of the related segment. This results in a reduction of helicity degrees of freedom, and thus of the number of required operations, at each dressing step.
(ii) Before each new dressing step, the set N } of open loops with the same loop topology and the same undressed segments is combined into a single object, (2.27) In this way, the remaining dressing operations for the objects in Ω N need to be performed only once. This procedure, called on-the-fly merging, is illustrated in Fig. 3. It plays an analogous role as the parent-child approach in Section 2.5.1, and its efficiency is maximised by means of cutting rules tailored to the needs of merging.
(iii) Open-loop objects of type (2.27) with more than three loop propagators are reduced on-the-fly using the integrand-reduction identity (2.19). This generates new open loops of the form where D j denotes a pinched propagator. This reduction is applied to rank-two objects directly before dressing steps that would otherwise increase the rank to three. In order to avoid the proliferation of new objects, pinched open loops are merged on-the-fly with other open loops stemming from lower-point Feynman diagrams or from other pinched open loops [33]. The numerators in (2.28) have the form whereq 2 terms that arise from pinched propagators (see Section 2.4.2) are retained in all UV divergent integrals and lead to R 1 rational terms.
Steps (i)-(iii) are iterated until the loop is entirely dressed. 9 (iv) At this stage, the loops are closed by taking the trace, and the resulting loop integrals, are reduced to master integrals upon extraction of R 1 terms, as described at the end of Section 2.4.2. Finally, all topologies are summed.
As demonstrated in Section 5, the on-the-fly approach yields significant efficiency improvements wrt the original open-loop algorithm. Moreover, based on the one-the-fly reduction algorithm, OpenLoops 2 has been equipped with an automated stability system that cures Gram-determinant instabilities with unprecedented efficiency (see Section 2.7).

Squared loop amplitudes
As outlined in the following, the calculation of squared loop amplitudes (2.3) is organised along the same lines of the parent-child algorithm of Section 2.5.1 but with a different colour treatment.
(i) The numerators of colour-stripped loop diagrams are constructed with the dressing recursion (2.15) exploiting parent-child relations.
(ii) After the last dressing step, loop numerators are closed by taking the trace, and colour-stripped diagrams expressed in terms of integrals T µ 1 ···µr N (2.18), which are then computed with Collier. While the N µ 1 ...µr (I N , h) coefficients need to be evaluated for every helicity state h, the reduction is done only once -and thus very efficiently -at the level of the h-independent tensor integrals.
(iii) Individual colour-stripped diagram amplitudes are combined with the corresponding colour structure and converted into colour vectors in the colour basis {C i }, Then, summing all diagrams yields the full one-loop colour vector (iv) Finally, the helicity/colour summed squared loop amplitude is built though the colour-interference matrix (2.8) as

Numerical stability
The reduction of one-loop amplitudes to scalar integrals suffers from numerical instabilities in exceptional phase-space regions. Such instabilities are related to small Gram determinants of the form where p k are the external momenta in the loop propagators D k . In regions where rank-two and rankthree Gram determinants become small, the objects that result from the pinching of propagators can be enhanced by spurious 1/∆ singularities. At the end, when all pinched objects are combined and the integrals evaluated, such singularities disappear. However, this cancellation can be so severe that all significant digits are lost, and the amplitude output can be inflated in an uncontrolled way by orders of magnitude. This calls for an automated system capable of detecting and curing all relevant instabilities in a reliable way. This is especially important for multi-particle and multi-scale NLO calculations, and even more for NNLO applications, which require high numerical accuracy in regions where one external parton becomes unresolved, thereby inflating spurious poles.
In principle, numerical accuracy can be augmented through quadruple precision (qp) arithmetic. But the resulting CPU overhead, of about two orders of magnitude, is often prohibitive. In Open-Loops, numerical instabilities are thus addressed as much as possible in double precision (dp) using analytic methods. In OpenLoops 1, as detailed below, numerical instabilities are avoided by means of the Collier library [19] in combination with a stability rescue system that makes use of Cut-Tools [10] in qp. In OpenLoops 2, loop-induced processes are handled along the same lines, while standard NLO calculations are carried out with the new on-the-fly reduction algorithm, which is equipped with its own stability system (see Section 2.7.2). The latter combines analytic techniques together with a new hybrid-precision system that uses qp in a highly targeted way, requiring only a tiny CPU overhead as compared to a complete qp re-evaluation.
An additional source of numerical instabilities originates from the violation of on-shell relations or total momentum conservation of external particles, i.e. due to the quality of the provided phase-space point. To this end before amplitude evaluation on-shell conditions and momentum conservation are checked. A warning is printed when these conditions are violated beyond a certain relative threshold, which can be altered via the parameter psp_tolerance (default=10 −9 ). Additionally, we apply a "cleaning procedure" which ensures kinematic constraints of the phase-space up to double precision, rsp. qp where applicable.

Stability rescue system
In the original open-loop algorithm-which was used throughout in OpenLoops 1 and is still used in OpenLoops 2 for squared loop amplitudes and tree-loop interferences in the HEFT-the reduction to scalar integrals is entirely based on external libraries, and the best option is to carry out the reduction of tensor integrals using the Collier library [19]. In the vicinity of spurious poles, Collier cures numerical instabilities by means of expansions in the Gram determinants and alternative reduction methods [4,34]. Such analytic techniques are applied in a fully automated way, and the resulting level of numerical stability is generally very good. Alternatively, the reduction can be performed at the integrand level using CutTools [10], but this option is mainly used as rescue system in qp, since CutTools does not dispose of any mechanism to avoid instabilities in dp.
In the calculation of tree-loop interferences, numerical instabilities are monitored and cured by means of an automated rescue system based on the following strategy.
(i) The stability of tensor integrals is assessed by comparing the two independent Collier implementations of the tensor reduction, Coli-Collier (default) and DD-Collier. This test can be applied to all phase-space points or restricted to a certain fraction of points with the highest virtual K-factor 10 Given the desired fraction, the points to be tested are automatically selected by sampling the distribution in the K-factor at runtime.
(ii) Points that are classified as unstable are re-evaluated in qp using CutTools and OneLOop.
(iii) In CutTools, numerical instabilities can remain significant even in qp. Their magnitude is estimated through a so-called rescaling test, where one-loop amplitudes are computed with rescaled masses, dimensionful couplings and momenta and scaled back according to the mass dimensionality of the amplitude.
In this approach, the re-evaluation of the amplitude for stability tests causes a non-negligible CPU overhead. Moreover, additional re-evaluations of the full amplitude in qp are very CPU intensive. Fortunately, thanks to the high stability of Collier, they are typically needed only for a tiny fraction of phase-space points. However, the usage of qp strongly depends on the complexity of the process, and for challenging multi-scale NLO calculations and NNLO applications it can become quite significant.
In the case of squared loop amplitudes, the qp rescue with CutTools is disabled, because of the inefficiency of OPP reduction for loop-squared amplitudes. This is due to the fact that all helicity and colour configurations must be reduced independently. Thus the above stability system is restricted to stage (i). Moreover, due to the fact that a K-factor is not available for loop-squared amplitudes, the comparison of Coli-Collier versus DD-Collier to assess numerical stability is extended to all phase-space points. Details on the usage of the stability rescue system can be found in Section 4.6.

On-the-fly stability system
The on-the-fly reduction methods [33] implemented in OpenLoops 2 are supplemented by a new stability system, which is based on the analysis of the analytic structure of spurious singularities in the employed reduction identities. In general, the reduction of loop objects with four or more propagators, D 0 , D 1 , D 2 , D 3 . . . , can give rise to spurious singularities in the rank-three Gram determinant ∆ 123 , and in the rank-two Gram determinants ∆ 12 , ∆ 13 and ∆ 23 . In the case of the on-the-fly reduction (2.19), the reduction coefficients associated with a D i pinch generate spurious singularities of the form with a clear hierarchical pattern: very strong instabilities in ∆ 12 , mild instabilities in ∆ 123 , and no instability in ∆ 13 and ∆ 23 . The on-the-fly reduction of objects with only three loop propagators involve only ∆ 12 and yields similar spurious singularities as in (2.36), but without the ∆ 123 term.
Rank-two Gram determinants Instabilities from rank-two Gram determinants are completely avoided in OpenLoops 2. In topologies with four or more propagators, this is achieved via permutations of the loop denominators, , in the reduction identities. Such permutations are applied on an event-by-event basis in order to guarantee so that the reduction is always protected from the smallest rank-two Gram determinant.
In this way, rank-two Gram instabilities are delayed to later stages of the reduction, where threepoint objects with a single Gram determinant ∆ 12 are encountered. In this case, instabilities at small ∆ 12 are cured by means of an analytic ∆ 12 -expansion, which have been introduced in [33] for the first few orders in ∆ 12 and are meanwhile available to any order [67].
Such expansions have been worked out for those topologies and regions that can lead to ∆ 12 → 0 in hard scattering processes. This can happen only in t-channel triangle configurations, where two external momenta k 1 , k 2 are space-like, and (k 1 +k 2 ) 2 = 0. The relevant virtualities are parametrised as k 2 1 = −Q 2 and k 2 2 = −(1 + δ)Q 2 , where Q 2 is a (high) energy scale, and the Gram determinant is related to δ via √ ∆ 12 = Q 2 δ/2. The corresponding three-point tensor integrals are expanded in δ based on covariant decompositions of type where L µ 1 ...µr i are Lorentz structures made of metric tensors and external momenta. Their coefficients C i (δ) are reduced to scalar tadpole, bubble and triangle integrals, i.e. 40) where c N i (δ) are rational functions containing 1/δ K poles, while the C i (δ) coefficients are regular at δ → 0. Numerically stable δ-expansions for C i (δ) are obtained via Taylor expansions of the scalar integrals. The required coefficients, To stabilise the tensor coefficients (2.40), singular terms of the form δ −K T N 0 (δ) are separated via partial fractioning and replaced by The singular parts cancel exactly when combining the contributions from A 0 , B 0 and C 0 functions as well as the rational terms. Thus only the finite series T N,K 0,fin (δ) need to be evaluated. The fact that all tensor integrals are stabilised using only C 0 and B 0 expansions makes it possible to expand with excellent CPU efficiency up to very high orders in δ, thereby controlling a broad δ-range. In practice, the δ-expansions are applied for δ < δ thr , with a threshold δ thr that is large enough to avoid significant instabilities for δ > δ thr , while below δ thr the expansions are carried out up to a relative accuracy of 10 −16 (10 −32 ) in dp (qp). By default δ thr is set to 10 −2 .
Rank-three Gram determinants The on-the-fly reduction coefficients (2.36) associated with D i pinches with i = 1, 2, 3 are proportional to 1/ √ ∆ 123 and read [33] , and µ 1,2,3 are auxiliary momenta used to parametrise the loop momentum [33]. In topologies with more than four propagators, D 0 , D 1 , D 2 , D 3 , D 4 , . . . , such rankthree Gram instabilities are avoided by performing the reduction in terms of four of the first five propagators, D i 0 D i 1 D i 2 D i 3 , which are chosen by maximising |∆ i 1 i 2 |, to avoid rank-two instabilities, and by subsequently minimising max{|K i 1 |, |K i 2 |, |K i 3 |}. In this way, small rank-three (-two) Gram determinants can largely be avoided until later stages of the recursion, where box (triangle) topologies have to be reduced.
OPP reduction The OPP method, used for five-and higher-point objects of rank smaller than two, is based on the same auxiliary momenta i mentioned above. Related rank-two Gram instabilities are avoided by permuting the propagators of the resulting scalar boxes according to (2.37).
IR regions In order to mitigate numerical instabilities in the context of NNLO calculations, Open-Loops implements additional improvements targeted at phase-space regions where one external parton becomes soft or collinear. Such improvements include: • global and numerically stable implementation of all kinematic quantities, including the basis momenta µ i used for the reduction, in special regions; • analytic expressions for renormalised self-energies to avoid numerical cancellations between bare self-energies and counterterms in the limit of small p 2 . This is relevant for self-energy insertions into propagators that are connected to two external partons via soft or collinear branchings.
Such dedicated treatments for unresolved regions will be documented in [67] and further extended in the future.
Hybrid precision system In order to cure residual instabilities that cannot be avoided with the methods described above, the on-the-fly reduction is equipped with a hybrid-precision (hp) system [67] that monitors all potentially unstable types of reduction identities and switches from dp to qp dynamically when a numerical instability is encountered. This system is fully automated and acts locally, at the level of individual operations. This makes it possible to restrict the usage of qp to a minimal part of the calculation, thereby obtaining a speed-up of orders of magnitude as compared to brute-force qp re-evaluations of the full amplitude. Typically, the extra time spent in qp is only a modest fraction of the standard dp evaluation time. The main features of the hp system are as follows.
• Quad precision is triggered and used at the level of individual reduction steps, based on the kinematics of the actual phase-space point and the loop topology of the individual open-loop object that is being processes at a given stage of the recursion.
• Reduction steps that are identified as unstable and all consecutive connected operations are carried out in quad precision until spurious singularities are cancelled. Quad precision is thus used for all subsequent operations (dressing, merging, reduction, master integrals) that are connected to an instability.
• For each type of reduction step, the magnitude of potential instabilities is estimated based on the actual kinematics and the analytical form of the reduction identity. This information leads to an error estimate that is attributed to each processed object and is propagated and updated through all steps of the algorithm.
• Quad precision is triggered when the cumulative error esimate for a certain object exceeds a global accuracy threshold, which can be adjusted by the user (see Section 4.6) depending on the required numerical accuracy.
The hp system is based on two parallel dp/qp channels for each generic operation (reduction, dressing, merging) and a twofold dp/qp representation of each object that undergoes such operations. By default the dp channel is used, and when an instability is detected the object at hand is moved to the qp channel, which is used for all its subsequent manipulations. At the end, when spurious singularities are cancelled, qp output is converted back to dp.
The efficiency of the hp system strongly benefits from the above mentioned analytical treatments of Gram determinants and soft regions, which avoid most of the instabilities and delay the remaining ones to later stages of the recursion, minimising the number of subsequent qp steps. As a result, for one-loop calculations with hard kinematics qp is typically needed only for a tiny fraction of the phase-space points, and for a very small part of the calculation of an amplitude. The usage of qp can become significantly more important in NNLO calculations, especially when local subtraction methods are used. In this case, one-loop amplitudes need to be evaluated in deep IR regions, where new types of instabilities occur for which no analytic solution is available at the moment. Such instabilities are automatically detected and cured by the hp system. This may lead, depending on the process and kinematic region, to a significant CPU overhead. In such cases, the accuracy threshold parameter should be tuned such as to achieve an optimal trade-off between performance and numerical stability.
Technical details and usage of the on-the-fly stability system are described in Section 4.6.
External libraries Finally, OpenLoops 2 benefits from improvements in Collier 1.2.3 [19], which is used for dp evaluations of scalar integrals and for tensor reduction in loop-induced processes, as well as in OneLOop [65], which is used to evaluate scalar integrals in qp.
3 Automation of tree-and one-loop amplitudes in the full SM

Power counting
In the Standard Model, scattering amplitudes can be classified based on power counting in the strong and electroweak coupling constants, 12 g s = √ 4πα s and e = √ 4πα. At LO in QCD, tree amplitudes have the simple form where n and m are, respectively, the maximally allowed power in g s and the minimally allowed power in e. The total coupling power is fixed by the number of scattering particles, n + m = N p − 2, where N p is the number of scattering particles.
In the SM, the general coupling structure of scattering amplitudes depends on the number n qq of external quark-antiquark pairs. For processes with n qq ≤ 1, the LO QCD term (3.1) is the only tree contribution, while processes with n qq ≥ 2 involve also sub-leading EW contributions of order g p s e q with p + q = N p − 2 and variable power q > m. Such contributions reflect the freedom of connecting quark lines either through EW or QCD interactions. As a result, tree amplitudes consist of a tower of QCD-EW contributions, For n qq ≥ 2, the Born amplitude (3.2) involves n qq terms, while the squared Born amplitude consists of a tower of 2n qq − 1 terms, with k = k and k = k are denoted, respectively, as Born-Born interferences and squared Born terms. The former are typically strongly suppressed with respect to the latter. This is due to the fact that physical observables are typically dominated by contributions involving propagators that are enhanced in certain kinematic regions. Squared amplitudes that involve such propagators are thus maximally enhanced. In contrast, since the propagators of Born amplitudes with k = k are typically peaked in different regions, Born-Born interferences tend to be much less enhanced. In addition, the interference between diagrams with gluon and photon propagators, which are enhanced in the same regions, turn out to be suppressed as a result of colour interference.
Based on these considerations, it is interesting to note that each term (3.4), with fixed order in α s and α, contains at most one squared-Born contribution with r − s = s. In fact this is possible only for even values of r. Thus the tower (3.3) consists of an alternating series of n qq squared Born terms 13 with r = 2R and (n qq − 1) pure interference terms with r = 2R + 1, ⊃ interference only for 0 ≤ R ≤ñ qq − 1 .
(3.5) 12 For simplicity, here we regard Yukawa and Higgs couplings as parameters of order e, keeping in mind that a separate power counting in λY and λH is possible. 13 In the following, for convenience, we refer to the the full amplitude M (2R) 0 as squared Born term.
The tower of Born terms (3.3) is illustrated in the upper row of Fig. 4. Squared Born terms are shown as large dark grey blobs, while interference terms are depicted as smaller light grey blobs.
At one loop, for processes that are not free from external QCD partons, 14 the leading QCD contributions have the form Here NLO QCD should be understood as the O(α s ) correction wrt the LO QCD term (3.1). For processes with n qq ≥ 2, the leading QCD terms are accompanied by a tower of sub-leading EW contributions, and the general form of one-loop SM amplitudes is Here and in the following, the inclusion of all counterterm contributions of UV and R 2 kind as in (2.11) is implicitly understood. One-loop terms of fixed order in g s and e in (3.7) can be regarded either as the result of O(g 2 s ) or O(e 2 ) insertions into corresponding Born amplitudes. In this perspective, denoting matrix elements of fixed order as we can define where δ QCD and δ EW should be understood as operators that transform an O(g p s e q ) Born matrix element into the complete one-loop matrix elements of O(g p+2 s e q ) and O(g p s e q+2 ), respectively. For processes with n qq ≤ 1, only one Born term and two one-loop terms exist, and the latter can unambiguously be identified as NLO QCD and NLO EW corrections, (3.10) In contrast, processes with n qq ≥ 2 involveñ qq + 1 = n qq terms of variable order g P s e Q , which can in general be regarded either as QCD corrections to Born terms of relative order g −2 s or EW corrections to Born terms of relative order e −2 , i.e. (3.11) More precisely, one-loop terms with maximal QCD order, P max = n + 2, represent pure QCD corrections, since Born terms of relative order e −2 do not exist. Similarly, one-loop terms of maximal EW order, Q max = m + 2 + 2ñ qq , are pure EW corrections, since Born terms of relative order g −2 s do not exist. In contrast, the remaining n qq − 2 terms with P < P max and Q < Q max have a mixed QCD-EW character, in the sense that they involve corrections of QCD and EW type, which coexist at the level of individual Feynman diagrams, such as in loop diagrams where two quark lines are connected by a virtual gluon and a virtual EW boson. This kind of one-loop terms cannot be split into contributions of pure QCD or pure EW type. Thus, in general only the full set of one-loop diagrams containing all mixed QCD-EW terms of order g P s e Q represents a well defined and gauge-invariant perturbative contribution. Keeping this in mind, as far as the terminology is concerned, it is often convenient to refer to (3.11) either as QCD correction wrt to O(g P −2 s e Q ) or EW correction wrt O(g P s e Q−2 ).
14 In the absence of extenal quarks and gluons, tree and one-loop amplitudes have a trivial purely EW coupling structure, M0 = e m M Squaring one-loop amplitudes with n qq ≥ 2 results in a similar tower of 2n qq − 1 mixed QCD-EW terms as in (3.3)- (3.4). In contrast, the interference of tree and one-loop amplitudes yields a tower of 2n qq terms, (3.12) Each term of fixed order in α s and α involves the interference between Born and one-loop terms of variable order, where t min = max(0, r −ñ qq ) and t max = min(r,ñ qq + 1). 15 In general, the one-loop amplitudes that enter (3.13) consist of mixed QCD-EW corrections in the sense of (3.11), i.e. (3.14) In practice, as discussed above, the one-loop terms with maximal QCD or maximal EW order consist of pure QCD or pure EW corrections. In (3.12)-(3.13) they correspond to r = 0 and r = 2ñ qq + 1, and they read These contributions are shown as the outer most blobs in the second row of Fig. 4. They emerge as pure O(α s ) and pure O(α) corrections as indicated by the red and blue arrows respectively. The remaining (2n qq − 2) terms cannot be regarded as pure QCD or pure EW corrections. Nevertheless, due to the fact that the squared Born tower is an alternating series consisting of n qq squared Born terms and (n qq − 1) pure interference terms, see (3.3)-(3.5), the tree-loop interference (3.12) corresponds to an alternating series of n qq + n qq terms that can be interpreted, respectively, as QCD and EW corrections with respect to squared Born terms. Specifically, the terms (3.13) with even indices, r = 2R with 0 ≤ R ≤ n qq − 1, can be written in the form where the terms with t = R, represent QCD corrections to squared Born amplitudes. In contrast, the alternative representation shows that EW corrections arise only in connection with interference Born terms, which are typically strongly sub-leading. Vice versa, for terms with odd indices, r = 2R + 1 with 0 ≤ R ≤ñ qq , the representation  Figure 4: Schematic representation of the towers of mixed QCD-EW terms at LO and NLO. The first row represents the LO tower (3.3)-(3.5), which consists of an alternating series of dominant squared Born terms (dark grey blobs) and sub-leading pure interference terms (light grey blobs). The second row corresponds to the NLO tower (3.12)-(3.21). Each LO term is connected to two NLO terms via QCD (red) and EW (blue) corrections, while each NLO term is connected to a unique squared Born term either via QCD or EW corrections. Apart from the outer most NLO terms of pure QCD and pure EW kind, QCD (EW) corrections to squared Born terms mix with EW (QCD) corrections to adjacent interference terms.
involves terms with t = R + 1, which represent EW corrections to squared Born amplitudes, while writing where 2R+1−t = t for all t, shows that QCD correction effects enter only through pure interference Born terms and are typically suppressed.
In summary, apart from the leading QCD and EW terms, NLO SM contributions at a given order α n+1−r s α m+r cannot be regarded as pure QCD or pure EW corrections. Nevertheless, the orders r = 2R and 2R + 1 are typically dominated, respectively, by QCD and EW corrections to the squared Born amplitude W 2R 00 ∼ M R 0 |M R 0 . Thus, keeping in mind that all relevant EW-QCD mixing and interference effects must always be included, each NLO order can be labelled in a natural and unambiguous way either as QCD or EW correction as illustrated in Fig. 4.
As detailed in Section 4.2, OpenLoops supports the calculation of tree and one-loop contributions of any desired order in α s and α. In practice, scattering probability densities at different orders in α s and α, . Alternatively, tree-loop interferences of order α P s α Q can be selected directly through the corresponding one-loop powers P or Q.

Input schemes and parameters
In this section we discuss the different input schemes and the SM input parameters that are used for the calculation of scattering amplitudes in OpenLoops. All parameters are initialised with physical default values, and can be adapted by the user by calling the Fortran routine set_parameter or the related C/C ++ functions as detailed in Appendix A.2. Tab. 10 in Appendix C summarises input parameters and switchers that can be controlled through set_parameter. Parameters with mass dimension should be entered in GeV units. The values of specific parameters in OpenLoops can be obtained by calling the routine get_parameter, and the full list of parameter values can be printed to a file by calling the function printparameter (see Appendix A.2).

Masses and widths
The OpenLoops parameters mass(PID) and width(PID) correspond, respectively, to the on-shell mass M i and the width Γ i of the particle with PDG particle number PID (see Tab. 6). Masses and widths are treated as independent inputs. For unstable particles, when Γ i > 0, the complex-mass scheme [38] is used. In this approach, particle masses are replaced throughout by the complex-valued parameters This guarantees a gauge-invariant description of resonances and related off-shell effects. By default, Γ i = 0 and µ i = M i ∈ R for all SM particles, i.e. unstable particles are treated as on-shell states, while setting Γ i > 0 for one or more unstable particles automatically activates the complex-mass scheme for the particles at hand. By default, M i > 0 only for i = W, Z, H, t.
For performance reasons, the public OpenLoops libraries are typically generated with m e = m µ = m τ = 0 and m u = m d = m s = m c = 0, while generic mass parameters m q are used for the heavy quarks q = b, t. By default, heavy-quark masses are set to m b = 0 and m t = 172 GeV, but their values can be changed by the user as desired. Dedicated process libraries with additional fermion-mass effects (any masses at NLO QCD and finite m τ at NLO EW) can be easily generated upon request. For efficiency reasons, when m Q is set to zero for a certain heavy quark, whenever possible amplitudes that involve Q as external particle are internally mapped to corresponding (faster) massless amplitudes. To this end the desired fermion masses have to be specified before any process is registered, see Section 4.2.
Strong coupling The values of the strong coupling α s (µ 2 R ) and the related renormalisation scale µ R can be controlled through the parameters alphas and muren, respectively. These parameters can be set dynamically on an event-by-event basis, 16 and OpenLoops 2 implements an automated scalevariation system that makes it possible to re-evaluate the same scattering amplitude at multiple values of µ R and/or α s (µ 2 R ) with high efficiency (see Section 4.3).

Number of colours By default, in
OpenLoops colour effects and related interferences are included throughout, i.e. scattering amplitudes are evaluated by retaining the exact dependence on the number of colours N c . In addition, dedicated process libraries with large-N c expansions can be generated by the authors upon request. When available, leading-colour amplitudes can be selected at the level of process registration (see Section 4.2) via the parameter leading_colour = 1 (default=0).
EW gauge couplings The U(1) and SU(2) gauge couplings g 1 , g 2 are derived from where e = √ 4πα and θ w denotes the weak mixing angle. The latter is always defined through the ratio of the weak-boson masses [69], (3.25) 16 For historical reasons their default values are µR = 100 GeV and αs = 0.1258086856923967.  If Γ W = Γ Z = 0, then cos θ w = M W /M Z is real valued. But in general the mixing angle is complex valued. For the electromagnetic coupling three different definitions are supported: (i) α(0)-scheme: as input for α the parameter alpha_qed_0 is used, which corresponds to the QED coupling in the Q 2 → 0 limit. This scheme is appropriate for pure QED interactions at scales Q 2 M 2 W , and for the production of on-shell photons (see below).
(ii) G µ -scheme: the input value of α is derived from the matching condition which relates squared matrix elements for the muon decay in the Fermi theory to corresponding W -exchange matrix elements in the low-energy limit. This results into 17 As input for α| Gµ the parameter Gmu is used, which corresponds to the Fermi constant G µ . The G µ -scheme resums large logarithms associated with α(M 2 Z ) as well as universal M 2 t /M 2 W enhanced corrections associated with the ρ parameter. This guarantees an optimal description of the strength of the SU(2) coupling, i.e. W -interactions, at the EW scale.
(iii) α(M 2 Z )-scheme: as input for α the parameter alpha_qed_mz is used, which corresponds to the QED coupling at Q 2 = M 2 Z . This scheme is appropriate for hard EW interactions around the EW scale, where it guarantees an optimal description of the strength of QED interactions and a decent description of the strength of weak interactions.
The choice of α-input scheme is controlled by the OpenLoops parameter ew_scheme as detailed in Tab. 1, where also the default input values are specified. Note that α(0) and α(M 2 Z ) are described by means of two distinct parameters in OpenLoops. Depending on the selected scheme, the appropriate parameter should be set.

External photons
The high-energy couplings α| Gµ and α(M 2 Z ) are appropriate for the interactions of EW gauge bosons with virtualities of the order of the EW scale. In contrast, the appropriate coupling for external high-energy photons is α(0) [71]. More precisely, for photons of virtuality Q 2 γ the coupling α(Q 2 γ ) should be used. For initial-or final-state on-shell photons this corresponds to α(0). However, in photon-induced hadronic collisions, initial-state photons inside the hadrons effectively couple as off-shell partons with virtuality Q 2 γ = µ 2 F , where µ F is the factorisation scale 17 In the literature, α|G µ is often defined as √ 2/π GµRe µ 2 W sin θ 2 w , where the truncation of the imaginary part is an ad-hoc prescription aimed at keeping α ∈ R in the complex-mass scheme. However, from the matching condition (3.26) it should be clear that (3.27) is the natural way of defining α|G µ as real-valued parameter.
of the parton distribution functions (see Appendix A.3 of [36]), Thus, at high µ 2 F the high-energy couplings α| Gµ or α(M 2 Z ) should be used. Based on these considerations, for processes with n on-shell and n * "off-shell" hard external photons plus a possible unresolved photon, the scattering probability densities W = W 00 , W 01 , W 11 are automatically rescaled as 18 with LSZ-like coupling correction factors Here α should be understood as the QED coupling in the input scheme selected by the user, while the value of α(0) correspond to the parameter alpha_qed_0 and is independent of the scheme choice. The coupling of off-shell external photons and the resulting R (off) γ factor are set internally as In this way α off is guaranteed to be a high-energy coupling. Note that unresolved photons, i.e. additional photons emitted at NLO EW, need to be treated in a different way. In this case, in order to guarantee the correct cancellation of IR singularities, real and EW corrections should be computed with the same QED coupling. This implies that the coupling α of unresolved photons should not receive any R γ rescaling.
The relevant information to determine the number of on-shell and off-shell external photons in Here "hard" should be understood as the opposite of "unresolved", i.e. it refers to all photons that are present as external particles starting from LO.

Yukawa and Higgs couplings The interactions of Higgs bosons with massive fermions is described by the Yukawa couplings
Here v corresponds to the vacuum expectation value, while µ f,Y is a Yukawa mass parameter. At LO and NLO QCD, the complex-valued Yukawa masses can be freely adapted through the parameters yuk(PID) and yukw(PID), which play the role of real Yukawa masses M i,Y and widths Γ i,Y . More explicitly, in analogy with (3.23), At NLO QCD, as discussed in Section 3.3.1, Yukawa couplings can be renormalised in the MS scheme or, alternatively, as on-shell fermion masses. The triple and quartic Higgs self-couplings are implemented as where µ H denotes the Higgs mass. By default κ H =κ (4) H = 1, consistently with the SM. At NLO QCD, and also at NLO EW for processes that are independent of λ (3,4) H at tree level, the Higgs selfcouplings can be modified through the naive real-valued rescaling parameters lambda_hhh≡ κ Higgs effective couplings Effective Higgs interactions in the M t → ∞ limit are parametrised in such a way that the Feynman rule for the vertices with two gluons and n Higgs bosons read where p µ 1 and p ν 2 are the incoming momenta of the gluons. The power counting in the coupling constants is done in e and g s as in the SM. In the Higgs Effective Field Theory, only QCD corrections are currently available.
CKM matrix The OpenLoops program can generate scattering amplitudes with a generic CKM matrix V ij . However, for efficiency reasons, most process libraries are generated with a trivial CKM matrix, V ij = δ ij . Process libraries with a generic CKM matrix are publicly available for selected processes, such as charged-current Drell-Yan production in association with jets, and further libraries of this kind can be generated upon request. When available, such libraries can be used by setting ckmorder=1 before the registration of the process at hand (see Section 4.2). In this case the default values of V ij remain equal to δ ij , but the real and imaginary parts of the CKM matrix can be set to any desired value by means of the input parameters VCKMdu, VCKMsu, VCKMbu, VCKMdc, VCKMsc, VCKMbc, VCKMdt, VCKMst, VCKMbt for Re(V ij ) and VCKMIdu, VCKMIsu, etc. for Im(V ij ).

Renormalisation
Divergences of UV and IR type are regularised in D = 4 − 2ε dimensions and are expressed as poles of the form C µ 2ε D /ε n , where µ D is the scale of dimensional regularisation, and is the conventional MS normalisation factor. For a systematic bookkeeping of the different kinds of divergences, UV and IR poles are parametrised in terms of independent dimensional factors (ε UV , ε IR ) and scales (µ UV , µ IR ). Thus, one-loop amplitudes involve three types of poles, Renormalised one-loop amplitudes computed by OpenLoops are free of UV divergences. Yet, bare amplitudes with explicit UV poles can also be obtained (see Section 4.3). The remaining IR divergences are universal and can be cancelled through appropriate subtraction terms (see Section 3.4).
For the renormalisation of UV divergences we apply the following generic transformations of masses, fields and coupling parameters, where µ 2 i,0 , ϕ i,0 , g i,0 denote bare quantities, and δµ 2 i , δZ ϕ i ϕ j , δZ g i the respective counterterms. For unstable particles, as discussed in Section 3.3.2, OpenLoops implements a flexible combination of the on-shell scheme [37] and the complex-mass scheme [38]. In this approach, the width parameters Γ i of the various unstable particles can be set to non-zero or zero values independently of each other. Depending on this choice, the corresponding particles are consistently renormalised as resonances with complex masses or as on-shell external states with real masses.
In the following, we discuss the various counterterms needed at NLO QCD and NLO EW. In general, as discussed in Section 3.

QCD renormalisation
The SM parameters that involve one-loop counterterms of O(α s ) are the strong coupling, the quark masses, and the related Yukawa couplings.
Strong coupling The renormalisation of the strong coupling constant is carried out in the MS scheme, and can be matched in a flexible way to the different flavour-number schemes that are commonly used in NLO QCD calculations. To this end, the full set of light and heavy quarks that contribute to one-loop amplitudes and counterterms is split into a subset of active quarks (q ∈ Q active ) and a remaining subset of decoupled quarks (q / ∈ Q active ). Active quarks with mass m q ≥ 0 are assumed to contribute to the evolution of α s (µ 2 R ) above threshold. Thus they are renormalised via MS subtraction at the scale µ = max(µ R , m q ). The remaining heavy quarks (q / ∈ Q active ) are assumed to contribute only to loop amplitudes and counterterms, but not to the running of α s (µ 2 R ). Thus, they are renormalised in the so-called decoupling scheme, which corresponds to a subtraction at zero momentum transfer.
The explicit form of the g s counterterm reads where C A = 3 and T F = 1/2, while µ R and µ UV are the renormalisation and dimensional regularisation scales for UV divergences, respectively. The logarithmic terms associated with quark loops read The number of active and decoupled quarks included in (3.41) is determined as explained in the following.
Choice of flavour-number scheme In NLO QCD calculations, the logarithms of µ R in the counterterm (3.41)-(3.42) should cancel the leading-order µ R dependence associated with α s (µ 2 R ). To this end, the number N q,active of active quark flavours in (3.41) should be set equal to the number N F corresponding to the flavour-number scheme of the calculation at hand. More precisely, when using a running α s (µ 2 R ) with N F quark flavours, the user 20 should set N q,active = N F . In variable-flavour number schemes, N F corresponds to the maximum number of quark flavours in the evolution, and typically N F = 4, 5 or 6. 21 In practice, the number of active quarks in OpenLoops is determined as 43) where N F corresponds to the desired flavour-number scheme and can be specified by the user through the parameter nf_alphasrun, while N q,m=0 is determined from the number of quarks with m q = 0 at runtime. By default nf_alphasrun=0, and all massless quarks are treated as active, while massive quarks are decoupled. In contrast, if nf_alphasrun is set to a value N F > N q,m=0 , the first N F massless or massive quarks are treated as active above threshold, and only the remaining heavy quarks are decoupled. For example, when m b = 0 the default value of N q,active is 5, and nf_alphasrun should be set to 6 in case a 6-flavour α s is used. In contrast, for m b = 0 the default value of N q,active is 4, and nf_alphasrun should be set to N F in case a N F -flavour α s with N F > 4 is used.
Total number of quark flavours By default, most public OpenLoops libraries involve quark-loop contributions with N q,loop = 6 quark flavours. Such libraries can be used for NLO calculations in any flavour-number scheme with N F = N q,loop or N F < N q,loop . In the latter case, heavy-quark loop contributions that do not contribute to the evolution of α s (µ 2 R ) are consistently accounted for by the N q,loop − N F decoupled quarks in the one-loop matrix elements. 20 Note that αs(µ 2 R ) and µR are separate input parameters controlled by the user, i.e. OpenLoops does not control the evolution of αs(µ 2 R ) but only the related counterterm. Thus it is the role of the user to set Nq,active to the correct value NF. 21 In case the running of αs is obtained from LHAPDF the information about the number of quark flavours contributing to the evolution of αs is available in the PDF info file as the tag NumFlavors for LHAPDF versions ≥ 6.0.
Extra libraries without top-quark loops (N q,loop = 5) can be easily generated upon request and are publicly available for selected processes. When available, libraries with N q,loop < 6 can be used by setting the parameter nf (default=6) to the desired value of N q,loop at the moment of the process registration.
Quark masses At NLO QCD, quark masses can be renormalised in the on-shell scheme (default) or in the MS scheme. The general form of mass counterterms is where C F = 3/4, and logarithms of the complex mass µ q are complex valued when Γ q > 0. The scheme-dependent finite part reads in the on-shell scheme (Λ q = 0), Here Λ q denotes the MS renormalisation scale for the mass of the quark q. This scale is controlled by the (real-valued) parameter LambdaM(PID), which plays also the role of scheme setter for the mass counterterm of the quark at hand. For Λ q = 0 (default) the on-shell scheme is used, while setting Λ q > 0 activates the MS scheme.
Yukawa couplings According to (3.32), Yukawa couplings are defined in terms of related Yukawa masses. Their ratio µ q,Y /λ q = v/ √ 2 depends only on the vacuum expectation value, which does not receive O(α s ) corrections. This implies the trivial counterterm relation Similarly as for the quark masses µ q , also Yukawa masses can be renormalised on-shell (default) or via MS subtraction. The counterterms read with X(µ q,Y , Λ q,Y ) as defined in (3.45). The MS renormalisation scale Λ q,Y for the Yukawa mass of the quark q is controlled by the independent parameter LambdaY(PID). By default Λ q,Y = 0, and the on-shell counterterm is used, while setting Λ q,Y > 0 activates the MS renormalisation. Similarly as for Yukawa masses (3.33), the values of Λ q,Y are automatically synchronised with Λ q when the latter is changed, but not vice versa. Thus the order in which LambdaM(PID) and LambdaY(PID) are set matters. As for Yukawa masses, this interplay can be deactivated by setting freeyuk_on=1 (default=0).
Wave functions The QCD counterterms for gluon and quark wave functions read where µ IR is the dimensional regularisation scale for IR divergences, Θ(M ) is the step function with Θ(0) = 0 and Higgs effective couplings The QCD counterterm associated with the Higgs effective vertex (3.35) reads where the last term originates from the two-loop matching of the Higgs effective coupling [72,73]. For double-(and multi-) Higgs production at the same order as the NLO QCD corrections also double-operator insertions with the same total number of Higgs bosons contribute. In OpenLoops these contributions are automatically included as pseudo-counterterms together with the virtual amplitudes.
Renormalisation and regularisation scales At the level of the user interface, the UV and IR regularisation scales are treated as a common scale µ D = µ UV = µ IR , and the logarithms of µ 2 UV /µ 2 IR in (3.50) are set to zero. In the literature, also the logarithms of µ 2 UV /µ 2 R in (3.41)-(3.42) are often omitted by assuming µ UV = µ R in the MS scheme. On the contrary, in OpenLoops the values of µ D and µ R are controlled by two independent parameters, mureg and muren, respectively. Their default values are µ D = µ R = 100 GeV. For convenience it is possible to simultaneously set µ R = µ D = µ by means of the auxiliary OpenLoops parameter mu. As described in Section 4.3, variations of µ R and α s (µ 2 R ) can be carried out in a very efficient way in OpenLoops 2.

EW renormalisation
The renormalisation of UV divergences in the EW sector is based on the scheme of [37] for on-shell particles, and on the complex-mass scheme [38] for the treatment of off-shell unstable particles. In OpenLoops 2 these two schemes are combined into a flexible renormalisation scheme that makes it possible to deal with processes such as pp → tt + − , where some unstable particles (t,t) are treated as on-shell external states, while other ones (Z) play the role of intermediate resonances. This is achieved through a refined definition of field-renormalisation constants, and by adapting the mass-renormalisation prescriptions for unstable particles on a particle-by-particle basis, depending on whether the individual width parameters Γ i are set to non-zero values or not by the user. As explained in the following, the O(α) renormalisation in OpenLoops involves also a non-standard treatment of ∆α(M 2 Z ) and special features related to external photons.
Counterterms for complex masses The propagators of unstable particles with Γ i = 0 are renormalised in the complex-mass scheme [38], where the renormalised self-energy is defined aŝ The counterterm δµ 2 i associated with the complex mass (3.23) corresponds to a subtraction of the full complex-valued self-energy at p 2 = µ 2 i . In particular, the counterterm δµ 2 i includes also the imaginary part of the self-energy, which is related to the width through and is already included in the imaginary part of µ 2 i . Thus the subtraction of Im Σ in the complexmass scheme is mandatory in order to avoid double counting. Since the renormalised self-energy In this context, self-energy graphs involving massless photons require a special treatment due to the presence of a threshold at p 2 = µ 2 . In this case, the correct expansion of the scalar two-point function reads where the additional −iΓ/M term accounts for the non-analytic behaviour at p 2 = µ 2 . The related expansion formula for generic self-energies reads where the non-analytic expansion coefficient is given by and depends only on the electromagnetic charge Q i of the particle at hand. This is due to the fact that (3.56) originates only from photon-exchange diagrams and is related to the presence of an infrared singularity in B 0 at p 2 → µ 2 i . The expanded mass counterterms for Higgs (i = H) and vector bosons (i = V = W, Z) read where Σ T denotes the transverse part of the gauge-boson propagator. The renormalisation of fermion masses depends on the following combination of left-handed (L), right-handed (R) and scalar (S) self-energy contributions, 59) and the expanded counterterm reads Counterterms for real masses When Γ i is set to zero, unstable and stable particles are described as on-shell states with a real-valued mass parameter, µ i = M i . In this case a conventional on-shell renormalisation is applied, Here the subtraction is restricted to the real part of the self-energy, while the Im Σ contribution must be retained, since it is not included in the renormalised parameter M 2 i . More precisely, the Re operator in (3.61) truncates only the imaginary parts associated with the UV-finite absorptive parts of two-point integrals, 22 while, in order to ensure a consistent cancellation of UV divergences, 22 In practice, the truncation of absorptive contributions is implemented at the level of the scalar two-point integrals through and in the same way for B 0 . For the derivative of self-energies also the following formulas for B1 and B 1 functions are used

63)
Re all other imaginary parts associated with complex-valued couplings or complex masses inside the loop are kept throughout. The explicit on-shell mass counterterms for Higgs bosons, vector bosons and fermions read Yukawa couplings At NLO EW, Yukawa couplings (3.32) are always related to fermion masses as predicted by the SM. Thus Yukawa masses and physical fermion masses, as well as the respective counterterms, are equal to each other. This implies For the renormalisation of the fermion masses µ f only the on-shell scheme, or its complex-mass scheme variant, are supported.
Wave functions The wave-function renormalisation constants (WFRCs) δZ ij are defined in a way that one-loop propagators do not mix, and their residues are normalised to one. Thus renormalised amplitudes correspond directly to S-matrix elements and do not require additional LSZ factors. On the one hand, due to the presence of absorptive contributions and complex parameters, in the complex-mass scheme the δZ ij constants can acquire complex values. On the other hand, the WFRCS for on-shell particles are usually defined as real parameters [37]. As explained in detail below, in OpenLoops these two approaches are reconciled by implementing WFRCs in a way that is consistent with [37] when the width parameters Γ i are set to zero for all particles, while imaginary δZ ij contributions are taken into account wherever they are strictly needed for the consistency of the complex-mass scheme at O(α).
At NLO, the renormalisation of the field ϕ i associated with a certain external leg yields Since the imaginary parts of the diagonal WFRCs δZ ii contribute only at O(α 2 ), in OpenLoops we omit them by defining In contrast, the non-diagonal WFRCs associated with γ-Z mixing are defined as where Σ AZ T (Q 2 ) denotes the transverse part of the γ-Z mixing energy. Here the imaginary part of µ Z in the denominator is retained in order to ensure UV cancellations in the complex-mass scheme, while absorptive parts are truncated 23 in order to match the conventional on-shell scheme when all Γ i are set to zero. For δZ AZ the mixing energy at p 2 = µ 2 Z is expressed through an expansion 23 Note that Re Σ AZ around p 2 = M 2 Z neglecting terms of O(Γ 2 /M 2 ). However, in practice this expansion is irrelevant, since δZ AZ only contributes for processes with external Z-bosons, where Γ Z = 0 is required.
At NLO EW, the independent renormalisation of left-and right-chiral fields corresponds to a diagonal renormalisation matrix in chiral space, For massless fermions, the matrix (3.73) is diagonal also in helicity space, and imaginary parts can be amputated similarly as for the diagonal WFRCs (3.70). In contrast, for massive fermions the matrix (3.73) mixes left-and right-handed helicity states. Thus, in this case imaginary parts are treated in a similar way as for the non-diagonal WFRCs (3.71). Thus the explicit form of the fermionic WFRCs δZ f R,L reads Variations of the complex-mass scheme Certain aspects of the complex-mass scheme at O(α) can be changed using the parameter complex_mass_scheme as detailed in the following. In OpenLoops such a dependence is systematically avoided by replacing Here Π γγ (Q 2 ) is split into a "heavy" contribution due to W -boson and top-quark loops, plus a remnant "light" contribution. The latter is subtracted at Q 2 = M 2 Z . In this way the sensitivity to light-fermion masses is isolated in ∆α(M 2 Z ), which describes the running of α from Q 2 = 0 to M 2 Z . The explicit light-fermion mass dependence is avoided by expressing ∆α(M 2 Z ) as where α(0) and α(M 2 Z ) are evaluated using the numerical values of the parameters alpha_qed_0 and alpha_qed_mz introduced in Section 3.2. By default, (3.77) is used throughout apart for the ∆α(M 2 Z ) terms associated with external off-shell photons. In that case, as discussed in the context of eq. (3.88), the following explicit expression with dimensionally regularised mass singularities is used, (3.78) Here γ γ = γ QED γ is the anomalous dimension defined in Tab. 3, and F m is the set of light fermions with 0 < m f < M Z . For later convenience, we also define the ∆α conversion term EW coupling counterterms The renormalisation of the EW gauge couplings (3.24) is implemented through counterterms for the photon coupling e and the weak mixing angle θ w . The latter is defined in terms of the weak-boson masses by imposing the relation (3.25) to all orders. The resulting counterterm reads Here, for Γ W,Z > 0 and Γ W,Z = 0, the mass counterterms δµ 2 W,Z are computed according to (3.58) and (3.66), respectively. As discussed in Section 3.2, in OpenLoops the photon coupling e can be defined according to three different schemes, which correspond to different renormalisation conditions. The form of the related counterterm δZ e in the various schemes is as follows.
(i) α(0)-scheme: the parameter α is identified with the strength of the photon coupling at Q 2 → 0. The resulting counterterm reads (3.81) (ii) G µ -scheme: the QED coupling is related to the Fermi constant through (3.27). This relation can be connected to the α(0)-scheme via where ∆r represents the radiative corrections to the muon decay, i.e. to the Fermi constant, in the α(0)-scheme [37]. This leads to the G µ -scheme counterterm Note that, since α| Gµ is effectively defined at the EW scale, its counterterm (3.83) does not depend on Π γγ (0).
(iii) α(M 2 Z )-scheme: the photon coupling is defined as the strength of the pure QED interaction at Q 2 = M 2 Z . This corresponds to the counterterm Also in this case Π γγ (0) drops out.
In OpenLoops the appropriate counterterm δZ e is selected automatically based on the choice of the α-input scheme. The latter is controlled by the parameter ew_scheme as detailed in Tab. 1. (i) For on-shell photons the coupling α(0) is used. Thus,

External photons
and δK (on) γ = 2 δZ e | α(0) + δZ AA yields the correct coupling counterterm δZ e | α(0) . Note that, as a result of the choice of a low-energy coupling, the ∆α(M 2 Z ) contributions to δZ AA and δZ e | α(0) cancel out in δK (ii) For off-shell photons the high-energy coupling α off defined in (3.31) is used. As a result, the ∆α(M 2 Z ) contribution to δZ AA remains uncancelled, and the renormalised scattering amplitude depends on large logarithms of the light-fermion masses. In photon-induced hadronic collisions, such logarithmic mass singularities are cancelled by collinear singularities associated with virtual γ → ff splitting contributions to the photon-PDF counterterm [36] (see Section 3.4). The latter are typically handled in dimensional regularisation with massless light fermions, which results in collinear singularities of the form 1/ε IR . For consistency, the same photon type iPDG switcher (1=on, 0=off) coupling Table 2: PDG identifiers for photons and switchers that control the coupling factors and renormalisation constants for the different types of external photons introduced in Section 3.2.
The high-energy coupling α off is defined in (3.31). If the switchers are set to zero (de-fault=1) the standard user-defined coupling α is used, and the related δZ (on/off) factors are deactivated. As indicated in the last column, contributions from collinear γ → ff splittings are included in Catani-Seymour's I-operator (see Section 3.4) only for off-shell photons.
regularisation must be used also for the related light-fermion contributions from ∆α(M 2 Z ). To this end, the finite renormalisation factor for off-shell photons is defined as where the counterterm δZ e | α off corresponds to the renormalisation scheme associated with α off according to (3.31), while Dα (reg) (M 2 Z ), defined in (3.79), converts ∆α(M 2 Z ) into its dimensionally regularised variant (3.78). The resulting overall renormalisation factor for offshell photons reads

Infrared subtraction
One-loop matrix elements with on-shell external legs involve divergences of IR (soft and collinear) origin, which take the form of double and single 1/ε IR poles in D = 4 − 2ε IR dimensions. In OpenLoops such divergences can be subtracted through an automated implementation of Catani-Seymour's I-operator that accounts for QCD singularities [39,40] as well as for singularities of QED origin [36,[41][42][43][44]. The singular part of the I-operator is universal and can be used to check the cancellation of IR poles in any one-loop calculation. Moreover, the full I-operator provides a useful building block for NLO calculations based on Catani-Seymour's dipole subtraction.
In addition to the I-operator, as documented in Section 4.3 and Appendix A.5, OpenLoops provides also routines for more general building blocks of IR divergences, namely colour-and gluonhelicity correlated Born matrix elements for QCD singularities, and corresponding charge-and photon-helicity correlations for QED singularities.
In OpenLoops it is possible to calculate the I-operator contributions that are required for the NLO corrections to conventional processes with M 0 = 0 and for loop-induced processes. The relevant OpenLoops functions are evaluate_iop and evaluate_iop2 (see Appendix A.5). At a certain order α P s α Q , their output corresponds to where the I-operator consists of the following IR insertions of order α s and α into LO contributions of order α P −1 s α Q and α P s α Q−1 , if j is an on-shell photon or any other neutral parton. (3.94) Here Q i denotes the electromagnetic charge of parton i, while n I,j is the number of initial-state partons in S in \{j}. By definition, on-shell photons do not undergo collinear splittings at NLO. Thus, T QED jk vanishes when the emitter j is an on-shell photon. Vice versa, off-shell photons are subject to final-state γ → ff and initial-state f → f γ splittings at NLO. The related −1/n I,j term in (3.94) is such that the recoil of the collinear radiation is shared by all initial-state partons that belong to S in \{j} [36].
The functions V jk (ε IR ) in (3.92) contain single and double poles in ε IR . They depend on the kinematic quantities s jk = |2p j p k | and Using the constants defined in Tab. 3, they can be written as [40] V QCD/QED ij Table 3: Here N f,u , N f,d , N f,l are the numbers of massless up-type quarks, down-type quarks and leptons, respectively, while N f = N f,u + N f,d . Since massive external legs induce only soft singularities, external W ± -bosons are treated in the same way as massive fermions with mass M W and charge ±1.
The singularities are contained in the functions and for M i > 0 .
where Ω jk = controlled through its native interfaces for Fortran and C/C ++ codes, or using the standard BLHA interface [45,46]. In the following, we introduce various functionalities of the OpenLoops interfaces, such as the registration of processes, the setting of parameters, and the evaluation of different types of matrix elements. In doing so we will always refer to the names of the relevant Fortran interface functions. The corresponding C functions are named in the same way with an extra ol_ prefix.
Further technical aspects, such as the signatures of the interfaces, can be found in Appendix A and Appendix B. As discussed there, the multi-purpose Monte Carlo programs Munich/Matrix [50], Sherpa [26,47], Herwig ++ [32], Powheg-Box [27], Whizard [49] and Geneva The OpenLoops program itself is written in Fortran and consists of process-independent main code and process-dependent code provided in the form of process libraries, which can be downloaded and automatically installed within the OpenLoops program for a wide range of processes in the Standard Model (SM) and Higgs effective theory (HEFT), as detailed in the following. The process libraries are automatically generated based on a (private) process generator implemented in Mathematica.

Installation of the main program
This section describes the installation of the process-independent part of the OpenLoops program, which is denoted as base code. The calculation of specific scattering amplitudes requires additional process-specific libraries, denoted as process code. Their installation is discussed in Section 4.1.2.
Prerequisites To install OpenLoops a Fortran compiler (gfortran 4.6 or later, or ifort) and Python 2.7 or 3.5 or later are needed.
Download The process-independent part of the OpenLoops program is available on the Git repository gitlab.com/openloops/OpenLoops. The latest release version can be found in the master branch and downloaded via git clone https://gitlab.com/openloops/OpenLoops.git Older and newer versions are available as git tags. The latest beta version available in the branch "public_beta" that can be downloaded via where the user can also find a detailed list of the available process libraries and extra documentation, as well as an up-to-date version of this paper.
Installation The compilation of the process-independent OpenLoops library is managed by the SCons build system 26 and is easily carried out by running ./scons in the OpenLoops directory. By default, Scons utilises all available CPU cores, while running ./scons -j<n> restricts the number of employed cores to <n>. The compiled library is placed in the lib subdirectory. 27 The default compiler is gfortran, alternatively ifort can be used. To change the compiler and set various other options, rename the sample configuration file openloops.cfg.tmpl in the OpenLoops directory to openloops.cfg and set the options in there. The sample configuration file lists various available options and describes their usage.

Installation of process libraries
The calculation of scattering amplitudes for specific processes requires the installation of corresponding process libraries. The available collection of OpenLoops process libraries supports the calculation of QCD and EW corrections for a few hundred different partonic reactions, which cover essentially all interesting processes at the LHC, as well as several lepton-collider processes. This includes pp → jets, tt+jets, V +jets, V V +jets, HV +jets, H+jets and various other classes of processes with a variable number of extra jets. Process libraries for a large variety of loop-induced processes such as gg → +jets, gg → HV +jets, gg → HH(H)+jets, etc. are also available.
New processes libraries, especially with EW corrections, are continuously added to the collection by the authors. Moreover, extra processes libraries can be easily made available upon request, either through an online form at https://openloops.hepforge.org/process_library.php or by contacting the authors. In particular this allows for the generation of dedicated process libraries tailored to specific user requirements. For example, it is possible to generate dedicated process libraries with special filters for the selection of certain classes of diagrams/topologies or various approximations related to the treatment of heavy-quark flavours, the expansion in the number of colours, the selection of resonances, non-diagonal CKM matrix elements, and so on.

Download and Installation
The web page https://openloops.hepforge.org/process_library.php provides a complete list of process libraries available in the public process repository, with a description of their content and the relevant process-library names to be used for download. The needed process libraries can be downloaded and compiled via

./openloops libinstall <processes> <options>
where <processes> is either a predefined process collection (see below) or a list of white-space or comma separated names of process libraries. A single process library typically contains the full set of parton-level scattering amplitudes that is needed for the calculation of a certain family of hadron-collider processes, either at NLO QCD or including EW corrections. For instance, the libraries named ppllll and ppllll_ew include, respectively, the NLO QCD and NLO EW matrix elements for the production of four leptons, i.e. the processes pp → + Each process library includes all relevant LO and NLO ingredients for the partonic processes at hand, i.e. all Born, one-loop and real-emission amplitudes at the specified order. More precisely, NLO QCD libraries contain LO contributions of a given order α p s α q and corrections of order α p+1 s α q , while NLO EW libraries contain the full tower of LO and NLO contributions apart from the NLO terms with the highest possible order in α s . Real-emission matrix elements are available throughout, but are not installed by default. This can be changed by using the option compile_extra=1 (default=0) 27 An installation routine to move the library to a different location is currently not available.
when installing the process. This option can also be set in the openloops.cfg file in order to enable real corrections for every process installation.
With the libinstall command it is also possible to install pre-defined or user-defined process collections. The pre-defined collection lhc.coll covers the most relevant LHC processes. 28 In particular, it includes matrix elements for V +jets, V V +jets, tt+jets, HV +jets and H+jets (for finite and infinite m t ), where V stands for photons as well as for the various leptonic decay products of off-shell Z and W ± bosons. Additional user-defined collections can be created as plain text files with the file extension .coll, listing the desired process-library names, one per line.
Updates When a new version of OpenLoops is available, it is recommended to update both the base code and the process code. 29 If OpenLoops was installed from Git, this is easily achieved by running ./openloops update while git pull && ./scons would update only the base code. Instead, if OpenLoops was not installed from Git, the installed processes can be updated by running ./openloops update --processes while the base code should be updated manually.

Selection of processes and perturbative orders
The OpenLoops program supports the calculation of scattering probability densities for a variety of processes at different orders in α s and α. Before starting the calculations, the user should register all needed scattering amplitudes, which are automatically labelled with integer identifiers for the bookkeeping of the various partonic channels and perturbative orders. As described in detail below, each desired matrix element should be registered in two steps. First, the user should select the desired order in the QCD and EW couplings, model parameters and specify possible approximations. In the second step, called process registration, the user should specify the list of external scattering particles, and select one of the available types of perturbative contributions. The three possible types, denoted in the following as amplitude types (amptype), are specified in Tab. 4 together with the list of corresponding objects of LO and NLO kind that can be evaluated in OpenLoops. As explained in the following, the classification into LO and NLO kinds is relevant for the selection of the desired order in α s and α. Note that squared-loop objects are classified as LO quantities, since they are assumed to describe loop-induced processes.
Selection of QCD and EW power As discussed in Section 3.1, the general form of scattering probability densities in the SM is a tower of terms of order α p s α q with fixed perturbative order p + q but variable powers p, q in the QCD and EW couplings. In OpenLoops, contributions with different orders in α s and α should be registered as separate (sub)processes. Under each amptype, the various objects that can be calculated are classified into output of LO and NLO kind as indicated in Tab. 4. All objects of LO type are evaluated at a certain power α p s α q , while all NLO objects are evaluated at a related power α P s α Q . The desired powers p, q, P, Q, and the relation between (p, q) 28 The collection all.coll makes it possible to download the full set of available processes libraries at once. However, due to the large overall number of processes and the presence of several complex processes, this is requires a very large amount of disk space and very long CPU time for compilation. Thus all.coll should not be used for standard applications. 29 In general, base code and process code can be combined in a rather flexible way, but care must be taken that they remain mutually consistent. The API compatibility between base code and process code is typically guaranteed across many sub-versions, both in the forward and backward directions. To this end, all mutually consistent versions are labelled with the same (internal) API version number, and OpenLoops accepts to use only combinations of process code and base code that belong to the same API version.  Table 4: Values of amptype to register different types of perturbative contributions and corresponding probability densities that can be computed by OpenLoops. Objects of LO and NLO kind are evaluated at order α p s α q and α P s α Q , respectively, according to the values p, q, P, Q of the LO and NLO power selectors in Tab (b) Similarly, order_qcd = p selects a fixed QCD order, i.e. LO terms of O(α p s α q ) and NLO EW corrections of O(α p s α q+1 ). In this case, q is automatically derived from p + q = N p − 2.
(c) Alternatively, NLO terms of O(α P s α Q ) can be selected by setting loop_order_qcd = P or loop_order_ew = Q. This option is supported only for the evaluation of tree-loop interferences (amptype=11). In that case, the output includes also the dominant underlying Born contribution of O(α p s α q ), which is chosen between O(α P s α Q−1 ) and O(α P s α Q−1 ) as indicated in Fig. 4. When the loop order P or Q is specified, the complementary order Q or P is fixed internally according to P + Q = N p − 1.
The desired order parameter should be set through the set_parameter routine before the registration of the process at hand. As explained above, it is sufficient to specify the QCD or the EW order, and only one of the order selectors in Tab. 5 should be used. If more than one order parameter is set by the user only the last setting before registration is considered.
Before registering a process, also various approximations can be specified by setting OpenLoops parameters such as nf, to control the number of active quarks, ckmorder, to activate non-diagonal CKM matrix elements, etc. A list of such parameters can be found in Tab. 9 (see Appendix C).
Process registration Each (sub)process should be registered by means of the native interface function 30 register_process, which automatically assigns a unique process identifier, as detailed in Appendix A.3. The syntax to specify the external particles of a generic n → m scattering process with n ≥ 1 is The particle identifier (PID) can be specified either using the PDG numbering scheme [70] or the string identifiers listed in Tab. 6.
Together with the external particles, also a specific type of perturbative output (amptype) should be selected. As summarised in Tab. 4, the available options correspond to the various scattering probability densities defined in (2.1)-(2.3), i.e. squared tree amplitude (W 00 ) tree-loop interference (W 01 ), and squared one-loop amplitude (W 11 ), but each amptype supports also the calculation of various related objects. Table 5: Selection of the orders α p s α q and α P s α Q for the LO and NLO objects defined in Tab. 4. Each selector takes one of the powers p, q, P, Q as input and derives all other powers as indicated in columns 2-5. The QCD and EW coupling powers at LO and NLO are related through p+q = N p −2 and P +Q = N p −1, where N p is the number of external particles. The loop_order selectors are supported only for amptype=11. They return the desired loop-tree interference of O(α P s α Q ) together with the dominant underlying squared Born term of O(α p s α q ) whose powers, (p, q) = (p Born , q Born ) = (P − 1, Q) or (P, Q − 1), are selected in a unique way as indicated in Fig. 4.  Table 6: Particle identifiers (PID) for process specification in OpenLoops. The numerical and string PID representations can be mixed. As explained in Section 3.2, for an optimal treatment of the coupling of on-shell and off-shell hard external photons the special PIDs ±2002 should be used.

Evaluation of scattering amplitudes
In this section we introduce various OpenLoops interface functions for the evaluation of the scattering probability densities ( where ε = ε UV = ε IR . In general, the W (1) residues receive contributions from IR and UV divergences, but UV-renormalised results contain only IR poles. By default, the normalisation factor C is defined as in (3.36), which corresponds to the BLHA convention [45]. Alternatively, by setting polenorm=1 (default=0) it can be changed into 31  The I-operator can be activated by setting iop_on=1 (default=0). The counterterm and the R 2 contributions can be deactivated by setting, respectively, ct_on=0 (default=1) and r2_on=0 (de-fault=1 where the IR and UV poles are replaced by numerical constants 32 (C /ε 2 IR → ∆ 2 , C /ε IR → ∆ IR,1 , C /ε UV → ∆ UV,1 ) and W can be reconstructed through three evaluations of (4.5) with different ∆ i values. However, the most efficient approach it to restrict the calculation of the most CPU expensive objects to their finite parts by setting all ∆ i = 0 (default), and to reconstruct the poles by exploiting the fact that UV and IR subtracted results are finite. In practice, when the I-operator is active, all poles are simply set to zero in (4.4), and only finite parts are computed. Also when the I-operator is switched off in (4.4), only the finite part of the right-hand-side of (4.4) is explicitly computed, while IR poles are reconstructed from the I-operator, i.e.
The explicit calculation of all poles in W 01 through multiple evaluations of (4.5) can be enforced by setting truepoles_on=1 (default=0). Thus, the correct cancellation of UV and IR poles can be explicitly checked by calling evaluate_loop with truepoles_on=1 and iop_on=1.
The individual building blocks of W 01 can be evaluated by various dedicated interfaces: (i) The bare loop amplitudes W 01,4D , with four-dimensional numerator, are evaluated by evaluate_loopbare, which returns a Laurent series similar to (4.2). As for evaluate_loop, pole residues are derived from the related UV and IR counterterms (default) or explicitly reconstructed, depending on the value of truepoles_on.
(ii) The UV counterterms W 01,CT are evaluated by evaluate_loopct, which returns a Laurent series similar to (4.2). In this case, UV pole coefficients are always obtained via two-fold evaluation. The more efficient function evaluate_ct restricts the calculation of the counterterm to its finite part W (0) 01,CT .
(iii) The R 2 rational part W 01,R 2 is free from UV and IR divergences. It is evaluated by evaluate_r2, which returns a single-valued output.
(v) The poles of all divergent building blocks of (4.4) can be accessed with a single call of evaluate_poles, which returns the residues of the 1/ε UV , 1/ε IR and 1/ε 2 IR poles for each building block. In this case, irrespectively of the value of truepoles_on, all residues are always computed explicitly.
Note that, for efficiency reasons, the combination (4.4) should always be computed via a call of evaluate_loop rather than separate calls for its building blocks.
Squared loop amplitudes W 11 = M 1 |M 1 are evaluated by the function evaluate_loop2. Since we assume that it is used for loop-squared processes, which are free from UV and IR divergences at LO, evaluate_loop2 returns a single-valued finite output. The calculation of I-operator insertions in loop-squared amplitudes, W 11,I-op = M 1 |I({p}; ε IR )|M 1 , is supported by evaluate_loop2iop. Since we assume loop-induced processes, the output is a Laurent series of type (4.2) with poles up to order 1/ε 2 . In general, W 11 and W 11,I-op are evaluated using only the finite part of M 1 , and possible UV and IR poles are simply amputated at the level of M 1 .
Efficient QCD scale variations OpenLoops 2 implements a new automated system for the efficient assessment of QCD scale uncertainties. This system is designed for the case where scattering amplitudes are re-evaluated multiple times with different values of µ R and α s , while all other input and kinematic parameters are kept fixed. This type of variations are automatically detected by keeping track, on a process-by process basis, of the pre-evaluated phase-space points, and possible variations of parameters. For each new phase-space point, matrix elements are computed from scratch and stored in a cache, which is used for (µ R , α s ) variations. In that case, the previously computed bare amplitude is reused upon appropriate rescaling of α s , and only the µ R -dependent QCD counterterms are explicitly recomputed. This mechanism is implemented for both types of loop contributions (2.2)-(2.3).

Colour-and spin-correlators
This section presents interface functions for the evaluation of colour-and helicity-correlated quantities that are needed in the context of NLO and NNLO subtraction methods, both for tree-and loop-induced processes. For efficiency reasons, colour/spin correlations are always computed in combination with the related squared tree or loop matrix elements, in such a way that the former are obtained with a minimal CPU overhead.
Colour and charge correlators The exchange of soft gluons/photons between two external legs, j and k, gives rise to colour/charge correlations of the form where T a i and Q i denote SU(3) and charge operators acting on the i-th external particle. 33 Tree-tree correlators correspond to LL = 00 in (4.7)-(4.8) and can be evaluated by the interface functions evaluate_ccmatrix and evaluate_ccewmatrix, which return the full matrices C (p,q|jk) 00 as twodimensional arrays. Alternatively, the N (N − 1)/2 independent colour correlators in (4.7) can be obtained in the form of one-dimensional arrays using evaluate_cc. Loop-loop correlators (LL = 11) can be evaluated in a similar way using the functions evaluate_ccmatrix2, evaluate_ccewmatrix2 and evaluate_cc2.
In amptype = 11 mode, also the tree-loop colour correlators Spin-colour correlators The emission of soft-collinear radiation off external gluons/photons generates also spin-correlation effects. For their description we use the notation where M is a generic helicity amplitude, and j is a gluon or photon emitter with helicity λ. The helicity states of all other external particles are kept implicit. With this notation, unpolarised squared matrix elements can be expressed as where the normalisation conventions of Eqs. (2.1)-(2.3) are implicitly understood. Spin-correlation effects arise as terms of type M|P j |M with spin correlators of the form P j = P µν j |µ, j ν, j| . They can be evaluated in a convenient way in terms of the spin-correlation tensor which allows one to write (4.14) Alternatively, spin correlations can be implemented in a more efficient way by exploiting the fact that, in NLO calculations, they arise only through operators of the form where k µ ⊥ is a certain vector 34 with k ⊥ · p j = 0. Since M|G j |M = − M|M , all non-trivial spin-correlation effects can be encoded into the scalar quantity where k ⊥ , j|M corresponds to the helicity amplitude (4.10) with ε µ λ (p j ) replaced by k µ ⊥ . In NLO calculations, spin correlations arise in combination with colour correlations through operators of the type T a j T a k |k ⊥ , j k ⊥ , j|, where j and k are called emitter and spectator. In OpenLoops, such spin-colour correlators are implemented in the form is available via the functions evaluate_stensor (for LL = 00) and evaluate_stensor2 (for LL = 11). All implementations (4.17)-(4.19) are well suited for the subtraction of IR singularities with the Catani-Seymour [39,40] and FKS [75] methods. The tensor representations (4.18)-(4.19) are more general, while the scalar form (4.17) is more efficient, but should be used only if k ⊥ · p j = 0 is fulfilled. 35 In amptype = 11 mode, also the tree-loop spin correlators  are available. They are evaluated by the functions evaluate_loopsc, evaluate_loopsctensor and evaluate_loopsctensor respectively, which return only the finite part, similarly as for (4.9).

Tree-level amplitudes in colour space
Besides calculating squared matrix elements, OpenLoops also provides full tree-level colour information at the amplitude level. Such information is relevant in the context of parton-shower matching in order to determine the probabilities with which a parton shower should start from a specific colour configuration. Moreover it can be used to determine colour correlations with more than two colour insertions, as needed within NNLO subtraction schemes. 35 OpenLoops automatically amputates possible non-orthogonal parts of k ⊥ by projecting k µ ⊥ onto ε µ ± (pj).
Colour vector As indicated in (2.7), any tree-level amplitude is represented as a vector {A For a process with n external gluons and m external qq pairs, each element of the basis has the general colour structure where the particle labels α k , β k , σ k , and the corresponding colour indices i α k ,j β k , a σ k , are attributed according to the labelling scheme defined in Tab. 7.
Trace basis In OpenLoops the colour basis is chosen as a so-called trace basis, where each basis element (4.24) is a product of chains of fundamental generators and traces thereof. More precisely, each basis element is a product of building blocks of type L(k, . . . , l, β, α) = (T a k · · · T a l )j β iα , (4.26) L(k, . . . , l) = Tr (T a k · · · T a l ) . (4.27) As indicated on the lhs of the above equations, each building block is uniquely identified through a sequence of integer particle labels. Sequences terminating with gluon labels and antiquark-quark labels correspond, respectively, to traces (4. i.e. the individual sequences are concatenated using zeros as separators. With this notation each element of the colour basis can be encoded as an array of integers. For instance, for qq → γqqZggg (see Tab. 7) we have L (8, 2, 5, 0, 7, 9, 0, 4, 1) = (T a 8 )j 2 i 5 Tr(T a 7 T a 9 ) δj 4 i 1 . (4.29) The explicit colour basis for a given scattering process can be accessed through the interface functions tree_colbasis_dim and tree_colbasis. The former yields the number of elements of the basis, as well as the number of helicity configurations, while tree_colbasis returns the basis vectors in a format corresponding to (4.25)-(4.28). The complex-valued colour vector {A (i) 0 (h)} in (4.23) can be obtained through the function evaluate_tree_colvect. Using {A (i) 0 (h)} it is possible to calculate the LO probability density (2.1) as where K ij is the colour-interference matrix defined in (2.8).
Colour-flow basis For the purpose of parton-shower matching in leading-colour approximation, it is more convenient to use the colour-flow representation [76,77], where gluon fields are handled as 3 × 3 matrices (A µ )j i = 1 √ 2 A a µ (T a )j i , and the colour structures of tree amplitudes with m external quark-antiquark pairs and n external gluons take the form 31) with N = m + n. The elements of the colour-flow basis have the form where α k →α k = π(α k ) is a permutation of the quark particle labels, which encodes the colour connections between antiquarks (β k ) and quarks (α k ) in (4.32).
A basis element of the form (4.32) is represented by an array of N p integer pairs defined as (α k , 0) for an incoming quark (outgoing anti-quark) with particle label α k , (0,α k ) for an incoming anti-quark (outgoing quark) with particle label β k , (α k ,α k ) for a gluon with particle label α k , (0, 0) for an uncoloured particle. (4.33) The pairs are ordered according to the sequence of scattering particles as registered by the user. Each non-zero index will appear twice, indicating which particles are colour connected.
In leading-colour approximation, the trace and colour-flow bases are related through the identities iα + sub-leading colour, which imply a one-to-one correspondence between the elements of the two bases, i.e.
Squared colour vector In leading-colour approximation, the colour-correlation matrices in the trace and colour-flow basis are equivalent to each other and proportional to the identity matrix, where n and m are defined as above. Thus the LO probability density (4.30) can be written as This squared colour vector can be evaluated through the interface function evaluate_tree_colvect2.
Since each component of (4.38) is associated with a given colour flow according to (4.35), in the context of parton-shower matching the ratio can be used as the probability with which the shower starts from the colour-flow configuration C flow i .
The explicit form of the colour-flow basis for a given process can be accessed through the interface function tree_colourflow, which returns an array of basis elements {C flow i } in a format corresponding to (4.33).
The interface functions described in this section are supported under amptype=1,11. So far they are implemented in a way that guarantees consistent results only for leading-QCD Born quantities, i.e. terms of order α p s α q with maximal power p, which involve a single Born term of order g p s e q .

Reduction methods and stability system
As discussed in Section 2.7, tree-loop interferences and squared loop amplitudes are computed using different methods for the reduction to scalar integrals and the treatment of related instabilities.
For all types of amplitudes, OpenLoops chooses default settings for the stability system that require adjustments only in very rare cases.
On-the-fly stability system For tree-loop interferences, with the only exception of the Higgs Effective Field Theory, the reduction to scalar integrals is based on the on-the-fly method and the stability system described in Section 2.7.2. Each processed object carries a cumulative instability estimator 37 that is propagated through the algorithm and updated when necessary. If the estimated instability exceeds a threshold value, the object at hand and all subsequent operations connected to it are processed through the qp channel. The stability threshold is controlled by the interface parameter hp_loopacc, which plays the role of target numerical accuracy for the whole Born-loop interference W 01 . Its default value is 8 and corresponds to δW 01 /W 01 ∼ 10 −8 .
In order to find an optimal balance between CPU performance and numerical accuracy, certain aspects of the stability system can be activated or deactivated using the parameter hp_mode. Setting hp_mode=1 (default) enables all stability improvements described in Section 2.7.2 and is recommended for NLO calculations with hard kinematics. Setting hp_mode=2 activates qp also for additional types of rank-two Gram-determinant instabilities that occur exclusively in IR regions. This mode is supported only for QCD corrections and is recommended for real-virtual NNLO calculations. Finally, hp_mode=0 deactivates the usage of qp through the hybrid-precision system, while keeping all stability improvements of analytic type in dp.
Stability rescue system For tree-loop interferences in the Higgs Effective Field Theory, the reduction to scalar integrals is based on external libraries. The primary reduction library redlib1 (default: Coli-Collier) is used to evaluate all points in dp. The fraction stability_triggerratio (default: 0.2, meaning 20 %) of the points with the largest K-factor is re-evaluated with the secondary reduction library redlib2 (default: DD-Collier). If the relative deviation of the two results exceeds stability_unstable (default: 0.01, meaning 1 %), the point is re-evaluated in qp with CutTools including a qp scaling test to estimate the resulting accuracy. If the estimated relative accuracy δW 01 /W 01 in qp is less than stability_kill (default: 1, meaning 100 %), the result is set to zero, otherwise the smaller of the scaled and unscaled qp results is returned. The accuracy argument of the matrix element routines (e.g. evaluate_loop) returns the relative deviation of the Coli-Collier and DD-Collier results or, if qp was triggered, of the scaled and unscaled qp result. In case of a single dp evaluation, the accuracy argument is set to −1.
Also squared loop amplitudes are reduced to scalar integrals using external libraries. To asses related instabilities, for all phase-space points the reduction is carried out twice, using redlib1 and redlib2. The option stability_kill2 (default: 10) sets the relative deviation of the two results beyond which the result is set to zero. Due to the double evaluation of all points, an accuracy estimate is always returned by the matrix element routine evaluate_loop2.
Setting redlib1 and redlib2, as well as various other options to control the stability system, is only possible in the so-called "expert mode". Further details can be obtained from the authors upon request.

Technical benchmarks
In this section we present speed and stability benchmarks obtained with OpenLoops 2 and compare them with the performance of OpenLoops 1.

CPU performance
The speed at which one-loop matrix elements are evaluated plays a key role for the feasibility and efficiency of non-trivial NLO Monte Carlo simulations. In Tab. 8 we present CPU timings for the calculation of one-loop QCD and EW corrections for several processes of interest at the LHC. Specifically, we consider the production of single W bosons, W + W − pairs and tt pairs in association with a variable number of additional gluons and quarks. For W production we consider final states with on-shell bosons and, alternatively, off-shell ν decay products.
The observed timings are roughly proportional to the number of one-loop Feynman diagrams, which ranges from O(10) for the simplest 2 → 2 processes to O(10 5 ) for the most complex 2 → 5 processes. Absolute timings correspond to OpenLoops 2 with default settings, i.e. with all stability improvements in dp plus the hybrid-precision system with a target accuracy of 8 digits. Augmenting the target accuracy to 11 digits causes a CPU overhead of 1% to 50%, depending on the process, while we have checked that switching off hybrid precision (hp_mode=0) yields only a speed-up of order one percent.
Comparing QCD to EW corrections, for processes without leptonic weak-boson decays we observe timings of the same order. More precisely, the QCD (EW) corrections tend to be comparatively more expensive in the presence of more external gluons (weak bosons). In contrast, in processes with off-shell weak bosons decaying into leptons EW corrections are drastically more expensive than QCD corrections. This is due to the fact that, for each off-shell W/Z decay to leptons, at NLO EW the maximum number of loop propagators increases by one, while at NLO QCD it remains unchanged. Due to Yukawa interactions, also the presence of massive quarks tends to increase the CPU cost of EW corrections.
Timings of OpenLoops 2 are compared against OpenLoops 1 with recommended stability settings (preset=2, preset is deprecated in OpenLoops 2) and, alternatively, with the stability rescue system switched off ("no stab") in OpenLoops 1. The difference reflects the cost of stability checks in OpenLoops 1, which is significantly higher than in OpenLoops 2. Note that this cost depends very strongly on the kinematics of the considered phase-space sample, and the values reported in Tab. 8 should be understood as a lower bound.
Apart from few exceptions, OpenLoops 2 is similarly fast or significantly faster than OpenLoops 1.  illustrates the CPU overhead caused by augmenting the hybrid-precision target accuracy from 8 to 11 digits. Default OpenLoops 1 timings (t preset2 OL1 ) correspond to the recommended stability setting (preset=2), where tensor reduction is done with Coli-Collier and compared against DD-Collier for 20% of the points with the largest K-factor; differences beyond one percent between Coli-Collier and DD-Collier trigger qp re-evaluations with Cut-Tools +OneLOop and a further stability test via qp-rescaling. For comparison, also OpenLoops 1 timings with disabled stability system (t no stab OL1 ) are shown within parentheses.
In particular, for the most complex and time consuming processes the new on-the-fly approach yields speed-up factors between two and three.

Numerical stability
As discussed in Section 2.7, the stability of one-loop amplitudes in exceptional phase-space regions is of crucial importance for challenging multi-particle and multi-scale NLO calculations, as well as for NNLO applications. In the following we present OpenLoops 2 stability benchmarks for NLO QCD and NLO EW virtual corrections. The level of numerical stability is quantified by comparing output in double (dp) or hybrid (hp) precision (W dp/hp 01 ) against quadruple-precision (qp) benchmarks (W qp 01 ). The latter are obtained using OpenLoops 2 in combination with the OneLOop library for scalar integrals. More precisely, we define the numerical instability of a certain result W X 01 as The numerical stability of OpenLoops 2 in the hard regions is illustrated in Fig. 5 for two nontrivial 2 → 4 processes at NLO QCD and NLO EW. The plots correspond to 10 6 homogeneously distributed Rambo points at √ s = 1 TeV with p i,T > 50 GeV and ∆R ij > 0.5 for all massless final-state particles. As demonstrated by the reference qp curve, running OpenLoops 2 in pure qp makes it possible to produce one-loop results with up to 32 stable digits. Such high-precision qp benchmarks can be obtained as a by-product of the hybrid-precision system and allow one to quantify the level of stability with better than 16-digit resolution in the full phase space. The results of OpenLoops 1 with CutTools in dp illustrate the impact of Gram-determinant instabilities, which result in a probability of one percent of finding less than two stable digits in gg → ttgg. 38 Using Collier reduces this probability by 3-4 orders of magnitudes, while OpenLoops 2 with one-the-fly reduction and hp-system leads to a further dramatic suppression of instabilities by four orders of magnitude, which corresponds to five extra stable digits. The effect of hybrid-precision alone corresponds to about two digits or, equivalently, a factor 100 suppression of the tail. The EW corrections toūu → e + e − µ + µ − feature a qualitatively similar behaviour but a generally lower level of instability, which is most likely a consequence of the lower tensor rank.
Example stability benchmarks relevant for 2 → 2 calculations at NNLO are shown in Fig. 6 for the case of the real-virtual QCD corrections to tt and W + W − hadron production. The instability A is estimated using a sequence of gg → ttg and uū → W + W − g samples with increasing degree of softness and collinearity, defined as Here Q denotes the center-of-mass energy, E j is the energy of the soft particle, and θ ij is the angle of a certain collinear branching. The parameters ξ soft/coll are defined in such a way that the denominators of soft and collinear enhanced propagators scale like (p i + p j ) 2 ∝ ξ soft/coll Q 2 . In practice, starting from a sample of 10 4 hard 2 → 2 events with Q = 1 TeV, we have supplemented each event by an additional soft or collinear emission with ξ soft/coll = 10 −1 , 10 −2 , . . . , 10 −9 .
In Fig. 6 the average level of instability and its spread are plotted versus ξ coll in gg → tt and ξ soft in uū → W + W − g. The stability of qp benchmarks is again very high in the whole phase space. In the deep IR regions numerical instabilities grow at a speed that depends on the process, the type of region (soft/collinear), and the employed method. For initial-state collinear radiation in gg → ttg, OL1+CutTools dp OL1+Collier dp OL2 dp OL2 hp default OL2 hp 11 digits OL1+CutTools dp OL1+Collier dp OL2 dp OL2 hp default OL2 hp 11 digits OL2 qp Figure 5: Probability of finding an instability A > A min as a function of A min in a sample of 10 6 events for gg → ttgg at NLO QCD (upper plot) andūu → e + e − µ + µ − at NLO EW (lower plot). The stability of quad-precision benchmarks (blue) is compared to different variants of the OpenLoops 2 on-the-fly reduction (green, black, red) and to the OpenLoops 1 algorithm interfaced with Collier (yellow) or CutTools (turquoise). For OpenLoops 2, besides default stability settings (black) we show the effect of increasing the hybrid-precision target from 8 to 11 digits (hp_loopacc=11, red), or disabling the hybrid precision system (hp_mode=0, green). The OpenLoops 1 curves correspond to the level of stability that is obtained in dp without full re-evaluations of unstable points in qp. OL1+CutTools dp OL1+Collier dp OL2 hp mode 2 OL2 qp OL1+CutTools dp OL1+Collier dp OL2 hp mode 2 OL2 qp Figure 6: Relative numerical accuracy A for gg → ttg (upper plot) and uū → W + W − g (lower plot) at NLO QCD versus the degree of collinear (ξ coll ) or soft singularity (ξ soft ) as defined in (5.2). For each value of ξ coll/soft the numerical accuracy is estimated with a sample of 10 4 randomly distributed underlying 2 → 2 hard events. The plotted central points and variation bands correspond, respectively, to the average and 99.9% confidence interval of A. Quad-precision benchmarks (blue) are compared to OpenLoops 2 with additional hybrid-precision improvements for IR regions (hp_mode=2, red) and also to OpenLoops 1 with Collier (yellow) or CutTools (turquoise) in dp.
CutTools loses three digits per order of magnitude in ξ coll , resulting in huge average instabilities of O(10 10 ) in the deep unresolved regime. Using the Collier library in dp we observe a more favourable scaling, with losses of only one digit per order of magnitude in ξ coll , and an average of three stable digits in the tail. Thanks to the hybrid-precision system, the level of stability of OpenLoops 2 is even much higher. It stays always above 10 digits and is roughly independent of ξ coll . For soft radiation in uū → W + W − g, apart from the fact that numerical instabilities are generally milder, the various tools behave in a qualitatively similar way.
Similar tests of the OpenLoops 2 stability system as the ones presented here have been carried out for various 2 → 3, 4, 5 hard processes and 2 → 3 processes with an unresolved parton, finding similar stability curves as shown here, and not a single fully unstable result, i.e. one with zero correct digits. A more comprehensive study on numerical instabilities will be presented in a follow-up paper [67].

Summary and conclusions
We have presented OpenLoops 2, the latest version of the OpenLoops tree and one-loop amplitude provider based on the open-loop recursion. This new version introduces two significant novelties highly relevant for state-of-the art precision simulations at high-energy colliders. First, the original algorithm has been extended to provide one-loop amplitudes in the full SM, i.e. including, besides QCD corrections, also EW corrections from gauge, Higgs and Yukawa interactions. The inclusion of EW corrections becomes mandatory for the control of cross sections at the percent level, and even more importantly in the tails of distributions at energies well above the EW scale. Second, the original algorithm has been extended to include the recently proposed on-the-fly reduction method, which supersedes the usage of external reduction libraries for the calculation of tree-loop interferences. In this approach, loop amplitudes are constructed in a way that avoids high tensorial rank at all stages of the calculation, thereby preserving and often ameliorating (by up to a factor of three) the excellent CPU performance of OpenLoops 1. The on-the-fly reduction algorithm has opened the door to a series of new techniques that have reduced the level of numerical instabilities in exceptional phase-space regions by up to four orders of magnitude. These speed and stability improvements are especially significant for challenging multi-leg NLO calculations and for realvirtual contributions in NNLO computations.
In this paper we have presented the algorithms implemented in OpenLoops 2 for the calculation of squared tree, tree-loop interference and squared loop amplitudes. This entails a summary of the on-the-fly reduction method [33] and its stability system, which automatically identifies and cures numerical instabilities in exceptional phase-space regions. This is achieved by means of Gramdeterminant expansions and other analytic methods in combination with a hybrid double-quadruple precision system. The latter ensures an unprecedented level of numerical stability, while making use of quadruple precision only for very small parts of the amplitude construction. Details of these stability improvements and hybrid precision system will be presented in an upcoming publication [67].
In the context of the extension to calculations in the full SM, we presented a systematic discussion of the bookkeeping of QCD-EW interferences and sub-leading one-loop contributions, which are relevant for processes with multiple final-state jets. We also detailed the input parameter schemes and one-loop O(α s ) and O(α) renormalisation as implemented in OpenLoops 2. Here we emphasised crucial details in the implementation of the complex-mass scheme for the description of off-shell unstable particles. The flexible implementation of the complex-mass scheme in OpenLoops 2 is applicable to processes with both on-shell and off-shell unstable particles at NLO. We also introduced a special treatment of processes with external photons, handling photons of on-shell and off-shell type in different ways, which is inherently required by the cancellation of fermion-mass singularities associated with the photon propagator and with collinear splitting processes.
While this manuscript as a whole provides detailed documentation of the algorithms implemented in OpenLoops 2, Section 4 together with Appendix A can be used as a manual, both in order to use OpenLoops 2 as a standalone program or to interface it to any Monte Carlo framework.
Calculations at NLO and beyond require, besides squared amplitude information, also spin and colour correlators for the construction of infrared subtraction terms. To this end we documented the available correlators and conventions available in OpenLoops 2, which comprise tree-tree and loop-loop correlators as well as tree-loop correlators. The former are necessary for the construction of NLO subtraction terms for standard and loop-induced processes. The latter are necessary in NNLO subtraction schemes. Furthermore, conventions and interfaces for the extraction of full tree amplitude vectors in colour space are given. These are necessary ingredients for parton shower matching at NLO.
The new functionalities of OpenLoops 2 and their future improvement will open the door to a wide range of new precision calculations in the High-Luminosity era of the LHC.

A Native Fortran and C/C++ interfaces
OpenLoops can easily be integrated into Monte Carlo tools via its native interfaces in Fortran and C or via the BLHA interface [45,46]. The C interface can of course be used from C ++ as well. We recommend to use the native interface, because it is easier to use, provides more functionality and does not require exchanging files between the tools. In this Appendix we present the various functionalities of the native OpenLoops interface. In doing so we will always refer to the names of the relevant Fortran interface functions. The corresponding C functions are named in the same way with an extra ol_ prefix. In Appendix A.1 we detail necessary modules to be loaded (Fortran) and required header files (C/C ++ ) together with conventions for the format of phasespace points for the evaluation of scattering amplitudes. In Appendix A.2 the setting of parameters is discussed and in Appendix A.3 the registration of processes. In Sections A.4-A.8 we detail the various interfaces for the evaluation of squared scattering amplitudes, amplitude correlators and amplitude colour vectors. Finally, in Appendix A.9 we give a basic example for the usage of the native OpenLoops interface in Fortran and C.
The implementation of the BLHA interface and the usage of OpenLoops together with Sherpa and Powheg-Box are discussed in Appendix B.

A.1 Generalities
Fortran In order to use the native Fortran interface, the module openloops must be included with The module files are located in the directory lib_src/openloops/mod, which should be added to the include path of the Fortran compiler.
Floating point numbers used in the interface are in double precision, denoted here by the kind type dp which can be obtained as follows: integer , parameter : : dp = selected_real_kind ( 1 5 ) Phase space points p_ex are passed as two-dimensional arrays declared as r e a l ( dp ) : : p_ex ( 0 : 3 ,N) Here and in the following N stands for the number of incoming plus outgoing external particles of the considered process. External particles are numbered from 1 to N and are interpreted as incoming or outgoing according to the process registration. See (4.1) and below. The entries p_ex(i,K) correspond to the energy (i=0) and the three physical momentum components (i=1,2,3) of particle K in GeV units. The fifth component is currently not used within OpenLoops.

A.2 Parameter setting
In order to set the OpenLoops parameter with name key to the value val, call where TYPE is integer, real(dp) or character(*) depending on the type of the parameter. It is possible to set parameters of integer or real(dp) type by passing the value in string representation. The error code err will be zero on success.
In C, the function to set a parameter depends on the parameter type: right after setting a parameter. A return value of 0 means that no error occured in the preceeding call.
With the default settings, the program will terminate in case of an error. This can be changed by adjusting the warning level using the function where level=0 means that errors are silently ignored, level=1 means that a warning message is printed, and level=2 (default) means that the program will be terminated on error.
The current value of a parameter can be retrieved by calling For an empty file name, i.e. file=" ", the output is written to stdout.

A.3 Process registration
As detailed in Section 4.2 before evaluation a process has to be registered. This proceeds via . . PID f,m " for a n → m process, where the various particle identifiers (PID) are enetered in either of the two particle labelling schemes specified in Tab. 6. Alternatively, 2 → N − 2 processes can be registered by entering process as an array of integers of length N , where the first two entries are interpreted as initial-state particles. Additionally the amplitude type amptype has to be passed as argument.
For the possible values of amptype see Tab. 4. The function register_process returns the process ID to be used in the routines to evaluate matrix elements, where it is denoted as id.
In the corresponding C interface for process registration the process can only be passed as a string. Again, the process ID is returned.
When all processes are registered the following function must be called before calculating matrix elements.
Fortran subroutine s t a r t ( )

s t a r t ( ) ;
When the calculation is finished, i.e. no more matrix elements will be calculated, the following function should be called. While these calls are not strictly necessary, if log files are used, the files may not be updated at the end of the run and therefore lack information. Additionally, dynamically allocated memory will be deallocated upon the finish call.

A.4 Scattering amplitudes
The following interface functions evaluate the scattering probability densities (2.1)-(2.3) and their building blocks described in Section 4.3. The required inputs are the integer identifier id of the desired process and the phase-space point p_ex (Fortran) / pp (C ++ ), as defined in A.1.
Tree-level amplitudes The function evaluate_tree evaluates the tree-tree probability density (2.1) returning m2l0=W 00 as output.
Fortran subroutine e v a l u a t e _ t r e e ( id , p_ex , m2l0 ) integer , intent ( in ) : : i d r e a l ( dp ) , intent ( in ) : : p_ex ( 4 ,N) r e a l ( dp ) , intent ( out ) : : m2l0 C/C ++ void o l _ e v a l u a t e _ t r e e ( int id , const double * pp , double * m2l0 ) ; One-loop NLO amplitudes The function evaluate_loop evaluates the UV renormalised Bornone-loop interference (2.2) returning m2l0=W 00 and m2l1={W 01 } as output. The three values in m2l1 represent the finite part, and the coefficients of the IR single and double poles 39 of the Born-one-loop interference, as defined in Eq. (4.2). Together with the one-loop amplitude an accuracy estimate is returned (depending on the employed stability system) as acc with acc=-1 in case no stability estimate is available. When available, acc quantifies the relative accuracy δW  integer , intent ( in ) : : i d r e a l ( dp ) , intent ( in ) : : p_ex ( 4 ,N) r e a l ( dp ) , intent ( out ) : : m2l0 r e a l ( dp ) , intent ( out ) : : m2l1 ( 0 : 2 ) r e a l ( dp ) , intent ( out ) : : a c c 01,4D } and an accuracy estimate (see above) acc as output. The three values in m2l1bare represent the finite part and the coefficients of the (UV and IR) single and the double poles. 40 Fortran subroutine e v a l u a t e _ l o o p b a r e ( id , p_ex , m2l0 , m2l1bare , a c c ) integer , intent ( in ) : : i d r e a l ( dp ) , intent ( in ) : : p_ex ( 4 ,N) r e a l ( dp ) , intent ( out ) : : m2l0 r e a l ( dp ) , intent ( out ) : : m2l1bare ( 0 : 2 ) r e a l ( dp ) , intent (  For performance reasons we also provide the function evaluate_ct, which evaluates only the finite part of the UV counterterm, defined in (4.4), returning m2ct0=W (0) 01,CT and m2l0=W 00 as output.

A.8 Colour basis and tree amplitudes in colour space
Besides calculating squared and colour-summed matrix elements, OpenLoops also provides treelevel amplitudes with full colour information, see Section 4.5, required for the matching of parton showers to matrix elements. In the following we describes how to retrieve the colour basis used for a process and the amplitude as a vector in the colour space which is spanned by these basis elements.
Dimension of colour basis and number of helicities The colour basis elements are encoded as integer arrays and must be retrieved once for each process. First one must obtain the following information: • ncolb: the number of basis elements, • colelemsz: the size of the longest basis element, • nheltot: the total number of helicity configurations (including vanishing configurations).
These are returned by the function tree_colbasis_dim for a given process. Tree amplitudes in colour space Now, the function evaluate_tree_colvect returns the (complex) tree-level amplitude amp={A (i) 0 (h)}, defined in (4.23), as a vector in the colour space spanned by the colour basis elements for each of the nhelnonv non-vanishing helicity configurations, which may be smaller than the total number of helicity configurations nheltot returned by tree_colbasis_dim(). In Fortran amp(:,h) for h=1..nhelnonv is an array of ncolb complex numbers such that the element amp(i,h) corresponds to the colour basis element basis(:,i). In C amp[h] for h=0..nhelnonv-1 is an array of 2*ncolb real numbers such that the elements amp[h][2*i] and amp[h] [2*i+1] are the real and imaginary parts of the amplitude which corresponds to the colour basis element basis[i].
Note that colour and helicity average factors and symmetry factors must still be applied when the squared amplitude is built from these results. See

B Other interfaces
OpenLoops has been integrated in a number of Monte Carlo frameworks. In particular Open-Loops can be used in conjunction with Sherpa [26,47], Munich/Matrix [50], Herwig ++ [32], Powheg-Box [27], Whizard [49] and Geneva [48]. In Appendix B.1 we detail the BLHA interface within OpenLoops, and in Appendices B.2 and B.3 the usage of OpenLoops within Sherpa and Powheg-Box respectively. Finally in Appendix B.4 we briefly introduce the OpenLoops Python command line tool.

B.1 BLHA interface
OpenLoops offers an interface in the Binoth-Les-Houches-Accord in both versions BLHA1 [45] and BLHA2 [46]. In order to use the Fortran BLHA interface, the module openloops_blha must be included with Fortran use openloops_blha The module files are located in the directory lib_src/openloops/mod, which should be added to the include path of the Fortran compiler. In a C/C ++ program the openloops.h header has to be included. In the following we list the scope of the BLHA interface within a C ++ program. Usage within a Fortran program proceeds analogous.
Within a C ++ program an BLHA contract file is read by OpenLoops via C/C ++ OLP_Start ( char * c on tr ac t_ f il e_ na m e , int * e r r o r ) ; The answer file is either written to the same file or in a file specified in the contract file via At runtime the tree and loop amplitudes for a phase-space point of N external particles with momenta pp, as specified in the BLHA1/BLHA2 standards, are obtained via C/C ++ OLP_EvalSubProcess ( int * id , const double * pp , double * muren , double * alphaS , double * r e s u l t ) ; Here, id is the ID of the corresponding subprocess (specified in the answer file), muren the renormalisation scale and alphaS the strong coupling constant. The result is written into the array result, where result [3] gives the tree amplitude and result [2] the finite part, result [1] the single pole and result[0] the double pole of the one-loop amplitude W 01 .
A corresponding routine of the BLHA2 standard is also implemented: C/C ++ OLP_EvalSubProcess2 ( int * id , double * pp , double * mu, double * r e s u l t , double * a c c ) ; Here, additionally an accuracy measure of the corresponding amplitude is returned as acc. When not available acc=-1 is returned. For further details see the specification of the BLHA1 [45] and BLHA2 [46] standards. An example illustrating the usage of the BLHA interface with OpenLoops is shipped as ./examples/OL_blha.cpp.

B.2 Sherpa
OpenLoops can be used as a plug-in of Sherpa 2.1.0 or later. Within upcoming releases of Sherpa also the EW subtraction [44] will become publicly available. For the installation of Sherpa and the usage of Sherpa+OpenLoops please also refer to the Sherpa documentation available at https://sherpa.hepforge.org.
In order to use OpenLoops together with Sherpa the Sherpa+OpenLoops interface has to be compiled together with Sherpa passing the --enable-openloops option together with the OpenLoops installation path to the Sherpa configure script. The OpenLoops installation path can be modified at runtime by setting (in the Sherpa run card or command line):

OL_PREFIX=PATH_TO_OPENLOOPS
In order to run Sherpa in combination with OpenLoops it is sufficient to add to the Sherpa run card the statement

ME_SIGNAL_GENERATOR Comix Amegic OpenLoops;
which includes OpenLoops in the list of available matrix element generators, and to set in the processes section of the Sherpa run card the flag Loop_Generator OpenLoops; Sherpa will now automatically use the one-loop matrix elements from OpenLoops when for example a parton-shower matched simulation is requested via (in the processes section of the run card)

NLO_QCD_Mode MC@NLO;
For details on these modes and many other options we refer to the Sherpa documentation.
An example run card illustrating the use of Sherpa+OpenLoops can be found within the installation of Sherpa in the file

PATH_TO_SHERPA/AddOns/OpenLoops/example/Run.dat
Additional examples of Sherpa+OpenLoops run cards can be found in the Sherpa manual.
In general Sherpa automatically handles all the necessary parameter initialisation of OpenLoops. However, user-defined parameters can be passed from the Sherpa run card (or command line) to OpenLoops via OL_PARAMETERS FIRST_PARAM_NAME FIRST_PARAM_VAL SECOND_PARAM_NAME SECOND_PARAM_VAL ...;
The interface provides the subroutines openloops_born, openloops_real, and openloops_virtual with interfaces identical to the corresponding Powheg-Box routines setborn, setreal, and setvirtual including colour-and spin-correlated tree-level amplitudes in the format required by the Powheg-Box. Additionally, the interface provides the routines openloops_init, openloops_borncolour and openloops_realcolour. The former synchronises all parameters between OpenLoops and the Powheg-Box and should be called at the end of the init processes subroutine of the Powheg-Box. The latter two provide colour information required for parton-shower matching, i.e. they return a colour-flow of the squared Born and real matrix elements in leading-colour approximation, on a probabilistic basis. Further details are given in Appendix A.3 of [79].

B.4 Python
OpenLoops provides a Python module openloops.py in the directory pyol/tools that wraps a subset of the functionality of the native interface. Its main application is to provide a simple command line tool to evaluate matrix elements. The documentation of the command line tool can be obtained via ./openloops run --help For example the following command evaluates the tree and one-loop amplitudes for n = 10 random phasespace points with a center-of-mass energy √ŝ = 500 GeV for the process uū → Zgg using M Z = 91 GeV and prints the result to the screen: ./openloops run "u u~> Z g g" order_ew=1 mass\(23\)=91 -e 500 -n 10 The random phase-space points are generated with Rambo [78].

C List of input parameters
In Tabs. 9-11 we list all input parameters and switches available in OpenLoops. Within the general purpose Monte Carlo frameworks (e.g. Sherpa, Powheg-Box and Herwig ++ ) these parameters are synchronised automatically. In Tab. 9 input parameters relevant for the process registration are listed, in Tab. 10 model input parameters are listed and in Tab. 9 input parameters relevant for the stability system are summarised. non-diagonal CKM matrix model str, default="sm" model selection. Available models: "sm", "heft" install_path str, default=" " set installation path of process libraries if different from OpenLoops default installation approx str, default=" " approximation allowed_libs str, default=" " whitespace separated list of allowed libraries check_poles int, default=0 1: print pole cancellation checks upon amplitude registration Table 9: Available input parameters and switches in OpenLoops relevant for the process registration. Possible input types include int: integer or str: string. For details see Section 4.2.
stability system: general parameter options decription psp_tolerance dp + , default=10 −9 Tolerance for warnings triggered by phase-space consistency checks (momentum conservation and on-shell conditions) stability system: born-loop interferences parameter options description hp_mode 1 (default) hybrid precision mode for hard regions 2 hybrid precision mode for IR regions (restricted to NLO QCD) 0 hybrid precision mode turned off hp_loopacc dp + , default=8. target precision in number of correct digits stability system: HEFT and loop-loop interferences parameter options description stability_triggerratio dp + , default=0.2 The fraction of points with the largest K-factor to be re-evaluated with the secondary reduction library stability_unstable dp + , default=0.01 Relative deviation of two Born-loop interference results for the same point above which the qp evaluation is triggered stability_kill dp + , default=1. Accuracy below which an unstable point is discarded after qp evaluation for Born-loop interferences. stability_kill2 dp + , default=10. Accuracy below which an unstable point is discarded in loop-loop interferences stability_log 0 (default) no stability logs are written 1 stability logs written on finish() call 2 stability logs written adaptively 3 stability logs written for every phase-space point stability_logdir str set the (relative) path for the stability log files Table 11: Available input parameters and switches in OpenLoops relevant for the stability system. Possible inputs include dp + : positive double, int: integer, str: string. For details see Section 4.6.