A general framework for implementing NLO calculations in shower Monte Carlo programs: the POWHEG BOX

In this work we illustrate the POWHEG BOX, a general computer code framework for implementing NLO calculations in shower Monte Carlo programs according to the POWHEG method. Aim of this work is to provide an illustration of the needed theoretical ingredients, a view of how the code is organized and a description of what a user should provide in order to use it.


Introduction
The POWHEG method is a prescription for interfacing NLO calculations with parton shower generators. It was first suggested in ref. [1], and was described in great detail in ref. [2]. Until now, the POWHEG method has been applied to the following processes: Z pair hadroproduction [3], heavy-flavour production [4], e + e − annihilation into hadrons [5] and into top pairs [6], Drell-Yan vector boson production [7,8], W ′ production [9], Higgs boson production via gluon fusion [10,11], Higgs boson production associated with a vector boson (Higgs-strahlung) [11], single-top production [12], Z + 1 jet production [13], and, very recently, Higgs production in vector boson fusion [14]. Unlike MC@NLO [15], POWHEG produces events with positive (constant) weight, and, furthermore, does not depend on the Monte Carlo program used for subsequent showering. It can be easily interfaced to any modern shower generator and, in fact, it has been interfaced to HERWIG [16,17] and PYTHIA [18] in refs. [3,4,7,10,12,14]. The present work introduces a computer framework that implements in practice the theoretical construction of ref. [2]. We call this framework the POWHEG BOX. The aim of the POWHEG BOX is to construct a POWHEG implementation of an NLO process, given the following ingredients: i. The list of all flavour structures of the Born processes.
ii. The list of all flavour structures of the real processes.
iii. The Born phase space.
iv. The Born squared amplitudes B, the color correlated ones B ij and spin correlated ones B µν . These are common ingredients of NLO calculations performed with a subtraction method.
v. The real matrix elements squared for all relevant partonic processes.
vi. The finite part of the virtual corrections computed in dimensional regularization or in dimensional reduction.
vii. The Born colour structures in the limit of a large number of colours.
With the exception of the virtual corrections, all these ingredients are nowadays easily obtained. A matrix element program (like MADGRAPH) can be used to obtain (iv) and (v). The colour-correlated and spin-correlated Born amplitudes are also generated automatically by programs like MadDipole [19] and AutoDipole [20]. Recent progress in the automatization of the virtual cross section calculation may lead to developments where even the virtual contribution (vi) may be obtained in a painless way [21,22,23,24,25,26,27,28,29,30]. Given the ingredients listed above, the POWHEG BOX does all the rest. It automatically finds all the singular regions, builds the soft and collinear counterterms and the soft and collinear remnants, and then generates the radiation using the POWHEG Sudakov form factor.
The purpose of this work is twofold. Our first aim is to complete here the theoretical work of ref. [2], by explaining several variants of the procedure illustrated there, that have turned out to be necessary to fulfill our goal. In doing so, we will refer often to formulae given in ref. [2], that is thus a prerequisite for reading the present work. Our second aim is to illustrate specifically how the code really works. It is true to some extent that well-written codes are self explanatory, and, in fact, we tried to write the POWHEG BOX as transparently as possible. However, what may not be so easy to understand from the code is its global organization. We believe that this document, together with the source code, could be used by others to understand the code up to the point of being able to modify it.
Strictly speaking, the present work is neither the documentation of the POWHEG BOX code, nor a description of its theoretical basis. So, for example, we do not include a rigorous description of all the subroutines in the code, and, as we already said, we refer to ref. [2] for the theoretical bases of our method. In general, one expects that a theoretical paper should include the description of how a given calculation has been performed. This normally includes the illustration of some algebraic steps, and, maybe, an indication of what part of the calculation was performed numerically, so that the reader should be able, on the basis of the given indications, to verify its content. In the present case, the calculation is performed relying heavily on computer algorithms, and the only realistic way to verify its content is to understand the code. Thus, here we explain the algorithms and give sufficient indications on the code structure for the reader to understand it and verify its correctness. We believe that a detailed documentation of the POWHEG BOX is not necessary for this purpose, since the details are better understood by directly studying the code. Furthermore, the code itself will unavoidably evolve with time, so that detailed documentation may not be so helpful after all. In summary, this paper should be simply seen as a description of our calculations, that, being performed essentially by a computer program, must, to some extent, coincide with a description of the program itself.
Researchers wishing to use the POWHEG BOX should not need, in principle, to study or understand this whole paper. They only need to know in which format they have to supply the ingredients that are listed above. These are summarized in section 2. Looking at the various implementation examples included in the code should also help with this task. In the near future, we will provide a manual that documents in detail the user interface. The remaining part of this work should be useful in order to understand better certain features that the POWHEG BOX provides, and that the user may need, or to implement new features that are not yet available.
The paper is organized as follows: in section 2 we report all the information needed to interface an NLO program to the POWHEG BOX. Thus, for example, we specify how the flavour structure of scattering processes is represented, how to specify their kinematics, and the format that the Born, virtual and real cross section subroutines must have. In section 3 we describe the algorithm used to generate all the singular regions of the process, and how these are represented in the computer code. A simple example is also discussed in detail. In section 4 we describe how theB function is constructed. This function, when integrated over the radiation variables, yields the inclusive cross section at fixed underlying Born configuration, and thus is crucial for the first stage of the event generation, i.e. the generation of the underlying Born structure. The computation ofB is quite complex, so this section is divided into several subsections, dealing with the Born contribution, the soft-virtual contribution, the real contribution and its soft and collinear limits. In section 5 we describe the mechanism provided in POWHEG BOX for tuning the part of the real cross section that is dealt with using the shower Monte Carlo method, and how the remaining finite part is treated. This separation into a singular and a finite part, besides being useful for tuning the POWHEG output, also provides a method to improve generation efficiency in certain cases. In section 6 we give an overview of the initialization stage of POWHEG. At this stage the inclusive cross section is computed, and the appropriate grids are set up for the generation of the underlying Born configurations. In section 7 we describe the second stage of the event generation, i.e. the generation of radiation, and in section 8 we describe how the POWHEG-generated event is prepared for further showering by a standard shower Monte Carlo program. Finally, in section 9, we give our conclusions.
Several appendixes collect further analytical and technical details. In appendix A we report the formulae for the soft integrals that are used in the POWHEG BOX. In appendix B we report the collinear limits of the real cross section, that are used in the POWHEG BOX to build the collinear counterterms. In appendixes C and D we describe the upper bounding functions used in the generation of radiation, both for final-state and for initial-state radiation. In appendix E we describe how the renormalization and factorization scales are set in the POWHEG BOX, and how the strong coupling constant is computed. Finally, in appendix F, we give a discussion of a few miscellaneous topics that are useful for using and understanding the program.

The format of the user subroutines
By flavour structure of a process we mean the type of all incoming and outgoing particles. In the program, the flavour type is denoted by an integer number, so that the flavour structure is a list of integers. The ordering and numbering of particles follow the rules: 1. first particle: incoming particle with positive rapidity 2. second particle: incoming particle with negative rapidity 3. from the third particle onward: final-state particles ordered as follows − colourless particles first, − massive coloured particles, − massless coloured particles.
The flavour is taken incoming for the two incoming particles and outgoing for the final-state particles.
The flavour index is assigned according to the Particle Data Group conventions [31], except for the gluons, where 0 (rather than 21) is used.
Example: if we are interested in the associated production of a t quark and a vector boson Z plus two jets pp → Z t + 2 jets , (2.1) then one of its contributing subprocesses is b u → Z t s g , (2.2) whose flavour structure, according to the previous rules, is given by [5,2,23,6,3,0] .
In QCD calculations, the colourless particles and the massive coloured particles will remain the same at the Born and NLO level. In the NLO calculation, the flavour structure of real graphs will have one more light parton in the final state. The virtual term, being the interference of the Born and of the one-loop amplitude, has the same flavour structure of the Born term.
In the POWHEG BOX, the header file pwhg flst.h, in the include subdirectory, contains all arrays and parameters having to do with flavour structures. They depend upon the parameters nlegborn and nlegreal that are set to the number of legs (incoming plus outgoing) of the Born (or virtual) and real graphs. The user of the POWHEG BOX will have to set explicitly nlegborn in the nlegborn.h file, that is thus included before the pwhg flst.h file in all program units that need to access the flavour structures. In the example (2.2), the user should set nlegborn=6. The variable nlegreal is always set to nlegborn+1. The user should also set the variables flst nborn and flst nreal to the number of inequivalent flavour structures for the Born and real graphs, and should also fill the arrays flst born(k=1:nlegborn, j=1:flst nborn) flst real(k=1:nlegreal, j=1:flst nreal) in an appropriate initialization subroutine, named init processes. Notice that flavour structures that are equivalent under a permutation of final-state particles should never appear in the list. Thus, in the example of eqs. (2.2) and (2.3), either [5,2,23,6,3,0] or [5,2,23,6,0,3] may appear as flavour structures, but not both.
The user should also set the value of flst lightpart, the position of the first light coloured parton. Then, in the example previously described, flst lightpart=5.

Tagging parton lines
At times it is convenient to treat lines with the same flavour as if they were different.
One such example is Higgs boson production via vector boson fusion (VBF). The fermion lines attached to the vector bosons may be treated as being distinct. Of course, in W + W − fusion, they may also be effectively distinct, with the W + coming from a u quark turning into a d, and the W − coming from a c quark turning into an s. Consider however the real graph depicted in fig. 1. It corresponds to a gluon-initiated next-to-leading correction to VBF Higgs boson production: gu → Hūdd. It is clear that the two d quarks in the final state have a very different role, and should be kept distinct. However, as far as the flavour combinatorics is concerned, they are considered identical in the POWHEG BOX, that assumes that the graphs are already symmetrized with respect to identical finalstate particles. Thus, the combinatoric algorithm will generate two regions for this graph, corresponding to either d being collinear to the incoming gluon. In order to overcome this problem, the POWHEG BOX allows the possibility to attribute a tag to each line, so that lines with the same flavour but different tags will be treated differently from the combinatoric point of view. In the example at hand, one assigns the tags according to the scheme in fig. 2. We arbitrarily assign a tag equal to zero to particles that we do not need to tag (the initial-state gluon and the produced Higgs boson). Within this scheme, the two final-state d quarks will be treated as different from the combinatoric point of view. Only the quark tagged as 1 in the real graph will generate a singular region, since if quark 2 were collinear to the incoming gluon, the associated underlying Born would have an incoming antiquark tagged as 2, and thus would not be present.
In the code, tagged lines are treated by generating an internal flavour index that replaces the real flavour index, in such a way that the internal flavour is different for lines with different flavour or different tag. The combinatorics is carried out with these internal flavour numbers. At the end, internal flavour numbers are replaced with the original real flavour numbers. From this point on, tags are ignored. The tags are set to zero during POWHEG initialization. If needed, the user's initialization routine should appropriately set the arrays flst borntags and flst realtags.
The task of changing the flavour indexes to account for different tags, and of changing them back to the original state, are carried out by the routines mapflavours and unmapflavours, that are called near the beginning and near the end of the genflavreglist routine, the subroutine that finds the different singular regions, as described in section 3.

The Born phase space
The Born phase space (provided by the user of the POWHEG BOX) is a subroutine named born phsp(xborn) that fills the four-momenta of Born-process particles (both in the laboratory and in the center-of-mass frame), the Bjorken x of the two incoming partons, the minimal mass of the Born system and the phase space volume.
It receives as input xborn, an array of real numbers, in the range [0, 1]. If no resonances are present in the final state, the dimension of this array is 3 n − 2, n being the number of final-state particles, i.e. n =nlegborn−2 . In case some resonances are present, their virtuality will require one more variable, and the user will have to take account of them too, increasing accordingly the numbers of elements of this array.
In the paper, we will refer to this set of numbers as X born . We recall that the Born phase space Φ n , defined in [2], is given by The born phsp routine should perform the following tasks: 1. Set kn pborn(mu=0:3, k=1:nlegborn) and kn cmpborn(mu=0:3, k=1:nlegborn) 1 to the Born momenta in the laboratory frame and in the center-of-mass (CM) frame.

The Born and Born-correlated squared amplitudes
The user of the POWHEG BOX should provide the routine setborn(p(0:3,1:nlegborn), bflav(1:nlegborn), born, bornjk(1:nlegborn,1:nlegborn), bmunu(0:3,0:3,1:nlegborn)). Given the four-momenta p and the flavour structure bflav of a Born subprocess, the routine should return the Born squared matrix element 2s b B in born, the colour correlated one in bornjk and the spin correlated one in bmunu. The flux factor 1/(2 s b ) =1/(2*kn sborn) (where s b is the center-of-mass energy squared of the Born process) should not be included, 2 since it is supplied by the POWHEG BOX.
The colour correlated Born amplitude is defined in eq. (2.97) of ref. [2]. We report it here for completeness (2.6) Here M {c k } is the Born amplitude, and {c k } stands for the colour indexes of all external coloured particles in the amplitude. The suffix on the parentheses that enclose M † {c k } 1 All variables with the kn prefix are defined in the header file pwhg kn.h. 2 In the notation of ref [2], B includes the flux factor indicates that the colour indexes of partons i, j are substituted with primed indexes in M † {c k } . The factor N is the appropriate normalization factor including averages over initial spin and colour and symmetry factors. We assume summation over repeated colour indexes (c k , c ′ i , c ′ j and a) and spin indexes. For gluons T a cb = if cab , where f abc are the structure constants of the SU (3) algebra. For incoming quarks T a αβ = t a αβ , where t are the colour matrices in the fundamental representation (normalized as Tr[t t] = 1/2). For antiquarks T a αβ = −t a βα . It follows from colour conservation that B ij satisfy where i runs over all coloured particles entering or exiting the process, and C f j is the Casimir constant for the colour representation of particle j. The spin correlated Born squared amplitude B µν j is defined to be non-zero if the j th Born leg is a gluon, and is basically the Born cross section obtained by leaving the gluon indexes of the j th leg uncontracted. More precisely, we can write where M ({i}, s j ) is the Born amplitude, {i} represent collectively all remaining spins and colours of the incoming and outgoing particles, and s j represents the spin of the j th particle. The ǫ µ s j are polarization vectors, normalized as Notice that the Born squared amplitude is requested for each individual flavour structure of the contributing subprocesses. Many different flavour structures will return identical or proportional values of the Born cross section. For example dd → Z is identical to ss → Z, and uū → γ * is proportional to dd → γ * . The POWHEG BOX identifies these identical contributions initially, and stores the proportionality constants. When computing the Born cross section for all needed flavour structures, it computes only the minimum number of squared amplitudes it needs, and obtains the others using the proportionality relations found initially.

The virtual amplitudes
The user should provide a subroutine setvirtual(p(0:3,1:nlegborn),vflav(1:nlegborn),virtual), that returns in virtual the finite part V fin of the virtual cross section for the process with flavour structure vflav and external momenta p. The V fin contribution is defined, in conventional dimensional regularization (after renormalization), by where the a and c ij coefficients do not depend upon ǫ. Their explicit form can be found, for example, in ref. [32]. The normalization factor is (2.12) Here it is important to remark that, in the conventional dimensional regularization, B and B ij in formula (2.11) are evaluated in (4 − 2ǫ) dimensions, and thus do depend upon ǫ. In fact, all formulas for the soft contributions and the collinear remnants used in the POWHEG BOX are computed in the MS scheme, dropping the 1/ǫ 2 and 1/ǫ contributions written in terms of the (4 − 2ǫ)-dimensional expression of the Born squared amplitudes and with the normalization factor of eq. (2.12) (see for example appendix A). Thus, if the virtual contribution is written as in formula (2.11), the divergent terms dropped in the POWHEG BOX cancel exactly the 1/ǫ and 1/ǫ 2 terms in (2.11), leaving the finite contribution V fin . If the NLO calculation has been performed in the Dimensional Reduction (DR) scheme, in order to use it within the POWHEG BOX, we do the following. First of all, we assume that the NLO result is expressed in terms of the standard MS coupling constant. If this is not the case, the appropriate straightforward corrections need to be applied. Then one defines V DR fin using the same formula (2.11) where now B and B ij are evaluated in four space-time dimensions. In other words, V DR fin is defined as the zeroth order coefficient of the Taylor expansion of V DR b 2π/(α S N ) in ǫ. The expression to be used in the POWHEG BOX is [33] whereγ(g) = N c /6, andγ(q) = (N 2 c − 1)/(4N c ), with the index i running over all coloured massless partons of the amplitude and where N c is the number of colours.
The scale Q in eq. (2.12) is completely arbitrary in this context. It may be chosen for convenience equal to s or equal to the renormalization/factorization scale. Within the POWHEG BOX it has been fixed to be equal to the renormalization scale µ R , and so also the user should set it to this value.

The real squared amplitudes
Together with the Born and Born-correlated squared amplitudes, the user of the POWHEG BOX should provide the routine setreal(p(0:3,nlegreal), rflav(1:nlegreal), amp2) that computes, given the momenta p of the external particles, the squared amplitude 3 for the real process specified by the flavours rflav, stripped off by a factor α S /(2π).
As for the Born contribution, also the real squared amplitude is requested for each individual flavour structure contributing to the real cross section. Many different flavour structures will return identical or proportional values for the real squared amplitude. The POWHEG BOX identifies these identical contributions initially, and stores the proportionality constants. When computing the real cross section for all needed flavour structures, it calculates only the minimum number of squared amplitudes it needs, and obtains the others using the proportionality relations found initially.

Analysis routines
In the analysis file, pwhg analysis.f, the user provides her/his own analysis routine to plot kinematic distributions.
In order to have a unique analysis file, able to deal with events at the partonic or at the hadronic level (i.e. after the shower), we pass all the kinematics and properties of the particles via the HEPEVT common block. In this way, the analysis subroutine has only to have access to this common block and to the value of the differential cross section. This last subroutine is then completely independent from the program that has filled the common block, and can be called by PYTHIA or HERWIG too.
The POWHEG BOX, during the integration of theB function, can produce plots with fixed NLO accuracy. This is done via a call to the routine analysis driver, that fills the common block with the kinematic momenta. This routine receives as input parameters the value of the cross section at that specific kinematic phase-space point and the integer variable ikin. If ikin is set to 0, the event is treated as a Born-like one, and the nlegborn momenta in the kn pborn kinematic common block are copied on the HEPEVT common block. Otherwise, the subroutine copies on the common block the nlegreal momenta stored in kn preal. As final step, it invokes the routine analysis, that receives as input parameter only the value of the cross section. 4

The singular regions
For each real flavour structure, the POWHEG method requires that one decomposes the real cross section into the sum of contributions that are divergent in one singular region only. In the notation of ref. [2] we write Each α r is thus associated with a single flavour structure, and a single singular region. Sometimes we will refer to it as α r region (or alr region, which is the name of the variable that we often use to represent it in the code). The separation in eq. (3.1) is illustrated in detail in section 2.4 of ref. [2]. We have investigated two methods for obtaining such separation: the first one is based on the subtraction method proposed by Catani and Seymour (CS) [34], and the second one on the method proposed by Frixione, Kunszt and Signer (FKS) [35,36]. It was found that the FKS subtraction method is better suited for our purpose, and so we focused upon it. The difficulties with the CS dipole method are due to the large number of dipoles and dipole types, and to the fact that it is not easy to separate the real cross section into a sum of terms having the same singularity structure of the CS dipoles. It is in fact not possible to simply weight the real cross sections with factors that vanish in all but one singular region, since for each singular region there are several CS dipoles, depending upon the choice of the spectator.
The FKS method is slightly more cumbersome than the CS method when counterterms are computed using the collinear and soft plus-distributions. It turns out, however, that this difficulty remains the same for all processes. The procedure to disentangle the plusdistributions can be coded simply in a general way once and for all.
In the FKS framework, singular regions are characterized as follows: • A final-state parton i is becoming collinear to either initial-state partons j = 1, 2, or soft • A final-state parton i is becoming collinear to a final-state parton j, or soft.
We will call i the emitted or radiated parton, and j the emitter. If we replace the pair emitted-emitter with a single parton of the appropriate flavour, we obtain the flavour configuration of the underlying Born. Given the list of flavours of the real graphs, one is faced with the combinatoric problem of finding all singular regions. In the POWHEG framework, this task is carried out keeping in mind that we should be able to group easily all singular regions that share the same underlying Born. In order to ease this task, it is convenient to choose a standard ordering for the flavour structure of each alr. One can easily demonstrate that the flavour of all the alr can be ordered in such a way that the two following properties are satisfied: The emitted parton is always the last parton. Property 2 is non-trivial, since in general the underlying Born structure obtained from a generic alr will be equivalent up to a permutation to a flavour structure present in the flst born list.
It is clear that putting all the alr in a standard form with the properties 1 and 2 simplifies the POWHEG implementation. In fact, since real contributions sharing the same underlying Born are often grouped together in POWHEG, it is better if the underlying Born flavour structures are unique. This will be illustrated more clearly in the example described in section 3.1.
In case of initial-state collinear singularities, the emitter will be assigned the value 1 (2) to distinguish collinear emissions from incoming line 1 (2) (the ⊕ and ⊖ directions of ref. [2]). If the emitted parton is a gluon, the emitter will be assigned the conventional value 0, that means that both 1 and 2 may have emitted it. These distinctions are such to minimize the number of regions, maintaining however the fact that, to each region, we associate a unique underlying Born configuration.
The FKS framework, for final-state radiation (FSR), distinguishes between the emitter and the emitted parton (often called the FKS parton), in the fact that only the emitted parton leads to soft singularities. Thus, the emitter can be a quark, with the emitted parton being a gluon, but not viceversa. On the other hand, the emitter-emitted can be a quarkantiquark pair. In this case, it does not matter what we choose to be the emitted parton. By convention, we will always choose it to be the antiquark. If the emitter and radiated partons are both gluons, when computing the corresponding α r , we supply a damping factor that removes the soft singularity of the emitter, with an appropriate compensating coefficient. For example, if we call E em and E r the CM energies of the emitter and emitted parton, the damping factor may be chosen equal to E em /(E em + E r ), and an extra factor of 2 is supplied to account for the region where the role of emitter and emitted partons are exchanged.
In the POWHEG BOX, the task of finding all regions associated with a given flavour structure is performed by the routine find regions(nleg,rflav,nregions,iregions) where nleg=nlegreal and rflav(1:nlegreal) is the input flavour structure. It returns in nregions the number of regions found, and, for every found region j, it returns the positions (in the string of flavours) of the emitter and of the emitted parton (arbitrarily ordered) in iregions(1:2,j).
The algorithm for finding the final-state regions is the following: 1. Loop over all massless parton pairs i=flst lightpart:nlegreal, j=i+1:nlegreal.
2. Check if i and j can come from the splitting of the same parton (i.e. if they have opposite flavours, or if they are both gluons, or one of them is a gluon).
3. If they cannot come from the same parton flavour, skip them. The initial-state regions are treated similarly. We check, for each final-state light coloured parton j, if it may come from the splitting of an initial-state parton. If it comes from the first (second) incoming line, then the number of regions nregions is increased and we set iregions(1,nregions) = 1 (2) and iregions(1,nregions) = j. If the emitted parton j is a gluon, then only one region is generated, and we set iregions(1,nregions) = 0. The first task of the POWHEG BOX is to build the list of all α r in the standard form specified in Properties 1 and 2. This task is performed by the subroutine genflavreglist, that, more specifically, does the following: • It sets the variable flst nalr to the total number of inequivalent singular regions α r found.
• It sets the variable flst nregular to the number of real flavour structures that do not have any singular region, 5 and it fills the array flst regular(k=1:nlegreal, alr=1:flst nregular) with the corresponding flavour structures. If flst nregular is greater than 0, also the flag flg withreg (defined in the header file pwhg flg.h) is set to true.
• It fills the array flst alr(k=1:nlegreal, alr=1:flst nalr) with the flavour structure corresponding to the given alr region. The ordering is guaranteed to respect Properties 1 and 2.
• It sets the array flst emitter(alr) to the emitter of the alr structure.
• It sets the flst mult(alr) array to the multiplicity of the alr th structure. This number arises because regions may be found that are equivalent by permutations of the external legs. Only one is retained in this case, with the correct multiplicity.
• It sets the array flst uborn(k=1:nlegborn, alr) to the flavour structure of the underlying Born of the alr th structure.
• It sets the array flst alr2born(alr) to an index in the flst born array, that points to the underlying Born flavour structure of the alr th region.
• It sets up a pointer structure from the underlying Born index to the set of alr's that share it: it sets the array flst born2alr(0,jborn) to the number of alr regions that have jborn as underlying Born, i.e. flst alr2born(alr)=jborn, and sets flst born2alr(i,jborn), with i=1:flst born2alr(0,jborn), to the corresponding alr index.
• For each alr, a list of all singular regions for the corresponding flavour structure is also build. These are needed in order to compute R αr , as we will see further on. The array element flst allreg(1,0,alr) is set to the number of singular regions of the flavour structure of the alr th region, i.e. flst allreg(1,0,alr)=nregions of that particular alr. Then flst allreg(i=1:2, k=1:nregions,alr) is set to the list of pairs of indexes characterizing the emitter and the emitted parton for each singular region.
In order to perform this task, the subroutine genflavreglist loops through the flst real list of real flavour structures, calling for each of them the routine find regions. Each region found is first transformed with a permutation, in such a way that the emitted parton is always the last in the list. In the case of final-state radiation, the emitter is also moved near the emitted parton (i.e. at the nlegreal-1 position) with a permutation. At this stage, one also makes sure that, if the emitter-emitted pair is made by a quark (antiquark) and a gluon, the gluon is always the emitted parton, and if the pair is a quark and antiquark, the antiquark is always the emitted parton (if this is not the case, emitter and emitted partons are exchanged). The lists flst alr and flst emitter are updated accordingly. Once this procedure is completed, the alr list is complete, but each element may appear more than once. The list is thus searched for equivalent elements, it is collapsed in such a way that each element appears only once, and a multiplicity factor flst mult is setup to keep track of how many occurrences of a given contribution are present. At the end of this procedure, only inequivalent alr's remain, with an associated multiplicity factor. But it may still happen that the same underlying Born configuration may appear in different alr with different ordering. At this stage the alr list is scanned again. If at any point one finds and underlying Born that differs from one appearing in the flst born list, the flavour structure of the current arl (and its underlying Born flavour structure) are permuted, in such a way that one recovers the same ordering of one of the elements appearing in the flst born list. In case of final-state radiation, the emitter may end up to be no longer the nlegreal-1 leg of the process, and so, the array flst emitter is updated accordingly.

Example
Due to the intrinsic complexity of the procedure for finding the singular regions, it might be useful to examine it on a simple example. Let us consider a process that, at the Born level, has only two flavour structures (flst nborn=2) and five coloured massless partons (nlegborn=5 and flst lightpart=3). In the second column of table 1, we list two partonic Born subprocesses, with the corresponding POWHEG BOX flavour structure flst born.
Suppose now that the real-process contributions are only the four ones in table 2, so that flst nreal=4. Since the real contributions have one more parton with respect to the Born diagrams, we have also nlegreal=6. Note that we deliberately have not included subprocesses such as sg → gudc, gg →sscū and gu →sscg, in order to keep the example short.
jreal processes Table 2: Flavour structure of four real subprocesses and the corresponding POWHEG BOX notation, in the third column.
The subroutine find regions is called on each flavour structure of the list flst real. This subroutine returns the list of emitter-radiated pairs, and their total number. For example, when the first flavour structure [3, 4, 0, 2, 1, 0] is passed to the subroutine, this returns the list of 7 pairs: {(3, 4), (3,5), (3,6), (4, 6), (5, 6), (0, 3), (0, 6)}. The (3,4) pair means that the gluon in the third position can be emitted by the up quark in the fourth position, and that, by removing this gluon, the so obtained underlying Born is a valid Born, since its flavour structure is equivalent (up to a permutation of the final-state lines) to the first flavour structure in flst born. The (0, 3) pair represents the singular region associated with gluon 3 being emitted by both initial-state partons, and, once removed, we get a valid flavour Born. When the subroutine is called on the second flavour structure in flst real, [3, 4, 3, −3, 2, 1], it returns a single pair (3,4), meaning that the ss pair can come from the splitting of a gluon, and the diagram obtained by replacing the ss pair by a gluon is a valid Born. Similarly the call of find regions on the third element of flst real returns the pair (1, 6) that signals the fact that this diagram is compatible with the splitting of an initial-state gluon into an ss pair.  Table 3: List of all the 10 singular regions found up to this stage of the program, of the emitterradiated pairs, of the position of the emitter, flst emitter, and of the corresponding flavour structure, flst alr.
In total we have then 10 singular regions, so that, at this stage of the program, flst nalr=10. We have listed them in the second column of table 3. The flavour structure of these singular regions is then saved in the array flst alr after a suitable permutation of the final-state partons, such that the emitted parton is in the last position (nlegreal), and, in case of final-state radiation, the emitter is in the nlegreal-1 position. At the same time, the position of the emitter is recorded in the array flst emitter (see the third and fourth column of   At this stage, the list of singular regions is scanned, and, if two elements are equivalent (up to a permutation of the final-state partons) and have equal emitted and radiated parton, one of them is removed from the list and the multiplicity factor flst mult of that singular region is increased. By referring to table 3, the fourth alr flavour list, [3, 4, 0, 1, 2, 0], is equivalent to the first one, [3, 4, 1, 0, 2, 0], so that it is removed and the multiplicity factor of the first singular region is set to 2. This is illustrated in table 4, where the fifth and the seventh singular regions of table 3 have been removed and the multiplicity factors of the second and fourth singular region in table 4 are set to 2.
In the last column of table 4, we collect the underlying Born flavour structures, flst uborn, corresponding to each alr singular region. Each of these underlying Born must be equivalent (up to a permutation of the final-state partons) to one of the Born in the list of valid Born flavour structures, flst born.
At this point, we scan the list of the underlying Born flavour structures, flst uborn, and permute the flavours in such a way to obtain exactly one element of the flst born list. Correspondingly, we reorder the flavours of the corresponding alr singular region. We perform the same exchanges on the first element of flst alr, i.e. [3, 4, 1, 0, 2, 0], and we obtain [3, 4, 0, 2, 1, 0], and, after these permutations, the emitter is the parton in the fourth position, so that flst emitter=4. By performing this task on every underlying Born flavour structure, we obtain the second and third column of table 5.
As final tasks, we fill the arrays with the pointers to go from an alr region to its underlying Born flavour structure and viceversa. The first 6 elements in the flst alr list have    as underlying Born, while the seventh has the second one, so that flst born2alr(0,1)=6 and flst born2alr(0,2)=1 (see table 6). The list of these singular regions is shown in the third column: flst born2alr(1:6,1)={1, 2, 3, 4, 5, 6} and flst born2alr(1:1,2)={7}. Finally, for every flavour structure in flst alr, we build the list of all the associated singular regions. This is done by simply calling again find regions on every element of the second column of table 5. For every alr, the total number of singular regions, nreg, is saved in flst allreg(1,0,alr) (see the fifth column of table 5), and all the pairs of emitted-radiated partons is saved in the array flst allreg(1:2,1:nreg,alr) (see sixth column).

TheB function
In order to generate an event, POWHEG first generates a Born kinematic and flavour configuration, with a probability proportional to (see eq. (4.13) in ref. [2]) The square brackets with a suffix represent a context: everything inside is relative to that suffix. The symbol f b labels a Born flavour structure. Thus, due to this context notation, B and V in the square bracket refer to the contribution of the Born and soft-virtual cross section having the flavour structure f b . The suffix in the first summation represents a sum over all α r that have f b as underlying Born. The corresponding square bracket under the integral sign means that we integrate in the radiation phase space of the current α r , keeping the underlying Born variablesΦ αr n fixed and equal to Φ n . The real contribution R has been properly regularized using the plus distributions (see eq. (2.88) and (2.89) in [2]), so that it is written with a hat, and has to be handled properly. The way we deal with the generation of the underlying Born kinematics in POWHEG is the following. For each singular region, we parametrize the radiation variables Φ rad as function of three variables (xrad in the code) in the unit cube, and we also parametrize the z variable in eq. (4.2) as a function of X (1) rad in the [0, 1] interval. We then introduce theB function, defined as One now integratesB(Φ n , X rad ) in the full (Φ n , X rad ) phase space, using an integration routine that implements the possibility of generating the integrand with a uniform weight after a single integration. The routine that we use is mint [37], that was explicitly built for application in POWHEG. Since mint is designed to integrate in the unit hypercube, also the Born phase space has to be mapped in a hypercube, spanned by the variables X born , as described in section 2.2.

The btilde function
The btilde(xx,www0,ifirst) function implements the functionB in eq. (4.4). The first elements of the array xx correspond to xborn and the last 3 correspond to xrad. The weight www0 is passed by the integration routine, and equals the weight factor (arising from integration volume and importance sampling) supplied by the integration routine. The flag ifirst has a rather involved use (described in detail in ref. [37]) that is needed in order to implement the folding of some integration variable. The subroutine mint may be in fact requested to use more points for some selected integration variable, keeping fixed all the others, for each single random contribution to the integral. In practice, one may require that more points for the variables X rad are used for a single value of X born . Upon the first call with a new value of X born , the function btilde is called with ifirst=0. In all subsequent calls with the same X born , but different X rad , btilde is called with ifirst=1. After all calls with ifirst=1, a last call with ifirst=2 is performed, where btilde accumulates all the values computed since the last ifirst=0 call. The quantities that depend only upon X born are computed only once when the call with ifirst=0 is performed. Thus, the Born phase space is generated at this stage. The Born cross sections for all Born flavour structures are computed and made available in appropriate common blocks at this stage, by calling the subroutine allborn. The virtual contributions to btilde are also computed here. The subroutines btildeborn(resborn,www) and btildevirt(resvirt,www) fill the arrays resborn and resvirt with the contributions for each Born flavour structure. The contributions from the collinear remnants and from the real cross section (btildecoll and btildereal) are computed both for ifirst=0 and ifirst=1. Notice that, in these cases, the index in the arrays btildecoll and btildereal refers to a given underlying Born. Thus, for each array entry in btildereal, for example, several contributions from different α r regions (sharing the same underlying Born) are summed up. All the contributions to btilde are accumulated in the array results. When called with ifirst=2, the behaviour of the program depends upon the setting of the flag negflag. If true, btilde should only compute the contribution of negative weights. Thus, the positive entries in results are zeroed, and the negative entries are replaced with their absolute value. If negflag is false (normal behaviour), the negative entries are zeroed. The array results is also stored in the rad btilde arr array, defined in the header file pwhg rad.h. It is needed at the stage of event generation, where the underlying Born flavour structure will be chosen with a probability proportional to its entries.

The Born cross section
The Born contribution to the btilde function is evaluated as follows. When btilde is called with ifirst=0, the Born phase space is generated with a call to the gen born phsp subroutine, that, in turn, calls the user provided born phsp subroutine. For cases when the Born cross section is itself infrared divergent (for example, in the Z + jet production case), we need either a generation cut for the underlying Born configuration, or a Born suppression factor. The POWHEG BOX provides a standard form for the latter. We do not further discuss this issue here; in the forthcoming reference [13] we will illustrate this problem in great detail. The factorization and renormalization scales are set with a call to setscalesbtilde, and then the routine allborn is called. The routine computes the Born contributions for all Born flavour structures, and stores them in the arrays br born for the cross section, br bornjk for the colour correlated Born cross section, and br bmunu for the spin correlated one. Many subsequent calls make use of the Born cross sections, so it is mandatory that this call is performed first. In particular, soft and collinear remnants need the Born terms, and so do the singular limits of the real cross section. In order to understand the code of the allborn routine, it is better to assume first that the flag flg smartsig is set to false. The role of this flag is explained in detail in section 4.8.
The call to btildeborn(resborn), in the btilde function, fills the array resborn with the Born cross section for each Born flavour configuration, including the corresponding parton distribution function (pdf) factors. In case the flg nlotest is set, while computing the integral of the btilde function, an analysis routine is also called to perform a bare NLO calculation for several user-provided distributions. In the case of the Born result, this analysis routine is called within btilde at the end of a given folding sequence.

The soft-virtual cross section
The soft-virtual amplitude is described in detail in section 2.4.2 of ref. [2] in the case of massless coloured partons. We complete here the formulae given in [2] in order to include the case of massive partons. All these formulae are implemented in the POWHEG BOX, that can also deal with massive coloured partons. When massive coloured partons are present, formula (2.99) in [2] becomes The quantity I ij is non vanishing for i and j denoting any coloured initial and final-state partons. I i is non zero for i denoting any massive coloured parton. They both arise from soft radiation, and are reported in appendix A in the same form that appears in the code.
The Q term has the same expression given in ref.
E i denotes the energy of parton i in the partonic CM frame, and f i denotes the flavour, i.e. g for a gluon, q for a quark andq for an antiquark. In addition we have We stress that now the index i in the sum of eq. (4.7) runs only over the massless coloured final-state partons. In fact, the contributions in square bracket arise from collinear (rather than soft) final-state singularities, and thus apply only to massless partons. The last line arises from initial-state collinear singularities. The parameters δ 0 and ξ c are arbitrary. In the framework of the POWHEG BOX, we have set δ 0 = 2 and ξ c = 1. An analogous parameter for initial-state collinear singularities, δ I , that appears in the collinear remnants, is also set to 2. These parameters are hardwired in the code, since there is no reason to change them.
The soft-virtual contribution of eq. (4.6) is implemented as follows. The POWHEG BOX has access to the user-provided Born cross section, including colour correlations. It thus builds automatically the Q, I ij and I i contributions of eq. (4.6) to the soft-virtual cross section. The corresponding code is found in the btildevirt subroutine, in the sigsoftvirt.f file. As already stated in section 2.4, the scale Q is chosen equal to the renormalization scale µ R . The user-provided virtual cross section should then have Q = µ R . The subroutine btildevirt fills the array resvirt with the contribution of the soft-virtual cross section for all possible underlying Born configurations.

The collinear remnants
The collinear remnants G α ⊕ ⊕ and G α ⊖ ⊖ of eq. (4.2) are the finite leftover from the subtraction of collinear singularities. Their general form, in the FKS framework, is given in eq. (2.102) of ref. [2]. They are implemented in the POWHEG BOX in the MS scheme. 6 Their implementation in the btilde function has to properly handle the distributions in the z variable. This is done using the identities where x stands for either x ⊕ or x ⊖ . The z integration extends from x to 1, and thus it is performed using the rules where we have always set ξ c = 1 in the code. In this case too, a loop over all underlying Born configurations is performed, and, for each of them, all singular remnant contributions appropriate to the flavours of the corresponding incoming legs are computed. The btildecoll function requires an extra integration variable, given as its first argument, to generate a value for z. Within the btildecoll function, if the flg nlotest flag is set, a call to the analysis routines is performed to output the contribution of the collinear remnants to the user-defined kinematic plots.

The real contribution to btilde
The real contribution to the btilde function has a certain complexity, mostly due to the handling of the distributions in the ξ and y variables. The function btilde simply calls the function btildereal to get the contribution of the real cross section for each underlying Born configuration. The complex task of evaluating the real integrand is carried out in the subroutine btildereal. In this subroutine, there is a loop over all possible emitters, that envelopes its whole body. All contributions are then evaluated for a given emitter.
With this choice we avoid repeating continuously complex phase-space evaluations. For each emitter, for given underlying Born and radiation variables, there is only one real kinematics to consider. If the emitter is a final-state parton, the real phase space is generated by a call to the subroutine gen real phsp fsr. For initial-state emission, a call to gen real phsp isr is done. The program then calls the subroutine sigreal btl, that fills its array argument with the contributions of all regions (i.e. alr) that have as emitter the current emitter kn emitter. This subroutine returns an array whose elements are R α (1 − y i )ξ 2 , with i = 1 in the FSR case and i = 2 in the ISR case, rather than R α alone (R α is defined in the notation of [2]). The quantity R α (1− y i )ξ 2 has well defined soft, collinear and soft-collinear limits, that are obtained by a call to the subroutines soft, collfsr (for final-state radiation) or collisrp and collisrm (for initial-state ⊕ and ⊖ collinear regions), softcollfsr for soft-collinear limit in final-state radiation, and softcollisrp and softcollisrm for soft-collinear initial-state radiation. The handling of the ξ and y distributions require some care, and we describe it here in some detail.
We begin by looking at the final-state radiation case. The integral that we would like to perform has the form the Jacobian being given by formula (5.40) in ref. [2]. We have indicated explicitly the dependence of the Jacobian upon the radiation variables, but one should keep in mind that it also depends upon the underlying Born variables. We have assumed, in all generality, that the upper limit in the ξ integration in eq. (4.15) may depend upon y, and we have denoted it as X(y). This is in fact not the case in our choice of the final-state radiation kinematics, but it is the case for initial-state radiation (ISR). The gen real phsp fsr subroutine returns the Jacobian for the integration of the radiation variables divided by ξ, in the variable jac over csi.
In eq. (4.15) we introduce a new rescaled variableξ, whose upper bound does not depend upon y ξ = X(y)ξ . We can easily show that We now rewrite eq. (4.15) as where we should not forget that ξ is now also a function of y through eq. (4.16). Defining we get We now see that both (1 − y) ξ 2 R α and J(ξ, y, φ)/ξ in eq. (4.19) should be computed also with ξ = 0 (the soft limit), y = 1 at fixedξ (the collinear limit) and both ξ = 0 and y = 1 (the soft-collinear limit). The case of initial-state radiation is handled similarly, except that now our starting formula isB and one proceeds as before, by treating the y = 1 and y = −1 regions independently.

The btildereal subroutine
We now give a more detailed description of the way btildereal is implemented. First of all, the generation of the radiation phase space is performed according to the description given in section 5.1.1 of ref. [2] for initial-state radiation, and in section 5.2.1 for final-state radiation. The subroutines gen real phsp fsr and gen real phsp isr generate the phase space as a function of the underlying Born kinematics, and of three real variables xrad (3), that assume random values between zero and one. Since we use eq. (4.20), a common Jacobian factor xjac for the transformation xrad(3) intoξ, y and φ is also provided. The program also sets the variables jac over csi, jac over csi coll and jac over csi soft to J(ξ, y, φ)/ξ, to its collinear limit and to its soft limit respectively, and multiplies them by xjac.
In btildereal, these limits are obtained through the calls to soft, collfsr and softcollfsr, and the limits for J(ξ, y, φ)/ξ are provided by the phase space subroutines gen real phsp fsr, in the variables jac over csi coll and jac over csi soft. Notice also that, while the soft limit of J(ξ, y, φ)/ξ is y independent, this may not be the case for (1 − y) ξ 2 R α . The computed values of the real contribution, the soft, collinear and soft-collinear counterterms are divided by (1 − y)ξ (as in eq. (4.20)) and accumulated with the appropriate sign in the array resreal, indexed by the underlying Born index of the current alr. In the case of final-state radiation, the upper limit for ξ does not depend upon y, being given by formula (5.49) of ref. [2], and is set by the phase space program gen real phsp fsr in the common variable kn csimax. Its value is taken from the array kn csimax arr, indexed by kn emitter, that is filled by the routine gen born phsp when the underlying Born phase space is generated in btilde. In case of initial-state radiation, kn csimax is y dependent, and is computed by an appropriate routine when the real phase space is generated. The last two terms in eq. (4.20),ξ-independent, are also accumulated in the resreal array.
After the appropriate calls to the routine that generates the real contributions and its various limits, the program loops through all real regions (i.e. alr), and accumulates the real contributions according to eq. (4.20) or its initial-state radiation version, in an array indexed by the index of the underlying Born of the current alr. This was set in the combinatoric routines, in the array flst alr2born. The accumulated values include the underlying Born Jacobian, the real contribution or one of its various limits, jac over csi or one of its limits, and the remaining factor of 1/(ξ(1 − y)) for FSR, or 1/(2ξ(1 ± y)) in the ISR case. The sign of each contribution can be read out from eq. (4.20). Besides filling the output array resreal, if the flag flg nlotest is set, the result is also output to NLO analysis routines, that perform a parton-level NLO calculation of user-defined distributions. The analysis driver for the NLO output, i.e. the subroutine analysis driver, described in section 2.6, is then called with the flag set to 1 for the true real contribution, and 0 for all remaining terms.

The subroutine sigreal btl
The subroutine sigreal btl fills its output array argument, indexed by the alr, with the real contributions that have as emitter kn emitter. The real contribution should also be multiplied by ξ 2 (1 − y) for FSR, or ξ 2 (1 − y 2 ) for ISR, and, furthermore, should also be multiplied by the S α functions, described in section 2.4 of ref. [2]. Two flags control the behaviour of this function: flg smartsig and flg withdamp. An explanation of the role of flg smartsig is given in section 4.8. The flg withdamp flag will instead be explained in section 5.
The code is better understood if one assumes, to begin with, that these flags are set to false. In this case, the program loops over all alr. For those that have emitter equal to the current emitter, it calls the subroutine realgr, passing as argument the list of flavours of the current configuration, and the real momenta. The function is supposed to return, in its last argument, the value of R, the real matrix element squared. Next, each contribution should be multiplied by its S α factor. In the framework of the POWHEG BOX, we define the S α factor in the following way. We consider the flavour structure of the α region under consideration, and call it f α . We call R f α the corresponding real contribution. R f α can have several singular regions (see, for example, the list of pairs of indexes in the last column of table 5). Each such region is characterized by a pair of indexes in the legs of the real process, of the form (i, j). These can be the indexes of two final-state lines becoming collinear, or of an initial-and final-state line becoming collinear. According to the POWHEG conventions, one can also set i = 0, meaning that there are initial-state collinear singularities in both directions (gluon emission from initial-state partons), and they share the same underlying Born. We call I α the array of all singular regions, i.e. I α is an array of pairs of indexes. In particular, the emitter associated with the α region, together with the last parton (i.e. the nlegreal parton) form a pair that belongs to I α . Let us call it the (k, n) pair. Then, S α is given by where d ij are appropriate kinematic functions that vanish when lines i and j become collinear. The choice of the d ij function implemented in the POWHEG BOX can be found in the compdij subroutine, in the gen real phsp.f file. When the phase space is generated, compdij is called, and the array kn dijterm is filled. It is an ordered array, i.e. one always assumes i < j. For initial-state singularities it is given by the expressions where y j is the cosine of the emission direction of parton j, in the real CM frame, relative to the positive collision direction. Notice that we assume, by convention, that k = 0 means that there are collinear singularities from both the positive and negative direction with the same underlying Born configuration, as is the case when a gluon is emitted. The positive and negative collinear directions need to be considered separately if the corresponding underlying Born differs. The parameter p 1 corresponds to the parameter par diexp, defined in the pwhg kn.h include file. Its default value is 1. For FSR regions we define where p 2 corresponds to par dijexp, and is set to 1 by default. The sigreal btl subroutine builds the S α factor for each non vanishing alr. At the combinatoric stage, for each alr, a list of the singular regions associated with its flavour structure was built. This list was stored in the flst allreg array, and is used to find the indexes ij for all the singular regions of the given alr (see table 5 for an example).
One extra factor is supplied for final-state singularities if both the emitter and radiated partons are gluons. One multiplies the result by where E em and E r are the energy of the emitter and of the radiated parton, evaluated in the partonic CM. This does not change the cross section, because of the symmetry in the exchange of the two gluons, but guarantees that only when the radiated gluon becomes soft we can have a soft singularity. As a final step, the multiplicity of the current alr and the ξ 2 (1 − y 2 ) (for ISR) or ξ 2 (1 − y) (for FSR) factors are included.

The flg smartsig flag
In several processes, organizing the program in the way described in the previous section would lead to several calls to the same matrix elements, with a consequent waste of computing time. In the process of W production, for example, the matrix element is the same whether it is a ud or a cs collision. Or it may differ only by a Cabibbo-Kobayashi-Maskawa matrix element factor. If the flag flg smartsig is set to true, upon the first call to the sigreal btl subroutine, the routine finds matrix elements that differ only by a constant factor, and builds appropriate array of pointers and proportionality constants. Upon subsequent entries in the program, multiple calls to proportional matrix elements will thus be avoided. All subprograms that invoke user routines for matrix element calculations are affected by the setting of flg smartsig, and implement the same mechanism for avoiding useless calls to user routines. By closely examining the code, the reader can find out how this works. The output of the program, however, should be independent on the setting of this flag. Only speed will be affected. In fact, the random number generator, used to set up the random kinematics to check matrix elements for proportionality, is reset to its original value after the equivalent matrix elements are found. In order to understand how the programs operate when the flg smartsig flag is turned on (i.e. is set to true) it is better to examine the allborn routine. Upon the first call to allborn, the current random number is saved, and then the Born cross section for all flavour components is computed, for several value of randomly chosen external momenta. An integer array equivto is set up, its value being -1 by default. If the Born contribution for the j th Born flavour configuration is found to be proportional to a previous k th Born flavour configuration (with k < j), the value of equivto(j) is set equal to k, and the array of real numbers equivcoeff(j) is set to the proportionality constant. Upon subsequent calls to allborn, this information is used to avoid further calls to setborn, whenever possible. Notice the use of randomsave and randomrestore. By enclosing a set of instructions between a randomsave and a randomrestore call, we make sure that the random number sequence is not altered by the inserted instructions.
If the flag flg smartsig is set to false, all calls to the matrix element routines are performed, but, thanks to the saving and restoring of the random number sequence, the output of the program should be independent of it. In other words, using flg smartsig=true should only accelerate the program, without altering the output. This feature can be used to check that nothing weird has happened in the setup phase of the equivto and equivcoef arrays.

The soft, collinear and soft-collinear limit functions
These limit functions could be obtained, in principle, by numerical methods, using the full real contribution. We have, however, preferred to compute them using the factorization formulae for collinear singularities, and the eikonal formulae for soft emission, to avoid numerical instabilities. Furthermore, the real contributions, and all the manipulations performed by the combinatoric package can be tested for consistency (see appendix F.1).
These soft, collinear and soft-collinear routines are collected in the file sigcollsoft.f. The subroutines relevant for the computation of the btilde function are reported in table 7. The basic formulae for the collinear limits are collected in appendix B. Here we illustrate soft soft limit collfsr collinear limit for FSR  the code of the collfsr routine. This routine in turns calls collfsrnopdf, and provides the luminosity factor to its output. Another ingredient that is necessary to build the collinear limit functions is the direction of the transverse momentumk T of the radiated parton with respect to the emitter in the collinear limit (see appendix B), defined in the partonic CM frame. This is a function of the emitter direction and of the azimuthal angle φ. The origin of the azimuth. i.e. the plane along which φ = 0 (or π) should be consistent with what gen real phsp fsr does in the collinear limit. A change of φ → φ + π is instead irrelevant. Our convention for the origin of φ is to take a plane containingk em (the momentum, in the CM frame, of the parton that will undergo the splitting in the underlying Born, see ref. [2]), and the third axis. The subroutine buildkperp, called from the collfsrnopdf routine, constructs the 4-vector kperp(0:3). This vector is normalized arbitrarily, and has zero time component. Its only requirement is that it should be parallel tok T . Its modulus squared is also returned by the buildkperp routine. For each alr sharing the current emitter, the subroutine collfsralr is called, with kperp also passed as an argument. The values of ξ and x, defined as (4.28) and x/ξ are all passed to the subroutine, that is meant to work also if ξ and x vanish, with their ratio remaining finite. The subroutine collfsralr implements the formulae given in appendix B in a straightforward way, multiplying them by ξ 2 (1 − y), and taking care to use the x/ξ variable when necessary, in such a way that one never divides by x or ξ. There is only one caveat to keep in mind. The collinear approximation is meant to reproduce an R α contribution. In the collinear limit of the α region, this coincides with R, since the S α factor becomes 1 in this limit. We should remember, however, that, in case the emitter and radiated partons are both gluons, we have also supplied a factor 2E em /(E em + E r ), which becomes 2(1 − x) in the collinear limit (see eq. (4.27)). In addition, in this case, we supply a factor of 1/2 to account for the two identical partons in the final state. Thus, an extra factor of (1 − x) is supplied.
The case of initial-state collinear singularities is handled similarly. In this case we simply have x = 1 − ξ, and, as before, one should evaluate all contributions taking care of never dividing by 1 − x. The routine collisralr implements the formulae in appendix B. It carries an integer argument i that corresponds to 1 for the collinear ⊕ direction and 2 for the ⊖ direction.
In order to get the soft-collinear limits, the subroutines softcollfsr, softcollisrp and softcollisrm simply call the corresponding collinear subroutines setting temporarily kn csi equal to zero.
The soft limit is obtained with the subroutine soft, that in turn calls softalr to get the soft contribution of a single alr. It implements formula (A.1) (for ǫ = 0), multiplied by ξ 2 (1 − y) for an FSR region, or ξ 2 (1 − y 2 ) for an ISR region. In formula (A.1) there are two powers of the soft momentum k in the denominator. We then factorize k 0 in front of the four-vector k and define k = k 0k , so that (ξ = 2k 0 / √ s) is finite. The vectork carries the information of the direction of the radiated parton. In practice, we thus replace k →k in eq. (A.1), and supply the factor 4(1−y)/s or 4(1−y 2 )/s. The vectork should equal k/k 0 in the limit ξ → 0, keeping y and φ fixed, for the given underlying Born kinematics. It is computed in the subroutine gen real phsp fsr and gen real phsp isr, with a call to the subroutine setsoftvec. Furthermore, we should remember that the functions S α have non-trivial soft limits. They should be computed in the soft limit and multiplied by the result. For this purpose, the routine compdijsoft, in the gen real phsp.f file, computes the soft limit of the d ij functions, and stores them in the array kn dijterm soft. This array has a single index, since the second one is the index of the soft parton. There is no need to consider the other d ij terms (those not involving the soft parton) since in the soft limit they are finite, and do not contribute to S α . Observe that compdijsoft assumes that the d ij terms are homogeneous in k 0 , which is the case if the two parameters par diexp and par dijexp are equal. If they differ, the fastest vanishing one (in the k 0 → 0 limit) will dominate S α . It is in principle possible to experiment with settings that have different par diexp and par dijexp, provided that the slowest vanishing ones are excluded in some way from the list. At the moment, this possibility has not been investigated. As a last point, special treatment was required for the FSR collinear limit of two outgoing gluons. Here, in fact, no action should be taken: one should provide a factor (1 − x), that equals one in the soft limit.

Tuning the real cross section in POWHEG
In POWHEG it is possible to tune the contribution to the real cross section that is treated with the Monte Carlo shower technique. This was pointed out first in ref. [1], where the POWHEG method was formulated, and it was first implemented in ref. [10]. In POWHEG there is the possibility to separate the real cross section, in a given singular region α, as follows where R α f has no singularities and only R α s is singular in the corresponding region. In practice, the separation may be achieved, for example, using a function of the transverse momentum of the radiation 0 F k 2 T 1, that approaches 1 when its argument vanishes, and define One carries out the whole POWHEG-style generation using R α s rather than R α . The contribution R α f , being finite, is generated with standard techniques, and fed into a shower Monte Carlo as is.
More generally F can be chosen as a general function of the kinematic variables, provided it approaches 1 in the singular region. This turns out to be useful in all cases when the ratio R/B in the POWHEG Sudakov exponent becomes too large with respect to its corresponding collinear or soft approximation (see for example ref. [7]). In this case, radiation generation becomes highly inefficient. A general solution to this problem (which has already been implemented in refs. [14] and [13]) is to chose the function F in the following way: if the real squared amplitude (no parton distribution functions included), in a particular singular region, is greater than five times its soft and collinear approximation, then F is set to zero, otherwise is set to one. We also stress that this procedure remedies automatically to the Born zeros problem examined in ref. [7].
This feature is implemented in the POWHEG BOX. By setting the flag flg withdamp to true, this behaviour is turned on. When computing the btilde function, the real contribution will always be multiplied by a damping factor, supplied by the routine bornzerodamp. The damping factor is not necessary in the soft and collinear counterterm contributions, since, in these cases, we will certainly have F = 1. The routine bornzerodamp takes as argument the α-region index (i.e. the alr), the value of R α (i.e. the real cross section without the parton distribution function factors) and the value of its collinear and soft limits (also without pdf factors). It returns the damping factor as its last argument. The presence of the collinear and soft limits of R α in the arguments of the subroutine, allows the user to set a damping factor that depends upon the distance of the real contribution from its collinear or soft approximation, as stated previously. This routine can be easily modified by the user: for example, the sharp theta function adopted in the POWHEG BOX can be replaced by a smoother function, the factor of five can be changed, and so on.
One can see the effects of setting the flg withdamp flag in the sigreal btl subroutine. The soft and collinear limits of the real contribution are obtained with calls to the subroutines collbtl and softbtl (the btl ending standing for btilde), that make use of the subroutines already described previously for the calculation of the soft and collinear limits.
If a damping factor is used, the leftover term R α f of eq. (5.3) needs to be handled independently. The subroutine sigremnants deals with this term together with the real terms that do not have any associated singular region, if there are any. It has the same calling sequence of btilde. It is meant to be integrated using the mint integration program, that allows for the possibility of generating phase space kinematics distributed with a probability proportional to the integrand, after a single integration. Within sigremnant, the contribution from the regular real graphs can be integrated with an arbitrary phasespace parametrization, that we choose to be the initial-state radiation parametrization, i.e. the gen real phsp isr subroutine. Both the underlying Born configuration and the real phase space are generated within sigremnant. The regular contributions to the real cross section are returned by the subroutine sigreal reg. Within sigreal reg, by making use of the list of regular real contributions (that is setup when the combinatorics is carried out), the regular contributions to the cross section are computed. The contribution of the R α f terms is more delicate. This is computed with a loop through all possible emitter values using the global variable kn emitter. The real phase space is set according to it. Then the subroutine sigreal damp rem (where damp rem stands for damp remnants) is invoked. This subroutine is very similar to the sigreal btl subroutine. For all alr that share the current emitter, the corresponding R α is computed, the damping factor dampfac is computed, and the real result is multiplied by (1-dampfac) (in sigreal btl the result was instead multiplied by dampfac).
Notice that sigreal damp rem and sigreal btl carry out very similar tasks, the only difference being the presence of the factor (1 − F ) in the first, and F in the second. This fact is exploited in the POWHEG BOX by implementing both of them via a call to a single subroutine sigreal btl0, that carries an extra integer argument. When the extra argument is zero, the multiplication factor is set equal to F , and when it is 1, it is set to (1 − F ).

The initialization phase
The preparation of the grids for the generation of the events is carried out in the subroutine bbinit. Its most important task is to execute the integration of the btilde function, determine the fraction of negative weights, compute the total cross section, and, if required, plot the NLO distributions. At the first step, the subroutine mint [37] is invoked with imode set to 0. In this mode, mint integrates the absolute value of btilde, and sets up the importance-sampling grid. Next, mint is invoked with imode set to 1, and the flag negflag set to true. In this mode, mint computes the negative contribution to the btilde function. No histograms for the NLO results are generated up to now. At this stage, negflag is set to false, the flg nlotest is set to true, and mint is invoked again on btilde to compute the positive contribution to the integral. At this stage, the NLO histograms are filled. We stress that also negative weights, if present, will end up in the histograms, so that the NLO histograms should exactly correspond to a standard NLO calculation. The positive weight total cross section computed by mint is combined with the negative weight part, and stored in a variable rad sigbtl, defined in the header file pwhg rad.h. After this, the contribution to the cross section from the real remnants is also computed. These are terms that arise either because there are real contributions with no associated singular regions, or because flg withdamp is set to true (see section 5). The remnant cross section calculation is performed with an independent set of grids. Also the remnant contributions will end up in the NLO histograms. The remnant cross section is stored in the variable rad sigrm, and the total in rad sigtot=rad sigrm+rad sigbtl.
When mint is called with imode equal to 1, the upper bounding envelope of the integrated function is also computed, and stored in an array. This upper bounding envelope will be used later for the generation of unweighted events. The arrays xgrid, ymax, xmmm are all necessary for the generation of the events, and they can be saved in a file, so that the time consuming initialization phase does not need to be repeated if one wishes to generate more events in the same context.
The final important task of the bbinit routine is the call to the do maxrat subroutine, that sets up the normalization of the upper bounding function for radiation, thus preparing the system for the generation of full events. This will be described in section 7.1.1.
In bbinit an initialization call to the function gen, that generates the underlying Born configuration, is also performed.

The generation of radiation
There are two components that contribute to the generation of radiation: one arises from theB term and the other from the remnant. The total cross section for the two contributions is stored in the global variables rad sigbtl and rad sigrm. When radiation is generated, one begins by picking one of the two cases with a probability proportional to the respective cross section. In the POWHEG BOX, the generation of radiation is carried out in the subroutine pwhgevent, that begins precisely by performing this random choice. We describe, in turn, the two components.

Radiation from theB component
This begins with the generation of an underlying Born configuration distributed according to theB function. Radiation is generated using the POWHEG Sudakov form factor (see eq. (4.21) of ref. [2]) If R has been separated into a regular and singular part, according to eq. (5.1), only the singular part will appear in the Sudakov form factor. According to the notation of ref. [2], the square bracket with the α r suffix indicates that all quantities inside the bracket should be taken relative to the α r region. So, the n + 1 phase space in eq. (7.2) is given as a function of the underlying Born phase spaceΦ αr n , taken at the point Φ n , and of the radiation variables Φ rad , according to the phase-space mapping defined for the α r region. In POWHEG, the individual Sudakov form factors for each α r are further assembled into a product of form factors sharing the same underlying Born and the same radiation region. As we have seen, each singular region is characterized by an emitter and an emitted parton. Within POWHEG, the emitted parton is always the last one, while the emitter can be any light coloured parton in the initial or final state. There is one single initial-state radiation kinematics, independent of which incoming parton is emitting. The final-state radiation kinematics depends instead upon the index of the emitter. We introduce here the label rr to specify the radiation region kinematics: rr = 1, if kn emitter=0, 1 or 2, and rr = kn emitter -flst lightparton + 2, if kn emitter ≥ flst lightparton. In fact, within the POWHEG BOX framework, the radiation kinematics is the same for kn emitter=0, 1 or 2. We write and the notation {α r |f b , rr} indicates the ensemble of all α r that share the same underlying Born f b and the same radiation region kinematics rr. It makes then sense to define since the phase space only depends upon the radiation region kinematics rr, and not on the specific α r . With this definition we have In order to generate the radiation, the POWHEG BOX uses the highest-bid algorithm. For each rr, it generates a p T value with a probability distribution equal to The program then selects the highest p T value, and thus fixes the corresponding rr region. The α r value is picked in the ensemble {α r |f b , rr}, with a probability proportional to the corresponding R αr . The individual p T values for each ∆ f b rr are generated with the veto method. We define dΦ rad = J rr dξ dy dφ, (7.8) where J rr is the Jacobian of the Φ rad phase space, when written as function the three radiation variables: ξ, y and φ. We then introduce a suitable upper bounding function U rr (ξ, y), and determine its normalization N rr f b by requiring for all Φ n and Φ rad . In order to generate the radiation, one first generates a p T value according to the probability distribution and then generates the corresponding values for the radiation variables ξ, y and φ, distributed with a probability proportional to U rr . At this point one accepts the event with a probability 1 N rr If the event is rejected, one goes back to the beginning, and generates a new p ′ T value, smaller than the current one, using the probability Ideally, the function U rr should be chosen in such a way that the equation r = ∆ U rr (p T ) can be easily solved for p T , and a set of radiation variables, distributed according to U rr , are easily generated. In practice we only require this second feature, and solve for the equation r = ∆ U rr (p T ) numerically (in this way, if r is a uniform random number between 0 and 1, the corresponding p T is distributed according to eq. (7.10)). Several choices are possible. In appendixes C and D we describe the functions used in the POWHEG BOX.
In the POWHEG BOX, a straightforward variant of the veto method is used several times. Suppose that we know a function F (Φ rad ) such that Then we can first use the veto method accepting events with a probability If the event is accepted, we go through a second veto, accepting the event with a probability This is obviously the same as accepting according to the probability of eq. (7.12), but it has the advantage that, in many instances, when a veto is imposed, one only evaluates the function F , without the need to compute the real and Born contributions. This method is also applied in the POWHEG BOX by replacing eq. (7.9) with 17) where N rr f b (ξ, y) is a stepwise function in the ξ and y radiation variables, and where One determines the step function N rr f b (ξ, y) so to have the smallest values that satisfy the first bound of eq. (7.17). Then, the method described above is used, where, according to eq. (7.14), so that events are first accepted with a probability N rr f b (ξ, y)/N rr f b . Within the POWHEG BOX, the generation of the underlying Born kinematics is performed by the routines gen btilde, that invokes gen with the appropriate arguments. After that, the subroutine gen uborn idx is called and it generates the underlying Born flavour configuration. The purpose of this call is to pick a random f b configuration, with a probability proportional to its contribution to theB value at the given kinematic point. By inspecting the gen uborn idx subroutine, we see how this task is performed. We recall that when gen returns, the last call to the btilde function has been performed at the generated Born kinematics configuration. The contribution of each flavour component of the B cross section is stored in the array rad btilde arr. In gen uborn idx a generic utility subroutine pick random is invoked with arguments flst nborn, rad btilde arr and rad ubornidx. The pick random subroutine returns the value rad ubornidx with a probability proportional to rad btilde arr(rad ubornidx). The variable rad ubornidx represents the index of the currently generated underlying Born configuration. There are several other rad prefixed global variables that need to be set, in order to perform the generation of radiation from the current underlying Born. First of all, a list of all alr, that share the current underlying Born structure, should be constructed. This is done by filling the array rad alr list, of length rad alr nlist, using flst born2alr, that was constructed at the combinatoric stage of the program.
The variable denoting the singular region rr is represented by the global variable rad kinreg in the POWHEG BOX. As already stated, it takes the value 1, for initial-state radiation, and the value rad kinreg = kn emitter -flst lightparton + 2 for final-state radiation. Not all values of rad kinreg may be associated with an active radiation region for the given underlying Born. A logical array rad kinreg on is set up, with its entries indexed by the rad kinreg values. The entries set to true correspond to active radiation regions. The array rad kinreg on is set in the subroutine gen uborn idx. In table 8 we summarize the global variables relevant to the generation of radiation. We do need all these variables, because we typically need to consider the alr that share the current underlying rad ubornidx index in the current underlying Born flavour structure rad alr list list of alr's that share the current underlying Born rad kinreg on marks the active singular regions for the current underlying Born rad kinreg current singular region Table 8: Global variables that characterize the generation of radiation for the given underlying Born configuration.
Born and the current kinematic region. This is done by going through the rad alr list, and considering only the alr's whose emitter is compatible with the current rad kinreg value.

Normalization of the upper bounding function
Before the radiation is generated, the normalization of the upper bounding functions should be computed. This task is carried out by the subroutine do maxrat, which, in turn, is invoked in the bbinit subroutine. The normalizations N rr f b (ξ, y) and N rr f b are stored in the arrays rad csiynorms(rad ncsinorms, rad nynorms, rad nkinreg, flst nborn), rad norms(rad nkinreg, flst nborn). By inspecting the do maxrat routine, one can see that there is a mechanism (that is better understood by studying the code) to store and retrieve previously computed values for these arrays. The core of the do maxrat routine is a loop repeated nubound times, where nubound is a parameter set in the POWHEG BOX data file. Within this loop, gen btilde and gen uborn idx are called in sequence. After that, radiation kinematic variables are set up randomly. The program then loops over all valid radiation regions (i.e. rad kinreg values). For the ISR radiation region, the initial-state radiation phase space is generated, with a call to gen real phsp isr rad0, and for the final state a call to gen real phsp fsr rad0 is performed. The task of effectively increasing the norms is performed in the routine inc norms. The phase space generation routines are slight variants of the phase space routines previously encountered. They perform a similar task, but are dependent upon the rad kinreg setting (rather than the kn emitter value) and furthermore they compute the kinematics starting from the values of kn csitilde, kn y and kn azi (details are found in the gen real phsp.f code). The inc norms subroutine first sets factorization and renormalization scales for radiation (the set rad scales call), then computes the Born and real cross section. The real cross section is multiplied by the Jacobian J rr . The upper bounding function U rr is returned by the function pwhg upperb rad(), and the ratio is formed. Its maximum gives the N rr f b (ξ, y) normalization. Notice that the subroutines sigborn rad and sigreal rad return respectively the value of the Born cross section for the current underlying Born (i.e. for the underlying Born indexed by rad ubornidx), and the R rr real cross section. Thus, the subroutine sigreal rad is similar to sigreal btl, but it only computes the cross section contributions that share the underlying Born stored in rad ubornidx and the singular region stored in rad kinreg.
The first two indexes, ξ i and y i , in the array rad csiynorms represent the step number of the stepwise function N rr f b (ξ, y) (i.e. they are integer functions of ξ and y respectively). Their form can be found in the code, but may be subject to future modifications.
For the purpose of tuning the choice of the upper bounding function, each evaluation of formula (7.20) is histogrammed, and the histogram is printed in TOPDRAWER format at the end of the upper-bound evaluation, in the file pwghistnorms.top. The efficiency in the generation of radiation will depend upon the shape of this histogram. It is roughly estimated by the ratio of the average value of formula (7.20) over its maximum. Highest efficiencies are achieved if the histogram goes sharply to zero near the maximum value of the abscissa. Lowest efficiencies are characterized by histograms with long tiny tails.

The gen radiation routine
This routine is invoked from pwhgevent, after the call to gen btilde and gen uborn idx. It loops through the valid radiation regions (i.e. the allowed rad kinreg values) and it calls either the gen rad isr or the gen rad fsr routines, that generate and store in the global variables kn csi, kn y and kn azi a set of kinematics radiation variables. It also returns, in its argument, the value of the radiation transverse momentum squared, t, that is defined in the function pwhg pt2. If t is the largest generated so far, the kinematics radiation variables and the rad kinreg value are saved in local variables, because they are the candidate for the highest bid method (discussed in appendix B of ref. [2]). At the end of the loop, if no call has generated any radiation, the routine exits, after setting kn csi to zero, which signals the generation of a Born-like event. If radiation was generated, the saved values of the radiation variables are restored in global variables, and the appropriate phase-space generation routine is invoked. The sigreal rad routine is invoked again, followed by a gen real idx call. Besides returning R rr , sig real rad also stores each cross section contribution, so that, after the radiation kinematics is generated, the corresponding flavour structure can also be generated with a probability proportional to each cross section contribution. This is what the gen real idx call does. The index (i.e. the alr) of the corresponding real flavour structure is stored in the variable rad realalr.
The subroutine sigreal rad, as stated earlier, is similar to sigreal btl, but it only computes the cross section contributions that share the underlying Born stored in rad ubornidx and the singular region stored in rad kinreg. It also takes care to avoid generating gluon splittings into heavy quark pairs below threshold. More precisely, if the radiation k 2 T , returned by the function pwhg pt2(), is below rad charmthr2 for g → cc or below rad bottomthr2 for g → bb, the corresponding result is set to zero.

The gen rad isr and gen rad fsr routines
These routines generate a p T value according to the Sudakov form factors in eq. (7.6). They make essential use of the function pt2solve(pt2,i), that represents the function log∆(p old T )/∆(p T ). The expression of log∆(p T ) is as given in eq. (D.18), in the case of initial-state radiation, or as given in eqs. (C.6) or (C.10) (in this last case, depending upon the form of the upper-bounding function used, controlled by the value of the integer variable iupperfsr). The value of p old T corresponds to the last vetoed p T . The value log∆(p old T ) is represented by the variable xlr. Thus, solving pt2solve for zeros represents the first step of each iteration of the veto procedure.
We examine now in detail the case of final-state radiation, with iupperfsr=1, that also illustrates how the other cases work. Looking at the function pwhg upperb rad, we see that, for iupperfsr=1, the upper bounding function (up to the normalization factor) has the form U rr N rr where α PW S is the variable-flavour two loop expression for the strong coupling constant used for radiation generation throughout the POWHEG BOX program. It is set by a call to set rad scales (the choice of scales are discussed in detail in appendix E). In appendix C, it is shown how to deal with this form of the upper bounding function for the case of one loop, constant flavour α S , and for constant N rr (i.e. not dependent upon ξ and y). We thus begin by considering the upper bounding functioñ We must choose b rad 0 and Λ rad in such a way that α rad S (µ) α PW S (µ) , (7.24) in all the range µ > p min T , where p min T is the minimum allowed p T for radiation. If this inequality is fulfilled, we will haveŨ rr U rr in all the relevant range. The value of b rad 0 is taken equal to while Λ rad is computed and stored in the global variable rad lamll by a call to the subroutine init rad lambda at initialization stage, from the routine init phys.
The routine gen rad isr proceeds by initializing the variable unorm to the value of N rr f b , stored in the rad norms array. The variable unorm is also made available, via a common block, to the function pt2solve. In the same way, the value of kt2max, appropriate to the current kinematics and radiation region, is computed and made available to the pt2solve routine, together with the value of Λ rad and the number of flavour (i.e. 5) to be used in b rad 0 . At this stage the function pt2solve is in the condition to operate properly. Its return value, for iupperfsr=1, corresponds to formula (C.6). The veto loop is started, with the variable xlr set to the log of a uniform random number 0 < r < 1. The zero of the pt2solve function is found (using the dzero CERNLIB routine), which thus corresponds to a p T value that solves the equation log(r) − log ∆ (Ũ rr ) (p T ) = 0. (7.26) If p 2 T (denoted by t in the POWHEG BOX) is below the allowed minimum value, a negative t is returned to signal the generation of an event with no radiation (i.e. with Born-like kinematics). Otherwise a sequence of vetoes is applied. First, the event is accepted with a probability α PW , (7.27) and vetoed otherwise. After this veto is passed, the distribution of eq. (7.22) has been corrected for the use of α rad S , instead of the correct one, α PW S . At this stage, ξ is generated (at fixed p T ): its probability distribution is uniform in its logarithm, as can be evinced from eq. (C.6). The value of y is obtained by solving for y the k 2 T = p 2 T definition for FSR, i.e. eq. (C.2). At this stage we can compute a further veto, accepting the event with a probability N rr which is the number returned by the subroutine uboundfct. After this veto, ξ and y have been generated with probability exp − U rr θ(k T − p T ) dξ dy dφ 2π U rr dξ dy . (7.29) At this stage, the Born cross section is computed with a call to sigborn rad, a uniform azimuth for radiation is also generated and also sigreal rad is called to compute the real cross section. One now vetoes again accepting the event with a probability and, after this, the ξ, y and φ variables have been generated according to the probability which is the desired result.

Remnant radiation
Within the subroutine pwhgevent, the generation of an event with the remnant component of the cross section is carried out as follows. First the subroutine gen sigremnant is invoked. This subroutine uses the routine gen to generate a point in the full phase space, distributed with a probability proportional to the sigremnant cross section, using the grids previously prepared, as described in section 6. The phase space point remains stored in the kinematics global variables. After that, the gen remnant subroutine is invoked.
This subroutine generates the flavour structure of the current event with the appropriate probability. This is possible because the subroutine gen remnant stores in the global arrays rad damp rem arr and rad reg arr (defined in pwhg rad.h) each contribution to the cross section for the last kinematics point computed. The gen remnant subroutine picks a contribution with a probability proportional to the values stored in these arrays. If the contribution is a remnant (described in section 5), its index is stored in rad realalr, and the corresponding underlying Born and emitter is found. The radiation phase space is thus generated again with this value of the emitter, and the same values for the three parameters used to parametrize the radiation phase space in sigremnant. These parameters are stored by the sigremnant subroutine in the global array rad xradremn. The recalculation of the radiation phase space is necessary, since only in the case when gen remnant picks the last contribution computed, the phase space would already have the appropriate settings.
The gen remnant subroutine also returns in its integer argument the value 1 for remnant contributions or the value 2 for regular contributions. If the contribution is from a regular part, its index is retrieved and stored in rad realreg, and the ISR phase space is used, since this is the one we have chosen to use for all regular contributions.

Completion of the event
For simplicity, in the POWHEG BOX, one always assumes that there is an azimuthal symmetry, so that, in the generation of the Born phase space, one can always require that some reference particle in the final state lies on the xz (or yz) plane, where z is the direction of the beam axis. At the end of the event generation, a random azimuthal rotation of the whole event is performed. This is done within the pwhgevent routine, through a call to the subroutine add azimuth.
Besides setting up the kinematics and the flavour structure, in order to pass the event to the Les Houches Interface for User Processes [38] (LHIUP from now on), we must also decide up to which scale the subsequent (SMC generated) shower should start. In case of a btilde generated event, this scale should coincide with the radiation transverse momentum. In case of remnant or regular contribution, this choice is to some extent ambiguous. In order to maintain some continuity of the remnant events with the btilde events, we also set this scale to the radiation transverse momentum. For regular contributions, this value is better decided on the basis of the specific process, and an appropriate function pt2max regular should be provided by the user, in the file pt2maxreg.f. The global variable rad pt2max is set to the maximum p T for the subsequent shower. It will be used in the LHIUP interface to set the variable SCALUP.

The Les Houches interface for user processes
At last, the generated event is put on the LHIUP interface. The scale for subsequent radiation is setup, and colours are assigned to the incoming and outgoing partons. ForB generated and remnant events, this task is carried out by the subroutine gen leshouches. For regular remnants, a special routine, gen leshouches reg, does the job and should be provided by the user in the file LesHouchesreg.f. The different treatment in the two cases is due to the fact that, in the case ofB and remnant events, we have a standard method to assign colour, that is correct in the singular region. For regular contributions, instead, other methods should be used, like resorting, for example, to the planar limit of the cross section formulae.
In general, there is much room for improvement in the technique used for colour assignment [2]. We do not consider this a crucial problem at the present stage. However, if the need of a better colour treatment will emerge, it is clear that the user should provide more colour information. We thus limit ourselves, in the present case, to an approximate colour implementation that is general enough to be process independent. What we do is to assign colour on the basis of the underlying Born configuration first. Then, depending upon the region we are considering, we assign the colour of the real emitter and radiated parton as if a collinear splitting process had really taken place. In the planar limit, this yields a unique colour prescription for the emitter and the radiated parton, except for the case of a gluon splitting into two gluons, that yields two possible colour assignments with equal probabilities. In figures 3 and 4 two particular cases are illustrated.  The routine gen leshouches begins with a call to born lh, that sets up a few eventspecific LHIUP parameters, like the flavour, status and mothers of the underlying Born incoming and outgoing particles. In order to understand this part of the code, the reader should refer to the specifications of the LHIUP [38]. The born lh subroutine sets up also the colours of the underlying Born process, by calling the borncolour lh subroutine. This subroutine is process dependent, and must be provided by the user. It should return, in the LHIUP, a planar colour connection, with a probability proportional to its Born contribution in the planar limit. For the most simple processes, like for example Z production, there is one single planar colour connection, i.e. the incoming quark and antiquark should have complementary colours. For more complicated processes, more colour structures can arise. In the planar limit, different colour structures do not interfere, so it is possible to generate a single colour structure with a probability equal to the corresponding contribution to the cross section. In some cases, it may be useful to include also some colour-suppressed contributions. This is the case, for example, in heavy flavour production, where the leading colour configuration always leads to a heavy-flavour pair in an octet colour state. Singlet production may be more suited to the direct production of bound states, and so, it may be better to include it. This can be done, as long as different colour configurations do not interfere with each other.
In the case when ξ = 0, a Born event is produced, with a very low value of the SCALUP variable, so that no further radiation is generated by the SMC. Within the gen leshouches subroutine, this is achieved by calling born lh, by copying the kn pborn momenta on the LHIUP (a task carried out by the subroutine momenta lh) and by setting SCALUP to the minimum radiation transverse momentum. If ξ = 0, radiation has taken place. The routine born lh is called first, and one more parton is added with the flavour of the radiated parton; the leg corresponding to the emitter in the real graph is assigned the correct flavour. The colours are assigned on the basis of the Born colour already stored in the LHIUP. The subroutine setcolour rad is used to perform this task for both initial-and final-state radiation.
A final task of the LHIUP subroutine is to put on the interface the intermediate resonances.
The LHIUP specifies how resonances should be put in the interface. This information should be made available to the shower program, since resonance masses must be preserved by the shower. This is achieved by calling the user routine resonances lh, which calls the add resonance routine for each particle id of intermediate resonances, specifying also its decay products.

Conclusions
In this work, we have introduced and documented the POWHEG BOX, a computer framework for the construction of a POWHEG implementation of any given NLO process. The POWHEG BOX code is available at the http://moby.mib.infn.it/~nason/POWHEG/.
In the POWHEG BOX package, the user can found three directories: W, Z and VBF H. They contain the code for W , Z and Higgs boson production in vector boson fusion, and can serve as template for any further process that a user may want to implement.
We would like to emphasize that the POWHEG BOX is a tool to develop programs. It is not something ready to run out of the box. Thus, even in order to compile the examples, the user should examine the Makefile, and make sure that the pdf and jet libraries are available in the system and are linked with the correct path. The copy of the code present in the repository is the SVN current version, marked with its version number. From time to time, we will make new SVN versions available as we implement new processes ourselves, or following important code improvements.
A byproduct of our work is the implementation of the NLO corrections for an arbitrary hadronic collision process, within the Frixione, Kunszt and Signer (FKS) subtraction scheme. The authors of ref. [32] have also proposed a general FKS implementation for e + e − processes, and they are currently extending it to hadronic collisions [39]. We stress that in our approach, however, the role of the NLO calculation is only meant to test the consistency of the implementation, and no effort was made to improve its efficiency.

Acknowledgments
The work of S.A. has been supported in part by the Deutsche Forschungsgemeinschaft in SFB/TR 9.

A. Soft contributions
In this appendix we document the calculation of the soft contribution in the FKS subtraction framework. The real cross section in the soft approximation is given by where B ij is the colour correlated Born cross section of eq. (2.6), and C i is the Casimir invariant for the i th leg. R f has no singularities as the momentum of the radiated parton, k, goes to zero.
(A.6) s b being the underlying Born CM energy squared. The range of U (x, y) must cover the range of the radiation variables for the given underlying Born configuration. A practical restriction for the range of U (x, y) is We want to generate p T uniformly in ∆ (U ) (p T ) = exp − U (x, y) θ(k T − p T ) dx dy dφ , (D. 5) in the given range. We assume 0 φ 2π. Trading y for k 2 T we find (D.8) The x integration can be performed to yield dx dy 2π 0 dφ U (x, y) θ(k T − p T ) = We observe that In ref. [3], it is suggested to use the bound with q 2 = k 2 T max + s b , (D. 12) and to defineṼ MS . If called with n < 0 it uses, as number of flavours, the number of quarks with mass less than the input scale. In the evaluation of theB function, the number of flavours st nlight is used. This is set by the user in the init couplings routine.

E.2 Scales and couplings for radiation
The choice of scales and couplings in the generation of radiation requires particular attention, since the shape of the Sudakov peak, in the radiation transverse momentum, is deeply affected by it. It is discussed in detail in ref. [2]. Here we report how this is actually done in the POWHEG BOX. The relevant code is in the subroutine set rad scales(ptsq). It is typically invoked with the radiation transverse momentum as argument. With this choice one can achieve, in some cases, complete next-to-leading logarithmic (NLL) accuracy in the POWHEG Sudakov form factor [3,2]. By inspecting the code, we can see that it sets st mufact2 and st muren2 to ptsq. The factorization and renormalization scale factors, st facfact and st renfact, are not used in this context. The program also takes care that the factorization scale never goes below the minimum allowed value in the pdf's. The strong coupling is then evaluated, the number of flavours is taken as the number of quarks with mass below the ptsq scale. Furthermore, the strong coupling constant is multiplied by the factor given in formula (4.32) in ref. [2]. This factor improves the NLL accuracy of the Sudakov form factor.
During the generation of radiation, we need a simplified one loop expression for the running coupling that is an upper bound of the running coupling that we use. We choose and we fix Λ ll by requiring where the scale µ 0 is the minimum allowed value for the renormalization scale, and is taken equal to 2Λ (5) MS . The value of Λ ll is stored in the global variable rad lamll.

F.1 Checking the soft, collinear and soft-collinear limits
In the POWHEG BOX, the routine checklims, through a call to the subroutines checksoft and checkcoll, allows the user to check the limiting behaviours of the real squared amplitudes, against their soft and collinear approximations. This routine has been used as a debug feature in the developing stages of the implementation of specific processes. If activated by the flags dbg softtest and dbg colltest in the init phys.f file, it can be used now to check if the Born, the colour-correlated and spin-correlated Born amplitudes (used to build the limiting expressions of the real amplitudes) are consistent with the real contributions, computed in the kinematic configurations where a gluon becomes soft or when it becomes collinear to another parton or when two quarks of opposite flavours become collinear. The double soft-collinear and collinear-soft limits are also tested. They do not depend upon the real squared amplitudes, but only upon the Born amplitudes. They have been used initially to check the consistency of the POWHEG BOX code, but they can also be used to perform some checks of the colour correlated Born amplitudes during the development of new processes.

F.2 Names
In the present work, we have made little references to the names of the fortran files where the various subroutines are stored. We assume that the reader can find them out by using standard command-line tools. However, while we have not spent much energy in deciding how to organize fortran files, the include files are rather well organized. For example, the include file pwhg flst.h declares common-block variables that refer to flavour structures. All these variables start with a prefix flst . Other important global variables are the kinematics ones (kn ), those involving the scales and the strong coupling (st ), those involving the generation of radiation (rad ), and so on. In this way, when finding such variables in the code, by inspecting the include files, the reader can better follow what is their purpose.

F.3 Input variables
The POWHEG BOX gets its input data from the routine powheginput. Upon its first invocation, this routine looks for a file named powheg.input. If it does not find it, asks interactively to enter a prefix, and then looks for the file "prefix"-powheg.input. It reads all the input file at once. Then, if invoked in the form powheginput("string") it returns the (real) value associated to "string" in the input file. If no matching string is found, it prints a message and aborts the program. If invoked in the form "#string", in case no matching string is found, it returns a very unlikely value −10 6 . This last mechanism is used in the POWHEG BOX to set default values.

F.4 User files
The files that contain the user routines are organized in the same way in all the examples provided with the code of the POWHEG BOX. They are: • nlegborn.h contains the number of legs of the Born process, nlegborn; • init processes.f contains the subroutine init processes, whose major task is to fill the list flst born and flst real; • init couplings.f contains the subroutine init couplings, that initializes processdependent couplings; • in PhysPars.h, there is a collection of physical variables that are in common with many subroutines (masses, electroweak couplings, widths. . . ); • Born.f contains the setborn subroutine; • in Born phsp.f the user can found the born phsp subroutine; • in real.f, the setreal routine; • and finally, in virtual.f, the setvirtual routine.
A template for the analysis subroutine can be found in pwhg analysis.f.