Fooling around with the Sudakov veto algorithm

The Sudakov veto algorithm for generating emission and no-emission probabilities in parton showers is revisited and some oversampling and reweighting techniques are suggested, mainly to improve statistics. Specifically we consider the generation of Sudakov form factors in algorithms of CKKW type for matrix element/parton shower merging, both at tree- and loop-level, and the generation of highly suppressed splittings inside standard parton showers.


Introduction
Sudakov [1] form factors or no-emission probabilities, are used in all parton shower (PS) programs, to ensure exclusive final states and, at the same time, to resum leading (and subleading) logarithmic virtual contributions to all orders. The way these enter into the shower is through the ordering of emissions, typically in transverse momentum or angle. In the standard case we have an inclusive splitting probability of a parton i into partons j and k given by where t = log p 2 ⊥ /Λ 2 QCD is the ordering variable, z represents the energy sharing between j and k, and where we have integrated over azimuth angle in the Altarelli-Parisi splitting function P i,j k (z). Starting from some maximum scale t 0 we then want to find the exclusive probability of the first emission, which we get from the inclusive splitting probability by multiplying with the probability that there is no emission before the first emission, dP first i,j k (t, z) = α s 2π P i,j k (z) dt dz × Δ(t 0 , t). Here, Δ(t 0 , t) is this no-emission probability, or the Sudakov form factor, given by In principle the Sudakov form factor can be calculated analytically. However, often the integration region in the zintegral can be non-trivial, and most PS programs today prefer to calculate it numerically using the so-called Sudakov veto algorithm [2]. The trick here is to find a simple function which is everywhere larger than P i,j k and which is easy to integrate, and by systematically vetoing emissions generated according to this overestimated function, a correct noemission probability is obtained.
The Sudakov veto algorithm (SVA) is normally used for purely probabilistic processes, but recently it has been generalized to also be used in cases where the function being exponentiated is not positive definite [3,4], and also other improvements have been introduced to allow for more flexible generation [5,6].
In this article we shall investigate other modifications of the Sudakov veto algorithm, where we try to increase the statistical precision in some special cases by oversampling techniques, but we will also briefly discuss the issue of negative contributions to splitting functions.
In Sect. 3 we will investigate the usage of the SVA in CKKW-like [7,8] algorithms, where parton showers are matched with fixed order inclusive matrix elements (MEs). Here, the SVA is used to make the inclusive MEs exclusive by multiply them with no-emission probabilities taken from a parton shower. One problem with this procedure is that every event produced with the matrix element generator is either given the weight zero or one, which becomes very inefficient if the cutoff used in the ME-generation is small. We will find that by introducing oversampling, a weight can be calculated which is never zero, but nevertheless will give the correct no-emission probability. In Sect. 3 we will also discuss an extension of the CKKW algorithm to include nextto-leading order (NLO) MEs [9] where the SVA is used to extract fixed orders of α s from the parton shower to avoid double counting of corresponding terms in the NLO calculation.
Then, in Sect. 4 we will consider cases where a parton shower includes different competing processes, where some of them are very unlikely. This is the case in e.g. the PYTHIA parton shower, where photon emissions off quarks are included together with standard QCD splittings. Since α EM is much smaller than α s it is very time consuming to produce enough events containing hard photons to get reasonable statistics. We shall see that a naive oversampling of the photon emissions has unwanted effects on the total no-emission probability, and that a slightly more involved procedure is needed. The method presented is different from the one introduced by Höche et al. in [5], but is equally valid. It turns out that both these methods can be used to include negative terms in the splitting functions.
But first we shall revisit the derivation of the SVA, as we will use many of the steps from there when we investigate the different oversampling techniques.

The Sudakov veto algorithm
Here we follow the derivation found in [2] and [10]. Although we normally have competing processes, we will first simplify the notation by just considering one possible splitting function, which we will integrate over z, The no-emission probability simply becomes If Γ (t) can be integrated analytically, and if the primitive function,Γ has a simple inverse, it is easy to see that we can generate the t-value of the first emission by simply generate a random number, R, between zero and unity and obtain where we have assumed that Γ (t) is divergent for small t, such that Δ(t, 0) = 0, an assumption we will come back to below. Now, in most cases the integration of Γ is not possible to do analytically, and if it is, the inverse function may be non-trivial. This is the case which is solved by the SVA. All we need to do is to find a nicer function,Γ , with an analytic primitive function, which in turn has a simple inverse, such that it everywhere overestimates Γ , With this function we can construct a new no-emission probabilitŷ (8) which is everywhere an underestimate of Δ(t 0 , t c ), and we can generate the first t according to it. As in the standard accept-reject method, we now accept the generated value with a probability Γ (t)/Γ (t) < 1. However, contrary to the standard method, if we reject the emission, we replace t 0 with the rejected t-value before we generate a new t.
Loosely speaking, we have underestimated the probability that the emission was not made above t, so we need not consider that region again. We now continue generating downwards in t until we either accept a t-value, or until the generated t drops below t c at which point we give up and conclude that there was no emission above t c . To see how this works more precisely, we look at the total probability of not having an emission above t c , where P n is the probability that we have rejected n intermediate t-values. To start with, we have where for P 1 we first have the probability that we generate a value t and then throw it away with probability 1 − Γ (t)/Γ (t) and then the probability that we do not generate anything below t. Similarly we get and that the nested integral can be easily factorized. This is easily generalized to an arbitrary number of vetoed emissions and we get which is the no-emission probability we want.
Here we have ignored the additional variables involved in the emissions. It is easy to see that if we have a simple overestimate of the splitting function we can construct ourΓ aŝ overestimates the integration region in z. For each t we generate, we then also generate a z in the interval [ẑ min (t),ẑ max (t)] according to the probability distribution and we then only keep the emission with the probability i.e. simply setting P to zero outside the integration region, while keepingP finite. Although the formulae become more cluttered, it is straight-forward to show, by going through the steps above that this will give the correct distributions of emissions.
If we now go back to Eq. (6), we there assumed that Γ diverges at zero such that Δ(t, 0) = 0. This is, of course, not necessarily the case, as pointed out in [3]. However, all PS models have some kind of non-perturbative cutoff, below which all emissions are assumed to be unresolved w.r.t the typical scales in the subsequent hadronization model, and we need therefore only concern ourselves with emissions above some cutoff t c > 0. This means that we can always add to our overestimate a term which is zero above t c but which diverges at t = 0. The fact that nowhere in the veto algorithm does anything thing depend on the form ofΓ below t c , means that we do not even need to specify how it diverges, it is enough to assume that it does. 1 In parton showers one always has many different emission probabilities. We typically have several possible emissions for each parton. The SVA is easily extended to this situation by noting that the no-emission probability factorizes for the sum of different splittings, The first emission is then generated with a t-value given according to 1 Note that we must also requireΓ > 0 everywhere to be able to reach below t c . and then randomly selecting one of the processes with weight such that the next emission is of type a with a probability Alternatively we can generate one t for each possible splitting according to and then select the splitting which gave the largest t a . Since the probability that any other splitting, b, was not above t a is precisely Δ b (t 0 , t a ), we therefore again get the probability that the first splitting was of type a at t = t a ,

Reweighting in CKKW-like procedures
In CKKW we want to multiply partonic states generated by a ME generator with Sudakov form factors. We take the partonic state and project it onto a parton-shower history. An n-parton state is then reconstructed as a series of parton shower emissions with emission scales {t 1 , . . . , t n } and the corresponding intermediate states {S 0 , . . . , S n } where S n is the one generated by the ME. We then want to multiply by the no-emission factors where Γ i is the sum of the z-integrated splitting functions from the partons in state S i . What we can do is to simply put the state S i into the parton shower program and ask it to generate one emission starting from the scale t i . If the generated emission has a scale t > t i+1 we throw the whole partonic event away and ask the ME generator to produce a new state. The probability for this not to happen is exactly Δ i (t i , t i+1 ) and the procedure corresponds to reweighting the ME state with the desired no-emission probability.
The problem is that if the ME state corresponds to very low scales, we will throw away very many events, which is very inefficient and may result in poor statistics.
A way to improve this situation is to introduce a boost factor for the splittings Γ i →Γ i = CΓ i , with C > 1, and multiply the overestimateΓ i with the same factor. As before this just gives a simple overestimate of the splitting function, which we know how to handle from Sect. 2. But rather than throwing an emission away with an extra probability 1/C (and not veto the event) we can always reject the emission while at the same time multiply the whole event with a weight 1 − 1/C. The total weight of the event will then be the sum of all possible ways we can veto a generated emission (we here assume that the normal rejection procedure has already been applied) In this way we get the right weight but we never throw away an event.
In the NLO version of the CKKW-L [9,11] we also want to calculate the integral of the z-integrated splitting function, which is used as a way of subtracting the fixed first order result from the exponentiation and then replace it with the correct NLO result. The way this was done in [9] was similar to the procedure above. The shower is started, and each emission is vetoed, but the number of emissions above t i+1 was counted and it was noted that the average number of vetoed emissions is given by To factor out other fixed order terms we note also that which means we can pick out higher-order terms in α s used in the shower. Again, the statistics can become a bit poor if most events yield the weight zero (which is the case if for large merging scales when the no-emission probability is close to unity), and only a few have non-zero values. We can instead again introduce the boost factor, C, and rather than simply counting the number of emissions we take the weight n/C.
We note that in general this C need not be a simple constant, it can be a function of the scale (or any other variable in the splitting.) This is used in the NLO version of CKKW-L, where the leading order α s term in the expansion of the no-emission probability is needed at a fixed renormalization scale, μ R , while in the shower we have a coupling running with the transverse momentum. Therefore, rather than counting the number of emissions, we sum up ratios of fixed and running α s for the emissions which are generated and discarded. Introducing We then get on the average a weight by working a bit on the symmetrized nested integrals of different functions 2 and we get what we desired. To obtain higher powers of the integral with fixed α s , it is easy to show that e.g. the average sum of triplets We also note that for initial-state splittings, the integral over splitting functions also contain ratios of parton density functions, where a is the incoming parton before, and b is the parton after the splitting. What is needed in the NLO-version of CKKW-L in this case is the integral for a given factorization scale, which is obtained by simply changing the α s -weight in Eq. (28) to where z is the energy fraction of the vetoed generated splitting. The derivation in Eq. (28) becomes a bit more cumbersome, but is straight forward.

Reweighting competing processes
Often we have many different competing splitting processes. The example we shall use here is the process of a quark radiating a gluon (Γ g ) competing with the process of the same quark radiating a photon (Γ γ ). Since generating the latter is much less likely because of the smallness of α EM as compared to α s , the generation may become very inefficient if we are interested in observables related to an emitted photon.
In principle we could again consider introducing a boost factor C > 1 and replace Γ γ (t) withΓ γ (t) = CΓ γ (t) and do the same with the overestimateΓ γ . As long asΓ γ (t) Γ g (t) we can reweight each event containing n photons with a factor 1/C n and get approximately the correct results for the observables. However this only gives the right emission probability, not the correct no-emission probability.
Instead we adopt a different procedure. Every time we generate a photon emission (accepted with probability Γ γ /Γ γ ), we veto it anyway with a probability 0.5. If we veto it, we also reweight the whole event with a factor 2 − 2/C, while if we keep it, we reweight the whole event with a factor 2/C. Clearly the emissions will still be correctly weighted, 0.5 × 2/C, but now we also get the correct no-emission probabilities. 3 Loosely speaking we are half the time reweighting the event to compensate for the boosting of the emission, and half the time compensating for the corresponding underestimate of the no-emission probability.
To see this, we again look at all possible ways of not emitting anything between two scales, given by the modified noemission probabilitỹ wherẽ and the product of weights from all intermediate photon emissions which have been vetoed (with probability 1/2, assuming we have already taken care of the acceptance factor Γ γ /Γ γ ): We note that we could, of course have replaced the probability one half with any b, vetoing the emissions with probability b and reweighting with (1 − 1/C)/b, while reweighting with 1/ ((1 − b)C) if not vetoed, and still obtain the correct result.
We also note that while solving the same problem as was addressed in [5], the solution is technically different. In that algorithm, only the standard overestimateΓ γ is multiplied by a factor C, while the acceptance of a generated emission at scale t is still done with probability Γ γ (t)/Γ γ (t) and a rejected emission instead reweights the event by a factor . ( The end result is the same as the method presented here. In fact, if one could choose 4 aΓ γ = 2Γ γ and let C → C/2, the reweighting of the events would be exactly the same. While we gain in efficiency for the emissions, we will also lose in precision for the no-emission probability due to fluctuating weights. It is easy to calculate the variance in the weights, but it is maybe more instructive to look at a real example. As an illustration we let PYTHIA8 [12] generate standard LEP events, with photon emission included in the shower, and we compare the default generation with weighting the photon emission cross section with a factor C. We show for different C the effect of using the full reweighting procedure (proper weighting), but also show for comparison the case of just using event weights with a factor 1/C n γ (naive weighting).
In Fig. 1a we show the transverse momentum distribution of the most energetic photon in an event using C = 1 (i.e. the default), C = 2 for the naive, and C = 4 for the proper weighting. 5 The error bands indicate the statistical error using 10 8 events, and the results are shown as a ratio to the result from a high statistics run (3 × 10 9 events) with PYTHIA8. We see that the statistical error is somewhat reduced in the reweighted samples, but we also see what seems to be a systematic shift in the naive reweighting, due to the mistreatment of the no-emission probability. This shift becomes very pronounced if we increase C, as seen in Fig. 1b, where we use C = 32 for the naive and C = 64 for the proper reweighting. Here we see that the statistical errors are very much reduced for both reweightings, but the naive procedure is basically useless due to the large systematic shift.
If we require two photons in each event, the gain from the reweighting becomes more obvious. In Fig. 2 we show the distribution in invariant mass of the two most energetic photons in an event. Here the gain in statistics is significant also for the case of modest boost factors (a), and for the large boost factors in (b) the gain in statistics is enormous, while, again, the naive reweighting suffers from a large systematic shift.
To isolate the effect on the no-emission probability, Fig. 3 shows the inclusive thrust distribution for the same runs as before. Here we see that especially with forceful proper reweighting the statistical error is increased because of the fluctuating weights, and we see again that the naive reweighting will give a systematic shift due to the mistreatment of the no-emission probability.
So far we have implicitly assumed that C > 1, since we motivated the whole procedure by the desire to increase the number of rare splittings. Note, however, that the proof of the procedure does not at all depend on the size of C. In fact it can even in some cases be taken negative.
Consider a case where there are negative contributions to the total splitting probability. One of the most simple cases is the emission of a second gluon in a final-state dipole shower in e + e − -annihilation into jets. Once a gluon has been radiated from the original qq pair, it can be shown that the distribution of a second gluon is well described by independent emissions from two dipoles, one between the quark and the first gluon and one between the gluon and the anti-quark. However, examining the e + e − → qggq matrix element one finds that there is a colour-suppressed negative contribution corresponding to emissions from the dipole between the q andq. This contribution is normally ignored completely in parton showers, mainly because it is difficult to handle in a probabilistic way in the SVA. It may even result in a noemission probability above unity.
In the reweighting scheme introduced here we can easily include the negative contribution to the splitting functions, and apply a boost factor of C = −1 for the qq-dipole. If a Fig. 1 The transverse momentum (w.r.t. the thrust axis) of the most energetic photon, given as a ratio to the result of a high statistics run with default PYTHIA8 result (see text)

Fig. 2
The invariant mass of the two most energetic photons, given as a ratio to the result of a high statistics run with default PYTHIA8 result (see text). In (b) the ratio is w.r.t. the result for the proper C = 64 reweighting as even with 3 × 10 9 events, the statistical error from the default PYTHIA8 run is too large in comparison Fig. 3 The inclusive thrust distribution, given as a ratio to the result of a high statistics run with default PYTHIA8 result (see text) gluon emission is generated from such a dipole, it is then either accepted and the event is given a negative weight, or it is rejected, in which case the event weight is multiplied by a factor four. We note that in this way it is in principle conceivable to implement a parton shower which includes all possible interference effects. We will, of course, have even larger issues with statistics, compared to the photon emission case above, as we now have potentially large weights that must cancel each other, but this procedure could still be an interesting alternative to the ones presented in [3] and [4] (an extension of [5] to negative weights).

Conclusions
This article does not claim to present innovative new physics results. Rather it presents a number of methods collected by the author during a couple of decades working with parton showers in general and with the Sudakov veto algorithm in particular. They are presented here in the hope that they may come in handy for the community now that more and more efforts are put into the merging and matching of parton showers with matrix element. Especially in the case of matching with next-to-leading order matrix elements (and beyond), a thorough understanding of how parton showers work and knowledge of how to manipulate them is necessary, and these kinds of methods may become increasingly important.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix: Short note on symmetrized sums of nested integrals of different functions
The reader should be familiar with the standard factorization of nested integrals resulting in e.g.
where we in the third line have simply relabeled the integration variables. For nested integrals of two different functions we get in general,