Resampling Algorithms for High Energy Physics Simulations

We demonstrate that the method of interleaved resampling in the context of parton showers can tremendously improve the statistical convergence of weighted parton shower evolution algorithms. We illustrate this by several examples showing significant statistical improvement.


Introduction
Event generators are indispensable simulation tools for understanding high energy collisions in particle physics. A core part of the event generators are parton showers where hard scattering events, calculated perturbatively, are dressed up with more and more partons in an iterative manner, typically starting from some high energy scale Q and evolving down towards smaller and smaller scales until an infrared cutoff Q 0 is reached.
Branchings within the parton shower evolution occur with a rate P (q, z, x), which determines the probability density of the dynamic variables associated to the branching. These are the scale q, additional splitting variables z, and additional variables x on which a certain type of branching depends. The latter can be collections of discrete and continuous values. Branchings are ordered in decreasing scale q in the evolution interval Q to Q 0 . The probability for a certain splitting not to occur between the scales Q and q is given by the so-called Sudakov form factor ∆ P (q|Q, x), The density for the dynamic variables describing an individual branching which is to occur below a scale Q of a previous branching (or an initial condition set by the hard process), is then given by dS P (q|Q, z, x) dq dz = ∆ P (Q 0 |Q, x)δ(q − Q 0 )δ(z − z 0 ) + ∆ P (q|Q, x)P (q, z, x)θ(Q − q)θ(q − Q 0 ), (2) where ∆ P (Q 0 |Q, x)δ(q − Q 0 )δ(z − z 0 ) represents events which did not radiate until they reached the cut-off scale Q 0 , and z 0 is an irrelevant parameter associating fixed branching variables to the cutoff scale, where no actual branching will occur. Different branching types x 1 , . . . , x n , with a total rate given by n i=1 P (q, z, x i ), can be drawn algorithmically using the competition algorithm (see e.g. [1,2]), which we will not discuss in more detail in this letter.
An individual splitting kernel P (q, z, x) cannot generally be integrated to an invertible function, which would allow to draw from dS P using sampling by inversion, so what is typically done is to find an overestimate R(q, z, x) (s.t. R(q, z, x) ≥ P (q, z, x) for all q, z, x), which can be integrated to an invertible function. Equating the normalized integrated splitting kernel to a random number r ∈ (0, 1) and solving for q then generates a new scale q with the distribution for R(q, z, x), and it can be proved that accepting the proposed branching variables with probability P (q, z, x)/R(q, z, x) generates the desired density P (q, z, x) instead [1][2][3][4][5].
In practice, in the case of a positive definite splitting kernel P (q, z, x) with a known overestimate R(q, z, x), the algorithm to draw from dS P then is given by Alg. 1. This Algorithm 1 The Sudakov veto algorithm, defined using an overestimate R such that R(q, z, x) ≥ P (q, z, x) for all q, z, x.
solve r = ∆R(q|Q , x) for q select z in proportion to R(q, z, x) return q with probability P (q, z, x)/R(q, z, x) end if Q ← q end loop arXiv:1912.02436v1 [hep-ph] 5 Dec 2019 algorithm is statistically perfect in the sense that it produces the correct distribution using only (positive) unit weights. In case of splitting kernels of indefinite sign, or in an attempt to include variations of the splitting rate P attached to a fixed sequence of branching variables, weights different from unity are unavoidable, see [2,[5][6][7][8] from some of these applications.
Especially when extending parton showers to include corrections beyond leading order [9][10][11] as well as subleading N c effects, and in attempts to perform the evolution at the amplitude level [8,[12][13][14][15][16][17][18], negative contributions to the splitting rates arise. These contributions require the weighted Sudakov veto algorithm, Alg. 2 in order to be included in the simulation.

Algorithm 2
The weighted Sudakov veto algorithm from [7], starting with an initial non-zero weight w 0 and a scale Q. Here S R can be any density according to which we can generate branching variables, not necessarily defined by an overestimate of the target splitting kernel P . The acceptance probability, 0 < < 1 may in principle depend on the splitting variables, but for the purpose of this letter we will treat it as a constant, though this is not a conceptual limitation.
A trial splitting scale and variables, q, z, are generated according to SR(q|Q , z, x), for example using Alg. 1. if q = Q0 then There is no emission and the cut-off scale Q0 is returned while the event weight is kept at w. else if rnd≤ then The trial splitting variables q, z are accepted, and The emission is rejected, and the algorithm continues with

end if end if end loop
While the weighted Sudakov veto algorithm can be shown to correctly account for negative contributions, an issue with this algorithm and in general with weighting procedures, is largely varying weight distributions which accumulate multiplicatively during the simulation of an event, especially in presence of the competition algorithm. In [8] the weight degeneration was partly reduced by only keeping event weights down to the winning scale in the competition step. This greatly reduced the weight spreading, but did not fully resolve the issue, in the sense that incorporating yet more emissions with negative weights would have produced unmanageable weights.
Approaching the parton shower precision era, we expect the issue of negative or varying weights to be a severe obstacle that has to be overcome, preferably by more general methods than to case by case monitor weights and adjust weighting algorithms. We therefore suggest to utilize the method of resampling [19,20] in the context of Monte Carlo event generators. We introduce the new method in this letter as follows: In section 2, we introduce the basics of the resampling method, while a benchmark example, illustrating the resampling power by comparing the proposed approach to a simplified parton shower is given in section 3. We also show the efficiency of resampling by taking a more realistic parton shower [21] with unit weights, destroying its statistical convergence beyond recognition using the weighted Sudakov algorithm, Alg. 2, and recovering good statistical convergence using the resampling method. Finally we make an outlook and discuss improvements in section 4.

Resampling
The key idea behind resampling is to, rather than reweigh 1 N weighted events, select n events among the N weighted ones in proportion to their weights. This is done in an interleaved procedure which is supposed to keep the weight distribution as narrow as possible at each intermediate step. After each weight change (in our case induced by the shower), all selected events are assigned the weight N/n and are further evolved separately. 2 It is important to mention that this method will introduce correlated events; some of them will be duplicated or have partly identical evolution history. The number n of selected events may well be chosen equal to the number N of events, but can also be chosen differently.
Note that the resampling procedure will not alter the flexibility of the simulation when it comes to predicting arbitrarily many differential cross sections in one run of the program; in fact the appearance of correlated events is also known in approaches using Markov Chain Monte Carlo (MCMC) methods as explored for the hard process cross sections in [22].
In the more commonly used reweighing procedure, events with small weights are selected with a low probability; however, if selected, they are assigned the same nominal weight (typically 1) as other events. Using resampling, also the large weights are kept under control. Instead of being allowed to increase without limit, all event weights can be kept at unity through duplication of events with large weights. This greatly improves the statistical convergence properties, and mitigates the risks of obtaining distributions that appear statistically wrong for a small number of events.
In an implementation of a parton shower, a natural way of implementing the resampling procedure is to resample after each emission step, i.e. to let each event radiate according to whatever versions of Sudakov factors and splitting kernels are used, and to then, among the say N evolved events, pick N events in proportion to the weights w i , meaning that some events will be duplicated and some will be forgotten. This can even be extended to apply to intermediate proposals within the veto and competition algorithm itself; we will explored both options in Sec. 3. Events which have reached the cut-off scale Q 0 can be ignored in the resampling procedure, since these are not evolved and will hence not acquire any further weights in the shower evolution. Rather, resampling among events which already have unit weights can only impair the statistical convergence, since some events will be "forgotten", whereas others will be duplicated.
In the broader context of Monte Carlo simulation, resampling was first introduced in [19,23] as a means for transforming Monte Carlo samples of weighted simulations into uniformly weighted samples. Later, [20] proposed resampling as a tool for controlling weight degeneracy in sequential importance sampling methods [24]. In sequential importance sampling, weighted Monte Carlo samples targeting sequences of probability distributions are formed by means of recursive sampling and multiplicative weight updating operations. The observation in [20], that recurrent resampling guarantees the numerical stability of sequential importance sampling estimators over large time horizons-a finding that was later confirmed theoretically in [25]-lead to an explosive interest in such sequential Monte Carlo methods during the last decades. Today sequential Monte Carlo methods constitute a standard tool in the statistician's tool box and are applied in a variety of scientific and engineering disciplines such as computer vision, robotics, machine learning, automatic control, image/signal processing, optimization, and finance; see e.g. the collection [26].
We emphasize that when implemented properly, the resampling procedure does not involve any significant timepenalty. The selection of n events among N ones in proportion to their weights w i is equivalent to drawing n independent and identically distributed indices i 1 , . . . , i n among {1, . . . , N } such that the probability that i = j is w j / N =1 w . Naively, one may, using the "table-look-up" method, generate each such index i by simulating a uniform (i.e., a uniformly distributed number) u over (0, 1), finding k such that and letting i = k. Assume that n = N for simplicity; then, since using a simple binary tree search determining k requires, on the average, log 2 N comparisons, the overall complexity for generating {i } is N log 2 N . However, it is easily seen that if the uniforms {u } are ordered, u (1) ≤ u (2) ≤ . . . ≤ u (N ) , generating the N indices {i } using the table-look-up method requires only N comparisons. Hence, once a sample {u ( ) } of N ordered uniforms can be simulated at a cost increasing only linearly in N , the overall complexity of the resampling procedure is linear in N .
In the following we review one such simulation approach, which is based on the observation that the distribution of the order statistics (u (1) , . . . , u (n) ) of n independent uniforms coincides with the distribution of where the n random variables {v i } are again independent and uniformly distributed over (0, 1); see [27,Chapter 5].
As a consequence, a strategy for simulating {u ( ) } is given by the following algorithm having indeed a linear complexity in n.
Algorithm 3 Algorithm generating a sample {u ( ) } of n ordered uniforms. The uniforms {v i } are supposed to be independent.

end for
Alternatively, one may apply the method of uniform spacings; see again [27,Chapter 5] for details. A caveat of the resampling procedure that one must be aware of is that it typically leads to event path depletion for long simulations. Indeed, when the number of iterations is large the ancestral paths of the events will typically, if resampling is performed systematically, coincide before a certain random point; in other words, the event paths will form an ancestral tree with a common ancestor. For n = N one may establish a bound on the expected height of the "crown" of this ancestral tree, i.e., the expected time distance from the last generation back to the most recent common ancestor, which is proportional to N ln N ; see [28]. Thus, when the number of iterations increases, the ratio of the height of the "crown" to that of the "trunk" tends to zero, implying high variance of path space Monte Carlo estimators.
The most trivial remedies for the path depletion phenomenon is to increase the sample size N or to reduce the number of resampling operations. In [29] it is suggested that resampling should be triggered adaptively and only when the coefficient of variation exceeds a given threshold. The coefficient of variation detects weight skewness in the sense that C 2 V is zero if all weights are equal. On the other hand, if the total weight is carried by a single draw (corresponding to the situation of maximal weight skewness), then C 2 V is maximal and equal to N − 1. A closely related measure of weight skewness is the effective sample size Heuristically, ESS measures, as its name suggests, the number of samples that contribute effectively to the approximation. It takes on its maximal value N when all the weights are equal and its minimal value 1 when all weights are zero except for a single one. Thus, a possible adaptive resampling scheme could be obtained by executing resampling only when ESS falls below, say, 50%. We refer to [30] for a discussion and a theoretical analysis of these measures in the context of sequential Monte Carlo sampling.
With respect to negative contributions to the radiation probability, showing up as negative weighted events, we note that events may well keep a negative weight. In this case, resampling is just carried through in proportion to |w i |, whereas events with w i < 0 contribute negatively when added to histograms showing observables. Negative contributions, for example from subleading color or NLO does thus not pose an additional problem from the resampling perspective.

Benchmark examples
In this section we illustrate the impact of the resampling procedure both in a simplified setup and within a full parton shower algorithm. As a toy model we consider a prototype shower algorithm with infrared cutoff Q 0 and splitting kernels dP (q, z, x) = a dq q with a > 0 being some parameter (similar to a coupling constant), and x an external parameter in the form of an initial "momentum fraction" (first arbitrarily set to 0.1), and later changing in the shower with each splitting. Starting from a scale Q (initially we make the arbitrary choice Q = 1, and pick a cut-off value of Q 0 = 0.01) we obtain a new lower scale q and a momentum fraction z (of the previous momentum), sampled according to the Sudakov-type density dS P (q|Q, z, x) associated with the splitting kernel above.
The momentum fraction parameter x will be increased to x/z after the emission, such that for an emission characterized by (q i , z i , x i ) the variables for the next emission will be drawn according to dS P (q i+1 |q i , z i+1 , x i /z i = x i+1 ). This algorithm resembles "backward evolution" to some extent, but in this case we rather want to define a model which features both the infrared cutoff (upper bound, 1 − Q0 Q , on z and lower bound on evolution scale q > Q 0 ) and the dynamically evolving phase space for later emissions (lower bound on z), while still being a realistic example of a parton shower, with double and single logarithmic enhancement in Q/q.
Proposals for the dynamic variables q and z are generated using a splitting kernel of R(q, z, x) = a q 2 1−z in place of P (q, z, x) = a q 1+z 2 1−z , and with simplified phase space boundaries 0 < z < 1 − Q 0 /q. We consider the distributions of the intermediate variables q and z, as well as the generated overall momentum fraction x, for up to eight emissions.
To consider a realistic scenario, similar to what is encountered in actual parton shower algorithms, where many emission channels typically compete, we generate benchmark distributions by splitting the parameter a into a sum of different values, (arbitrarily [0.01, 0.002, 0.003, 0.001, 0.01, 0.03, 0.002, 0.002 ,0.02, 0.02]) corresponding to competing channels. The usage of weighted shower algorithms amplified by competition is a major source of weight degeneration, as the weights from the individually competing channels need to be multiplied together.
We perform resampling after the weighted veto algorithm (choosing a constant = 0.5 in this example) and the competition algorithm have proposed a new transition, as schematically indicated in the left panel in Fig. 1. In Fig. 2 we illustrate the improvement of the resampling algorithm.
We also try the resampling strategy out in a more realistic parton shower setting, using the Python dipole parton shower from [21]. In this case, we run 10 000 events in parallel, and each event is first assigned a new emission scale (or the termination scale Q 0 ) using the standard overestimate Sudakov veto algorithm combined with the competition algorithm. For each shower separately, we then use one veto step as in Alg. 2 such that some emissions are rejected. In the ratio P/R, P contains the actual splitting kernel and coupling constant, whereas R contains an overestimate with the singular part of the splitting kernel and an overestimate of the strong coupling constant, see [21] for details. The constant in Alg. 2 is put to 0.5. This results in weighted shower events, and the resampling step, where 10 000 showers among the 10 000 are randomly kept in proportion to their weight, is performed, again resulting in unweighted showers. This procedure is illustrated in the right panel in Fig. 1, and we remark that the resampling step can be combined with the shower in many different ways, for example, as in the first example above (to the left in the figure) after each actual emission (or reaching of Q 0 ), or, as in the second example (to the right) also after rejected emission attempts.  Fig. 1: Illustration of the different algorithms we consider as benchmark algorithms; we depict the algorithm as executed in each shower instance. Each dS Ri block corresponds to a proposal density for one of n competing channels, diamonds depict the acceptance/rejection step with the weighting procedure of the Sudakov veto algorithm, and the selection of the highest scale within the competition algorithm. Red dots indicate when the algorithm in each shower instance is interrupted and resampling in between the different showers is performed. Depending on the resampling strategy, events which reached the cutoff can be included in the resampling, or put aside as is done in our benchmark studies. The left flow diagram corresponds to the benchmark example discussed, while the right flow diagram corresponds to the implementation in the Python dipole shower. Re-entering after a veto step (the 'veto' branches), will start the proposal from the scale just vetoed. Note that in the first case (left), resampling is performed only after emission (or shower termination), whereas in the second case, it is performed after each (sometimes rejected) emission attempt.
The result for the Durham y 45 jet observable [31] is shown in Fig. 3. The smooth curve in red gives the unaltered result for the Durham y 45 jet observable [31]; the jagged blue curve represents the same parton shower with equally many events, but where the weighted Sudakov algorithm, Alg. 2, has been used. The resulting distribution is so jagged and distorted that it appears incompatible at first sight. Only when adding interleaved resampling, inbetween the emissions steps (in orange) is it clear that the curves actually represent the same statistical distribution. This beautifully illustrates the power of the resampling method. To avoid oversampling, the resampling is turned off for events for which the shower has terminated.
We stress that, while the usage of the weighted Sudakov algorithm in the above examples is completely unmotivated from a statistical perspective, since the default starting curve already has optimal statistics, the resampling procedure can equally well be applied in scenario with real weight problems, coming for example from applying the weighted Sudakov algorithm in a sub-leading N c parton shower.

Conclusion and outlook
In this note we advocate to use the resampling method for drastically improving the statistical convergence for  Fig. 2: Distribution of the scale of the 4 th emission, for sampling the distribution with the competition algorithm splitting a into a sum across 10 different competing channels, with red being the direct algorithm, blue the weighted Sudakov algorithm and orange the resampled, weighted Sudakov algorithm. The bands in the distribution reflect the minimum and maximum estimate encountered in 300 runs with different random seeds and thus give a rough estimate of the width of the distribution of estimates.    [21] (default). The same observable when sampled with Alg. 2 without resampling (no resampling), and when sampled with interleaved resampling after each emission or rejection step in Alg. 2 (resampling).
parton shower algorithms where weighted algorithms are used.
The implementation tests, performed in the most simplistic way, illustrate beautifully the power of the basic resampling method, and prove its usefulness for parton showers. Nevertheless, we argue that further improvement could be achievable by various extensions of the algorithm to mitigate the ancestor depletion problem. One possible approach is to furnish the proposed resampling-based algorithm with an additional so-called backward simulation pass generating randomly new, non-collapsed ancestral paths on the stochastic grid formed by the events generated by the algorithm [32,33]. Another direction of improvement goes in the direction of parallelization of the algorithm, which is essential in scenarios with large sample sizes. Parallelization of sequential Monte Carlo methods is non-trivial due to the "global" sample interaction imposed by the resampling operation. Nevertheless, a possibility is to divide the full sample into a number of batches, or islands, handled by different processors, and to subject-"locally"-each island to sequential weighing and resampling. Unfortunately, the division of the sample into islands introduces additional bias, which may be of note for a moderate number of islands. Following [34,35], this bias may be coped with by interleaving the evolution of the distinct islands with island level selection operations, in which the islands are resampled according to their average weights.
At the more conceptual level, let us remark that the method we suggest represents a first step in a change of paradigm, where parton showers are viewed as tools for generating correct statistical distributions, rather than tools for simulating independent events. We believe that the most advanced and precise high energy physics simulations, if they should run in an efficient way, will have no chance to avoid weighted algorithms, and as such heavily need to rely on methods like the resampling method outlined in this note. We will further investigate the method, also including the possibility of making cancellations in higher order corrections explicit already in intermediate steps of such a Monte Carlo algorithm rather than by adding large weights at the very end. We will also address the structural challenges in implementing these methods in existing event generators which we hope will help to make design decisions, keeping in mind the necessity of methods like the one presented here.