Jet transition values for the anti-$k_{\bot}$ algorithm

We define jet transition values for the anti-$k_{\bot}$ algorithm for both hadron and $e^+e^-$ colliders. We show how these transition values can be computed and how they can be used to improve the performance of clusterization when jet resolution parameters are varied over a larger set of values. Finally we present a simple performance test to illustrate the behavior of the new method compared to the original one.


Introduction
The production of hadronic jets is a common feature of particle collisions. Jets are widely studied, as they can be used to test the standard model and measure its parameters, they can signal new physics, and provide important background for new physics searches as well.
Jets are defined through jet clustering algorithms, therefore they have an important part in jet studies. Jet algorithms take final state particles as an input and combine them according to their prescription into larger objects, what we then call jets. The algorithms have a set of resolution parameters, which defines the jet structure. Although jet algorithms are required in all kind of jet analysis, one particularly important observable is the so-called jet rate, because it directly connects to the clustering algorithm and provides useful information about how the number of jets depends on the choice of parameters.
Jet rate measures the relative production rate of n-jets compared to all hadronic events. It is given by the ratio of the n-jet cross section σ n−jet and the total hadronic cross section σ tot at center-of-mass energy Q 2 : where a denotes the set of jet resolution parameters characteristic to a given jet algorithm. Jet rates are mostly studied as a function of one or more of their resolution parameters. This means clustering the same momenta configuration with a wide range of chosen values of the jet resolution parameters. Since repeated clusterization is usually computationally inefficient, in practice one tries to exploit the properties of the algorithm to enhance performance in computations. This is also important on the theory side, since making higher order predictions in perturbation theory typically requires the generation of millions of phase space points, which all need to be clustered individually.
In the case of e + e − colliders the most common jet clustering algorithm is the k ⊥ (or Durham) algorithm [1], which has auspicious properties to do computations efficiently, illustrated in the next section. Today, in the LHC era, the commonly used jet algorithm is the anti-k ⊥ algorithm [2]. Though jet rate studies similar to k ⊥ ones are not prevalent, they are also hampered by the lack of properties that the k ⊥ has. Hence computations can be slowed down significantly due to clusterization only.
In this paper we present a reformulation of the anti-k ⊥ algorithm equivalent to the original, which makes it possible to define transition values in a similar fashion to the k ⊥ algorithm. Furthermore we show how these transtion values can be computed and used to speed up calculations. Our method can be used for both hadron and e + e − colliders.

The k ⊥ algorithm
We start with a short review of the k ⊥ algorithm, and discuss how it is used in calculations in practice. The algorithm depends on a single jet resolution parameter y cut and the distance measure is defined as E i and E j denote the energy of particle i and j respectively, while θ ij labels the angle between the three-momenta p i and p j . During clusterization we compute y ij for each pair of particles and find the smallest one y kl = min y ij . If y kl < y cut holds we combine particles k and l, then start the procedure again with the new list of objects. Otherwise we stop the clusterization and the resulting objects are considered jets.
In the case of the k ⊥ algorithm one can uniquely define transition values. Transition values y i−1←i are certain values of y cut , where the number of jets changes from i into i − 1 for a given final state configuration. The distribution of the transition value behaves as an event shape observable. Using the k ⊥ algorithm every transition value y i−1←i can be computed performing the clusterization only once independently of y cut , such that in every clusterization step the smallest y kl value provides the corresponding y i−1←i transition value. We repeat the steps until all particles are clustered into two jets. When jets are defined through the k ⊥ algorithm the number of jets is a monotonically decreasing function of y cut for every possible phase space point.
These two properties of the algorithm described previously make possible to connect the dσ/dy i−1←i differential distributions and the σ n−jet (y cut ) cross section. For example the threejet cross section can be computed as The meaning of the two terms are the following: the three-jet cross section for a chosen y cut gets contributions from the dσ/dy 2←3 differential cross section for every y 2←3 value which is greater than y cut . This gives the first term in Eq. (3). However the resulting quantity in itself would include all events with y 3←4 ∈ [0, 1]. Events with y 3←4 ∈ [0, y cut ] indeed are four-jet events which cluster into three-jets, but events with y 3←4 ∈ [y cut , 1] do not cluster into three-jets. Thus we need to subtract the integrated dσ/dy 3←4 distribution to get the correct three-jet cross section, which gives the second term. Eq. (3) provides a very useful relation to speed up numerical calculations. One has to perform the clusterization only once per phase space point, calculate the differential cross sections, then do a simple integration with the desired y cut according to the formula to obtain the n-jet cross section.

The anti-k ⊥ algorithm
Now we turn our interest towards the anti-k ⊥ algorithm and discuss its shortcomings in computational time compared to the k ⊥ algorithm. The anti-k ⊥ algorithm uses two different measures: a two-particle measure d ij and a beam jet measure d iB . They are defined as for hadron colliders, where k ⊥,i , y i and φ i denote the transverse momentum, rapidity and azimuth of particle i respectively. In the case of e + e − colliders we have with the notation being identical to the one introduced in the previous section. Choosing p = −1, 0, 1 we obtain the anti-k ⊥ [2], the Cambridge/Aachen [3] and the inclusive k ⊥ [1] algorithms respectively. Collectively they are named as the general inclusive k ⊥ algorithm. The anti-k ⊥ algorithm has two jet resolution parameters: R and E cut . During clustering we calculate d iB for every particle i and d ij for every particle pair i, j. If d kl is the smallest measure, we combine particle k, l, but if d kB is the smallest one, particle k is considered a jet candidate, and we remove it from the list of objects. We repeat these steps until every particle becomes part of a jet candidate. Finally we apply energy cut(s), and every jet candidate with E i > E cut is a resolved jet.
The anti-k ⊥ algorithm has characteristics and properties, which makes it preferable for experimental use [2], for example cone-like jet shapes. However the algorithm has certain other properties, which unfortunately make computational shortcuts like Eq. (3) absent, therefore making clusterization more expensive in the study of the jet rate observable. This is due to the fact that in general the number of jets is not a monotonic function of R 2 or 1 − cos R as it can be seen in Fig. 1. The reason is partially the presence of the additional E cut parameter. Although we obtain more and more jet candidates when we increase the spatial resolution, many of them would not survive the last cut on the energy. Furthermore the presence of the beam jet measure, d iB prevents the same definition of jet transition values as in the case of the k ⊥ algorithm.
This would leave us in an unfortunate situtation where clustering would need to be done for each different choice of the jet resolution parameters, in particular when we vary R. Fortunately we can still define jet transition values, which can be used in calculations.

Transition values
We start with an equivalent reformulation of the anti-k ⊥ algorithm, which is more suitable to define and find transition values. First we combine the two measures d ij and d iB the following way where we define y cut ≡ R 2 and y cut ≡ 1 − cos R for hadron and e + e − colliders respectively. Note that y ijk is independent of y cut . Now clusterization is done as it follows: first we calculate y ijk . If y ijk < y cut , we combine particle i, j; otherwise we consider particle k to be a jet candidate and remove it from the list. We repeat the procedure until the list is empty. Finally we apply the energy cuts on our jet candidates.
The clustering procedure is now similar to the k ⊥ algorithm, hence we can define jet transition values in a similar fashion. We call y t ≡ y cut a transition value when the clustered particle configuration changes. It is important to notice that it does not necessarily imply a change in the number of jet candidates, since the final number of resolved jets depends on the chosen value of E cut as well. Two different y cut values can result in the same number of jet candidates, but these candidates may differ in their momenta configuration. For example for one choice of y cut all the four resulting jet candidates might be hard and considered resolved in the end, however using a different value one of the four could be soft and therefore it would not be counted as a resolved jet. Both configurations have the same number of jet candidates, but they are in two different regions separated by a transition value.
Using this definition is convenient in practice. The transition values must be calculated only once, then one can apply as many different energy cuts as wanted without repeating the clusterization again. Nevertheless the calculation of y t values is not straightforward. For the k ⊥ algorithm the sequence of clustering is independent of y cut and relevant information can be fully retrieved for any y cut value from one complete clusterization. In contrast, the clusterization sequence of the anti-k ⊥ algorithm depends on the actual choice of y cut , due to the presence of the two different distance measures.
It was shown that in the Cambridge algorithm one faces a similar problem, but transition values can still be found systematically [4]. Here we can adopt the method of Ref. [4] as well to find transition values for the anti-k ⊥ algorithm in the following way: 1. First set an initial value for y ini and set y cut = y ini .
2. If y cut is less than some preset lower limit y stop , stop the algorithm.
3. Perform clusterization with the chosen y cut , and find the maximum value of y ijk during the process.
4. Store the transition value y t = y max ijk and apply energy cuts to obtain the corresponding number of jets.
5. Set y cut = y max ijk and go to Step 2.
Clusterization between two transition values is completely determined, choosing two different y cut in this set will lead to the same jet configuration. This leads to an improvement in the calculation of jet rates. We can fill histograms more easily between two transition values, we don't have to consider each bin separately and perform clusterization over and over again. It is worth to mention that the method is independent of the definition of d ij and d iB , therefore it can be used both in the hadron and e + e − collider version of the anti-k ⊥ algorithm and in fact for any version of the general inclusive k ⊥ algorithm.  : The number of jets as function of y cut obtained from clustering a randomly generated phase space point with 10 particles in two different ways. The two approaches provide identical results and the non-monotonic behavior of function is also visible.
On Fig. 1 we show the number of jets as a function of y cut . We used a randomly generated phase space point with 10 particles in the final state at Q 2 = 100 GeV center-of-mass energy. E cut was chosen 8 GeV. The 10 particle configuration was clustered with the e + e − version of the anti-k ⊥ algorithm using both approaches: bin-by-bin with 30 y cut values denoted by blue dots and via the transition values method denoted by the red line. Both methods produce identical results, but while the bin-by-bin method required 30 repeated full clusterizations, the red curve was reproduced from 14 transition values. Fig. 1 also illustrates the general non-monotonic behavior of the number of jets as a function of y cut .

Performance
Finally we explore the performance of the new method compared to the original approach. We name the method computing the number of jets over a wide range of y cut through transition values as transition, while the bin-by-bin version is dubbed as direct. We implemented both methods in a Fortran90 program. For simplicity we choose the e + e − collider version of the anti-k ⊥ algorithm. Using RAMBO [5] we generated 1000 phase space points with 5, 10, 15 and 20 particles in the final state, and clustered them with both methods. We checked that both methods give the same results, as it is illustrated in Fig. 1. To perform clusterization with the direct method we selected 30, 60, 90 and 120 bins for y cut , the first number of bins being closer to experimental setups, while the last one is more typical for theoretical predictions. In the transition method y ini was always set to the largest value of y cut of the histogram, while y stop was chosen to be the smallest. This way we ensured that the range of search for transition values coincides with the range of the histograms.
We summarize our results in laptop. We emphasize that our numbers in Table 1 are shown just to illustrate the behavior of the new method compared to the usual one, it is not an exhaustive study on performance. For example fluctuations in computational time were not taken into account. Nevertheless Table 1 still provides useful information about how the transition method performs.
As we can see the timing of the direct method scales with the number of bins, as one would expect it. The numbers indicate a linear relation. The transition method depends non-linearly on the number of particles, as more particles introduce more and more possible final jet configurations, hence more transition values to compute. This method also depends on the range of y cut values. Although a large number of particles would mean plenty of transition values, many of them could fall outside of the range of interest, hence they would be not computed in the end. The transition method clearly outperforms the direct method when multiplicity is small, an order of magnitude speed up can be achieved. It is even more obvious when the number of bins is larger. However when multiplicity is increased, but the number of bins is kept low, the direct method is the preferable one. Nevertheless if we also increase the number of bins, the transition method still has a factor of two difference. In general we can say that if the number of bins is large, significant improvement in speed can be achieved using the transition method. Table 1 clearly shows that the transition method can be used to improve computational performance in fixed order parton level Monte Carlo event generators. The calculation of fixed order predictions typically involve only a small number of strongly interacting final state particles, but a large number of bins in order to produce smooth histogram curves. In addition millions of phase space points are generated, which all require clusterization, therefore faster methods are preferred.

Summary
In this paper we defined transition values for the anti-k ⊥ algorithm and we presented a way to compute them. The knowledge of these values can speed up computations, which involve large number of variations of the y cut jet parameter. Our simple performance test shows that the new method could be applied best to improve performance significantly in the calculation of fixed order predictions for jet rates. Our method can be used both for the hadron and the e + e − collider version of the anti-k ⊥ algorithm, in fact for any version of the general inclusive k ⊥ algorithm.