Bisection-based asymmetric MT2 computation: a higher precision calculator than existing symmetric methods

An MT2 calculation algorithm is described. It is shown to achieve better precision than the fastest and most popular existing bisection-based methods. Most importantly, it is also the first algorithm to be able to reliably calculate asymmetric MT2 to machine-precision, at speeds comparable to the fastest commonly used symmetric calculators.


Introduction
For the purposes of this document, we will define the kinematic variable M T 2 in the following way: M T 2 (m s , s, m t , t, / p; χ 1 , χ 2 ) = min p, q s.t.p + q = / p max M T (m s , s, χ 1 , p), M T (m t , t, χ 2 , q) where the transverse mass is given by in which s, t, p, q and / p are all real two-vectors, and the remaining quantities are real scalars which may all be assumed to be non-negative as they only enter through their squares.Until fairly recently, the majority of M T 2 usage in experimental literature concerned itself only with the so-called 'symmetric' case, χ 1 = χ 2 , which is also the form in which M T 2 was first proposed [1].However, there is a growing interest in the 'asymmetric' case, χ 1 = χ 2 , [2,3], which is a powerful tool for reducing asymmetric backgrounds to symmetric signal processes.For example, it was recently suggested to use asymmetric M T 2 to suppress the dominant two-lepton t t background in searches for the supersymmetric top quark partners decaying into one charged lepton [4].Such searches tend to require a large transverse mass, M T > M W , from the selected charged lepton and the missing momentum from the undetected neutrino (neutralinos in the signal can make M T M W ).This eliminates most of the one charged lepton t t as M T ≤ M W and what is left is a two lepton t t background with an unidentified electron/muon or hadronically de-arXiv:1411.4312v1[hep-ph] 16 Nov 2014 caying τ lepton.The natural choices for χ 1 and χ 2 are then M ν and M W .When the rest of the t t topology is correctly assigned, M T 2 has an endpoint at m top for the background.This property helped to extend the most stringent limits on t → t χ0 1 production [5].Asymmetric M T 2 has been used to search for many more topologies, including cases in which the signal processes are expected to have low values of M T 2 as in three body t → bW χ0 1 decays.For these cases, the fact that a Jacobian peak pushes probability near the kinematic maximum of the M T 2 distribution is crucial.
While it is a relatively trivial matter to construct an algorithm capable of evaluating M T 2 by using an off-the-shelf numerical minimiser and the definition given above, such methods are usually very slow (since the minimum is often on a singular fold or crease) and error prone (as the minimum can be near infinity, or the crease may be steep sided but have a very shallow gradient along the fold).Therefore, ever since M T 2 's creation there has been a need for fast and accurate methods for evaluating it, many of which can be found collated in [6].The publication in 2008 of the 'elliptic bisection' method of Cheng and Han [7] caused a minor revolution in the world of M T 2 algorithms.Their implementation (which we will refer to as 'CH') has since been regarded as the de-facto fastest and most reliable method of evaluating the symmetric form of the variable.
M T 2 is, from a theoretical perspective, no harder to compute in the asymmetric case than in the symmetric.However, as a result of the historical accident that no-one was interested in asymmetric M T 2 in 2008, the classic CH algorithm1 is only capable of evaluating symmetric M T 2 .Consequently, physicists needing to evaluate asymmetric M T 2 have had no option but to either: (1) attempt to compute the variable themselves with generic minimisers, exposing themselves to the dangers and severe performance penalties mentioned above, or (2) attempt to modify the CH algorithm to cope with the asymmetric case.Neither of these options have been achieved satisfactorily.Ample evidence that the latter route (CH modification) is much harder than one might naively expect may be found in Walker's Table 1 [9] which shows that CH modifications which do not pay enough attention to the many areas of numerical instability that are carefully worked around in the CH algorithm, can easily lead to errors of hundreds of GeV in M T 2 evaluations.It is possible to find ways around the majority of these issues, as Walker [9] demonstrates elegantly, but only at some considerable cost in algorithm complexity. 2iven the above, the primary goal of the work written up herein, was to produce a reliable fast M T 2 calculator capable of serving the previously un-served needs of those wanting to use the asymmetric case.Rather than attempt to adapt the CH implementation itself, it was thought better to return to first principles, and ask why previous attempts to adapt CH had been so unsuccessful, complex or error prone.

The good and bad bits of bisection algorithms
A key insight within CH [7] was this: if it were possible to determine relatively quickly whether a 'trial' value M were greater than or less than the actual value of M T 2 , without actually computing the actual value of M T 2 , then by repeatedly bisecting a finite interval known to enclose the true value, it would be possible to determine M T 2 in a cost effective manner. 3This is very important, and will be heavily relied upon in the new algorithm described herein.
A second important result of [7], and later works, was that the trial value, M , could indeed be compared to the true value of M T 2 , without knowing the value of the latter.We will not describe the origin of this test in detail here as it is described elsewhere. 4However the main points we will need to understand are the following: 1.A trial value M , together with m s , s and χ 1 , defines a (possibly empty) subset R 1 (M ) of points within R 2 .
2. A trial value M , together with m t , t, χ 2 and / p, defines another (possibly empty) subset R 2 (M ) of points within R 2 .
3. Whenever R i (M ) (for i=1 or 2) is nonempty, it comprises either (i) a set of points within or on the boundary of a non-degenerate ellipse, or (ii) a set of points within or on the boundary of a nondegenerate parabola,5 or (iii) a set of points lying on a straight 'ray', starting at a point in R 2 and radiating out to infinity in some direction (this being a limiting case of progressively narrower and narrower parabolas, under certain circumstances), or (iv) a single point (this being a limiting case of progressively smaller and smaller ellipses, under certain circumstances).
4. The two singular cases (iii) and (iv) described above occur if and only if M is equal to the kinematic minimum of the ) switches from elliptical to parabolic character (or from point to ray, in the singular case) when the mass m s (respectively m t ) goes from non-zero to zero.
6.The value of M is greater than or equal to the true value of M T 2 if and only if Item 6 above is the key beneficial feature that allows the bisection algorithms to work, and is used in the new algorithm we propose.Item 5, on the other hand, is the cause of most of the trouble in the existing implementations.It causes problems for three reasons which we will describe in turn: (a) co-ordinate instability, (b) special case proliferation, and (c) Lorentz-vector choices.We will describe the effects in terms of R 1 (M ) only, but the same arguments apply to R 2 (M ).

Co-ordinate instability
As m s → 0, but before 0 is actually reached, the ellipses bounding R 1 (M ) grow progressively longer and longer (and wider and wider at the same time) in a 'one-sided' manner -i.e. one of the two ends of the ellipse stays more or less where it is, eventually forming the pointy part of the parabola, while the other end of the ellipse grows and swells to infinite length and width in order to 'open up' into the infinite region that is the inside of the parabola.Because of this growth, the coordinates of the centre of the ellipse, and the lengths of its semi-major and semi-minor axes, all tend to infinity. 6Many existing implementations use these unstable coordinates internally, and have only limited protections against their being used in unstable regions.For example, the CH method is aware of this potential problem, and so first compares the magnitude of m s to an ad-hoc threshold, and if m s is dangerously light, flips over to treating m s as if it were identically zero (even though it than the mid-point of the interval.We will see later that this is sometimes possible. isn't), since a custom massless algorithm knowing only about parabolae need not share the elliptical parametrisation.The need to flip at an arbitrary point, however, limits the ability to calculate precise values for any value of m s below that threshold, and the precise positioning of the threshold might need considerable investigation and tuning, which complicates the development and use of the algorithm.

Special case proliferation
So far as the author can tell, when testing for the intersection in item 6 of section 2, all existing implementations examine some sort of discriminant, or Sturm Sequence, which is able to make a statement about the number or character of the points which lie on the two conics bounding R 1 (M ) and R 2 (M ).This is probably the largest single inefficiency such methods introduce.The basic problem is this: if the bound- M ) may be non-empty without intersection of their boundaries when, say, R 1 (M ) is bounded by a small ellipse entirely contained witgin the interior of a larger ellipse bounding R 2 (M ).So if these discriminants determine that the boundaries do not intersect, additional special-case code then has to determine whether this is due to a trivial enclosure, or to non-intersection of the interiors.Such special-case code is complicated by the fact that different tests may be needed for parabola-nested-within-parabola, ellipse-nestedwithin-parabola, ray-nested-within-ellipse, and ellipse-nested-within-ellipse!

Lorentz-vector implementations
If the dangerous region occurs when m s is close to zero, but not exactly so, why should we care?
We don't expect to measure non-zero jet masses of 0.0001 GeV at the LHC.Alas, many end users use computer libraries that store Lorentz-vectors internally in (E, p x , p y , p z ) formats.If massless vectors are stored in such formats, and then subsequently the mass of the vector is requested, the packages will often return a number whose magnitude is x |, which can easily fail to be zero due to finite precision and rounding effects. 7When this magnitude is not zero, it is frequently very small, just where it can cause co-ordinate instabilities, unless protected against.

The new algorithm
The principle innovation in the new algorithm, is that it finds a new way of approaching and performing the intersection test in item 6 of section 2. Whereas existing methods largely examine the boundaries of R 1 (M ) and R 2 (M ) for intersection leading to the complications described in Section 2.2, the new method uses the direct test for the intersection of the interiors of conics described in [10].Interestingly, [10] only claims that their test is applicable to testing for the overlaps of ellipse interiors.It is conjectured by CGL that the test in [10], without modification, is also valid for testing for the overlap of the interiors of a non-singular parabola with another non-singular a parabola or an non-singular ellipse.Two of the authors of [10] indicate that this conjecture is likely to be true, and may provide a formal proof of this later if it is not already a known result (private communication).While no mathematical proof is offered here, the validity of the conjecture to the extent to which it is needed to support the functioning of the new algorithm is experimentally verified by the computation of millions of M T 2 values and their comparison to the existing fast, slow, and analytic computations.
The benefits of this test are not only that it is one method that is as suited to working with parabolae or ellipses (or mixtures thereof) but that it also represents the conics in a matrix form whose coefficients can remain bounded even as quantities like m s → 0. No special change in the representations of these conics occurs when m s or m t move in small amounts near zero.It is these features that allows for easy machineprecision calculations of M T 2 , as no ad-hoc se-lection criteria (cuts) are needed to deal with any of the transitions described in Sections 2.1, 2.2 and 2.3.
The second alteration is small but no less important, and concerns item 4 of section 2. Since the one place that the test of [10] can become singular is the one place where the trial has reached kinematic limit, or as close to this as the machine-precision will allow, we can use this as a stopping condition for the bisection, provided that the lower end of the initial bisection region sits on the kinematic limit.In other words, the lower end of the kinematic limit is started, by fiat, on the kinematic minimum.The upper limit is grown exponentially until it is found to be bigger than the true value of M T 2 .Thereafter the intervals are bisected, updating the upper or lower endpoints, as appropriate.If at any stage in this bisection process either of the conics look singular, it will be known this can only have happened because the trial value M has been forced down to the kinematic limit, or as close to that as the machine-precision will allow.This is therefore a sign that the true value of M T 2 either is at the kinematic limit, or is as close to it as the machine can determine.So when such a condition is encountered, the singular nature of the conic terminates further evaluation and the kinematic minimum is returned.The advantage of this is that it allows a scale-free way of dealing with the very lowest M T 2 values.
The third and final tweak in the new algorithm is one that provides a factor of three speed increase when evaluating M T 2 on samples in which a large fraction of their events near the lower kinematic minimum, while leading to a negligible (3%) slowdown for other events.Since any M T 2 value returning the kinematic minimum will have produced only bisections which shrunk the bracketing interval in the high side, one can bias bisections low (say to the bottom 10% of the current bracketed range) until such time as a trial M value is found to be less than the true value of M T 2 , after which bisections move to the centre of the bracketed interval.By this means, answers near the kinematic minimum can be reached roughly three times (∼ log 2 (10)) more quickly than they would otherwise, with negligible loss of performance for answers in the 'bulk'.
Unlike the two changes that came before, this is relatively minor as it only leads to speed increases for some classes of events.
A single C++ header file fully implementing an M T 2 calculator using on the above method is available as part of the first arXiv submission of this paper.Links to later versions or corrections (if any) will be signposted on [6] and or in the arXiv comment field.

Validation
A selection of validation plots are provided here, that selectively highlight the biggest areas of difference between the new algorithm and the CH baseline.Figure 1 illustrates CH algorithm failing to accurately describe the upper end of a t t M T 2 distribution at the 0.002 GeV level.Figure 2 illustrates the CH algorithm failing to cope with large numbers of near massless particles near the lower kinematic endpoint of another M T 2 distribution.Figure 3 is an example of a comparison in the bulk.The comparisons in the bulk of the distribution illustrate relative differences between the algorithms.It would be ideal if both of the algorithms could be compared not only to each other but to the true value of the minimum.This is possible when the recoil ('upstream') momentum of the system of inputs to M T 2 is zero -in this case there are known analytic formulae for asymmetric M T 2 [3]. Figure 4.
Compares the new algorithm with the analytic formulae in two t t events for which one of the leptons is lost and so χ 1 = m ν , χ 2 = m W , and the spread of M T 2 is ∼ 100 GeV.The agreement between the machine precision numerical algorithm and the analytic calculation agree to better than 10 −12 GeV, or one part in ∼ 10 14 .

Summary
Thus far the new algorithm has passed all validation tests it has been given, including many not decribed here.It can calculate both symmetric and asymmetric M T 2 to machine-precision (one part in ∼ 10 14 on the computer on which it was tested).In cases where the new algorithm has been found to produce different results to the de-facto standard calculator (CH), the new algorithm is found to be correct.Data in Table 1 indicates that, if asked to calculate to the same 0.002 GeV precision as the old CH algorithm, the new algorithm is about two to three times faster and is capable of evaluating M T 2 values at ∼MHz rates.If asked to calculate at machineprecision, the new algorithm is at most 20%-50% slower than CH, and is still capable of 200 kHz evaluation rates.There appears, therefore, to be many good reasons for trialling it a new reference implementation.

Acknowledgements
CGL gratefully acknowledges the assistance of Gary Hughes and Mohcine Chraibi, authors of [11], for having pointed him towards [10] after discussions about their paper, which would otherwise have formed the basis of this algorithm.CGL also gratefully thanks Joel Walker and Ben Brunt for useful discussions, and Alan Barr for making many comments on the final draft.Finally, CGL notes that three weeks after writing this algorithm, but about three weeks before writing up, he was contacted by colleague Colin Lally who, by chance, had independently come up with very much related ideas (sharing the matrix addition of conics discussed in [10] as a starting point, for example).CGL thanks Colin for suggesting ways in which the method described herein could potentially be improved.None of these ideas are yet implemented.Figure 3.This shows the difference between the value of MT 2 calculated using the CH algorithm, and the value of MT 2 calculated (for the same event) by our new algorithm, for a large number of t t events.When supplying momenta to the algorithms, the correct l+b pair for each 'side' of each event was used.
There is a narrow core of events that both algorithms get right, and a tail in which (in this case) CH happens to be high at the 0.002 GeV level.Some individual events from this tail were examined in detail (by inspecting the minimsation surfaces in arbitrary precision plotting programs and finding the location of the true minimum by hand) and the new answer was, in all cases, found to be the correct one.This figure shows the time in seconds taken to perform 400,000 MT 2 evaluations using three different methods, each on two sorts of event (one for which CH tended to over-estimate MT 2 values, 'OVER', and one for which CH tended to under-estimate MT 2 values 'UNDER').All programs were compiled with -O3 optimisations turned on, and were run on a 1.4 GHz Intel Core i5 MacBook air.The three methods compared are (i) the old CH algorithm (known to be accurate to about 0.002 GeV), (ii) the new algorithm 'LESTER' running in machine precision mode, and (iii) the new algorithm 'LESTER' running to a lesser precision of 0.002 GeV.It may be seen that the new algorithm is no worse than 20% to 50% slower than CH when working in machine-precision mode (i.e.generating much better answers than CH).It is evident that when asked to only attempt to get answers to 0.002 GeV, the new algorithm was then about two to three times faster than the old CH algorithm at similar precision.

Figure 1 .
Figure1.This figure shows a zoom in to the upper kinematic endpoint, ±0.01 GeV, of a t t MT 2 distribution that ought to stop at 175 GeV (i.e. at zero on the x-axes above).It is noted that the new algorithm (left) stops at 0, as it should, while CH (right) is sometimes about 0.002 GeV too high.

Figure 2 .
Figure 2.This figure shows the lower end of an MT 2 distribution containing a high number of close-tomassless visible and invisible objects.The kinematic minimum has been subtracted, so the distributions shown should never access negative positions on the horizontal axis.The new algorithm (left) performs correctly, but the old CH algorithm (right) shows a large delta-function spike at an unphysical (negative) position, indicating that these MT 2 values are lower than the kinematic minimum.An unphysical mass-gap is also observed in the CH case.

Figure 4 .
Figure 4.  Left: A comparison between the new algorithm and the analytic calculation in the balanced case.Right: the same comparison in the unbalanced case -now the analytic formulae has comparable calculation complexity as the numerical procedure and so the difference is symmetric about zero.