Identifying Phase Space Boundaries with Voronoi Tessellations

Determining the masses of new physics particles appearing in decay chains is an important and longstanding problem in high energy phenomenology. Recently it has been shown that these mass measurements can be improved by utilizing the boundary of the allowed region in the fully differentiable phase space in its full dimensionality. Here we show that the practical challenge of identifying this boundary can be solved using techniques based on the geometric properties of the cells resulting from Voronoi tessellations of the relevant data. The robust detection of such phase space boundaries in the data could also be used to corroborate a new physics discovery based on a cut-and-count analysis.

Abstract: Determining the masses of new physics particles appearing in decay chains is an important and longstanding problem in high energy phenomenology. Recently it has been shown that these mass measurements can be improved by utilizing the boundary of the allowed region in the fully differentiable phase space in its full dimensionality. Here we show that the practical challenge of identifying this boundary can be solved using techniques based on the geometric properties of the cells resulting from Voronoi tessellations of the relevant data. The robust detection of such phase space boundaries in the data could also be used to corroborate a new physics discovery based on a cut-and-count analysis.

Introduction
Voronoi tessellations [1] are useful in a wide variety of fields, from biology [2] to astronomy [3,4] to condensed matter physics [5]. In high energy physics, they have been used rather sporadically, e.g., as an optional approach to QCD jet-finding and area determination in FastJet [6] and in the model-independent definition of search regions in SLEUTH [7][8][9][10]. In Ref. [11], some of us pointed out that Voronoi methods can be applied directly to the analysis of data from high energy physics experiments, e.g., when trying to detect the presence of a new physics signal in the data or to perform parameter measurements.
In most Voronoi-based approaches, the goal is to use Voronoi tessellations to identify "neighbors" of data points. The tessellation then automatically provides a number of cellbased attributes for each data point. Ref. [11] argued that using the geometric properties of Voronoi cells, and, in particular, functions of the geometric properties of Voronoi cells and their neighbors, gives valuable additional information and can allow for relatively modelindependent searches for targeted "features" in the data. As briefly discussed in Ref. [11], a particularly useful application is the study of kinematic edges when investigating cascade decays in new physics models such as supersymmetry (SUSY) [12].
To understand the importance of edge-finding in multidimensional spaces for SUSY mass measurement, we first note that many extensions of the standard model (SM) are characterized by a Z 2 symmetry under which new physics particles (NPPs) are charged but the SM particles are uncharged. Such a symmetry ensures that the lightest NPP will be stable and hence may constitute the dark matter. With the assumption of such a symmetry, a typical collider event involving NPPs proceeds as follows: 1. NPPs are pair produced.
2. Each NPP goes through a series of (generally two and three-body) decays called a "decay chain". In each decay, an NPP decays to another, lighter, NPP, and one or more SM particles. The NPPs generally have a small intrinsic width compared with their mass. Hence it is generally a good approximation to view the decay chain as consisting of a series of on-shell decays of NPPs.
3. Eventually the lightest particle charged under the Z 2 is reached. It is stable, and, if a dark matter candidate, uncharged and uncolored. Hence it will escape the detector without being detected.
Popular new physics models within this paradigm include SUSY, where the Z 2 symmetry is called "R-parity"; Universal Extra Dimensions (UED) [13], in which the Z 2 symmetry is called "KK-parity"; and Little Higgs models, in which the Z 2 symmetry is called "Tparity" [14]. As the lightest NPP escapes detection, we are not able to determine directly the masses of the initial new physics particles produced in the collision, nor the masses of any intermediate particles in the decay, as we would if we were studying a resonance decaying to visible particles. However, we can determine the masses of the NPPs by studying the distributions of (functions of) the momenta of observed particles [15].
The approach to mass measurement taken here seeks to improve on those described in the existing literature in the following ways: 1. Instead of finding edges or endpoints in the one-dimensional distribution of a single variable, we will attempt to determine the boundary of the signal region in a higherdimensional phase space. This improves on one-dimensional methods, in increasing greatly the amount of information that can be extracted from the data [93]. 3 To be specific, we shall consider the classic SUSY decay chain of three successive two-body decays as shown in Fig. 1. From the measured four-momenta, p i (i = 1, 2, 3), of the three visible particles, v i , in the decay, one can form three two-body invariant mass combinations, m ij = (p i + p j ) 2 , for i < j. Signal events will then populate the interior of a compact region, V 3 , in the three-dimensional phase space, (m 12 , m 23 , m 13 ), of invariant masses [94,95]. The size and the shape of the two-dimensional surface boundary of V 3 , which we term S 2 , contains the complete information about the underlying mass spectrum, m X i (i = 1, 2, 3, 4), of the NPPs in Fig. 1. Therefore, we shall focus on methods for detecting S 2 directly. 4 One can imagine doing this in two ways: • By defining a kinematic variable which takes the same (constant) value (e.g., 3 At the same time, this approach avoids the drawbacks of some of the more model-dependent, processspecific, and computationally intensive methods, of which the MEM is perhaps the paradigmatic example. 4 One can also project the allowed three-dimensional phase space V3 onto the subspace of two variables, say (m12, m13), and obtain a corresponding two-dimensional allowed phase space V2 whose one-dimensional boundary S1 can be similarly used for mass measurements and disambiguation [30,31,58,94]. With regards to utilizing the full amount of information contained in the data, this approach stands midway between the traditional method of using one-dimensional distributions of single variables and the three-dimensional approach considered here. zero) everywhere along the phase space boundary. 5 This approach will be discussed below in section 3, where we review the relevant variable, ∆ 4 , introduced in Ref. [93].
• By analyzing the measured density of events in phase space and locating the boundary, S 2 , directly using techniques inspired by spatial analyses performed in other fields of science. This will be the main subject of this paper, and will be discussed in sections 2.2, 3.2, 3.3, and 4.
2. We build on the idea of Ref. [11] that Voronoi tessellations provide a powerful and model-independent tool for identifying edges (for a brief introduction to Voronoi tessellations, see section 2.1 below). While the analysis of Ref. [11] was limited to data in two dimensions, here we extend the procedure to the three-dimensional case and try to delineate the region, V 3 , in the phase space of the three variables, (m 12 , m 23 , m 13 ). Before tackling a SUSY physics example in section 4, we consider several analogous toy examples of increasing complexity in sections 2 and 3. This helps develop the reader's intuition and motivates some of our analysis choices. Following Ref. [11], in order to select "edge" cells in the Voronoi tessellation of the data, we consider the relative standard deviation 6 (RSD),σ i , of the volumes of neighboring cells, which is defined as follows. Let N i be the set of neighbors of the i-th Voronoi cell, with volumes, {v j }, for j ∈ N i . The RSD,σ i , is now defined bȳ where we have normalized by the average volume of the set of neighbors, (1. 2) The variable defined in eq. (1.1) is a straightforward extension to three dimensions of the "scaled standard deviation" of neighbor areas found to be helpful in Ref. [11].
3. We make crucial use of the recent observation of Ref. [93] that for sufficiently manybody final states there is an enhancement (in fact a singularity) in the phase space density near the boundary, S 2 , of the allowed phase space, V 3 . Due to the enhancement in the density of signal events near the boundary of phase space, we can alternatively target the boundary points of V 3 as being points in a densely populated region. This motivates us to consider, in addition toσ i , a second variable related to volume. We choosev where again we normalize by the average volume (1.2) of the set of neighbors, N i . 5 Compare to the "singularity coordinate" defined in Ref. [96]. 6 In statistics, the RSD is also known as the coefficient of variation (CV).
In what follows, we will therefore focus on the two Voronoi-based dimensionless variables (1.1) and (1.3). The former is motivated by the discontinuity in the density of events at the boundary [11], while the latter is motivated by the enhancement in the density of signal events at the boundary [93]. We will find that the judicious combination of these two variables yields a significant increase in sensitivity as compared with either variable in isolation. As a result we find that we are able to identify the boundary, S 2 , of the allowed signal phase space, V 3 , with a high degree of accuracy, even when the ratio of signal to background events, S/B, is relatively small.
To support these conclusions, as well as to explain in detail to the reader the methods employed, we proceed as follows. In section 2 we will provide a brief, but sufficient, review of Voronoi tessellation methods and the use of the geometric properties of Voronoi cells for identifying features in high energy physics data. Section 3.1 will review the consideration of events in multi-dimensional phase space, and, especially the observation that, for sufficiently many dimensions, the differential phase space volume is highly peaked near the boundary. Then in sections 3.2 and 3.3 we study the efficacy of Voronoi methods for finding a densely populated spherical boundary in a generalized "phase space", while section 4 will examine the application of these methods to an actual benchmark point. We present our conclusions in section 5. Throughout our studies, we will use ROC curves to quantify the sensitivity of the variables we define; we briefly review and discuss this approach in Appendix A.

Voronoi Methods for Finding Boundaries
Voronoi tessellation [97] refers to the procedure, previously proposed by Dirichlet [98] and hinted at by Descartes [99], through which an n-volume containing a set of N d data points, {d i }, is divided into N d non-overlapping regions, {R i }, such that d i ∈ R i , ∀i. The boundaries of R i are chosen such that, for every point in some region R j , the corresponding data point, d j , is the nearest data point.
For applications in high energy physics, we consider the data points to be events in a suitably chosen phase space 7 . It is important to make a judicious choice of phase space -on the one hand, it should be of low enough dimensionality to keep the problem tractable in practice, yet the dimensional reduction should not result in the loss of any useful information, e.g., the washing out of interesting features in the underlying phase space distributions. Consider, as an example, the decay chain of Fig. 1. In general, the inclusive production of the X 1 resonance will be described by a 9-dimensional phase space, consisting, e.g., of the nine momentum components of the visible particles, v i . However, three of those degrees of freedom correspond to uninteresting Lorentz boosts of X 1 , another three degrees of freedom are angular variables which are sensitive to spins but not the X i mass spectrum, leaving only the three degrees of freedom relevant to a mass measurement. 7 Here we are following a slightly confusing (but standard) usage and using the term "phase space" to refer to the space of n-tuple values of n observables used to categorize events. This set of observables may not be sufficient to totally specify the kinematics of the event. In section 3, "phase space" will have a more precise meaning. As already mentioned in the introduction, we can take these three degrees of freedom to be the invariant mass quantities (m 12 , m 23 , m 13 ). We shall present the results from our analysis of this physics example in section 4, but we first begin with a few toy studies.

Voronoi tessellations in two dimensions
In order to make contact with Ref. [11] and to introduce our notation, we begin by studying several simplified scenarios in two dimensions. In the next section, (2.2), we will generalize the approaches taken and the lessons learned here to the case of three dimensions.

A linear boundary in two dimensions
In Fig. 2, we consider the unit square in the first quadrant (x ≥ 0, y ≥ 0) and simulate N events = 280 events (data points) according to the probability distribution where H(x) is the Heaviside step function and ρ is a constant density ratio, taken in Fig. 2 to be ρ = 6. The meaning of the distribution (2.1) is very simple: the unit square is divided into two equal halves by the vertical boundary at x = 0.5 (the yellow line in Fig. 2). Within each half, the density is constant (on average), but the left region is denser by a factor of ρ. This setup produces an edge at x = 0.5, where the density changes by a factor of ρ. Our goal will be to detect this edge by tagging the Voronoi cells that are crossed by the boundary line -such cells from now on will be referred to as "edge cells" and in Fig. 2 they are shaded in brown. The remaining Voronoi cells away from the edge will be referred to as "bulk" cells, and in Fig. 2 they are left white. The basic idea put forth in Ref. [11] was to study the resulting Voronoi tessellation and identify edge cells from their geometric properties (as well as from the geometric properties of neighboring cells within the immediate vicinity). The Voronoi cells in two dimensions are planar polygons, for which one can investigate the usual geometric properties like number of sides, area, perimeter, etc. Fig. 3 shows four possibilities, where the Voronoi cells are color-coded according to the value of the corresponding geometric quantity. Then, in Fig. 4, we plot the probability distributions for these geometric quantities separately for the edge cells (red solid lines) and the bulk cells (blue dotted lines). As can be seen in Fig. 2, the edge cells, by construction, represent a very small fraction of the total number of Voronoi cells in the tessellation. Thus, in order to increase the statistical precision of the distributions in Fig. 4, we generated N exp = 1000 pseudo-experiments analogous to the one shown in Fig. 2.
In the upper left panels of Figs. 3 and 4, we study the number of elements, |N i |, in the set of neighbors, N i . This is equivalent to the number of sides of the i-th Voronoi polygon. This variable has been studied in the literature [100]: for example, it is known that the Voronoi polygons most commonly have 5 or 6 sides, which is confirmed in Figs. 3 Blue dotted (red solid) histograms refer to bulk (edge) cells. In order to increase the statistics, we show results from N exp = 1000 pseudo-experiments with N events = 280 each. and 4. There is also a long tail of polygons with many sides, which is conjectured to behave asymptotically as |N i | −2|N i | [101]. Indeed, Fig. 2 contains an example of a polygon with as many as 12 sides! The upper left panel of Fig. 4 demonstrates that, as expected, the |N i | distributions for bulk and edge cells are rather similar, and are thus not suitable for tagging edge cells [11].
The upper right panels of Figs. 3 and 4 illustrate a different geometric quantity related to the areas, a i (i = 1, 2, . . . , N events ), of the Voronoi cells. The areas of the Voronoi polygons are meaningful because they provide an estimate of the value of the underlying distribution f (x, y) (2.1) at the corresponding data point (x i , y i ): so that f (x, y) is still unit-normalized: boundary has normalized area of approximately 1, while a typical bulk cell to the right of the boundary has a normalized area of approximately ρ = 6. Note that while we fix the total number of events, N events , the fraction which ends up on one side of the boundary varies -for example, in the single pseudo-experiment depicted in Figs. 2 and 3, there happen to be 243 events on the left side and 37 events on the right side (to be compared with the expectation of ρN events /(ρ + 1) = 240 events on the left side and N events /(ρ + 1) = 40 events on the right side). If the total area is A = i a i , the expected average size of a bulk cell in the dense region on the left is given by A(ρ + 1)/(2ρN events ), hence in the upper right panels of Figs. 3 and 4 we plot the cell areas, a i , normalized as The distribution of the Voronoi cell areas when the points have been randomly selected (in a "Poisson process") is not known analytically, and is typically approximated with a three-parameter generalized gamma function, where the parameters are fitted to results from Monte Carlo simulations [102]. For our purposes, we are not interested in the form of the actual distributions but in the question of whether the distributions for bulk and edge cells show any appreciable differences. As seen in the upper right panel of Fig. 4, the area distribution for bulk cells is nearly bimodal; with the normalization (2.4) two peaks are expected nearā i ∼ 1 andā i ∼ ρ = 6 8 . The area distribution for edge cells, on the other hand, is unimodal, peaking relatively close toā i ∼ 1. After all, we expect a larger fraction, namely ρ/(ρ + 1) of the edge cells, to have their centers on the "dense" side of the boundary and only 1/(ρ + 1) of the edge cells to have their centers on the "sparse" side of the boundary. A close inspection of Fig. 2 confirms this expectation: out of the 21 edge cells, 18 (3) have their centers to the left (to the right) of the vertical boundary, which is consistent with our expectations for ρ = 6. In conclusion, it is clear that in this case, the Voronoi cell area by itself is not a very good candidate for an edge-tagging variable [11]. We expect that in the more general situation, where the densities on each side of the boundary are not uniform, this variable will be even more unsuitable.
Having investigated a variable describing the size of the Voronoi polygon, we now examine a variable characterizing the shape of the polygon, e.g., the isoperimetric quotient where a i is the area and p i is the perimeter of the i-th Voronoi polygon. The variable (2.5) is a measure of "roundedness" -it is equal to zero for infinitely thin (pencil-like) polygons, and is equal to 1 in the limit of a perfectly symmetric polygon with infinitely many sides (i.e., a circle). The corresponding results for the isoperimetric quotient are shown in the lower left panels of Figs. 3 and 4. We observe that the edge cells tend to be slightly more "squashed", but the difference is very minor and not useful in practice.
In a similar vein, one could continue to study other geometric characteristics of a single Voronoi cell, e.g., perimeter, average side length, etc. [103], but, similarly, this is unlikely to lead to any success in identifying edge cells. The reason is that we are trying to detect a discontinuity and therefore we need to study the relative properties of cells on both sides of the boundary. One possibility is to compute a derivative quantity, e.g., the gradient at each cell location [11]. Another option is to compare the spread in the areas of the neighboring cells, e.g., by computing the RSD,σ i , of the areas of the cells in N i (the set of neighbors of the i-th Voronoi polygon) in analogy to (1.1) [11]: where the normalization now is done by the average area of the neighbors (2.7) The idea is very simple -the neighbors of edge cells are typically quite diverse -some happen to be on the dense side and are therefore relatively small, while others are on the sparse side and are relatively large. As a result, the RSD of neighbor areas for edge cells is expected to be enhanced. On the other hand, for bulk cells, all neighbors are roughly similar, and the RSD of their areas should be small. These expectations are confirmed in the lower right panels of Figs. 3 and 4. In the temperature plot of Fig. 3, the edge cells are clearly distinguished by the different color, and theσ i distributions for bulk and edge cells in Fig. 4 are visibly displaced from each other. We see that, in agreement with the conclusions from Ref. [11], the RSD,σ i , is a promising variable for edge detection 9 .

A circular boundary in two dimensions
Before concluding our discussion in two dimensions, we perform one more toy exercise.
In the example of the previous section 2.1.1, the boundary was a straight line; in a more realistic situation we will encounter a boundary which is an arbitrary curved line. In anticipation of the physics example discussed in section 4, we now consider a two-dimensional example with a curved boundary in the shape of a circle. Instead of the rectangular pattern given by (2.1), we consider the radially symmetric distribution where r = (x, y) is the position vector in 2 dimensions and r ≡ | r| is its magnitude. As in (2.1), the distribution (2.8) describes two regions, the inner region is a unit circle, while the outer region is a hollow disk extending up to r = √ 2 (the circular dashed line in Fig. 5). The regions are separated by a circular boundary at r = 1, marked with the solid yellow 9 Ref. [11] also considered a few other variables related to derivatives -the magnitude of the gradient at each data point, the correlation between the directions of the gradients computed at two neighboring cells, the scalar product of the gradients at two neighboring cells, etc. The conclusion, drawn on the basis of ROC curves (see below appendix A) was that the RSD variable,σi, was optimal among all those choices. ; they are distributed so that the ratio of the bulk events on the two sides of the boundary is equal to ρ. Thus, out of the N events = 280 events inside the dashed circle with r = √ 2, on average we will have ρN events /(ρ + 1) = 240 events in the dense interior region (the unit circle) and N events /(ρ + 1) = 40 events within the sparse exterior hollow disk. 10 In the left panel of Fig. 5, the brown-shaded polygons are by definition the edge cells (those crossed by the yellow circular boundary). The right panel in Fig. 5 demonstrates that, once again, the edge cells can be effectively selected by the RSD,σ i , of the areas of the neighboring cells.

Voronoi tessellations in three dimensions
Since the relevant physics example we treat in section 4 is in a three-dimensional phase space, (m 12 , m 23 , m 13 ), we shall now generalize our previous discussion to three dimensions. For this purpose, we consider the three-dimensional analogue of (2.8): where now R = (x, y, z) is the position vector in three dimensions and R ≡ | R|. The distribution, (2.9), again describes two regions of constant density, except now the dense region is a three-dimensional spherical core of radius 1. Again, we choose ρ = 6 and generate N events = 4200 events according to (2.9). The events populate a ball of radius R = 3 √ 2 centered at the origin, (x, y, z) = (0, 0, 0). On average, we expect to have ρN events /(ρ+1) = 10 Since our plots are rectangular, in Fig. 5 we have extended the exterior region beyond r = √ 2, populating the additional real estate with the same density as the hollow disk. This was done to avoid spurious, but visually distracting empty areas on the plots. Two-dimensional slices at z = 0 through phase space for the three-dimensional toy example studied in section 2.2. We distribute N events = 4200 points according to the threedimensional probability distribution (2.9) within a sphere of radius 3 √ 2 centered at the origin (x, y, z) = (0, 0, 0). The Voronoi tessellation is done before taking the two-dimensional slice, i.e., the cell boundaries seen on these four plots are obtained by intersecting the three-dimensional Voronoi cell boundaries with the plane at z = 0. The yellow circle marks the boundary of the dense core. The resulting cells in the two-dimensional slice are color coded by a certain attribute of the corresponding three-dimensional Voronoi cell: number of neighbors (upper left); normalized volume (upper right); isoperimetric ratio (2.10) (lower left), and RSD of the neighboring volumes (1.1) (lower right). 3600 events in the core and N events /(ρ + 1) = 600 events in the outer hollow spherical shell  corresponding three-dimensional polyhedron 12 . For example, the upper left panel in Fig. 6 shows that the Voronoi polyhedra typically have a relatively large number of faces (or equivalently, neighbors); the corresponding distribution for bulk cells, shown in the upper left panel in Fig. 7, is known to peak at 15 [104]. We also observe that the edge cells are very similar in that regard, i.e., there is no appreciable difference in the number of neighbors as we move across the boundary.
In the upper right panels of Figs. 6 and 7 we show the corresponding result for the normalized volumes,v i , of the Voronoi polyhedra, where, in analogy to (2.4), we scale each volume, v i , by the expected average volume in the dense core, 4 3 π(ρ + 1)/(ρN events ). As expected, the distribution for bulk cells is bimodal, while edge cells behave somewhat similarly to the interior bulk cells (as we already saw in the two-dimensional example of section 2.1.1).
In the lower left panels of Figs. 6 and 7 we plot the analogous "isoperimetric quotient" 12 We caution the reader that the color bars in Fig. 6 refer to the three-dimensional Voronoi polyhedra and not the polygons seen in the plots -the latter result from the intersection with the equatorial plane, and, in general, have different properties.
for the three-dimensional case, where v i (s i ) is the volume (surface area) of the Voronoi polyhedron and the normalization is chosen so that Q i = 1 for a perfect sphere. Figs. 6 and 7 show that the shapes of the Voronoi polyhedra, as measured by (2.10), are very similar in the two bulk regions and not much different near the boundary either. This leaves us with the RSD,σ i , of the volumes for the set of neighbors, N i . This quantity was already defined in (1.1) and our results are shown in the lower right panels of Figs. 6 and 7. We see thatσ i can efficiently identify edge cells; the circular boundary is clearly seen in the lower right plot of Fig. 6. Theσ i distributions for bulk and edge cells are quite distinct, as shown in Fig. 7. Thus we verify thatσ i remains a promising variable for edge detection beyond the two-dimensional examples studied in Ref. [11].

Phase Space Considerations
While two and three-body phase space is discussed at length in most standard lectures and textbooks on quantum field theory, a Lorentz-invariant formulation of the general case with an arbitrary number of final state particles is often omitted. This is in part because processes with more than three final state particles can, in almost all circumstances, be analyzed as a sequence of on-shell production and decay stages and in part because the level of formalism required to describe the general case is significantly more involved. Nevertheless, as was shown in Ref. [93], even when a cascade decay proceeds through multiple on-shell stages, treating the phase space in its fully-differential form captures important correlations that cannot be inferred from more traditional one-dimensional observables such as kinematic edges and endpoints. In this context, we briefly review the geometry of four-body phase space, 13 concentrating on the equation describing the boundary of the kinematically available region and on the differential volume element. In section 3.2, we shall apply kinematic features obtained from the phase space considerations in section 3.1 to our uniform sphere example from section 2.2 and use the resulting toy example to study our Voronoi methods for three-dimensional data.

Review of the four-body phase space of a cascade decay
Let us consider the process where a heavy resonance, X, decays into four particles. We first focus on presenting the form of four-body phase space in full generality and will further specialize to the case where the decay proceeds via three-step two-body cascade decays as in Fig. 1. Following the argument in Ref. [105], we begin by introducing a 4 × 4 matrix defined as where the {p i } denote four momenta of final state particles, including the NPP X 4 . 14 We then define the characteristic polynomial of Z as where λ represents the relevant eigenvalues and the ∆ i identify the coefficients of the above polynomial. Specifically, one finds that . It turns out that the kinematically allowed region is given by ∆ 1,2,3,4 > 0 [105]; the boundary of this region is formed by What makes four-body (and higher) phase space particularly interesting is the form of the volume element. In terms of m 2 ij = (p i + p j ) 2 = 2z ij + m 2 i + m 2 j , the four-body phase space, Π 4 , can be written as where the normalization has been chosen to reproduce the PDG convention [106] for the well-known expression with non-Lorentz invariant quantities We remark that dΠ 4 is inversely proportional to ∆ 4 , and, given the fact that ∆ 4 = 0 defines the kinematic boundary, as in (3.3), the phase space has a singular structure near ∆ 4 = 0. While being an integrable singularity, this implies that events are more likely to be populated close to the boundary rather than far away from it. This observation is ideal for mass measurements which ultimately rely on the determination of this phase space boundary. Given the generic formalism for the phase space with four particles in the final state, we now specialize to the case where the decay proceeds through the three consecutive twobody decays shown in Fig. 1. The X i 's are NPPs represented by red dashed lines, while the v i 's are SM particles represented by black solid lines. For simplicity, we assume that all SM particles are massless unless specified otherwise. X 1,2,3 are assumed to be narrow resonances, while X 4 is collider-stable and invisible.
We point out that the presence of the intermediate particles does not affect the enhancement near the boundary of phase space discussed above. Within the narrow width approximation, each internal propagator squared can be replaced by a delta function, whose argument is linear in the z ij or, equivalently, in the m 2 ij variables. Therefore, integrating over those delta functions does not introduce any non-trivial Jacobian factors which would ruin the enhancement.
To quantify the enhancement near the boundary for this event topology, we derive the analytic form of the ∆ 4 probability distribution and show that it is completely independent of m X i for the massless limit, i.e., m v i = 0. We start by writing ∆ 4 in terms of the experimental observables m 2 v i v j which are denoted by m 2 12 , m 2 13 , and m 2 23 . These dimensionful variables can be traded for dimensionless, unit-normalized variables ξ ij as where 0 ≤ ξ ij ≤ 1 and the maximal values, m 2 ij,max , are given by the well-known kinematic endpoint formulae (see, e.g., [18]): (3.9) We also trade the dimension-8 quantity ∆ 4 for a dimensionless and unit-normalized quantity q defined as Here the maximum value of ∆ 4 is given by (3.11) As shown in Ref. [93], for any given set of masses, {m X i }, in this topology, the probability of obtaining any given event near the point, m 2 ij , is expressed as 12) or equivalently, in terms of the dimensionless quantities, ξ ij and q, defined in (3.6) and (3.10), this can be rewritten as Here H(x) is the usual Heaviside step function. Obviously, the expression in eq. (3.13) diverges for q → 0, as expected from the general discussion earlier, and has a non-zero finite value at q max = 1. In order to visualize the enhancement near q ∼ 0, it is useful to partition the probability density in (3.13) into two The phase space structure implied by eq. (3.12). The data points are generated with the event topology in Fig. 1, using a constant matrix element. The mass spectrum is (m X1 , m X2 , m X3 , m X4 ) = (500, 350, 200, 100) GeV. A three-dimensional scatter plot (upper left) and three phase space slices at fixed ξ 13 : ξ 13 = 0.25 (upper right), ξ 13 = 0.5 (lower left), and ξ 13 = 0.75 (lower right). The red dot-dashed (outermost) curve is the contour for ∆ 4 = 0, while the black dashed curves correspond to ∆ 4 contours for 10%, 30%, 50%, 70%, and 90% of ∆ 4,max . The data points which would have emerged via the flat component in eq. (3.14) are represented by blue "×" symbols, whereas the data points from the remaining enhanced component ∼ 1 √ q − 1 are represented by red "+" symbols.
components: a flat piece, proportional to 1, and an enhanced piece, containing the q −1/2 singularity: where dV ξ ≡ dξ 12 dξ 23 dξ 13 is a shorthand notation. If events were uniformly distributed over the entire phase space in ξ ij , their probability density would simply be proportional to the first (constant) term in eq. (3.14). Hence, all non-trivial effects in the phase space density distribution are due to the second term (inside the parentheses) in (3.14). Fig. 8 helps us develop some useful intuition about the probability distribution (3.14). The upper left panel shows a scatter plot of physical events in the dimensionless ξ ij -space, generated according to (3.14). We used a mass spectrum of (m X 1 , m X 2 , m X 3 , m X 4 ) = (500, 350, 200, 100) GeV. The events populate a compact region whose shape has been likened to that of a "samosa" [95]. Since it is difficult to visualize the enhancement near the phase space boundary in this three-dimensional view, in the next three panels of Fig. 8 we take a few slices at fixed ξ 13 : ξ 13 = 0.25 (upper right), ξ 13 = 0.5 (lower left), and ξ 13 = 0.75 (lower right). For each slice at a fixed ξ 13 , we show all data points whose m 2 13 values fall within 0.5 GeV 2 of the nominal value for that slice, i.e., within ξ 13 m 2 13,max ±0.5 GeV 2 . Then we project those points onto the plane of ξ 12 vs. ξ 23 and divide them into two (color-coded) groups. The data points which would have emerged from the flat piece in (3.14) are denoted with blue "×" symbols, whereas the points arising from the enhanced piece in (3.14) are identified by red "+" symbols. In addition, we also show several theoretical contours of constant ∆ 4 values, starting with the outermost red dot-dashed curve at ∆ 4 = 0 (i.e., q = 0) representing the phase space boundary. The internal, black dashed curves mark the contours for ∆ 4 = 0.1 ∆ 4,max , ∆ 4 = 0.3 ∆ 4,max , ∆ 4 = 0.5 ∆ 4,max , ∆ 4 = 0.7 ∆ 4,max , and ∆ 4 = 0.9 ∆ 4,max , respectively. Note that some of these contours are absent from the bottom panels because the relevant hyper-surfaces, corresponding to large ∆ 4 values do not intersect those slices.
Comparing the densities of red and blue data points, we get an idea about the effect of the enhancement in the vicinity of the phase space boundary. The blue points are more or less uniformly distributed, which is by design. In contrast, the distribution of red points is highly irregular, and their density peaks at the phase space boundary. For a more quantitative understanding, we derive the analytic expression for the probability density function in q and obtain [107] As previously advertised, this probability density function is completely independent of all {m X i } and is enhanced near q ≈ 0. In other words, the fraction of events that lie in a fixed q-interval is universal, and it is enhanced near the boundary of the phase space region. For example, roughly 5% of events have q ≤ 10 −3 , i.e., less than 0.1% of ∆ 4,max . In Fig. 9, we plot the distribution of the q variable from (3.10), taking 20,000 events out of the same event sample as the one used for Fig. 8. If we define any phase space point whose q value is less than 5% of q max as a boundary point, we find that ∼ 33% of the events are then categorized as boundary points. The red histogram represents the q distribution with respect to the full data set; the black dashed, vertical line denotes the location of 0.05 q max . The black solid curve shows the theoretical prediction from eq. (3.15). One can easily see that the q distribution (red histogram) is fully consistent with the theory expectation. Indeed, the value of q (or equivalently, ∆ 4 ) is not an experimental observable, since it requires a model assumption (the input of a mass spectrum for X i ). What is Figure 9. Probability density distribution in the q variable (3.10) using the same event sample as in Fig. 8. The blue shaded histogram is contributed by the boundary data points which are tagged by the Voronoi tessellation. The black solid curve is the theory prediction for dP/dq in eq. (3.15). needed then is a practical way of tagging the boundary data points with such low values of q by some other means; we employ Voronoi tessellations as an available tool. We Voronoi tessellate our phase space using the full data set. If a given Voronoi cell has vertices on both sides of the "samosa" surface defined by q = 0 (or equivalently, ∆ 4 = 0), then the associated data point is tagged as a boundary point (recall the definition of "edge" cells in Fig. 2). The contribution from the boundary points extracted with the above algorithm is shown by the blue shaded histogram in Fig. 9, which we find represents ∼ 38% of the events in the sample. As Fig. 9 demonstrates, the set of boundary cells which can be tagged by placing a cut on q is essentially the same as the set of boundary cells identified with the Voronoi tessellation. In what follows, therefore, instead of using the variable q, which is experimentally inaccessible, we shall focus on the Voronoi cells belonging to the blueshaded histogram in Fig. 9 and try to develop a tagging method based on their geometric properties, since they are experimentally observable.

Density-enhanced sphere boundaries
Inspired by the behavior of the phase space density near the boundary, we deform the density of data points from the sphere example considered previously in section 2.2. Performing Voronoi tessellations and studying the properties of the resulting Voronoi cells, we can develop our insight on what is expected from physical examples. Note that ∆ 4 vanishes on the phase space boundary and takes its maximum somewhere in the bulk. In other words, the ∆ 4 value increases as the distance between the data point of interest and the boundary surface increases (see also contours in Fig. 8). Although specifying the value of distance does not determine the ∆ 4 value, it turns out that there exists a positive correlation between the two quantities [107]. To proceed, we make the simplifying Ansatz that the distribution of the data points inside a unit sphere depends only on the radius, R, with an enhancement at R = 1. Motivated by the form of the probability density in (3.13), we Figure 10. The same as Fig. 6, but for a toy example in which the dense core has a non-uniform distribution given by (3.17).
introduce the following volume density function for the data points inside the unit sphere Now in analogy to (2.9), we consider the three-dimensional distribution Following the example from section 2.2, we again take the density ratio ρ = 6 and generate N events = 4200 events according to (3.17). Our results are shown in Figs. 10 and 11, which are the analogues of Figs. 6 and 7, respectively. We see that, in principle, all four variables plotted in Fig. 10 show some potential for discriminating edge cells. For example, a careful inspection of the lower left panel of Fig. 10 reveals that the edge cells appear somewhat elongated, which results in a lower isoperimetric quotient, as confirmed by the lower left panel in Fig. 11. On the other hand, due to the density enhancement near the boundary, we would also expect the edge cells to have smaller normalized volumes. This expectation is also confirmed -in the upper right panels of Figs. 10 and 11. Finally, the lower right panels of Figs. 10 and 11 again demonstrate that the RSD of the neighboring areas is a good discriminator, in agreement

Finding density-enhanced sphere boundaries with Voronoi tessellations
We now analyze the example of a sphere with an enhanced density near the boundary considered in section 3.2, in terms of ROC curves. In the left panel of Fig. 12, we show the ROC curve for each of the four variables depicted in Figs. 10 and 11: number of neighbors (magenta dotted line), normalized volume (green dashed line), isoperimetric quotient (blue dot-dashed line) and RSD of neighbor areas (red solid line). We observe that the RSD outperforms the other three variables, in agreement with the conclusions from [11] for the two-dimensional case. However, the other three variables also have a certain degree of discriminating power, as seen in Fig. 11. The natural question then is how much additional sensitivity can be gained by considering not just one, but two variables simultaneously. We studied the correlations between the RSD,σ, and each of the other three variables, and generally find that they are not perfectly correlated. (This makes sense intuitively because the RSD is computed from the neighbor set, N i , while the other three variables are properties of the individual cell.) We concluded that, among the three options, the normalized volume,v, is the most promising, since it appears least correlated withσ.  Fig. 11. Right: Improved ROC curves with optimal two-dimensional cuts in the (v,σ) plane as illustrated in Fig. 13: with 20 × 20 binning (green dot-dashed) or 100 × 100 binning (dotted blue). The blue dashed line is the ROC curve based on theσ variable alone and is identical to the solid red line in the left panel.
Therefore, we expect that the sensitivity will improve once we incorporate the normalized volume,v, in the analysis. This expectation is confirmed in the right panel of Fig. 12, where we show "improved" ROC curves based on binning in bothv andσ. The procedure, illustrated in Fig. 13, is as follows. We consider the (v,σ) plane divided into 20 × 20 bins (left panel of Fig. 13) or 100 × 100 bins (right panel of Fig. 13). We expect the signal, i.e., the boundary Voronoi cells, to populate the bins with small volume and relatively large RSD, while the background, i.e., the bulk cells, are distributed more uniformly throughout the (v,σ) plane. In order to build the optimal ROC curve, we need to determine the signal to background ratio, S/B, in each bin, and design the cuts so that we remove successively the bins with the smallest S/B. The bins in Fig. 13 are color-coded according to the corresponding value of log(S/B) 15 . Given the finite statistics, there are bins which have no events (neither signal nor background); they are left uncolored. For definiteness, the bins which have some signal events, but no background events, are assigned the same value as the maximal log(S/B) value among the bins containing both signal and background events. Similarly, the bins which had some background events, but no signal events, were assigned the same value as the minimal log(S/B) value among the bins containing both signal and background events. Fig. 13 shows that, as expected, the bins with the largest S/B (colored in black) are located in the upper left corner of the plot, corresponding to smallv and largeσ. The spread in the cluster of black-colored bins is indicative of the gain in sensitivity due to simultaneous consideration of the two variables,v andσ. According to the right panel of Fig. 12, the bulk of the gain is already obtained with a 20 × 20 grid; increasing the number of bins 25 times to a 100 × 100 grid does not lead to substantial improvement. Therefore, in practice, one might want to consider grids of even smaller size, especially since the true Figure 13. Two-dimensional histograms of the expected signal to background ratio in the (v,σ) plane: for 20 × 20 bin (left) and 100 × 100 bins (right). The ROC curves in the right panel of Fig. 12 were built by successively cutting away the bin with the lowest signal-to-background ratio among all remaining bins. (Alternatively, one could start from zero and successively keep adding the bin with the highest signal-to-background ratio among all remaining bins.) ranking of the bins in terms of S/B depends on the parameter values, e.g., the density enhancement on the boundary and the value of ρ, which are not known a priori. This is why when we consider the physics example in the next section, we shall utilize a smaller grid of 15 × 15 bins in the (v,σ) plane (see Fig. 16).

Finding Phase Space Boundaries with Voronoi Tessellations
We now use Voronoi tessellations to find the phase space boundary for SUSY events at the 14 TeV LHC. We consider the 2 + 2 + 2 topology from Fig. 1, where, as usual, a (lefthanded) squark X 1 =q undergoes a cascade decay through a heavy neutralino, X 2 =χ 0 2 ; a slepton, X 3 =˜ ; and a light neutralino, X 4 =χ 0 1 . As in Ref. [93], we consider the production of a squark in association with a neutralino LSP (χ 0 1 ). Events were generated with MadGraph5 [108]. The mass spectrum that we used was mq = 350 GeV, mχ0 2 = 300 GeV, m˜ = 250 GeV, and mχ0 1 = 200 GeV. 16 The particles visible in the detector are: a quark jet v 1 = j, a "near" lepton v 2 = n , and a "far" lepton v 3 = f . The relevant phase space is then (m 12 , m 23 , m 13 ) ≡ (m j n , m , m j f ). For SUSY signal events, each of these three variables exhibits an upper kinematic endpoint. The three endpoint values are given by eqs. (3.7-3.9): m 2 j n,max = 9931 GeV 2 , (4.1) 16 Despite the relatively low squark mass, this study point does not seem to be ruled out by the current LHC data. Sinceχ 0 1 is mostly Bino, the cross-section for squark-neutralino associated production is suppressed by the left-handed squark hypercharge, and is only ∼ 5 fb at 8 TeV. Furthermore, there is no dedicated search for such asymmetric event topologies (squark-LSP production). If we nevertheless test against a standard SUSY search, e.g. a signature with a same-flavour opposite-sign lepton pair, jets and large missing transverse momentum [109], we find a rather low selection efficiency ( < ∼ 1%), due to the softness of the visible decay products and the tendency of the two final state neutralinos to be back to back, thus reducing the amount of missing transverse energy. From the three-dimensional point of view, the signal events populate the interior of a compact region in the (m j n , m , m j f ) space, whose boundary is given by the constraint [94,95]m which, for convenience, is written in terms of unit-normalized variables (see also (3.6)) Our main goal in this section will be to test the algorithms from the previous sections for tagging the Voronoi cells in the vicinity of the boundary surface (4.4). In addition to the signal events from squark-neutralino associated production with the squark decaying as in Fig. 1, we shall also consider a representative number of background events. In order to make contact with the results from the previous sections, in section 4.1 we first take the background events to be uniformly distributed in mass-squared phase-space, and we ignore the combinatorial background. Then in section 4.2 we study a more realistic case, where the SM background is comprised of dilepton tt events and we also account for the combinatorial problem with the two leptons.

An example with uniform background and no combinatorics
As in the other two three-dimensional examples considered in sections 2.2 and 3.3, in this section we include "SM physics background" events, which we take to be uniformly distributed everywhere throughout the mass-squared phase-space (m 2 j n , m 2 , m 2 j f ) and normalized so that the density contrast across the boundary (4.4) is equal to ρ = 4. Note that in this scenario the interior "bulk" events and the "edge" cells on the surface boundary (4.4) consist of both SUSY signal and SM background events.
As before, we visualize the resulting Voronoi tessellation by presenting two-dimensional slices of the relevant three-dimensional phase space, in this case 17 (m 2 j n , m 2 , m 2 j f ). In  17 It is known that working in terms of the squared masses is more convenient and intuitive [94,95].  Fig. 14) or the normalized volumev i defined in (1.3) (in Fig. 15) of the corresponding three-dimensional Voronoi cell. Just as in the case of the density-enhanced sphere considered in section 3.3, Figs. 14 and 15 suggest that the edge cells near the phase space boundary (4.4) are characterized both by a large value ofσ i and a small value ofv i . Therefore, in designing a selection cut to pick up edge cells, it makes sense to consider both of these two variables at the same time. This is illustrated in the left panel of Fig. 16, which is the analogue of Fig. 13 for this case. We consider a moderately large 15 × 15 grid in the (v,σ) plane and rank the resulting bins according to their signal-to-background ratio 18 as follows. The bin with the highest S/B is assigned rank 1, while the bin with the lowest S/B is assigned rank 15 × 15 = 225. In case of a tie between several bins, each bin is assigned the same average rank. Finally, bins with no events at all are ranked at the very bottom. 19 We observe that, similarly to Fig. 13, the  highest ranked bins in terms of S/B appear at large values ofσ and small values ofv. Using the obtained bin ranking, we can build the corresponding ROC curve, shown with the red solid line in the left panel of Fig. 17, which in some sense is the "ideal" ROC curve that could be achieved, ifσ andv were the only discriminating variables under consideration.
One can now repeat the same procedure for different values of ρ. We show four more not have a football program. Figure 17. The same as the ROC curves shown in the right panel of Fig. 12, but for a 15 × 15 grid. Left: the ranking of the bins in constructing the ROC curve was done with the correct value of ρ (shown on each curve) used in generating the "data". Right: the ranking of the bins was always done according to the "average" ranking shown in the right panel of Fig. 16. examples in the left panel of Fig. 17, with increasingly pessimistic values for the density contrast ρ: 3, 2, 1.5 and 1.2. As expected, the ROC curves become progressively worse, as quantified in the figure. With regards to the bin ranking in each case, we notice the following trend -the highest ranked bins remain at the lowest possible values ofv, but slide down theσ axis to slightly lower values of the RSD, near, or even belowσ ∼ 1. This is easy to understand intuitively -as ρ is decreased, the number densities on both sides of the surface boundary become more similar, and there is less variation between the sizes of bulk cells on the inside and on the outside. The fact that the bin ranking derived from our Monte Carlo simulations depends on the value of ρ poses an important conceptual problem with this procedure -when the analysis is performed on real data, we will not know the actual value of ρ, and, hence, we will not be certain which particular ordering of bins to use. Nevertheless, the fact that the highest ranked bins are clustered, more or less, in the same location, suggests a possible resolution: we can simply average our results obtained for several different values of ρ and use the resulting average rank for each bin, which is shown in the right panel of Fig. 16. The corresponding ROC curves derived with the help of this "average" bin ranking procedure are shown in the right panel of Fig. 17. Comparing the two panels of Fig. 17, we see that the ROC curves based on the average ranking are only slightly degraded compared to the "ideal" case.
We are now ready to start designing selection cuts for the edge cells. One possibility is to select the cells which fall into the a predetermined number, N top bins , of the highest ranked bins in the (v,σ) plane. If we "cheat", i.e., use the correct value of ρ for the ranking (as in the left panel of Fig. 16), we obtain the result shown in Fig. 18, where we have chosen N top bins = 5. In general, the tagged cells are distributed throughout the volume of the three dimensional phase space, so for illustration purposes we again use the same six two-dimensional slices as in Figs. 14 and 15. We observe that the procedure is pretty efficient in tagging edge cells, and occasionally we tag an isolated bulk cell. Of course, Figure 18. The Voronoi cells which pass the two-dimensional selection cut requiring the cell to belong to one of the top 5 bins in terms of signal-to-background ratio for the correct choice of ρ = 4 (see the left panel in Fig. 16).
for such a low value of N top bins , not all edge cells will pass the cut, which will cause the boundary contours (marked with black dashed lines) to appear segmented and incomplete. By increasing the value of N top bins , we can obviously tag more edge cells and eventually "close" those contours, but at the cost of more mistagged bulk cells.
Since the true value of ρ will be unknown, the plots in Fig. 18 are for academic purposes only. The more realistic situation is depicted in the analogous Fig. 19, where we again choose N top bins = 5, only this time we use the average bin ranking from the right panel of Fig. 16. The result in Fig. 19 is slightly worse than Fig. 18 -while we do find a higher rate for mistagging bulk cells (typically in the interior), the majority of the tagged cells are very close to the surface boundary, suggesting that this is a promising technique for identifying edge cells.

A realistic example with tt background and combinatorics
We now repeat the exercise from the previous section 4.1 with two improvements. First, we have to address the combinatorial problem of distinguishing the "near" and "far" lepton. The standard approach in the literature is to trade the original variables m j n and m j f for the ordered pair [18,20,31] m j (high) ≡ max m j n , m j f , (4.8a) Figure 19. The same as Fig. 18, but using the average ranking of the bins shown in the right panel of Fig. 16.
This reordering procedure is pictorially illustrated with the first two rows of plots in Fig. 20, where we show scatter plots of signal events for different ranges of the third invariant mass variable (the dilepton mass): 1, 000 GeV 2 ≤ m 2 ≤ 3, 000 GeV 2 (first column); 3, 000 GeV 2 ≤ m 2 ≤ 5, 000 GeV 2 (second column); 5, 000 GeV 2 ≤ m 2 ≤ 7, 000 GeV 2 (third column); and 7, 000 GeV 2 ≤ m 2 ≤ 9, 000 GeV 2 (fourth column). In the first row, the signal events are plotted in the original plane of (m 2 j n , m 2 j f ), and the points are color-coded as follows. The points below the diagonal 45 • line, which have m 2 j n ≥ m 2 j f , are colored in red, while the remaining points above the diagonal 45 • line with m 2 j n ≤ m 2 j f , are colored in black. The same data is then plotted in the second row of Fig. 20 in the plane of (m 2 j (low), m 2 j (high)). Notice that the effect of the reordering procedure (4.8) is to leave all the black points in place, while interchanging the x and y coordinates of the red points 20 . After the reordering (4.8), half of the plane on each plot is left blank. In order to avoid such voids, in the third row of Fig. 20 we replot the data in the plane of (m 2 j (low), m 2 j (high) − m 2 j (low)), which is fully accessible. As expected, the scatter plots in the third row of Fig. 20 exhibit boundary lines, which we can target with our edge-detecting method. In fact, each plot has two such boundaries where the signal number density sharply changes -one for the red points and another for Figure 20. Scatter plots of signal events (black and red points, top three rows) and dilepton tt events (blue points, bottom row), for different ranges of the dilepton invariant mass squared: 1, 000 GeV 2 ≤ m 2 ≤ 3, 000 GeV 2 (first column); 3, 000 GeV 2 ≤ m 2 ≤ 5, 000 GeV 2 (second column); 5, 000 GeV 2 ≤ m 2 ≤ 7, 000 GeV 2 (third column); and 7, 000 GeV 2 ≤ m 2 ≤ 9, 000 GeV 2 (fourth column). In the first row, the signal events are plotted in the plane of (m 2 j n , m 2 j f ) and colored red (black) if m 2 j n ≥ m 2 j f (m 2 j n ≤ m 2 j f ). The same points are then plotted in the planes of (m 2 j (low), m 2 j (high)) (second row) and (m 2 j (low), m 2 j (high) − m 2 j (low)) (third row). The background events in the fourth row are plotted in the plane of (m 2 j (low), m 2 j (high) − m 2 j (low)). Having thus taken care of the combinatorial problem, we now also improve our treatment of the background -instead of uniformly distributed background events as in section 4.1, we consider dilepton events from tt production. The corresponding scatter plots are shown in the fourth (last) row of Fig. 20. Since there are 2 b-jets, each background event contributes two entries to the scatter plot. We see that within the relevant range of the plotted variables m 2 j (low) and m 2 j (high) − m 2 j (low), the distribution of the background events is somewhat uniform, with some noticeable clustering near the origin.
We are now in position to apply our Voronoi boundary detection algorithm. The result is shown in Fig. 21, where we present nine slices at fixed values of m 2 as indicated at the top of each panel. The red (black) dashed line in each plot corresponds to the expected theoretical boundary implied by eq. (4.4) for the set of points with m 2 j n ≥ m 2 j f (m 2 j n ≤ m 2 j f ) (see also the third row in Fig. 20). As before, the signal and background events were normalized so that ρ = 4. Fig. 21 demonstrates that the Voronoi cells with the highest values of RSDσ i (colored in red or orange) are indeed found near the theoretical boundaries (the red or black dashed lines). As anticipated, the method performs well for intermediate values of m 2 ∼ (4, 000 − 5, 000) GeV 2 , where the two boundaries resulting from the reordering (4.8) tend to overlap. We also observe that the boundary closer to the origin also seems to be singled out, especially at high values of m 2 .

Conclusions
In this paper, we took the first steps towards developing a general method for identifying surface boundaries in 3D phase space distributions from Voronoi tessellations. In the case of a sequential cascade decay like the one exhibited in Fig. 1, the surface boundary in the relevant (m 12 , m 23 , m 13 ) space is characterized by two properties: 1) it is the location of points where the number density is enhanced, due to the ∆ −1/2 4 factor in the phase space distribution (3.12) [93]; 2) it is the location of points where the number density suddenly changes due to the lack of signal events outside the kinematically allowed boundary. These two properties motivate the use of the geometric variables,σ i andv i , derived from the Voronoi tessellation of the data. (For other options, see [11].) We showed that the edge cells tend to have large values ofσ i and small values ofv i , thus we advocated empirically selected cuts in terms ofv i andσ i for tagging edge cell candidates. We considered several examples of increasing complexity and quantified the efficiency of those selection cuts using the language of ROC curves.
There are several directions in which this line of research can proceed from here.
• Statistical significance of a set of tagged edge cells. As we have seen in Figs. 18 and 19, the method is not perfect, in the sense that it occasionally tags a few bulk cells. Therefore, we need to develop a statistical procedure for determining the statistical significance of a given observed set of tagged edge cell candidates. Such a procedure should involve not just the relative number of tagged cells, but also their correlations, e.g., proximity to each other, connectedness, etc. This is an interesting subject on its own and will be addressed in a future publication [107].
• Parameter measurements from fitting to a set of tagged edge cells. Having selected a set of edge cell candidates, one could imagine fitting to the theoretical prediction for the shape of the boundary surface (4.4), obtaining a measurement of the mass spectrum of the new particles X 1 , X 2 , X 3 , X 4 . The actual fitting can be done in several different ways, which will also be investigated in [107].
• Experimental effects. In order to keep the discussion simple and to the point, in this paper we have ignored the experimental complications arising from finite particle widths, detector resolution, ISR jet combinatorics, fakes, etc. Our goal was to present the method as a proof of principle, since the Voronoi approach to data analysis is still in its infancy. Once Voronoi-based methods have become more established and mature, it will become worthwhile to perform detailed and more realistic studies beyond parton level and with full detector simulation.
Here we have focused on cases where the number of signal events on the boundary is significant, leading to a "step" function discontinuity in the total density of events as one moves across the boundary. However, there are interesting examples of distributions where the number density is continuous, but exhibits a "kink", i.e., a discontinuity in its derivative (gradient) [110][111][112]. In such cases, our methods can still be applied -not directly on the initial data itself, but on a secondary data sample created by taking suitable derivatives. Indeed, while we have identified and studied a promising use of Voronoi tessellations in the analysis of particle physics data, there are many exciting applications yet to be developed. We look forward with anticipation to the future development and adoption of these novel and powerful methods.

A ROC Curves and Sensitivity Measures
ROC (receiver operating characteristic) curves are a useful tool for measuring the sensitivity of a variable 21 . As shown in Fig. 22, the ROC curve is found by (i) defining an event selection procedure ("cut") based on the variable in question and (ii) determining the fraction, ε S , of signal events, and the fraction, ε B , of background events that pass the given cut. The coordinates of each point along the curve are then provided by ( B , S ). It is easy to see that the ROC curve must include the point (0, 0), "A" in Figs. 22 and 23, where all events, signal and background, have been disallowed by the cut. The point (1,1), where all events pass the cut ("C" in Figs. 22 and 23), is also part of every ROC curve. ROC curves have a number of important and useful properties which we shall explore in the remainder of this section.

A.1 Comparing ROC curves
If we consider two ROC curves, R 1 (ε B ) and R 2 (ε B ), obtained for the same signal and background processes using different choices of variable and/or the cut procedure, then if the variable/ cut combination used to produce R 1 is clearly more sensitive than the variable/cut combination used to produce R 2 . This statement is uncontroversial, but the comparison is not always applicable, as a pair of ROC curves, R 1 and R 2 may have points ε B,1 = ε B,2 such that R 1 (ε 1 ) > R 2 (ε 1 ), (A.2) We will therefore need to develop other procedures to compare ROC curves; we will present several approaches in section A.4. First, however, we must investigate the connection between ROC curves and likelihood and explore some of the more important consequences of this relationship.

A.2 ROC curves and likelihood
There are many ways to perform an analysis using a given variable, and hence many ROC curves that may be constructed with no other information than the value of a given variable for signal and background events. We note that the ratio of signal and background likelihoods is an optimal test statistic, i.e., choosing to accept an event, e, when where L S is the signal likelihood and L B is the background likelihood, is the procedure which accepts the maximum fraction of signal events (ε S ) for a given choice of ε B (which in turn determines the numerical value of l(ε B ) in eq. (A.4)). This fact is known as the Neyman-Pearson lemma [114] and is equivalent to the statement that the likelihood ratio produces the optimal ROC curve. Points "A", "B", and "C" from Fig. 22 together with the ROC curve from Fig. 22 are reproduced here. We have also indicated the ROC curve of a perfectly insensitive cut, which is shown by the line from point "A" to point "C". Finally we have shown a ROC curve for a cut which gives a higher signal fraction for every choice of background fraction than the ROC curve from Fig. 22. We see that the area under the ROC curve of the perfectly insensitive variable (shown in pink) is 1/2, that there is additional area under the ROC curve shown in Fig. 22 (shown in blue), and that there is even an even greater area under the ROC curve for the more sensitive variable.
We now provide a heuristic proof of this assertion in terms of ROC curves, which hopefully provides useful insights into both ROC curves and the Neyman-Pearson lemma. We consider the situation where signal and background likelihoods are calculated numerically from samples of signal and background events, S and B, which can be assumed arbitrarily large in order to approximate analytic expressions with any desired accuracy. First, we create a set of "variable bins" in our variable of interest, i.e., such that for every event, e, in our signal and background samples, for exactly one choice of i, where v(e) refers to the value of the variable under consideration obtained from the event, e. We then divide our signal and background events into bins using the values of the variable, v, under consideration. I.e., we obtain "signal bins" and "background bins" Since every signal and every background event must be in some bin, by construction, we have Hence the estimators of the signal and background likelihood, and are automatically normalized. Further, we note that the contribution to the ROC curve from bin i is a line segment from initial point (ε B,0 , ε S,0 ) to (ε B,0 +|B i |/|B|, ε S,0 +|S i |/|S|), i.e. from (ε B,0 , ε S,0 ) to (ε B,0 + L B,i , ε S,0 + L S,i ). We therefore see from eqs. (A.11) and (A.12) that the slope of the line segment corresponding to a particular bin is given by the ratio of signal and background likelihoods calculated for that bin. An interesting corollary, since L S,i and L B,i are always non-negative, is that the ROC curve must be monotonically increasing, i.e., We now assert that any physically reasonable cut procedure 22 accepts a set of variable bins where {i 1 , i 2 , ...} are the indices of the variable bins which pass the cut. Further if we have a cut that can be parameterized and that ranges from a cut in which all events are accepted to a cut in which no events are accepted, then we can express the cut function as a permutation, p of 1, 2, ..., n, where n is the number of bins, such that the variable bin labelled by p(1) is the last to be eliminated, the variable bin eliminated by p(2) is the next-to-last to be eliminated, etc. 22 We may have to choose different, and, in particular, smaller bins, e.g., we may not be able to model the cut function v > v0 if v0 is not on a bin boundary. In principle this limitaiton can be circumvented by choosing the variable bins to be smaller than the detector resolution, which must be finite. Since physically reasonable cut functions can only use the output of such a detector, we can eliminate pathological cut functions, like only accepting events with rational v.
The optimal cut function identified by the Neyman-Pearson lemma, i.e., eq. (A.4) corresponds to the case where p(1) labels a bin with the maximum value of signal to background likelihood ratio and if i > j. We claim that this gives the ROC curve for which ε S (ε B ) is maximized for any choice of ε B . This is geometrically obvious given that the likelihood ratio gives the slope of the line segment in the ROC curve; the optimal ROC curve is one in which the steepest segments occur first. A more rigourous proof notes that the ROC curve based on the likelihood ratio has and f m+1 ∈ [0, 1] is the fraction of bin m + 1 that must be included to give the desired value of ε B . We now consider the value of ε S at the same value of ε B in a different ROC curve. Since this ROC curve differs from the optimal ROC curve described by eqs. (A.18-A.19), it must include some different bins or different fractions of bins. These "new bins" give a contribution to ε B of ∆ε B , which is also the contribution to ε B from the "replcaed original bins" they replace. The contribution to ε S from the replaced original bins was ∆ε S , and can be found by multiplying ∆ε B by the (appropriately weighted) average slope of the replaced original bins. The contribution to ε S from the new bins is also the weighed average slope of the bins. However every bin in the new bins has a slope less than or equal to the slope of any bin in the replaced original bins. Thus the value of ε S in the new ROC curve is less than or equal to its value in the ROC curve based on the likelihood ratio.

A.3 Subdividing variable bins
We now consider the effect on our ROC curve of subdividing a bin, by which we mean that we take a bin, V i , and break it into two bins, which we label V i and V n+1 as follows: (A.20) We then obtain the signal bins S i and S n+1 following eq. (A.7) and the background bins B i and B n+1 following eq. (A.8). Using these new bins will always allow for the construction of a ROC curve that is as good as or better than the curve before subdivision (in the sense of eq. (A.1)). This is clear as if the permutation of bins that led to the original ROC curve is p, then the permutation p (j) = p(j) for j < i (A. 21) p (i) = i (which now refers to a fraction of the original bin, i) (A. 22) p (i + 1) = i = n + 1 (A. 23) p (j) = p(j − 1) for j > i + 1 (A.24) yields a ROC curve which is the same everywhere except in the region corresponding to the subdivided bin. If the two "daughter" bins have the same slope (likelihood ratio), then this curve will be exactly the same as the original ROC curve. However, if this is not the case, then if we choose (without loss of generality) the daughter bin with the greater slope to come first (i.e., to be labelled as i), we have a ROC curve which has strictly greater signal values than the original ROC curve in the regime corresponding to the subdivided bin. Finally, it is possible that daughter bin i may have a greater slope than some of the other bins, that daughter bin n + 1 may have a lesser slope than some of the other bins, or both of these possibilities may be realized. In all of these cases the ROC curve may be further imporoved by sorting the bins by slope. As a consequence of this result considering smaller bins will increase sensitivity; the extreme limit is to use an unbinned likelihood rather than binned likelihood. While we have not proved it, this fact also motivates the (true) statement that considering additional relevant, but still independent, variables will improve sensitivity (as subdividing a bin using the new variable may yield daughter bins with different slopes).
However, it should be noted, when using actual samples of signal and background events to construct likelihoods, that one can reach false conclusions about sensitivity by subdividing bins too far. An extreme example of this is provided by the case where all the signal and background events we are considering have their own bin. In this case, the likelihood ratio is infinite for bins with one signal event and zero for bins with one background event and the ROC curve is vertical from (0, 0) to (0, 1), and then horizontal from (0, 1) to (1,1). Obviously this is too good to be true. What has happened is an example of posterior statistics and is clearly an invalid inference. Clearly such a situation should be avoided, e.g., by using bins large enough to be robust with respect to statistical fluctuations.

A.4 Measures of ROC curve sensitivity
In section A.1, we noted that sometimes a ROC curve will indicate that one variable/ cut procedure is absolutely better than another variable/ cut procedure. However, this may not always be the case. Thus it is important to have procedures for comparing ROC curves (and hence sensitivity) that always allow the comparison to be made. All of these procedures can be viewed as functionals, i.e., they describe a ROC curve by a number that corresponds directly to the quality of a variable.
Simple functionals that can be used are (i) the value of ε S at a fixed value of ε B and (ii) the value of ε B at a fixed value of ε S . We can view (i) as the fraction of signal events chosen with a fixed false positive rate, while in (ii) we demand that the cut accept a fixed fraction of signal events. A better variable will then reject a larger fraction of background events.
In particle physics, we are generally interested in the significance with which we can claim the discovery (or the exclusion) of some process. Therefore we can describe ROC curves by (i) the maximum value of S/ √ B attained on the ROC curve, (ii) the maximum value of S/B attained on the ROC curve, or (iii) the maximum discovery significance on the ROC curve. Choice (i) is appropriate as a measure of significance in the (Gaussian) limit of a large number of events in situations where systematic uncertainties do not have a large effect. Choice (ii) is appropriate in the situation where systematic errors dominate. There are different appropaches to implementing choice (iii) to model the statistical and systematic components of the significance. The difference between the Gaussian and Poisson distributions for small numbers of events may also be modelled. While choice (iii) represents the actual quantity to be maximized for a specifc analysis, it has the drawbacks of (i) being complicated and (ii) being dependent on specific experimental conditions, such as the luminosity gathered, systematic uncertainties, etc.
To make comparisons between ROC curves in a general way, we note that the worst possible likelihood-based ROC curve is the ROC curve that reflects throwing away signal and background events indiscriminately, i.e. a straight line from ("A" to "C" in Figs. 22 and 23). On the other hand, a perfect variable would take us from "A" to (0, 1) to "C". This ROC curve contains the entire range of ε B and ε S underneath it, suggesting the use of the integral of the ROC curve as a sensitivity measure. Further, as indicated in Fig. 23, variables which lead to "better" ROC curves, in the sense of eq. (A.1) in section A.1 have more area under their ROC curves. In order for the worst possible likelihood ROC curve to have a value of 0 and the best possible ROC curve to have a value of 1, we multiply the integral under the ROC curve by 2 and subtract 1. The resulting expression, where AUROC referes to the "area under the ROC curve", e.g., under curve ABC in Figs. 22 and 23 gives the measure of sensitivity known as the Gini coefficient; it is our main quantifier of ROC curves (and hence of variable/ cut procedure sensitivity) in the bulk of this work.