Measuring the W-Boson mass at a hadron collider: a study of phase-space singularity methods

The traditional method to measure the W-Boson mass at a hadron collider (more precisely, its ratio to the Z-mass) utilizes the distributions of three variables in events where the W decays into an electron or a muon: the charged-lepton transverse momentum, the missing transverse energy and the transverse mass of the lepton pair. We study the putative advantages of the additional measurement of a fourth variable: an improved phase-space singularity mass. This variable is statistically optimal, and simultaneously exploits the longitudinal- and transverse-momentum distributions of the charged lepton. Though the process we discuss is one of the simplest realistic ones involving just one unobservable particle, it is fairly non-trivial and constitutes a good"training"example for the scrutiny of phenomena involving invisible objects. Our graphical analysis of the phase space is akin to that of a Dalitz plot, extended to such processes.

Neutrinos -and perhaps novel weakly-interacting particles-escape unobserved from the collisions in which they are produced. In the corresponding "missing energy" events, the reconstruction of the masses of the parent particles and the specification of the underlying process are challenging because there are typically fewer kinematical constraints than unknowns. At a hadron collider this situation is rendered even thornier, since particles produced at small angles also escape undetected. This prohibits the determination of the longitudinal momentum of the center of mass system of the colliding partons.
The above limitations confer a higher standing to observables exclusively dependent on transverse momenta [1], or otherwise invariant under longitudinal boosts [2]. In principle, transverse observables are insensitive to the significant uncertainties associated with the (longitudinal) parton distribution functions (pdfs). In practice the uncertainties are to some extent reintroduced via the angular coverage limitations of an actual experiment, which are not invariant under longitudinal boosts.
The quintessential transverse observable is the transverse mass, of W -discovery fame. In an event at a hadron collider, consider the production of a single W , followed by its decay W → ν, with an electron, a muon, or one of their antiparticles. Denote by x ≡ (x 0 , x T , x 3 ) and l ≡ (l 0 , l T , l 3 ) the neutrino and charged lepton fourmomenta, respectively. Here l T ≡ (l 1 , l 2 ) and x T ≡ (x 1 , x 2 ) are the momenta of the leptons in the plane transverse to the beam direction(s), and p T ≡ (p 1 , p 2 ) the analogous quantity for the observed final state hadrons. The traditional "transverse mass", a function of l T and p T , whose distribution is used to infer the W boson mass, is [1] where ∆Φ( x T , l T ) is the angle between the transverse lepton directions. The most precise determination of the mass of the W by a single experiment is the one by DØ [3]. In spite of the relatively unfavorable environment of a hadron collider, its large statistics results in a value with an overall error smaller than that of the LEP experiments. The DØ result is based on the decays W → e ν, and the measurement of three highly correlated transverse observables: the traditional "transverse mass" function [1], the lepton's transverse energy and the total missing transverse energy. The result: to a quadratic form, from whose minimum and width M W and its estimated error are inferred. Naturally, all the procedure is tested and calibrated by the observed Z-production and leptonic decay (into e + e − , in the DØ case).
In order of decreasing incidence on the error in Eq. (2), the limitations are the electron's energy calibration, the uncertainties on the pdfs, and the statistics. For this particular measurement, the backgrounds are well understood and quite negligible.
Given the large statistics already gathered at the Tevatron collider, and with the advent of the LHC as a high statistics precision physics tool, the main limitation of a hadron collider determination of the W mass from its decays into electrons and muons is likely to be the pdf uncertainty. At the LHC this problem is in particular exacerbated [5] by the fact that it is a pp, not app collider, and the quark pdfs in a proton -or the identical antiquark pdfs in an antiproton-are much better known than the antiquark pdfs in a proton.

II. INTRODUCTION
A ginormous amount of attention has been paid to hypothetical processes involving neutral, long-lived, weakly-interacting final state particles that can only be indirectly detected. A prototypical example is the pair production of squarks followed by their decays into quark plus neutralino. Such processes generally involve two or more particles of unknown masses.
The first aim in the missing particle searches for physics beyond the Standard Model is the establishment or the exclusion of a signal, both tantamount to an efficient suppression of backgrounds. Some novel longitudinal boost invariant variables are a very good choice in this endeavor [2], as demonstrated by the data analysis in [6].
A longer range aim is the measurement of unknown masses, when there are more than one and a candidate process is selected. In this connection, a very general algebraic singularity method has been advocated [7], involving the use of a "singularity variable" (SV), allegedly more powerful than that of a singularity "condition" (SC), such as the one leading, as we shall see, to the M 2 T result of Eq. (1). It is too late to discover the W , though not to attempt to measure its mass even better, a relevant task in checking the consistency of the Standard Model and constraining the mass of its hypothetical scalar. With this ab-initio motivation, we have exhaustively studied the phase space for W production and leptonic decay, a simple undertaking analogous to the analysis of a Dalitz plot, but with incomplete kinematical information ( §IV).
We have also studied the singularities of this phase space, and their use in constraining the W mass ( §IV and V) . We identify the criterion for the theoretically optimal SV and derive its explicit form ( §VI, VIII and X). En passant, we find that other nonoptimal SVs, such as the one proposed in [7], are "dangerous", in that their distributions display fake singularities ( §VII).
The singularity variables we study involve the measured longitudinal momentum of the charged lepton, l 3 . This longitudinal information is obviously additive to the transverse information exploited in observables such as M 2 T , but is highly correlated with it ( §IX). The l 3 distribution directly reflects the pdfs of merging quarks and antiquarks of different flavor. Recent progress in QCD fits and in calculations well beyond the leading order allows one to hope that -eventually-the dominant limitations concerning the problem at hand will not be the theoretical pdf uncertainties, but the limited calorimetric resolutions.
Given a trustable set of pdfs, one can simulate the observable distribution of events dN/(dl 3 d 2 l T d 2 p T ) for a set of input trial masses and contrast it with observation. This comparison involves the five relevant variables and their correlations; it has no statistically superior competitor. Why then study any alternatives? Besides the pleasure of understanding with use of one's own neural network, there is the motivation of paving the way of searches for other processes involving unobservable particles, for which it is a-priori prohibitive to simulate all possibilities.
In this note we report on a thorough theoretical study of the extraction of phase space information from single-W signal events, but we use the standard model of W production and decay only to leading order. We entirely ignore the backgrounds, which are well known to be very modest for this particular process. A reason for these choices is that only the experimentalists themselves can fully model the detector's effects and backgrounds, and that this modeling is independent from the theoretical issues on which we focus.

III. LINGUISTIC QUANDARIES
Based on equations such as M 2 = (l + x) 2 , we shall be drawn to give a plethora of meanings to what is, for starters, simply a letter: "M ". It ends up being everything else. The resemblance to M -theory is coincidental.
Naturally, M may stand for the physical or measured M W , as well as for its Lorentzian distribution, when the width is not neglected. But it may also, as in the case of the transverse mass, M T , be a non-Lorentzian function of other observables.
In analyzing data, one compares them with MC generated distributions that depend on an ensemble of input "trial masses", for which we reserve the label M . A different type of trial masses, which we call M, appears in "singularity variables", which are functions of observable momenta and of M. Not to make this complex linguistic heritage hereditary, we label the singularity variables "Σ" (and not once more "M ", as in the M 2 T function) thereby not introducing new meanings to the symbol M or the word "mass".

IV. SINGLE-W PHASE SPACE
The full information relevant to the reconstruction of the W mass is embedded in the kinematical equations: where we have made the approximation l 2 = 0 for the charged lepton. The equations are incomplete in that the ν longitudinal momentum, x 3 , is unconstrained, precluding a direct determination of the W boson mass from a "mass peak". Is there a systematic way to extract the kinematically most stringent information on M W ? To answer this question it is useful to study first the phase space described by Eqs. (3)(4)(5)(6) in a simplified case. If the energy and transverse momentum of the observed hadrons could be measured with precision, it would be possible to boost every event to the p T = 0 frame. To (temporarily) simplify the algebra, let us just adopt this constraint. Solve the linear equations E 2 , E 3 , E 4 to express x 0 , x 1 , x 2 as functions of x 3 . Substitute the result in E 1 to obtain the phase space It will be useful to consider the two solutions to Eq.(7) in x 3 = x 3 (l T , l 3 , M ): With no loss of generality, and to be able to plot the phase space, do three more things. Take l 3 to be positive if directed along the direction of a given (fixed) proton beam. Define the l T of Eq. (9) to be positive if directed above the beams, negative otherwise. The function Φ(l T , l 3 , x 3 ) = 0, from divers points of view, is plotted in Fig. 1. Along the (blue) straight lines the planes tangent to the phase space contain one "visible" direction, l 3 , and the "invisible" direction x 3 . The projection of phase space into the visible directions (l T , l 3 ) is bounded by the lines l T = ±M/2.
The boundaries of the phase space projected along an invisible direction onto the space of the visible ones, l 2 T = M 2 /4, are an example of singularity condition(s). At their location there is a single invisible coordinate x 3 for fixed values (l T , l 3 ) of the visible ones, as opposed to the two of the general case in Eq. (10), and the projected phase space density is not smooth [7]. In practice two cuts have to be applied to the momentum of the observed lepton. We adopt |l 3 | < 5 |l T | (resulting from a pseudo-rapidity limitation |η| < 2.3) and a rather demandingly low |l T | > 10 GeV. These cuts result in the unobservability of a large fraction of phase space: the (red) domain shown without a mesh in Fig. 2. The maximum |x 3 | = O(50) M W happens to be close to the absolute kinematical limit, approximately |x 3 | < E p , at the current LHC energy, E p = 3.5 TeV. This was probably not the main reason to choose this machine energy. In simple cases such as the one at hand the singularity condition can be directly obtained. The l T boundary is the projection of the phase space points at which the tangent plane is vertical and contains the invisible direction x 3 . At these points ∂Φ(l T , l 3 , x 3 )/∂x 3 = 0. Eliminating M from this expression and Eq. (7) one obtains x 3 = l 3 . At these boundaries M 2 = 4 l 2 T .

A. The formal singularity condition
The procedure of the last paragraph requires some guesswork, but can be rendered entirely general and systematic. At a singularity one or more of the invisible directions are contained in the tangent plane to the full phase space. The general condition for this to happen is that, in the space {x} of invisible directions, the row vectors of the Jacobian matrix D ij ≡ ∂E i /∂x j (with the row index i running along the number of equations and the column index j over the number of invisible coordinates) be linearly dependent, so that the derivative relative to an x-direction normal to these vectors be zero. In other words, at a singularity, the rank of D ij must be smaller than its rank at nonsingular points [7].
For the general single-W case we are discussing and the reduced rank condition is The same condition is obtained in the p T = 0 example. Combining it with Eq. (7) results in x 3 = l 3 , the phase space boundaries shown as straight (blue) lines in Fig. (1).
The general case with nonvanishing p T is treated with equal ease. Eliminate the four variables x to solve the five equations (3-6,12) in M . The result is Σ T = 0, with: Of the four M -roots of Σ T = 0, one is not unphysical which reduces to M T = 2 |l T | for p T = 0. The function M T 2 of Eq. (14) is the consuetudinary M 2 T of Eq. (1).

V. KIM'S SINGULARITY VARIABLE
Discussing the general case with an arbitrary number of invisible final state particles, Kim has argued [7] that the use of a "singularity variable" (SV) is more powerful than that of a singularity "condition" (SC), such as the one leading to the M 2 T result of Eq. (14). Kim requires a SV to have four properties [7]: (i) To vanish at the singularity. (ii) To be perpendicular -at the singularity-to the phase space surface in the observable directions. (iii) To be "normalized such that every event can give the same significance". (iv) To be computed to first nontrivial order (the second fundamental form) in the distance between a phase space point and the nearest singularity.
Our interpretation of these formal looking choices is the following. Condition (i) is the only scale invariant stipulation. At the singularity, condition (ii) entails a maximal sensitivity to the unknown masses. Condition (iii) ensures that two events with the same distance to the singularity be treated on equal footing. The requirement (iv) is one way to make the procedure general.
To fathom all this it is useful to jump momentarily to the result of Kim's prescription in our single-W case. The SV (more precisely, the singularity function) is: with Σ T as in Eq. (13), and M substituted for M, as its role will now be that of a trial mass. For p T = 0 this SV reduces to: Refer for a moment to the limit Γ → 0 for the W width and a situation with no measurement uncertainties. Consider a set of N real or MC generated events, i.e. a list of values of ( l, p T ) and the histograms dN (M)/dσ of the corresponding values of σ = Σ(M, l, p T ), for different choices of M. For M = M W , the real or "MC true" value of the W boson mass, the singularity is at σ = 0, dN (M)/dσ peaks at that point and vanishes for σ < 0. For a fixed data set and varying M, the function dN (M)/dσ varies in shape, but obviously not in statistically useful content. We shall later illustrate these points in detail.
The use of an "implicit" variable M may seem to be an overkill. In the single-W case with p T = 0, it is. One could equally well erase M in Eq. (16) and use the SV: which, in conjunction with M 2 = 4 l 2 T , embodies two projections of the full distribution dN/(dl T dl 3 ).
Contrariwise, one could make the singularity condition into a singularity variable with an implicit M: and consider the distributions dN (M)/dσ T . But the information that these distributions contain is precisely the same as that of the distribution dN/dl 2 T , the corresponding histograms are just mirror reflected and shifted relative to one another.
The above unfavorable commentaries on implicit variables are by no means general. Even in the single-W case, for p T = 0, it will not be possible to "erase" M from Eq. (15) in the same cavalier spirit in which we erased it from Eq. (16) to obtain Eq. (17). Singularity variables should be of particular practical relevance in problems with more than one unknown mass or unobservable particle, for which the labor of making templates for all possibilities may be out of the question. There, at least at the discovery stage, "clever" variables may be useful to zoom kinematically to the relevant mass ranges before a full analysis is to be contemplated, as discussed in [2].

VI. THE QUEST FOR AN OPTIMAL VARIABLE
It is instructive to consider a trivial example with one visible variable, l, and a single invisible one, x, constrained by the "Euclidean phase space" equation This apparently arbitrary instance actually corresponds to an imaginable process, that of a particle decaying into an invisible one, X, and a visible one that happens to be at rest. The longitudinal momentum of X is x and its transverse one, l, is measured via the usual transverse balance. M is a combination of the masses involved [8].
The value of the unknown quantity M in Eq. (19) is encoded in the l-distribution. The Jacobian matrix is D = ∂Φ/∂x = 2x. The constraint that its rank be reduced is x = 0, resulting in the SCs l = ±M . For a given "observed" l, there are two points P in Φ. Their nearest singularity is the point S, as illustrated in Fig. 3. Following Kim's method [7], we obtain for the SV proportional to the squared (angular or geodesic) P to S distance measured on the Φ surface. In a less trivial case, the resulting SV would have been the same distance on the quadratic approximation to Φ around S.
There is nothing sacred about the elegant result of Eq. (20). There are other SVs that (up to an overall normalization) coincide with u to second order. Three examples, illustrated in Fig. 3, are: • (1) The distance between P and the hyperplane, H, tangent to Φ at S (the dotted vertical line, in this case). This distance is the horizontal arrow.
• (2) The P to H distance along the normal direction to Φ at P : the slanted arrow.
• (3) The square of the length of the vertical arrow.
In the notation of Eq. (20) and normalized so that they coincide with Σ K to O(u 2 ), these SVs are: Note that Σ 1 is the 2D analog of the singularity condition used as a SV, as in Eq. (18). That is to say, it is equivalent to the transverse mass distribution. Is any of these SVs in Eqs. (20) to (23) "the best" in some useful sense? To answer, consider the distributions of the numerical values σ of the various Σ i functions, for fixed M (a zero width resonance): Recalling Eq. (19), and in particle physics language, dx dl δ(Φ) is the phase space, H i is the distribution of the Σ i values. Monte Carlo generated "diagonal" histograms, H i (σ, M, M ), would be the templates for various trial choices of M .
In the four cases of Eqs. (20) to (23), with the notation ρ ≡ M/M , and normalized to unit integral in the allowed range of the corresponding σ, the distributions are In the simple case at hand, one need not refer to "nondiagonal" histograms H i (σ, M, M), that involve the implicit variable M = M . In more blind searches with several unknown masses this may no longer be the case.
Moreover the nondiagonal histograms provide one way to ascertain the "goodness" of their SV. To quantify the amount by which the distribution of a given SV is sensitive to the difference between a "true" mass M = M and a variation thereof, M = M + ∆M , define the "statistical squared derivative",χ 2 , and its integral [9] The notation reflects the parentage ofχ 2 with the usual χ 2 measure; it is also the square of the geometrical mean between ordinary and logarithmic derivatives. "Statistical" reflects the fact thatχ 2 (σ) is a local measure of a variation relative to the one expected from a standard deviation of 1σ size. In this hypothetical case with sharply defined cuts in σ,χ 2 is singular at σ = 0. Regularizing the singularity with a cut σ > σ 0 > 0 we obtain: The singularities of the different H i are all ∝ 1/ √ σ and have been equally normalized by construction (and for a fair comparison). The sensitivity to the value of M is maximal close to the singularity. This sensitivity puts the SVs of Eqs. (20) to (23) in the "goodness" order dictated by the second term in brackets in Eqs. (27). The fully "orthogonal" SV Σ 2 is the contest's winner. The usual transverse mass distribution (Σ 1 in this simplification) does not fare well. So far there seems to be no compelling reason not to have made the above variable-comparing analysis with M = M for starters. But in a more realistic case M would stand for the central value of a distribution of non zero natural width, while M is just an auxiliary quantity introduced for analysis purposes.
To illustrate the above, and to convey the numerical meaning of Eqs. (27), substitute the sharp definition of M in Eqs. (19,24) by the one corresponding to a resonance of mass M and width Γ: This corresponds to "spreading" the circle of Fig. (3) and "scanning" it with circles of varying -but sharply defined-M, with the help of different "Σ" scanners.
Results for the distributions for Kim's variable and the orthogonal SV are shown in the upper Fig. (4). The lower figure shows theirχ 2 i (σ) around the σ = 0 singular point, the domain to which the H i distributions are most sensitive to the unknown M. The figures are drawn for M = M = 1, Γ = 0.3, showing how the orthogonal Σ 2 is better than Σ K . However, the difference is not large and, for a narrow resonance (or one whose width is masked by detector effects) it would be negligible, as the relative differences close to σ = 0 between theχ 2 i (σ) of the various SVs diminish linearly as Γ/M → 0. The D i integrals of Eq. (26) over their complete respective kinematical domains are numerically similar, apparently demonstrating that, in toto, all variables are statistically equivalent. In practice this is not the case. The signal-to-noise ratios of the distributions are increasingly unfavorable as one moves away from the σ i ∼ 0 neighborhood of the signal's peak.
We have proven that Σ 2 is better than others, but not that it is the best. Its optimality, however, appears to be intuitively obvious. The phase space Φ of Eq. (19) simply scales as M changes. The optimal SV ought to maximize the dependence on M at every point in phase space. This dependence is maximal in the direction orthogonal to Φ. The variable Σ 2 measures a distance to the nearest singularity, in that preferred direction.

VII. INDUCED SINGULARITIES
Let us return to the case of single-W production and model the simplified p T = 0 instance as stated in the ending paragraph of §II, that is, to leading order. We use the quark and antiquark parton distribution functions of [10] at an LHC energy of √ s = 7 TeV and apply the cuts |l T | > 10 GeV and |η| < 2.3 to the charged lepton. We ignore the difference between W + and W − production. We choose to present results for the distribution of the values, σ, of the function: which differs from Eq. (16) by a factor 4 l 4 T . This does not affect the arguments to follow. Moreover, in conjunction with the transverse mass (4 l 2 T ) distribution, the use of Eqs. (16) or (30) are equivalent.
A heedless use of Eq. (30) results in an interesting surprise, illustrated in the top panel of Fig. 5. The histogram has two peaks, one of them significantly above the expected singularity at σ = 0. The peaks fuse as one lets the W have its rather narrow width, Γ/M 0.02, as illustrated in the lower panel of Fig. 5. Still, the fused peak is not just the expected singularity at the origin of the SV and the issue calls for understanding.
Consider restricting the phase space of Eqs. (7) and Fig. 1 to its slices at fixed longitudinal momentum of the W , W 3 = x 3 + l 3 , shown in these plots as (green) ellipses (in practice this can only be done at a monochromatic eν e collider). The distribution H(σ, M, M, W 3 ) is shown on the upper Fig. 6, for M = M = 1, W 3 = 2. It has two singularities besides the one expected at σ = 0.
The origin of the singularities is clarified in the lower Fig. 6, where the curve is the phase space Φ(l 3 , σ), again for M = 1, W 3 = 2. A uniform distribution of events along Φ(l 3 , σ), projected on the σ axis, has three cumulation points at the projections of the vertical tangents. The one at the edge is the expected σ = 0 singularity, the other two are induced singularities. In these M W = 1 units, for W 3 < 1 there is no induced singularity, for W 3 = 1 there is one and for W 3 > 1 there are two. One induced singularity survives the integration over the W 3 distribution, as shown in Fig. 5.
The source of the induced singularities is the specific form of the SV in Eq. (30) -or of the formal SV of Eq. (16)-which results in a fixed-W 3 phase space the curvature of whose surface is not everywhere of the same sign. The induced singularities are not endpoints, but are event accumulation points for the same reason as the endpoints, i.e. the tangent manifold to the phase space at their locations contains invisible directions. In a process with just one mass scale to disentangle, the complications we just discussed are a lesser problem. In a process with more than one mass scale, they are a putative source of confusion. The fully orthogonal SV Σ 2 of Eq. (22) does not result in induced singularities.

VIII. RESULTS
For the single-W case at hand, consider the "fully orthogonal" variable akin to Σ 2 in Eq. (22). We call it Σ A and discuss it first in the p T = 0 instance. Its geometrical interpretation is depicted in Fig. (2); Σ A is a measure of the length of the arrow, which is orthogonal to a phase space point P with coordinates (l T , l 3 , x 3 ) and The length, Σ A , of the orthogonal segment joining P with a point in the plane tangent to the singularity is such that More explicitly with x 3 as in Eq. (10). For each (l T , l 3 ) pair (an event) there are two equal probability solutions, the two roots of the equation. In generating events we chose at random the ± sign in Eq. (10). We show in Fig. (7) the p T = 0 results for the m 2 T and Σ A distributions. All three graphs are generated for a peak mass of the W , M = 1. As shown in the bottom figure, for a trial mass M = M the peak of the distribution shifts away from σ A = 0, becoming wider and, for M < M , double peaked: there is for this "bad" choice an induced singularity, even for the optimal SV. Naturally, the histograms with M = M are not statistically independent from the M = M one. While they may be used to "focus" on the correct choice of M, the extraction of information on the W boson mass would ultimately hinge on a set of templates for M = M values close to its currently measured value.
The value of x 3 is not always real. When the value of l 2 T chosen by the Lorentzian distribution of physical (or MC generated) values of M W is such that 4 l 2 T > M 2 , x 3 involves the square root of a negative number. There is nothing pathological about these events. The way to "recover" them is to set: In the middle Fig. (7), for example, the recovered events are those at σ 2 < 0.

IX. CORRELATIONS
It is clear that the transverse mass -or its equivalent Σ T of Eq. (18)-and the SV of Eq. (33) are highly correlated. They both vanish at the singularity as M − 2 l T . To illustrate the point, define the variable which has the same mass dimensionality as Σ A and, close to the singularity, carries the same information as Σ T . The double histogram dN/dΣ A dΣ t , shown in Fig. 8, illustrates the expected correlation. Naturally, correlations between observables constitute a weakness of their ensemble, to which we shall come back in the conclusions. Suffice it to say here that in the "signal only" case at hand, there is only one mass scale to extract from the data: the correlations are unavoidable.

X. THE GENERAL CASE
In Figs. (1,2) we have profited from the fact that the p T = 0 phase space of Eq. (9) is a function of l 2 T to plot the phase space for negative and positive l T . For p T = 0 this is no longer possible. Let l T and p T be the moduli of the corresponding vectors and θ be the angle between them. The general case phase space is then: for which the generalization of the p T = 0 result of Eq. (10) is The statistically optimal Σ A is computed exactly as in the previous section, with the result: where n 1 is computed as in Eq. (31) in terms of the phase space function of Eq. (36). More explicitly: Some examples of the general phase space surface are given in Fig. 9.

XI. CONCLUSIONS AND OUTLOOK
We have studied in detail the phase space of the simplest interesting hadron collider process involving an unobservable particle and only one mass to be determined. Naturally, the crucial ingredients are the phase space projections onto the observable momenta, their limits, and the distances of actual events from these limits.
The edge of the projected phase space is given by the formal singularity condition, Eq. (12), which can be re-expressed as a function of the observable momenta, Eq. (14) and coincides with the consuetudinary transverse mass function, Eq. (1).
The "singularity variables" are various measures of the distance of an actual event to the nearest edge singularity. We have determined in §VI the measure for which SV is statistically optimal, which we called the "statistical squared derivative" and turns out to be well known to statisticians as the "Fisher information" [9]. The actual result ought to have been obvious for starters: the optimal variable -Σ A in Eqs. (33,39)-is orthogonal to the phase space at all points and is thereby most sensitive to the unknown mass, which determines the overall scale of momenta.
Somewhat unexpectedly, singularity variables other than the optimal one develop fake singularities away from the edge singularity at σ = 0, see Fig. (5), top. The W 's natural width suffices to merge the edge and fake singularities, resulting in a peak at σ > 0, see Fig. (5), bottom. This is a potential complication in their use as tools to determine the unknown mass(es).
Contrary to the SCs, the SVs depend on longitudinal momenta. In the case of single-W production, whether or not they may add significant precision to a measurement of the W mass depends on the prior level of understanding of the relevant pdfs [5], a question that we have not tried to investigate. It may well turn out, contrariwise, that the optimal SV, with a value of M determined by the transverse observables, is a good tool to constrain the pdfs.
The SVs contain the SC as a factor. This makes them "weak", in that they are highly correlated to the information contained in the SC, as discussed in §IX.   7) is an efficient way to "focus" on the relevant mass scale, particularly for cases with more than one unknown mass [7]. But it does not add to the precision with which the mass(es) may be measured. Whether or not the various and rather negative conclusions of the previous two paragraphs apply to cases wherein more than one particle decays into invisible ones is a question that we plan to discuss in subsequent work. The answer requires a detailed study of the relevant phase space, akin to the one in this note.