A critical point in the distribution of lepton energies from the decay of a spin-1 resonance

We consider a spin-$1$ resonance produced with an arbitrary spectrum of velocities and decaying into a pair of massless leptons, and we study the probability density function of the energy of the leptons in the laboratory frame. A special case is represented by the production of $W$ bosons in proton-proton collisions, for which the energy of the charged lepton from the decaying $W$ can be measured with sufficient accuracy for a high-precision measurement of $M_W$. We find that half of the resonance mass is a special value of the lepton energy, since the probability density function at this point is in general not analytic for a narrow-width resonance. In particular, the higher-order derivatives of the density function are likely to develop singularities, such as cusps or poles. A finite width of the resonance restores the regularity, for example by smearing cusps and poles into local stationary points. The quest for such points offers a handle to estimate the resonance mass with much reduced dependence on the underlying production and decay dynamics of the resonance.


Introduction
The problem of estimating the mass M of a resonance that partially decays into undetectable particles often arises in collider experiments. For example, it occurs when some of the decay products of the resonance interact too weakly with the detector to produce a signal, or when they are measured with insufficient precision. If the kinematics of the collision event can be closed by other means, for example by using energy-momentum conservation, the problem has an obvious solution, otherwise it is under-constrained.
The loss of information due to the unobserved particles, which prevents M from being unambiguously determined on an event-by-event basis, can be statistically recovered if the dymanics of both production and decay of the resonance are known. When such a prior knowledge is available, the probability density function (p.d.f.) of the visible particle momenta {p }, denoted by σ −1 dσ/d{p }, can be computed by marginalizing the unobserved degrees of freedom. This marginal p.d.f. depends on the unknown resonance mass through the kinematics of the visible decay products. In general, the multi-dimensionality of the observable space makes the analytic calculation of this function of paramount complexity. The problem is then best tackled by the use of Monte Carlo (MC) simulations of the process of interest, resulting in a discrete set of MC templates σ −1 MC dσ MC (M i )/d{p } generated at different trial values of M . With these templates at hand, a numerical evaluation of the likelihood function of the data is possible, and the standard theory of likelihood-based estimators can be then used for estimating the unknown mass [1]. By construction, such an approach is model-dependent, as it relies on theoretical assumptions (in fact, the complete S-matrix for the process of interest) for relating σ −1 dσ/d{p } to M . There are indeed examples where model uncertainties represent the limiting factor to the experimental accuracy. The determination of the W boson mass at hadron colliders represents perhaps the most remarkable of such cases [2][3][4]. An alternative approach, which allows the aforementioned limitation to be partly overcome, consists in exploiting singularities in the phase-space of the visible observables [5], i.e. special points where the tangent plane to the phase-space manifold is aligned with one of the invisible particle directions. The position of such pointed features in the spectra of kinematic variables can be related to the unknown mass, or, more generally, to combination of masses when there is more than one resonance in the decay chain [6]. Besides being ideally independent from the details of the underlying dynamics, the main advantage of the phase-space singularity method is that a multi-dimensional problem is recast into a search for striking features, like sharp edges or cusps, on univariate distributions. A study of the phase-space singularity method in the context of the W boson mass measurement at hadron colliders has been documented in Ref. [7]. Not surprisingly, the optimal of such singularity variables is highly correlated with the usual transverse mass which, being a function of the transverse hadronic recoil, is affected by other well-known sources of experimental uncertainty [4].
Motivated by the need of reducing the model-dependence in the measurement of the W boson mass without having to rely on the hadronic recoil, we will concentrate hereafter on the special case of a spin-1 resonance that decays into a pair of massless leptons, of which one is assumed to be measured with high precision, whereas the other is undetected. It has been pointed out in Ref. [8] that a two-body decay kinematics of this type features an obvious, yet subtle, invariance under boosts. Indeed, the mass of the mother particle plays a special role in the distribution of energy E of the visible daughter particle. In particular, it can be proved that M/2 is a local maximum of the energy distribution σ −1 dσ/dE, if the mother particle is produced unpolarized. In this case, one can just measure M by locating the point in the observed energy spectrum featuring the largest density. An application of this technique in the context of the top-quark mass measurement has been documented in Ref. [9].
The argument leading to the identity argmax[dσ/dE] = M/2 relies on the assumption that the resonance is unpolarized. Instead, we would like to be as agnostic as possible with respect to the mechanism of production and decay of the resonance. In this spirit, we will study the mathematical properties of the p.d.f. of the lepton energy in full generality by deriving exact results in the approximation of a narrow-width resonance. Strictly speaking, any unstable resonance has a finite width Γ > 0. In practice, the latter has to be compared with the experimental resolution σ E on the visible particle energy, which sets the minimum granularity at which differential properties of the p.d.f. σ −1 dσ/dE can be defined. The case Γ/σ E 1, is then mathematically equivalent to treating the resonance in the narrow-width approximation. We will then validate the results against selected toy examples of production and decay dynamics. The results of this study motivate the usage of stationary points in the higher-order derivatives of the energy p.d.f., in particular of the second derivative, as an estimator of the resonance mass. Finally, we will study this method in the context of the W boson mass measurement at the LHC using a MC simulation of the reaction pp → W → µν µ in proton-proton (pp) collisions at a center-of-mass energy √ s = 13 TeV. Quantitative estimates of the statistical and of the dominant theoretical uncertainty affecting the newly proposed method of measurement are also provided.

Kinematics in the laboratory frame
Let E (E * ) be the lepton energy in the laboratory (center-of-mass) frame, and c * ≡ cos θ * the cosine of the polar angle in the center-of-mass frame with respect to the mother particle velocity β in the laboratory. We also define E 0 = M/2 and introduce the dimensionless parameters x = E/E 0 , y = E * /E 0 , and z = E/E * = x/y. The lepton energy in the laboratory is related to E * and c * via a Lorentz transformation that depends only on the boost factor γ = (1 − β 2 ) − 1 2 , with β = |β|, namely: The distribution of energies in the center-of-mass frame is assumed to be described by a Breit-Wigner function: where ∆ = Γ/2M . Since we are mostly concerned with narrow-width resonances, i.e. ∆ 1, we can safely neglect the fact that the function h in Eq. (2.2) should be truncated at y = 0 to prevent the center-of-mass energy from becoming negative. In fact, Eq. (2.2) coincides with the more correct relativistic Breit-Wigner distribution [10] only when y ≈ 1 (although it is somehow simpler for the calculations to use the non-relativistic version of Eq. (2.2), the results presented here do not depend on this assumption). Finally, we remark that this p.d.f. converges to the Dirac delta function in the limit ∆ → 0.
From Eq. (2.1), the domain of z is found to be: where the relation γ 2 β 2 = γ 2 −1 has been used. If γ = 1, Eq. (2.1) can be inverted yielding: which implies a linear relation between c * and E at a fixed value of γ and E * .
In the center-of-mass frame of a spin-1 resonance decaying to a pair of spin-1/2 particles, the cosine of the polar angle of the decaying lepton with respect to a given quantization axis is described by a p.d.f. of the form [11]: where the angular coefficients A 0,4 have been introduced as arbitrary dimensionless functions of the boost factor γ. The A 0 coefficient controls the fraction of longitudinal polarization (f 0 ) and satisfies the requirement 0 ≤ A 0 ≤ 2, whereas A 4 regulates the fractions of left (f L ) and right (f R ) transverse polarization. For a pure V − A interaction, the angular coefficients are related to the polarization fractions f 0,L,R , relative to the direction of flight of the resonance, by the linear relations: where the explicit dependence of the angular coefficients on γ has been omitted for simplicity. Multiplying both sides of Eq. (2.7) by the constant E 0 , we obtain: where I(·) = 1 if the argument is true, 0 otherwise. The p.d.f. of x can be now obtained by marginalizing both γ and y. We assume γ ∼ g(γ) independently of y, which is usually appropriate for a narrow-width resonance. Under this assumption, we can write: The p.d.f. g is positive-definite and normalized to unity: We first consider the case that g is an analytic function everywhere, in particular at γ = 1 (the alternative case will be discussed later). Under this assumption, it can be replaced by its Taylor series centered at γ = 1: where κ ≡ γ − 1 ≥ 0. Likewise, we assume that A 0,4 (γ) are analytic at γ = 1 such that: 0,4 κ + . . . . (2.11) We now move to study the behavior of f when x ≈ 1. To this purpose, we expand the right-hand side of Eq. (2.9) in terms of a small parameter , such that x = 1 + . In this limit, we have: where k is an integer.

The narrow-width approximation
We first consider the case of a narrow-width resonance, i.e. we set h(y) = δ(1 − y). After integrating-out y, the right-hand side of Eq. (2.9) can be rewritten symbolically as: where Pol 2 ( ; ·) stands for a second-order polynomial in . In Eq. (2.13), the integration region has been split into two disjoint intervals: [ 2 /2, δ], where the cut-off δ is sufficiently small that the approximations in Eq. (2.10)-(2.11) are valid to first order, and the complementary interval [δ, +∞]. The first integral provides the contribution inside a neighborhood of x = 1 from the phase-space region γ ≈ 1, i.e. when the decaying particle is almost at rest; the second integral accounts for the contribution stemming from larger boost values. By virtue of the spin-1 assumption, the integrand function within both integrals is a quadratic polynomial in , hence it has vanishing derivatives beyond the second order. We can now compute explicitly the first integral at the right-hand side of Eq. (2.13). After a straightforward integration, we get: where K δ are constants that depend only on the cut-off δ. By rearranging the various terms in Eq. (2.14), we finally obtain: where the constants A, . . . , H are independent of . There are terms in this expansion which are not analytic at = 0. They are are proportional to the constants: As a consequence of Eq. (2.15), the higher-order derivates of f can develop various types of singularity at x = 1: kinks or cusps (from | | terms), discontinuities (from sign( ) terms), delta functions (from the derivatives of the latter), and poles (from the derivative of the ln | | term). This non-regular behavior should not come as a surprise: even if g, A 0 , and A 4 are smooth functions, the transformation in Eq. (2.4) becomes singular in the limit γ → 1 + . When convoluted with a continuous spectrum of boosts, this primordial singularity is weighted by an infinitesimal cross section g(1)dx, but still percolates to the final p.d.f., in a way that depends on how the phase-space (γ, c * ) gets populated. We anticipate that the appearance of a singularity in a strict mathematical sense is a consequence of treating the resonance in the narrow-width approximation. Within this approximation, however, its existence is a robust result, as discussed later.
The nature of such singularity is, to some extent, akin to the phase-space singularity discussed in Ref. [5]. Indeed, for a fixed value γ > 1, the variable x has two wall singularities associated with the decay of the visible particle collinear or anti-collinear with the velocity of the resonance. These configurations correspond to edges of the phase-space. When γ = 1, a singularity of higher degree appears because the dimensionality of the phasespace shrinks from a line to a point. The singularity studied here has, however, a richer phenomenology compared to the algebraic singularity of Ref. [5], because it depends not just on the geometry of the phase-space manifold, but also on how the dynamics of production and decay distributes events across the phase-space.
It is interesting to consider some limiting cases of Eq. (2.16). As expected, an unpolarized resonance gives rise to a p.
implying that x = 1 is a local maximum of the density (in particular, it is a cusp if g (0) > 0 and a stationary point otherwise). This is in agreement with the result obtained in Ref. [8]. Whenever the boost p.d.f. has a vanishing amplitude in the neighborhood of γ = 1, i.e. g (k) = 0 for the first k derivatives, the coefficients in Eq. (2.16) are also vanishing. A special case is when there is a minimum momentum threshold on the production of the resonance, such that g(γ) ≡ 0 for γ ≤ γ thr. (in this case, all the derivatives at x = 1 are formally zero). The expansion of Eq. (2.15) thus contains only terms of order k , with k = 0, 1, 2: in the neighborhood of x = 1, the energy p.d.f. is proportional to a parabola. Equation (2.3) can be then used to express E 0 in terms of the lower (E − ) and upper (E + ) edges of the interval in which f (x) ∝ Pol 2 (x), as: We now briefly consider the possibility that either of the functions in the integrand is not analytic at γ = 1, such that the Taylor expansion of Eq. (2.10)-(2.11) are not defined. To fix the ideas, we consider the case g(γ) ∼ (γ − 1) α , with α > 0, for which γ = 1 is a cusp point. As we will see later, this choice of p.d.f. finds at least one remarkable physical application. In this case, Eq. (2.14) gets modified by the appearance of terms like | | 2α+m , where m = 1, 2, 3, . . . is an integer, which, for arbitrary values of α, gives rise to the same phenomenology of non-regularity on f .

Finite-width effects
The regularity of laboratory energy p.d.f. is restored by integrating over a continuous spectrum of center-of-mass energies. Indeed, in the limit γ → 1 + , the laboratory frame coincides with the center-of-mass frame, i.e. f (x) = h(x), which is a smooth function. In the case of a finite width, Eq. (2.13) applies with the replacement: where is again defined as = x − 1. Equation (2.13) thus becomes: Consider for example a term like | | in the expansion of f . Upon integration over y, its first derivative calculated at x = 1 gives: hence the new minimum/maximum of f gets displaced from x = 1 by an amount of O(∆ 2 ). Notice that if the k-th order derivative has a kink such that | lim x→1 the integration over y smears this singularity into a stationary point whose position depends not just on ∆, but also on g and A 0,4 , which determine the left and right slopes of f (k) . In the latter case, nothing can be said a priori about E 0 , unless that it must be "close" to the stationary point: for the case of a symmetric kink in the second-order derivative, such a displacement would be of O(Γ 2 /M ). We remark that an ultimate calibration of the estimator would be needed if the displacement were in excess of the desired accuracy on M .

An explicit example
We now discuss the case of W boson production at the LHC which allows us to specialize some of the generic formulas derived before.
By using the fact that dγ 2 = d(|q| 2 /M 2 ), we can write with ζ = q T /|q|. Hence, g(1) = 0. Furthermore, since g ≈ γ 2 − 1, the boost spectrum is not analytic at γ = 1. The finiteness of the double-differential cross section d 2 σ/dq 2 T dy 0 , where y is the rapidity of the W boson, is a general result that arises from the small transverse momentum behavior of cross sections in hard processes [12]: Similarly, the differential cross section dσ/dy at y = 0 is finite because it is proportional to the product of the partonic densities evaluated at s is the protonproton center-of-mass energy. By using a complete Monte Carlo simulation of this reaction, discussed in Sec. 4, we also find that the ∼ (γ − 1) 1 2 rise is limited to a tiny region of phase-space, typically γ − 1 5 × 10 −4 , corresponding to the region |q| 4 GeV, i.e. where the differential cross section in q T is rapidly growing. For γ values in excess of about 10 −3 , the boost p.d.f. is well approximated by a power law of the form g(γ) ∼ (γ − p 0 ) −p 1 , with p 0 ≈ 0.9 and p 1 ≈ 0.8.
The angular coefficients are dimensionless functions of |q|/M and y encoding the average polarization of W bosons produced in hadron collisions as a function of the W boson kinematics [11]. They are simultaneously determined by the partonic density functions (PDFs) of the proton and by emission of additional QCD radiation. Furthermore, they depend on the reference frame, i.e. they are not rotation-invariant. The coefficients A 0,4 of Eq. (2.5) are calculated in a particular rest frame of the W boson defined by a boost along the velocity β, which we will refer to as the helicity frame. For fixed and small values of γ, their values are determined by the average of A 0,4 over all momenta q defining the surface of a sphere of radius |q|, such that γ ≈ 1 + |q| 2 2M 2 : Similarly to Eq. (2.21), one can easily verify that all directions q/|q| are equally likely in the limit |q| → 0: where the right-hand side does not depend on the direction Ω. When |q| is small, the direction of the quark and antiquark in any rest frame of the W boson remain almost antiparallel. They both carry spin parallel or anti-parallel to their respective momentum. While averaging over the full solid angle, their directions, which are fixed in the laboratory frame, move isotropically in the helicity frame around the quantization axis defined by q itself. The net result must be a uniform distribution in c * , i.e. A (0) 0 = 2/3 and A (0) 4 = 0, or, using Eq. (2.6), f L = f R = f 0 . This is only true when |q| → 0. For small finite momenta, a tiny polarization is produced. The amount by which a polarization is built by the misalignment of the quark directions can be estimated looking at a particular configuration where y = 0 and |q| = q T . In this case, the center-of-mass frame is related to the Collins-Soper (CS) frame [13] by a rotation of π/2. From a MC simulation, we find that the longitudinal polarization in the CS frame is built at a pace of about The values of (f L , f R , f 0 ) in the helicity frame is thus perturbed by an amount of similar size from the values ( 1 4 , 1 4 , 1 2 ) in the limit |q| → 0. When |q| grows, large values of q T in Eq. (2.23) become increasingly unlikely and the cross section favors a longitudinal motion with |q z | q T . In this latter case, a net transverse polarization is built as a consequence of the PDF ratio q/q growing fast at large rapidities. Again using a MC simulation, we find an empirical slope dA 4 /dy ≈ ±0.3 for q T = 0, where the sign depends on the charge of the W boson. Given that dy = d ln γ, we also have A We notice that the same argument applies to the case of a proton-antiproton collider, for which W bosons produced almost at rest in the laboratory frame are preferentially polarized in the direction of the antiproton. When integrating over the full solid angle, however, the average polarization in the helicity frame vanishes.

Discussion
We now summarize the results obtained so far. When considering the two-body decay of a spin-1 resonance, the probability density function f describing the laboratory energy of any of the two daughter particles, assumed to be massless spin-1 2 particles, is in general not analytic at x = 1, when the narrow-width approximation for the resonance is made. In particular, the derivatives of f are likely to be non-derivable, discontinuous, or divergent at that point, depending on the boost factor p.d.f. and on the polarization of the resonance. The appearance of a local maximum of the density at x = 1 is in general a fortuitous occurrence. A pole at x = 1 in the first derivative is associated with the presence of a non-zero transverse polarization at rest. Conversely, cusps in the second derivative appear quite naturally as a result of terms of the form |x − 1| 3 in the expansion of f around x = 1, which do not compete with (x − 1) 3 terms from higher boost factors. This is a general result that only depends on the spin-1 assumption for the resonance. The condition for which a cusp is generated amounts to |G| > |F |, as defined in Eq. (2.15). If g (0) = 0, this is satisfied if A (0) 4 = 0 (otherwise the coefficient E in Eq. (2.16) would be non-zero, giving rise to a pole in f (1) ). If instead g (0) > 0 and A (0) 4 = 0, the condition for developing a cusp in f (2) is: which is satisfied if A 4 is a slowly varying function of γ. We remark that even if Eq. (2.25) were not satisfied, kinks at x = 1 for at least one among f (0) , f (1) , and f (2) will be present, so that a divergence in the higher-order derivatives will eventually show up.
The following search algorithm is then proposed. For simplicity, we assume the laboratory energy E to be normalized to a constant E 0 , playing the role of a trial mass. We define x = E/E 0 and set f (0) ≡ f for consistency of notation. Then: 4. if no such points exist, then compute f (k) , with k ≥ 3, and continue searching for a singularityx k+1 .
The mass estimator is then defined asM = 2x k E 0 . When a broad distribution of energies in the center-of-mass frame is accounted for, the analyticity of f is restored. In particular, poles and cusps are regularized into local stationary points. These points are in general displaced from x = 1 by an amount that vanishes in the limit ∆ → 0. Furthermore, since there may be a multiplicity of such stationary points, a prior on M will be in general needed to disambiguate among them and for an ultimate calibration of the estimator. The determination of the unknown resonance mass is then recast as a univariate optimization problem, in a way that decouples from the details of the underlying production and decay dynamics to the extent that the resonance width can be neglected.

Numerical examples
The predictions of Eq. (2.15) have been verified numerically for selected choices of the functions g, A 0 , and A 4 . The three following functional forms for the boost factor p.d.f. have been studied: . This function is analytic in γ = 1, and is chosen as the prototype of a p.d.f. with g (0) = 0.
The numerical values of the coefficients are somehow tuned on the empirical boost distribution for W bosons production at the LHC when γ − 1 is in excess of about 10 −3 , see Sec. 2.3.
• g sqrt (γ) ∝ (γ − 1) 1 2 . This function is chosen as the prototype of a p.d.f which is not analytic in γ = 1. In particular, it is finite for γ → 1 + , but its first derivative is infinite.
For sake of numerical precision, the integration over the boost factors is restricted to the range γ ∈ [1,3]. Both g exp and g pow are not integrable, so strictly speaking they cannot be interpreted as probability density functions. However, they can still provide a good approximation of physical densities for small values of γ. For all three functions, the following values for (A 0 , A 4 ) have been studied: The resulting probability density functions f (0) are shown in Fig. 1-3 together with their first f (1) and second f (2) derivatives. The latter are estimated from finite differences of f (0) over an equally-spaced mesh of x i values: where d is the mesh size. Figure 1 shows the results for g exp for each choice of the angular coefficients. Since g (0) = 0, we have C = E = H = 0, as for Eq. (2.16). Apart from case 2), where x = 1 is also a local maximum of f (0) , the first derivative does not vanish in general at x = 1. However, the presence of a term like | 3 | in Eq. (2.15) induces the presence of a cusp in the second derivative.
For g pow , which has g (0) > 0, additional sources of non-analyticity are present in Eq. (2.15), clearly visible in Figure 2. In case 1), the Taylor expansion of f (0) contains a term of the form E | | with E = 0, hence the second derivative receives a contribution from a step-function centered at x = 1. In case 2), C = 0 so that x = 1 is a cusp: the first and second derivatives are thus locally proportional to a step-function and a delta function, respectively. In case 3), H < 0, so that the first order diverges to +∞ like ln | | when → 0, whereas the second order derivative goes like 1/ . Case 4) is qualitatively similar to the first.
Whenever g or any of the two angular coefficients are not analytic at γ = 1, like for g sqrt , Eq. (2.15) does not apply. The general appearance of step functions and poles in x = 1 is however unchanged, as shown by Figure 3. In particular, a term of the form 2 ln | | stems from the last but one line of Eq. (2.13). The choice of g sqrt is, however, special since, for an unpolarized resonance, it provides an analytic p.d.f.:  In this last case, the mass estimator would be provided by argmax[f ].

W bosons at the LHC
A special case of the problem studied in Sec. 2 is represented by W bosons produced in hadron-hadron collisions and decaying into a lepton-neutrino pair. For the purpose of studying this particular process, a sample of proton-proton collision events at √ s = 13 TeV simulating the pp → W → µν µ reaction has been generated with NLO QCD accuracy using the MG5_aMC@NLO [14] event generator interfaced with Pythia8 for the parton shower [15]. The NNPDF3.0 [16] set is used to simulate the proton PDFs. A total of about 84 millions of events are generated, with a fraction of negative weights such that the effective number of events is reduced by roughly a factor of two compared to the case of unweighted events. Given that the cross section for W → µν µ production is about 20.5 nb at √ s = 13 TeV [17], the simulated sample used for this study has the same statistical power of a sample of collision events corresponding to 1.9 fb −1 of integrated luminosity.
The natural width of the W boson is Γ W ≈ 2.08 GeV [10], corresponding to a value ∆ ≈ 10 −2 in Eq. (2.2). This is not negligible on the scale of a high-precision measurement of M W , which targets a relative accuracy on the mass as low as 10 −4 [2][3][4]. Hence, an ultimate calibration of the estimator is required to meet this level of accuracy. In Sec. 2.3, it was found that the boost factor p.d.f. for W bosons produced in proton-proton collisions can be roughly approximated by a power law g ∼ (γ − 1) α : for small values of γ − 1, i.e. 5 × 10 −4 , we have α ≈ 0.5 and the W boson is almost unpolarized in the helicity frame; for higher boost values, α ≈ −0.8, and a net polarization is built, ultimately dominated by a particular transverse mode. From the numerical simulations of Fig. 2-3, we could thus expect x = 1 to be a local minimum of f (2) . Indeed, the rising edge of g populates only the region |x − 1| 5 × 10 −4 , where it provides a smooth function f (2) , similarly to the rightmost panel in the second row of Fig. 3. For larger boosts, f (0) should resemble more closely the plots in Fig. 2, featuring a deep minimum of f (2) at x = 1. The whole picture is then smeared by the finite width of the W boson. Figure 4 shows the binned density f (0) obtained from the simulated sample of events. The first and second derivatives are estimated bin-wise in the same fashion as Eq. (3.1). A deep minimum in the histogram of f (2) at x values close to unity is clearly visible. Interestingly, x = 1 is also close to be a global maximum of f (0) , a result that qualitatively recalls the last toy example in Fig. 2, where the boost p.d.f. and the angular coefficients were indeed tuned on the values extracted from the Monte Carlo simulation.

Detector acceptance and final-state radiation
For the case of W boson production and decay at the LHC, two effects break the mathematical hypotheses assumed to derive Eq. (2.8): the presence of detector acceptance cuts, which are unavoidable in experiments at hadron colliders, and the emission of final-state photon radiation (FSR) from the charged lepton. Both affect the center-of-mass dynamics, albeit in different ways as discussed below. When the detector coverage is incomplete, the harmonic polynomials that depend on φ * , which are themselves proportional to sin θ * and sin 2θ * [11], don't average exactly to zero in some regions of the phase-space, thus adding spurious terms to the c * expansion of the decay angle distribution. Furthermore, the detector acceptance cuts, being based only on the kinematics of the visible decay products, affect the lepton reconstruction efficiency differently depending on the kinematics of the W boson. The overall result is to modify the angular coefficients by boost-dependent efficiency factors ρ 0,4 (x | γ) in Eq. (2.9), which are in general non-trivial functions of x. Here, we will study the effect of selection cuts realistic for general-purpose experiments like ATLAS [18] and CMS [19], namely |η| ≤ 2.5 and p T ≥ 25 GeV, where η and p T are the muon pseudorapidity and transverse momentum, respectively. In the MC simulation we find these cuts to have an efficiency of about 77% for W + and 84% for W − for events with lepton energy E ≈ M W /2.
The emission of FSR by the charged lepton perturbs the center-of-mass dynamics. The overall effect of such perturbation can be thought of as the convolution of the original harmonic polynomials with a smearing kernel, which introduces infinite harmonics in c * . Furthermore, the emission of extra particles (photons and lepton-pairs) reduces the centerof-mass energy available for the muon and thus primarily affects the visible energy spectrum by an overall downward shift. This process is well-known [20] though, so that it could be in principle unfolded at the detector level to recover a pure QCD description of the final-state kinematics. For the purpose of studying this process, we will consider both an unrealistic scenario, where the charged leptons do not undergo photon radiation (pre-FSR leptons), and a realistic scenario where a QED-shower of the muons is simulated by the Pythia8 MC (bare leptons).

The search for a stationary point
We now consider the problem of finding the stationary points of the higher-order derivatives of f . The rightmost histogram in Fig. 4 clearly shows that a local minimum of f (2) is present at x ≈ 1. The estimator of such point is, however, not uniquely defined. We won't address here the problem of finding the statistically optimal of such estimators. Instead, we decide to define the estimator implicitly as the root of a conveniently chosen function of the data. To this purpose, we first approximate the density f with a polynomial function of degree D centered at x = 1: (4.1) The coefficients c n in Eq. (4.1) are determined from a fit to the simulated data by means of an analytic χ 2 method. We then define the stationary points of the i-th order derivative implicitly as the roots of the (i + 1)-th order derivative. The latter are determined numerically by using Halley's root-search method [21], a variant of the classical Newton method. Statistical uncertainties on the coefficients of the polynomial fit are propagated to the rootŝ x i by means of pseudo-data, resulting in 68% confidence level (CL) intervals. This approach has a twofold advantage: it regularizes the statistical bin-by-bin fluctuations by the use of smooth functions and it allows for an analytic evaluation of the derivatives at any point x.
The energy spectrum is provided as a histogram with 100 MeV large bins. The central value of each bin is normalized to the constant E W = M W /2 to yield the dimensionless variable x. The fit is performed in the interval E ∈ [36.2, 44.3] GeV, corresponding to invariant masses of the W bosons in a window of about ±4Γ W around M W . Such range is large enough to provide acceptable fits with D = 4, which is the minimum degree to define a unique root of the third derivativex 3 . We notice that this way of estimating the rootŝ x i is quite sensitive to border effects related to the choice of the fit range: since Eq. (4.1) is only a local approximation of the density, discrepancies between the true spectrum and f (0) at the edges of the fit range tend to pull more strongly the coefficients associated with the large powers of n, which in turn affect more strongly the roots of the higher order derivatives. The bias associated with the choice of fit range will be eventually reabsorbed as part of the calibration procedure discussed in the next section.
For the sake of comparison, the root of the first derivativex 1 , which corresponds to a local maximum of f (0) , is also studied. Positive and negative muon events are first considered separately. Since the two samples of events provide consistent results, they are ultimately combined to maximize the statistical accuracy of the analysis. The result of the fit to the simulated data is shown in Fig. 5, together with the first, second, and third derivative of the fitted polynomial function. As expected, the second derivative features a local minimum around x = 1 identified by the rootx 3 of the third derivative.

Calibration curve
The calibration of thex i estimator is determined by reweighting the same MC sample to different values of M W . The fit is then repeated for each mass-reweighted sample and new rootsx i are computed, resulting in a calibration curveM W =M W (x i ). Figure 6 shows such curves separately for muons in the full phase-space and for muons within the detector acceptance as defined in Sec. 4.1. The pointsx i are then interpolated through a linear regression. The solid area shows the 68% CL interval as obtained from the covariance matrix of the fit. The first, second, and third derivatives of the fitted function are also shown with their uncertainty bands.
The response ofx 3 to a change of M W is found to be linear to better than 1%. This fact is reassuring and confirms thatx 3 is indeed a good estimator of M W . For comparison, the root of the first derivativex 1 , and the mean value x µ in the same range of x values considered in the fit, are also reported as a function of M W . The former is found to have a good linear response but a larger offset compared tox 3 (800 MeV against 100 MeV). Instead, the mean value x µ is very mildly related to M W , which makes it a rather poor estimator of the mass. This is however mainly an artifact of considering a narrow range of x values: as illustrated by the first panel of Fig. 5, the function f (0) in the neighborhood of x = 1 is a concave function: a tiny shift δx of the peak position does not change the mean of the distribution to first-order in δx. It is also interesting to study the response of the three estimators to a restriction of the lepton phase-space. This is shown in the right panel of Fig. 6. Both x µ andx 1 are found to be significantly affected by acceptance cuts, i.e. their values change compared to the full-acceptance case by more than their statistical uncertainty. On the contrary,x 3 is more stable, changing by less than one standard deviation.  Finally, an identical analysis is repeated considering bare leptons instead of pre-FSR leptons. The results are shown in Fig. 7. Besides an overall shift of about 200 MeV, which can be ascribed to the loss of energy drained away by FSR [20], the linear response ofx 1 andx 3 to M W is found to be preserved. pass the event selection criteria to the extent that one of the muon is emitted with either soft p T or large values of |η|. In turn, this condition preferentially selects events where the intermediate Z/γ * boson is produced with a finite boost. As for Eq. (2.15) with g (k) = 0, this implies that the E spectrum of the selected muons in the neighborhood of M Z /2 must have a flat second-order derivative. Similarly, muons in top quark events come from the decay of boosted W bosons, since, in the rest frame the decaying t quark, the W boson recoils againts a b quark with |p * W | ≈ m t /2. For multi-jet production, these arguments do not hold and a detailed data-driven estimation of the functional form should be performed. However, we remark that the analysis discussed here is robust against changes in muon acceptance, as observed in Sec. 4.1. Since the multi-jet background is reducible by either tighter identification criteria on the muon or by stricter requirements on the missing energy or transverse mass in the event, we do expect room for optimization in case the functional form of this background were found to be measurable with only limited accuracy.

Outlook
A more refined analysis of the residual theoretical uncertainties would require the simulation of a much larger data sample and a careful treatment of other model effects, like nonperturbative physics, mixed QCD-QED corrections, etc. (see e.g. Ref. [22] for a recent review). Similarly, experimental uncertainties from the backgrounds, the bias due to the choice of a fixed-order polynomial to fit the data, the impact of the lepton energy scale uncertainty, etc., should be thoroughly assessed. This is beyond the scope of this work. The study presented here confirms that a stationary point in the second derivative of the lepton energy density is a good estimator of M W and that it is robust against changes of the underlying W boson production and decay dynamics, detector acceptance cuts, and the emission of photon radiation.

Conclusions
We have considered the two-fermion decay of a spin-1 resonance of mass M , and analyzed the lepton energy distribution in the laboratory frame in full generality, i.e. regardless of the underlying production and decay dynamics of the resonance. In particular, we have studied the analyticity of the probability density in the neighborhood of M/2. We find that the density at this point is not analytic for a narrow-width resonance. In particular, we have studied the conditions for which a singular point appears in the higher-order derivatives of the density, and found that the second derivative is likely to develop a cusp or a pole. Exact formulas have been derived under the assumption that the distribution of boost factors γ and the polarization of the resonance are described by regular functions of γ. The formulas have been qualitatively validated with toy examples of production and decay of a narrow-width resonance. When a finite width of the resonance is accounted for, the regularity is restored such that cusps or poles are smoothed into local stationary points potentially displaced from M/2. The size of such displacement depends on the width of the resonance, but partially also on the production dynamics, thus requiring an ultimate calibration. The quest for stationary points in the higher-order derivatives of the energy density function is thus advocated as a way to estimate M with possibly limited knowledge of the underlying production and decay dynamics of the resonance. A special case is represented by the production of W bosons in proton-proton collisions, which has been studied on a Monte Carlo simulation of this process, assuming LHC-like conditions on the proton beams. As expected, a stationary point in the second derivative is found close to M W /2. The robustness of this point as an estimator of M W has been studied by considering the effect of detector acceptance cuts, the emission of final-state radiation, changes of the perturbative calculation of scattering amplitudes for W production, the proton PDFs, and the input W boson mass. Interestingly, such a mass estimator features a good linearity, a small bias, and is rather resilient to changes in the lepton acceptance and in the modeling of the W boson production dynamics. An ultimate assessment of the residual model-dependence is left for future work.