Reconstruction of Higgs bosons in the di-tau channel via 3-prong decay

We propose a method for reconstructing the mass of a particle, such as the Higgs boson, decaying into a pair of tau leptons, of which one subsequently undergoes a 3-prong decay. The kinematics is solved using information from the visible decay products, the missing transverse momentum, and the 3-prong tau decay vertex, with the detector resolution taken into account using a likelihood method. The method is shown to give good discrimination between a 125 GeV Higgs boson signal and the dominant backgrounds, such as Z decays to tau tau and W plus jets production. As a result, we find an improvement, compared to existing methods for this channel, in the discovery potential, as well as in measurements of the Higgs boson mass and production cross section times branching ratio.

To be explicit, imagine a hypothetical limit in which we have a signal (the Higgs) that consists of a narrow peak in invariant mass, a background (the Z) that is also a narrow peak, but centred elsewhere, and additional backgrounds (like W plus jets and QCD) that are approximately flat in invariant mass. The signals and backgrounds are otherwise roughly indistinguishable in their dynamics. Now, if the observables that we have available are uncorrelated with the invariant mass, then we are essentially reduced to counting events in order to try to discover the signal and our ability to do so is greatly limited by the total statistics available. We are, moreover, completely at the mercy of systematic uncertainties in the overall background normalization, which we have no way to measure in data. Even if we were able to make a discovery in this way, we could at most make one measurement of the signal properties (its overall size) and here too we would be exposed to the systematic uncertainty in the background normalization.
Conversely, if we find a way to reconstruct, more or less, the invariant mass, the first benefit is that we are no longer limited by the overall statistics, but rather by the number of signal and background events in a region of invariant mass of our choosing (near 125 GeV being the obvious choice for h → τ τ ). Moreover, we now have a clean separation between the signal and background and indeed between the different backgrounds themselves. This opens up the possibility of using the extra information to constrain the uncertainties on the background yields and shapes via data-driven techniques. Indeed, a simple sideband analysis would suffice, in which the Z background is measured in a 'control' region near 90 GeV and the other backgrounds are measured in a control region away from the peaks near 90 GeV and 125 GeV. Finally, independent measurements of the signal mass and cross-section times branching ratio become possible.
Needless to say, the real situation is rather more complicated for h → τ τ at the LHC, with the current performance falling somewhere between the two extremes of perfect and imperfect mass resolution. Nevertheless, the basic principle remains the same: the more we are able to separate the signal and the different background components from each other, the less we shall find ourselves at the mercy of statistical and systematic uncertainties.
So, how could we reconstruct something like the τ τ invariant mass? Several approximate methods or observables have previously been suggested in the literature (see, for example, [6][7][8][9][10][11]). Some of these suffer from being rather poorly correlated with the invariant mass (some provide, for example, only an upper or lower bound on it), while others suffer from the fact that they turn out to be ill-defined for a significant fraction of events, with a consequent loss of statistics. As examples, the collinear approximation used in [10] fails for one in three events, whereas the observable used in [6,11] does not exist for a similar fraction of events.
Here we wish to propose yet another method, which differs significantly in that we focus on the subset of events in which a τ lepton undergoes a 3-prong decay. This implies an immediate disadvantage in the form of a reduced number of signal events overall for a given integrated luminosity: a τ lepton has a branching ratio of 15 % for a 3-prong decay (of which 9.3 % are to π − π + π − ν τ and 4.6 % are to π − π + π − π 0 ν τ ) [12], meaning that only 28 % of di-τ events feature at least one 3-prong decay. However, the hope is that this disadvantage is more than compensated by the advantages.
These advantages all stem from the fact that the presence of a 3-prong decay allows us to reconstruct the τ τ invariant mass, if the invariant mass of the neutrino or neutrinos from the other τ decay is known. Thus, for hadronic decays of the other τ , we can fully reconstruct events (up to a discrete ambiguity and in the absence of detector mismeasurements, both of which we shall deal with below); for leptonic decays of the other τ , we are able to partially reconstruct events. As a result we hope to benefit from a reduced exposure to statistical and systematic uncertainties as argued above.
The extra kinematic information needed to reconstruct comes from the location of the secondary (3-prong τ -decay) vertex: if one can measure with reasonable accuracy the impact parameter of each of the three charged tracks, defined as the shortest distance between the track and the primary vertex, then the intersection of these impact parameters gives the location of the secondary vertex. 1 Now, the location of the secondary vertex tells us the direction of the τ momentum; the mass-shell constraint for the τ then allows to reconstruct the magnitude of the τ momentum, up to a possible two-fold ambiguity. Given the measured missing transverse momentum, we are then able to reconstruct the momentum of the other τ (up to a further possible two-fold ambiguity), provided we know the invariant mass of the neutrino(s) produced in the other τ decay.
The reconstruction process just described can only be expected to work if things are well measured. For example, if they are not, we may end up with no real solutions to the kinematic constraints. We account for this by defining an ad hoc likelihood function in which we convolute the observed quantities with a function parameterizing the detector response. The maximum of this likelihood function is an event observable (albeit one with an obscure definition) and it is this observable that we propose to use for signal discrimination. The fact that we invoke a likelihood function also allows to deal with the unknown invariant mass of the two neutrinos produced in a leptonic τ decay: we marginalize with respect to the unknown invariant mass, including the matrix element for the τ decay.
Yet another advantage of focussing on 3-prong decays is that the fake backgrounds (coming from, e.g. W+ jets and QCD) will be reduced, as jets and other leptons are presumably less likely to fake a 3-prong decay (with a reconstructed sec-ondary vertex) than they are to fake a generic hadronic tau or leptonic tau decay. 2 The outline of the paper is as follows. In the next Section, we describe the algebraic details of the reconstruction procedure. In Section 3, we present the results of our numerical simulations and in Section 4, we draw our conclusions.

The method
As described in the introduction, events are reconstructed using the decay vertex information. If one of the τ leptons decays to a 3-prong hadronic system and a τ neutrino, then the displacement r from the primary interaction point to the τ decay vertex should be measurable with useful precision. 3 Let us begin by considering the limit of perfect detector resolution. Denoting the energy, momentum and mass of the 3-prong decay products by E j , p j and m j , respectively, and the angle between r and p j by θ, the momentum of that τ lepton can be reconstructed, with a twofold ambiguity, as p τ = p τ r/|r|, where (neglecting the neutrino mass) The other τ lepton may decay either hadronically or leptonically, into a visible system j ′ (a hadronic jet or a charged lepton) and an invisible system i ′ (a τ neutrino or a pair of neutrinos). The transverse momentum of the invisible system is found from the missing transverse momentum, / p T , and the reconstructed momentum of the first τ via Given the invariant mass of the invisible system, m i ′ = m ν = 0 for a hadronic and m i ′ = m νν ≥ 0 for a leptonic decay, one can then solve for the invisible longitudinal momentum, again with a twofold ambiguity: where The momentum of the second τ can now be reconstructed as p τ ′ = p i ′ + p j ′ , and hence the invariant mass of the τ τ system, m τ τ is determined, up to a fourfold ambiguity. Now consider a real detector and let q = (r, E j , p j , p ′ j , / p T ) correspond to the measured quantities. These do not coincide with their true values in an event, which we now denote byq, but rather are shifted by amounts depending on the detector resolution, which we describe by a response function, f (q,q). Then the likelihood, as a function of the true invariant massm τ τ , for an event with measured quantities q, may be written as where M(q) is the matrix-element squared for the decay and m τ τ (q) is the invariant τ τ mass reconstructed from the true quantitiesq according to the recipe described above.
Here M(q) should also include the jacobian factor relating the final-state phase space to the quantitiesq. We find, in most cases, that including these effects gives, at best, a marginal improvement in the mass resolution. Indeed, some effects (such as the exponential distribution of the τ -decay lifetimes), lead to large fluctuations in the likelihood integrand and hence to large errors in the numerical integration, worsening the mass resolution. Thus we do not include these effects, in general.
There is, however, one such effect which we do include. In the case of leptonic decay of the second τ , the matrix elements also depend on the momenta of the two invisible neutrinos, and the right-hand side of eq. (2.5) should include an integration over their phase space, weighted by the expected distribution of the νν invariant mass. This is conveniently expressed as P (m 2 νν ) dΦ νν , with where dΩ * is the element of solid angle in the νν centre-of-mass frame and At each phase-space point, the value of m νν is then used, with this weight, for the reconstruction of the decay.
In the integral over the delta function in eq. (2.5), we include all real solutions to the equationm τ τ = m τ τ (q). In contrast to [9], complex solutions should be discarded here, since they cannot correspond to true values for genuine τ τ resonance events. Note, however, that measured values q that would correspond to complex values of m τ τ lying close to the real axis if reconstructed directly, which correspond to real solutions shifted slightly by detector resolution, will be included in the integration at neighbouring values ofq. For fake backgrounds, we often find that no nearby values ofq lead to real solutions, allowing the event to be rejected.
We perform the integrations in eq. (2.5) by a Monte Carlo method similar to that adopted in [14], generating a large number of pointsq distributed around each measured point q according to a smearing function f (q,q) deduced from detector simulations. The jet masses are generated according to certain probabilities that we describe in the Appendix. Each real solutionm τ τ = m τ τ (q) is entered into a histogram with the corresponding weight. Because the Monte Carlo method generates only a finite number of points, all histogram bins are given a small positive offset, to avoid multiplications by zero.
Since our likelihood function does not encode the matrix element in its entirety, we cannot expect to be able to make statistical inferences directly from it in the usual way. Doing so might lead, for example, to us wrongly rejecting the Standard Model Higgs boson hypothesis, or obtaining a biased measurement of its mass. Instead, we use our ad hoc likelihood function to define an event observable in the following way: for each event, we extract the smallest value ofm τ τ that gives a local maximum of the event likelihood and define this to be the event value of the observable m SV . 4 Our simulations suggest that this observable gives distributions for Higgs and Z boson event samples whose peak locations provide a good determination of the corresponding boson mass, with small tails. In any case, the presence of such effects can be mitigated by comparing experimental distributions of observables to template Monte-Carlo samples, as we do in simulations of pseudo-experiments below.

Simulations and results
Our study is based on a sample from the Herwig++ event generator [15], version 2.52 [16] and corresponds to an integrated luminosity of 20 fb −1 , which is very similar to what has been achieved at the LHC by the end of 2012.
As regards the signal, recent Standard Model predictions for Higgs production at the LHC may be found in Ref. [17]. For a Higgs mass of 125 GeV, at a collision energy of 8 TeV, the expected total cross section is 19.52 pb in the gluon fusion channel and 1.58 pb in the vector boson fusion channel, with a probable uncertainty of around 10%. The predicted SM branching ratio for τ τ decay, also given in Ref. [17], is 6.4%. Higgs production followed by τ τ decay thus corresponds to a cross section of 1.34 pb.
For the Z background, CMS [18] reports a flavour-averaged prediction of σ(pp → Z → ll) = 1.13 nb and a measurement of 1.12 nb for the 8 TeV LHC. We take σ(pp → Z → τ τ ) = 1.13 nb. For the W + j background, we use σ(pp → W j) = 2.15 nb, taken from Herwig++. All of our simulations are carried out at the parton level, without showering or hadronization effects, apart from hadronic tau decays which we simulate using Herwig++ [19]. The detector response was modelled as follows.
Firstly we assume identification efficiencies of 0.4 and 0.3 for the 1-and 3-prong hadronic taus, respectively [20]. The lepton identification efficiency is assumed to be 0.9 in our analysis. To estimate the number of W + j events in which the jet mimics a 1-or 3-prong hadronic tau, we use the fake rates of 0.01 and 0.002 for 1and 3-prong taus, respectively. These numbers are obtained from the simulation of W + j events [21].
Secondly, we parameterize detector mismeasurements by smearing the energy component of jets and leptons with σ(E)/E j = 0.5 GeV For the missing transverse momentum, we smear each component with σ x = σ y = 5 GeV.
For the τ decay vertex, the Monte-Carlo truth position is smeared by Gaussian distributions of widths 0.613 ± 0.008 mm and 10.5 ± 0.2 µm, in the directions parallel and perpendicular to the 3-prong tau-jet, respectively. For jets that fake taus, we take the truth vertex position to be zero and then smear as above.
We then apply event selection cuts to purify the signal, for which we impose For the hadron-lepton mode, we further impose to reduce the background involving W s, where ∆φ is the azimuthal difference between the lepton and the missing transverse momentum. The cross section for each process/channel after taking account of the efficiencies of (mis)identification and the selection cuts is listed in Table 1. We use the 3 prong-hadron (-lepton) channel for m SV and the hadron-hadron (-lepton) channel for m vis and m ef f . In Figure 1 we compare the signal and background distributions of our variable m SV with the existing variables m eff and m vis . m vis is simply the invariant mass of the visible products of both τ decays, while m eff includes the missing transverse momentum (an explicit definition may be found in [10]). We show results for the  lepton-hadron modes and hadron-hadron modes separately. The better separation between signal and backgrounds that we expected to obtain using m SV is clear to see in the Figure. Indeed, m vis has distributions for the Higgs signal and Z background which are strongly peaked, but the two peaks sit on top of each other. m eff incorporates extra information in the form of the missing transverse momentum, and slightly increases the separation between the maxima of the Higgs and Z boson peaks, but at the cost of introducing large tails (from the smearing of the missing transverse momentum measurement). As a result, the Higgs signal is easily hidden in the large tail of the Z background. Moreover, the shapes of all components become similar, making discrimination difficult when the overall normalizations are uncertain. In contrast m SV provides a good separation between the narrow Higgs and Z boson peaks, which appear at the true mass values. These peaks are, furthermore, very different in shape from the continuum W + jet background.
To see how well each variable can reconstruct the resonance, we list the peak location of the each variable's distribution and the input mass of the resonance in Tables 2 and 3 in the hadron-hadron and hadron-lepton modes, respectively. The pure Z → τ τ and h → τ τ samples with several Higgs masses are used. The peak location is calculated as the weighed average of the three highest bins. The error is estimated by using 10 independent samples. The tables show clearly that m SV reconstructs the masses of resonances very well compared to the other variables. Although this correlation is not the basis of our method for mass determination, it helps to separate the signal from the background.
Another remark is that the S/B for the W + jet background is better for m SV , as can be seen from Fig. 1. This is because a fraction of the W + jet background events do not produce real solutions. The fact that the tau mass is much smaller than the typical momentum scale of the reconstructed objects (jets and leptons) implies that the neutrino momenta are inferred to be very close to those objects (see e.g. Eq. (2.1)). However the direction of this inferred momentum tends to conflict with the direction of the observed missing transverse momentum, since the neutrino from the W decay is generally not collimated with respect to those objects, leading to no real solution.
Nevertheless, it is also apparent that the statistics available using m SV are lower than for the other variables, even in the region of maximum signal. We need, therefore, to make a quantitative comparison of the three variables. To do so, we generate distributions of them (for m h = 125 GeV signal and backgrounds) in ten pseudoexperiments, each corresponding to an integrated luminosity of 20 fb −1 of 8 TeV LHC data. Each pseudo-experiment is then compared to template model distributions with different values of m h and with different normalization factors f h , f W , and f Z for the Higgs signal and W + j and Z backgrounds, respectively. (The values f h,W,Z = 1 correspond to the leading order Monte-Carlo prediction.) Allowing the model distribution normalizations to float in this way allows us not only to take into account some of the most important systematic effects 5 (arising from the uncertainties in the luminosity, the Monte-Carlo predictions, and data-driven extrapolations), but also provides a means to measure the cross-section times branching ratio for Higgs production followed by decay to τ τ .
We make a cut on the observable of interest itself, so as to maximize its discovery    To assess the discovery potential, we then compute the difference in log-likelihood between models with and without a Higgs signal. In the model without a signal, we maximize the log-likelihood with respect to f Z , and f W , whereas in the model with a signal, we additionally maximize with respect to m h and f h . In Fig. 2, we show −2 times the difference in log-likelihood for the three variables. The centre of each bar shows the mean value over trials, while the width of each bar gives the root-meansquare deviation over trials.
We observe a significant improvement using m SV in the hadron-hadron channel, along with a more modest improvement (compared to m vis ) in the leptonic channel. The different performances of m vis in hadron-lepton and hadron-hadron modes can be understood from the distributions in Fig. 1. Unlike the hadron-lepton mode, the m vis distribution in the hadron-hadron mode shows that both Z and W + j backgrounds as well as the signal have falling shapes in the signal region m vis ∈ [75, 110] GeV. This suggests that in the hadron-hadron mode, fitting the signal + background distribution with only the Z and W + j backgrounds by floating f Z and f W can be more possible compared to the hadron-lepton mode. This feature  can be seen explicitly in Fig. 3, which show the same likelihoods as in Fig. 2 but with the f s fixed at 1. Floating the f s brings significant degradation for m vis in the hadron-hadron mode.
The absolute values of the discovery significance are exaggerated, since we have neglected sub-dominant backgrounds and many uncertainties, but the relative performance of the different variables should be meaningful.
We have estimated the size of W +j background using the reported tau fake rates in W + j events. The tau fake rate is generally dependent on the tau identification algorithm and the jet p T . To check the robustness of our result, in Fig. 4 we show  the same discovery potential plots as Fig. 2 but containing a W + j background twice as large as the one in Fig. 2. The discovery potentials are degraded slightly but the qualitative features are unchanged.
To assess the expected resolution in the Higgs mass measurement, we first show, in prepared in the same way, we cannot estimate any biases that might occur when each variable is used to determine the mass from real data. However, for each variable, we can estimate the precision of the mass measurement from a quadratic fit to the log-likelihood. The resulting fractional uncertainties are shown in Table. 4 and are seen to be much smaller for m SV than for the other two observables, in both hadron-hadron and lepton-hadron modes. This is easily explained by the fact that the Higgs signal peak is sharpest for m SV . For the measurement of the product of the production cross section and the branching ratio for the decay, the procedure is exactly analogous, except that now we vary f h , after maximizing with respect to m h . The resulting fit and fractional uncertainties are shown in Fig. 6 and Table 5, respectively. Again, there is always an improvement when using m SV . For m vis , the difference in resolution between the hadron-lepton and hadron-hadron modes can be blamed on different shapes in the W + j background in these modes, as we have discussed earlier in the connection with the different discovery potentials for m vis between these modes.
m SV m vis m eff had-lep 0.33 0.44 0.65 had-had 0.32 0.93 0.73 Table 5: Resolution for measurement of the production cross section times branching ratio for pp → h → τ τ normalized by the leading order prediction using m SV , compared to m eff and m vis , for lepton-hadron (left) and hadron-hadron (right) modes, for an integrated luminosity of 20 fb −1 at the 8 TeV LHC.

Conclusions
As shown in Figs. 2-6 and Tables 4-5, our simulations suggest that a significant improvement in discovery potential, Higgs boson mass resolution, and measurement of production cross section times branching ratio can be obtained by focussing on 3-prong τ decays. The performance of m SV is roughly comparable irrespective of whether the other τ meson decays leptonically or hadronically, leading to greater gains in the hadron-hadron channel, where the other observables perform more poorly. One possible reason for this is that the detector resolution is invariably poorer in this channel, in which final-state leptons are replaced by jets. Thus the performance of m vis and m eff is degraded. However, these events are also fully reconstructible using the vertex information, in the absence of smearing. As a result, the likelihood that defines the observable m SV is able to correct for the extra smearing to a certain extent, by insisting that the unsmeared quantites consistently reconstruct the event.
The gains are greatest for the mass measurement, which is perhaps not surprising since our method provides a means to reconstruct the mass whilst partially correcting for the uncertainties that are introduced by the detector resolution.
The results of our simulations are encouraging, but they should be taken with a pinch of salt. The simulations themselves are rudimentary, and we have only performed a comparison with the basic variables m eff and m vis . Both collaborations now employ more sophisticated likelihood-based analyses. Unfortunately the full details of these have not been made public, so it is difficult for us to make a fair comparison. CMS do say that their likelihood method gives a Higgs mass resolution of around 21% compared to 24% using mvis [22].
We have also not made a full study of the backgrounds, of which many are relevant for this search. However, the two backgrounds we did consider are very different in their nature (one being a genuine, resonant background and the other being a fake, continuum background). We hope therefore, that the other backgrounds will be similar to one or other of these in their behaviour. The recent CMS results suggest, moreover, that our two backgrounds are the dominant ones in the signal region in most of the τ τ sub-channels (along with pure jets, which we expect to be similar to W +jets).
Finally, we have only considered the most obvious systematic effect, namely the uncertainty associated with the normalization of the signal and backgrounds. Nevertheless, our qualtitative argument that a better mass reconstruction gives a better separation between the signal and the different backgrounds, means that many systematic uncertainties are expected to be reduced.
We hope, at least, that our qualitative arguments and quantitative simulations are enough to convince the collaborations to explore the suitability of this method. Even if it then turns out that a significant improvement is not obtained using our method alone, we remark that an overall improvement can still be expected if it is combined with existing approaches. Our method is complementary and, as is clear from Fig 7, the observable we extract is not strongly correlated with the existing variables m eff and m vis . It thus provides independent information and may be used to increase the significance of searches in the τ τ channel. Our simulations apply to the current 8 TeV run and our hope is that application of our method to data being produced now will allow us to clear up the mystery of the observed deficit of h → τ τ decays. Nevertheless, we expect our method to become even more relevant in the subsequent stages of the LHC programme. For one thing, the method is limited by statistics, but gains in reducing systematic uncertainties. It is the latter that will limit our ultimate ability to make precision measurements of the Higgs sector at the LHC. What is more, both ATLAS and CMS are planning upgrades or replacements of their vertex detectors, with an improvement of a factor of a few expected in the vertex resolution. The associated improvement in the τ τ mass reconstruction using our method should reduce the uncertainties even further.

A. The treatment of jet masses
As we discussed in section 2, in evaluating the likelihood function (2.5), we generate a large number of pointsq in which the jet energy is smeared, as well as the other observables, according to the detector resolution. The magnitude of the jet momentum is then calculated as p j = E 2 j − m 2 j , where m j is the jet mass. We generate for the 3-prong tau-jet, where R τ →πν is the BR(τ → πν) divided by the branching ratio of inclusive 1-prong tau decays and G(x; µ, σ) is Gaussian probability distribution with mean value µ and standard deviation σ. We took m ρ = 775 MeV, σ ρ = 90 MeV, m a = 1230 MeV and σ a = 160 MeV in our analysis. We have checked that the treatment of the jet mass does affect the frequency of finding real solutions in the likelihood evaluation, but does not affect much the overall shape of the likelihood function. Thus, m SV is rather robust against variation of the jet mass. To demonstrate this, we show two m SV distributions for the h → τ τ events in the hadron-hadron channel in Figure 8, where one uses the jet mass described above and the other uses massless jets, with P 1pr (m j ) = P 3pr (m j ) ∝ δ(m j ). As can be seen, the two distributions are very similar.