Model-independent method for measuring the angular coefficients of $B^0 \to D^{*-} \tau^+ \nu_{\tau}$ decays

Reconstruction of the $B^0 \to D^{*-} \tau^+ \nu_{\tau}$ angular distribution is complicated by the strongly-biasing effect of losing the neutrino information from both the $B$ and $\tau$ decays. In this work, a novel method for making unbiased measurements of the angular coefficients while preserving the model independence of the angular technique is demonstrated. The twelve angular functions that describe the signal decay, in addition to background terms, are modelled in a multidimensional fit, using template probability density functions that encapsulate all resolution and acceptance effects. Sensitivities at the LHCb and Belle II experiments are estimated, and sources of systematic uncertainty are discussed, notably in the extrapolation to a measurement of $R(D^{*})$.

Complete information on the B 0 → D * − τ + ν τ decay kinematics is ultimately obtained from the full angular decay rate [30]  where the angles (θ D , θ L , χ) parameterise the spin-0 B 0 meson decay topology, and are defined in App. A. This expression involves a sum of twelve independent angular functions, each of which is multiplied by a coefficient I X (X ∈ {1c, 1s, 2c, 2s, 3, 4, 5, 6c, 6s, 7, 8, 9}) that encapsulates the dependence on the square of the dilepton invariant mass, q 2 , form factors, and the fundamental couplings. The angular distribution can reveal the influence of NP even if R(D * ) becomes fully compatible with the SM. Angular analysis is well established in the study of rare dimuon decays such as B 0 → K * 0 µ + µ − [31,32]. The principal advantage of the technique is that the coefficients contain all form factor dependence, so there is no experimental uncertainty due to a choice of form factor scheme. Combinations of the angular coefficients can also reduce dependence on the form factors in subsequent phenomenological interpretations. The difficulty that arises in applying angular analysis methods to semitauonic decays is the missing information due to the lost neutrinos in both the B and τ decays, which strongly sculpts the angular distribution and makes a parametric fit to data impossible.
In this paper, a novel approach is presented that uses a multidimensional template fit in the angular basis to measure the I X coefficients in a model-independent manner without statistical biases. The technique assumes and requires excellent agreement between data and simulated samples for the construction of the templates, which must describe all reconstruction, resolution and migration effects. In this case the fit reduces to a linear sum of twelve independent templates, preserving the inherent model independence of the angular method. Eleven I X parameters are measured (one is fixed by I X = 1), and the model dependence is confined to one overall signal yield. The expected resolutions and covariance matrix of the I X parameters are determined under realistic experimental conditions emulating both hadron collider and B-factory scenarios; assessment of the I X sensitivity to NP is beyond the scope of the work.
For this demonstration, the angular coefficients are measured integrated over all values of q 2 . This is not required and parallel angular fits in several q 2 bins could follow a similar procedure. NP may also be sought by measuring I X andĪ X (the CP conjugate) values separately with the same template fit by tagging according to the τ lepton charge.
It is also noted that the methods developed in this work are applicable to the light lepton B 0 → D * − l + ν l modes. These decays suffer from lower background levels, and superior angular resolution due to the stable charged lepton coming directly from a well-defined B decay vertex. A measurement of the I X coefficients of these modes is well motivated to validate form factor schemes and provide a null test of the SM.

Monte Carlo simulation
Monte Carlo (MC) signal samples of B 0 → D * − τ + ν τ decays are generated using the RapidSim package [33]. RapidSim is a fast MC generator for simulating heavy-quark hadron decays. It uses TGenPhaseSpace [34] to generate b-quark hadron decays and FONLL [35] to give the b quark the correct production kinematics for the Large Hadron Collider (LHC). Exclusive decays of the D * − meson decay to D 0 π − are generated, with the D 0 meson decaying to K + π − . The three-prong τ + → π + π + π −ν τ decay, rather than the more abundant muonic τ decay, is the focus of this study. This is because the presence of a τ decay vertex results in lower backgrounds, and with only two neutrinos in the final state, this mode has the best decay angle resolution.
Signal B 0 mesons are decayed using EvtGen [36]. The ISGW2 [37] model distributes B 0 decays according to Eq. (1.1), with SM values for each I X coefficient. The VSS model is used to model the vector D * − decay, and TAUOLA [38] (model number 5) produces the correct kinematic and invariant mass structure for the three-prong τ decay. To emulate the effects of detector resolution, the track momenta and decay vertex coordinates are smeared according to RapidSim LHCb resolution presets [33]. The missing momentum due to the presence of two final state neutrinos is modelled by ignoring both particles in any calculations of reconstructed quantities. Detector acceptance effects are modelled by restricting generated B 0 mesons to the momentum range [0, 100] GeV/c and the pseudorapidity range [1,6], which is similar to the geometrical acceptance of the LHCb detector.

Kinematic reconstruction
Due to the lost neutrinos in the final state and the absence of a constraint from the initial state, neither the τ nor B 0 momentum can be fully reconstructed at a hadron collider. The best calculation of the decay angles uses estimates of the τ and B 0 momentum that are  Table 1: Summary of τ and B 0 momentum bias and resolution, defined as the mean and width of (p Reco − p True )/p True , respectively.
determined from the topology of the decay. As the τ lepton mass is well known [39], its momentum can be estimated up to a two-fold ambiguity from its line of flight between the reconstructed B 0 and τ vertices. The τ momentum magnitude in the laboratory frame is where m 3π , | p 3π |, and E 3π are the reconstructed invariant mass, momentum, and energy of the three-prong system, m τ is the known τ mass, and θ τ,3π is the angle between the three-prong momentum vector and the flight vector of the τ . Eq. (2.1) has a single solution when θ τ,3π takes the maximum allowed value such that the square-root term is zero, i.e.
θ max τ,3π = arcsin Combined with the τ line of flight, this provides an estimate of the τ momentum components with minimal bias. In a similar fashion, the B 0 momentum is estimated using where Y represents the D * − τ + system as reconstructed using Eq. (2.1). The maximum opening angle between the B 0 flight vector and the momentum vector of Y is which is used in Eq. (2.3) to provide a single estimate for the B 0 momentum magnitude. The τ and B 0 momentum bias and resolution are tabulated in Tab. 1.
Since the four-vectors of the τ and B 0 are necessary inputs for the calculation of the decay angles, they too suffer substantial resolution effects. This is shown in Fig. 1, where the true and reconstructed angular distributions of B 0 → D * − τ + ν τ are compared. The angular resolutions are quantified in Tab. 2, which shows that cos θ D is the most well-reconstructed quantity. As a result of the large resolution effects on cos θ L and χ, considerable migration of events occurs within the angular phase space. This is illustrated in Fig. 2, where the two-dimensional projections of the truth-level and reconstructed angular distributions are shown. The density difference within each bin is also shown, where red (blue) regions indicate increases (decreases) in density caused by the reconstruction. The overall effect of the event migration is to reduce the density variation across the phase space, but a bias in cos θ L towards more positive values is also evident.
Due to the reconstruction-induced event migration, a parametric fit to the reconstructed decay angles using Eq. (1.1) cannot be used to measure the I X coefficients. Any attempt to correct the reconstruction biases leads to a dependence on the model used in the Monte Carlo from which the correction is derived. Instead, it is demonstrated that the I X coefficients can be measured with a binned fit using multidimensional histogram templates, where the angular degradation and other detector effects are included directly in each of the twelve templates that describe the signal probability density function (PDF).

Angle
Res. µ Res. σ cos θ D 0.00 0.23 cos θ L 0.15 0.65 χ -0.01 rad 2.24 rad Table 2: Angular variable resolution mean (µ) and width (σ) determined using generated B 0 → D * − τ + ν τ events. The resolution is defined as a Reco − a T rue , where a ∈ {cos θ D , cos θ L , χ}.  Figure 2: Two-dimensional projections of the truth-level (top) and reconstructed (middle) angular distributions; darker colours indicate regions of higher density. The density difference (bottom) in each bin indicates where the density increases (red) and decreases (blue) as a result of the reconstruction.

Building and using templates
The decay rate defined in Eq. (1.1) involves a sum of twelve independent angular functions in the (cos θ D , cos θ L , χ) True variable space. Because of the bias and resolution effects that arise in the reconstruction of semitauonic decays, a template fit in the (cos θ D , cos θ L , χ) Reco variable space must be used. The twelve angular functions become twelve multidimensional template histograms each scaled by an I X coefficient. In this way, the fit is performed with the reconstruction effects included directly within the PDF. The template histograms are created by first filling twelve density histograms D I X .  Figure 3: Projections of the D I X density histograms (red) and the h I X templates (blue) for X = 1c (top) and X = 2s (bottom). The differences between D I X and h I X are caused by the bias and resolution effects introduced by the reconstruction.
Each density histogram contains a large number of bins across (cos θ D , cos θ L , χ) True space; 30 × 30 × 30 uniform bins are used here. The D I X histograms are filled according to the angular function associated with each I X in Eq. (1.1). The twelve D I X histograms are divided by a density histogram, M , of the total signal model given by Eq. (1.1). By taking the ratio R I X = D I X /M , the model used in the simulation cancels and ensures that the R I X histograms are model independent. The R I X histograms are then used to assign weights to simulated signals events based on their (cos θ D , cos θ L , χ) True value. A simulated event falling within the true angular bin i will be assigned twelve weights, w I X = R I X (i).
Subsequently, the per-event weights w I X are applied when constructing histogram templates, h I X , in the reconstructed angular variables (cos θ D , cos θ L , χ) Reco . The result of this procedure is illustrated in Fig. 3, where the h I 1c and h I 2s templates created using a sample of one million generated events are shown. The large size of the sample ensures that the template statistical uncertainty in each bin is negligible. The corresponding density histograms D I 1c and D I 2s are also shown, to illustrate the sculpting effect of the reconstruction. Note that the h I X templates may not be positive in all bins, but the sum of all twelve templates in any given bin is always positive, and is proportional to the total decay rate in that bin.

Signal-only template fit
The twelve h I X templates are normalised to have unit integral, then multiplied by their corresponding I X coefficients to build the total signal PDF This expression is analogous to Eq. (1.1), but where the angular functions are replaced with the h I X templates. The normalisation condition from Ref. [30] is imposed, which constrains the value of I 1c from the other I X coefficients. Importantly, the form of each h I X template remains the same regardless of the underlying physics model; only the values of the I X coefficients are modified by the presence of NP. Using this PDF, a binned maximum likelihood fit to the reconstructed decay angle distribution is demonstrated using the TensorFlowAnalysis package [40], which provides an interface between TensorFlow [41] and the MINUIT [42] minimisation package. For this demonstration, N sig = 100, 000 signal events are generated across n bins bins in each of the three angular variables, where n bins = 16 is chosen to ensure that there are approximately 25 signal events in each bin. An alternative binning scheme based on placing approximately 50 events into each bin has been tested and shows consistent behaviour. The I X values found by the binned fit are summarised in Tab. 3 (a).   Table 3: Results of the (a) binned template fit to the reconstructed angles, and (b) unbinned parametric fit to the truth-level angles. In both cases only the B 0 → D * − τ + ν τ signal sample is used. The uncertainties quoted in (a) are statistical, and arise from the use of a finite number of bins. The uncertainties in (b) are due only to finite sample size, and are negligibly small compared to those of (a); they are thus omitted for this comparison.

Validating with a truth-level fit
To validate the binned template fit results, a second sample of 100, 000 signal decays is generated without acceptance cuts, resolution smearing, or the effect of the missing neutrino in the four-vector calculation. Using the true angles and Eq. (1.1) as the PDF, an unbinned parametric fit is performed with the normalisation condition of Eq. (3.2) again imposed. The total fit projections are shown in Figure 4 and the fitted I X values are recorded in Table 3 (b). The level of agreement between the truth-level parametric fit and the binned fit to the reconstructed angular variables is excellent. By correctly describing reconstruction biases and resolution effects using templates, the binned fit correctly measures the angular coefficients. It is desirable to show that the template method can correctly recover SM I X values under an alternative form factor scheme. To demonstrate this, the signal-only template fit is repeated with per-event weights applied to the generated signal sample in order to align it with the CLN form factor scheme [43]. The reweighting is performed using the HAMMER package [44,45]. Both the parametric unbinned fit and the binned template fit are rerun; the h I X templates are unaltered as they do not contain any form factor dependence. The I X coefficients resulting from the fits are displayed in Fig. 5, and are seen to agree well with the SM values calculated using the CLN scheme in Ref. [30]. Furthermore, the longitudinal D * − polarisation fraction calculated from this signal-only template fit, aligns satisfactorily with the SM expectation noted in Sec. 1.

On binning in q 2
The weighting procedure can be applied inclusively or in a number of discrete q 2 True bins. Assuming simulation accurately models all q 2 True → q 2 Reco sculpting and bin migration, the model independence of the I X measurements is preserved. Specifically, with N bins in q 2 True , the event weights in the template definition become w I X = R I X (i, j), where i again identifies the true angular bin and j tags the true q 2 True bin. Because an event from the j th q 2 True bin may migrate to any k th q 2 Reco bin, 12N 2 templates are required. These h k j,I X Binned signal only (3D reco angles) Figure 5: Comparison of the signal-only truth-level unbinned parametric fit (orange points), the binned template fit (navy points), and the q 2 -integrated SM I X values calculated in Ref. [30] (red points). The fitted MC samples are re-weighted to follow the CLN form factor scheme, which has also been used in the calculation of the cited SM values.
templates are built in the reconstructed angular variables space (cos θ D , cos θ L , χ) Reco for each q 2 Reco bin by applying the weights. 1 Although there are 12N 2 templates, there are still only 11N freely varying angular coefficients and N signal fractions, A j . This is because the apportioning across the q 2 Reco bins is fixed from simulation via the weighting procedure. In a generalisation of Eq. 3.1, the PDF is defined summed over q 2 Reco bins and q 2 True bins, (4 − 6y j,1s + y j,2c + 2y j,2s ) h k j,1 1c + y j,1s h k j,I 1s + y j,2c h k j,I 2c + y j,2s h k j,I 2s + y j,6c h k j,I 6c + y ,j6s h k j,I 6s + y j,3 h k j,I 3 + y j,9 h k j,I 9 + y j,4 h k j,I 4 + y j,8 h k j,I 8 + y j,7 h k j,I 7 + y j,5 h k j,I 5 .
(3.4) 1 The q 2 Reco bins are not required to align with the q 2 True bins. Since the reconstruction may measure q 2 Reco values outside the possible q 2 True range, different ranges may be necessary. Also, the number of q 2 True bins and q 2 Reco bins may differ, though they are both taken to be N in this explanation.
The coefficients are y k j,X = f k j,X I j,X , where the fractions f k j,X with k f k j,X = 1 quantify the bin migration probabilities are derived from simulation through the weighting procedure.
It is stressed that because the q 2 True binning is included in the event weights R I X (i, j) = D j,I X /M j , the dependence on the model used in the simulation is removed in a multi-q 2 -bin implementation in the same way that it is removed in the inclusive implementation documented in this work. However, as 12N 2 templates are required just for the signal, large simulated samples would been needed in such an analysis as well as large data sets.

Dealing with backgrounds
The template fit is studied in a more realistic manner by considering backgrounds to the B 0 → D * − τ + ν τ signal. A large number of backgrounds are generated using RapidSim, matching the list of backgrounds considered in recent experimental work [8]. There are three categories to consider: (1) prompt B → D * − π + π + π − (X) backgrounds that are reduced by requiring a displaced three-prong vertex, (2) Several high branching fraction B decays produce three charged pions at the B decay vertex, with one or more additional particles (X) missed in the reconstruction. The prompt backgrounds considered for this study are recorded in Tab. 5 (App. B). These prompt B → D * − π + π + π − (X) backgrounds are reduced by applying a flight requirement to the 3π system. In Ref. [8], τ candidate vertices are required to be displaced from the B vertex along the z-axis with a separation of 4σ z , where the vertex uncertainties of the B and τ candidates determine the standard deviation σ z . Vertex fits are not performed in RapidSim so the effect of the flight requirement must be approximated. Assuming a combined B and τ vertex uncertainty of 1 mm, the flight requirement is emulated by requiring that all 3π vertices are displaced 4 mm from their corresponding B decay vertex. This is applied to all RapidSim samples, including the signal sample; the efficiencies on signal decays and prompt B → D * − π + π + π − (X) decays are similar to those reported in Ref. [8].
The largest source of background remaining after the τ flight requirement arises from double-charm decays of the type B → D * − D + s (X). The D + s meson flies before decaying to final states that include three charged pions, mimicking the τ + → π + π + π −ν τ signature. The B → D * − D + s (X) modes generated for this study are recorded in Tab. 6 (App. B) and the D + s modes are listed in Tab. 7 (App. B). A realistic sample is created for each doublecharm B mode by weighting contributions according to measured branching fractions [39]. To create a single B → D * − D + s (X) sample, the subsamples are summed according to the proportions measured in Ref. [8]. Similarly, B → D * − D + (X) and B → D * − D 0 (X) decays are also present in the background. The D 0 and D + mesons also fly, and often produce three or more charged particles in their decay. The B → D * − D (0,+) (X) modes generated for this study are recorded in Tab. 8 (App. B), and the D (0,+) modes are listed in Tab. 9 (App. B).
The B → D * * τ ν τ decay is identical to signal aside from the fact that the charm meson is produced in a higher state of angular momentum. Given this similarity, the feed-down background must be included as a small fixed fraction relative to signal, and treated as a systematic pollution in any phenomenological interpretation. Two contributions are included, namely B + → D 1 (2420) 0 τ + ν τ and B + → D * 2 (2460) 0 τ + ν τ decays, where the excited charm mesons decay to D * − π + and the additional pion is not reconstructed. The EvtGen models used are given in Tab. 10 (App. B). No B → D * * τ ν τ modes have been observed, so the D 1 (2420) 0 and D * 2 (2460) 0 samples are summed in equal proportion.

Multivariate classifier
The reconstructed angular distributions for each background category are shown in Fig. 6 along with the total signal template. All events are required to fall within the LHCb acceptance and pass the τ flight requirement. On their own, the reconstructed decay angles do not provide enough signal and background separation to reliably measure the twelve parameters that describe the B 0 → D * − τ + ν τ signal. As such, a multivariate classifier is included as a fourth dimension to provide sufficient separation.  To minimise dependence on any underlying model, the classifier is designed to avoid any input variables that relate directly to the B decay kinematics; variables describing the 3π system are thus preferred. The most appropriate variables include the reconstructed proper lifetime and the invariant mass of the 3π system, and the invariant mass of the π + π + and π + π − combinations. These variables provide discrimination between τ candidates and charm mesons, due to their different lifetimes and decay properties.
A gradient boosted decision tree (BDT) classifier is trained using the scikit-learn package [46]. The total B → D * − D + s (X) background sample described above is used to train the BDT to favour B 0 → D * − τ + ν τ decays. The D + s sample is used as it is the largest category of background remaining after the τ flight requirement. The distributions for each input variable in signal and background are shown in Fig. 7. The area under the classifier Receiver Operating Characteristic (ROC) curve is 0.84, and the performance is illustrated in Fig. 8 (a) where the BDT distributions are shown for both in signal and background. The classifier is applied to all generated signal and background samples, and those events with classifier decision values above zero are retained for use in the fit and shown in Fig. 8.  This selection requirement is 80% efficient on signal while rejecting 70% of background. The feed-down and signal BDT distributions are confirmed to be almost identical.
τ component of the signal where the neutral pion is not reconstructed, is not a background but rather contributes to the total signal. This decay mode differs from the three-prong signal only in the τ decay and thus has the same I X coefficients. To benefit from the presence of this mode in the signal, a dedicated sample is generated and processed in a manner identical to the principal τ + → π + π + π −ν τ signal. Small differences in the reconstructed decay angles are observed, due to the additional missing momentum in the τ + → π + π + π − π 0ν τ decay. The BDT distributions are also not equivalent for the two cases, since the invariant mass variables used in the BDT differ. Following Ref. [8], a total signal sample is created from both the τ + → π + π + π −ν τ and τ + → π + π + π − π 0ν τ samples, using f 3π = 78% as a relative fraction.

Final signal plus background sample
To create a realistic total dataset that includes both signal and background events, the generated signal and background samples are summed using the relative fractions listed in Tab. 4; the values used are based upon those measured in LHCb data [8]. The signal comprises f sig = 11.8% of the total sample, while the total feed-down contribution is 11% of the signal fraction (1.3% in total). The majority of the sample is composed of Table 4: Fractions used to construct the total data sample. The fractions that are fixed in the fit are labelled so.

Four-dimensional fit
To reliably measure the signal fraction f sig and I X coefficients, a four-dimensional binned maximum-likelihood fit to the total dataset is performed using the PDF where h sig represents the signal PDF, defined in (3.1). In the fit, f sig , f D + s , and f D + freely vary, as do the eleven I X coefficients within h sig . The fractions f D 0 and f D * * are fixed, matching the procedure in Ref. [8]. The prompt fraction is constrained such that the fractions sum to unity.
The twelve signal template histograms h I X are created following a procedure that is essentially identical to that described in Sec. 3, but where the additional BDT dimension is included and a signal sample containing both the τ + → π + π + π −ν τ and τ + → π + π + π − π 0ν τ modes is used. For each background mode β, a template h β is created by filling a fourdimensional histogram with the ((cos θ D , cos θ L , χ) Reco ,BDT) distribution of the total β sample. Large generated samples exceeding one million events are used to create all of the templates, in order to avoid statistical uncertainties on the template bin contents. The number of bins n bins in each fit dimension is chosen to be equal and is determined using where N sig is the anticipated number of signal events in the sample. The bin boundaries are not uniform, but are chosen such that each bin is populated with approximately 25 signal events. An alternative binning based on 50 signal events per bin gives consistent results.

Hadron collider scenario
The four-dimensional fit is applied to datasets corresponding to three scenarios: N sig = 8000, N sig = 40, 000 and N sig = 100, 000. These are calculated by extrapolating the yield measured in Ref. [8] to 9 fb −1 (Runs 1 + 2), 23 fb −1 (Runs 1-3), and 50 fb −1 (Runs 1-4), as anticipated by LHCb [47]. The 23 and 50 fb −1 scenario yield expectations account for additional improvements in the upgrade LHCb detector performance relative to LHCb in Run 1 and 2. The signal and background fractions listed in Tab. 4 are used to create the data sample for each of these cases, maintaining the same signal purity throughout. 9 fb −1 (N sig = 8000): The 4D template fit functions stably in this lowest-statistics case, and the resulting fit projections are shown in Fig. 9. The signal fraction is measured to be f sig = 0.116 ± 0.010 (8.6% relative uncertainty), which agrees with the input value f sig = 0.118. The I X values are measured with large uncertainties but remain compatible with the true values, as shown in Fig. 10. Using the I X results for this sample, the derived value of F L (D * ) is 0.368 ± 0.047, where the uncertainty quoted is statistical only. It will thus be possible to make a competitive measurement of F L (D * ) using the 9 fb −1 data. 23 fb −1 (N sig = 40, 000): Increasing the signal yield by a factor five gives a strong improvement in the I X measurements, as shown in Fig. 10; the derived value of F L (D * ) is 0.489 ± 0.018. The signal fraction is f sig = 0.117 ± 0.003 (2.6% relative uncertainty), which has reduced more than a naive N sig scaling due to the larger number of bins in the 23 fb −1 fit relative to the 9 fb −1 fit. Both fits operate with binning schemes that require approximately 25 signal events per bin, and the increased number of bins in the 23 fb −1 fit provides greater differentiation between the I X signal components and the backgrounds.  Figure 9: One-dimensional projections of the N sig = 8, 000 binned fit, where the solid points represent the data and the filled histograms represent each fit component. The total B 0 → D * − τ + ν τ signal, given by the sum of all twelve angular terms, is shown in red.
the 23 and 50 fb −1 scenarios, it is well motivated to continue performing measurements of this type during Run 4 of the LHC. This is highlighted by the derived value of F L (D * ) value, which is found to be 0.446 ± 0.010.

Fit stability validation
To demonstrate the stability and accuracy of the three fit scenarios, many pseudoexperiments ("toys") based on the fits are run. Using the template PDFs and the yields from the 9, 23, and 50 fb −1 fits, toy datasets are generated where the number of events is independently determined in each bin according to Poisson variations of the bin content. The template fit is applied to each toy dataset, and pull distributions are created for all freely varying fit parameters. All pull distributions have mean values close to zero and widths close to unity, as expected for an unbiased fit returning the appropriate uncertainties.

B-factory scenario
Complementary to LHCb, the Belle II experiment [48,49] can use an anticipated 50 ab −1 dataset to measure the angular coefficients in B → D * τ ν τ decays. The bb production cross section is much lower in e + e − collisions compared to pp, but the well-defined initial state and To estimate the performance of the template fit at an e + e − experiment, the signal and background samples generated to emulate the LHCb resolution and acceptance are reprocessed using the true B meson four-vector in the decay angle calculations rather than relying on the estimation in Eq. (2.4). Vertex and track momentum resolutions are assumed to be similar, so an overall 10-20% advantage in angular resolution is determined.
The three-prong τ decay mode has not yet been used in a B-factory semitauonic analysis, so the signal yields at Belle II must be approximated. Belle have used the τ + → π + ν τ and τ + → ρ + ν τ modes with hadronic tagging to measure the τ polarisation [4]. Signal yields of N (B 0 → D * − τ + ν τ ) = 88 ± 11 and N (B + → D * 0 τ + ν τ ) = 210 ± 27 are reported in 0.77 ab −1 of data. Assuming near-perfect track finding efficiency at Belle II, such that the three-prong modes are reconstructed with a similar efficiency as the one-prong, a total tagged sample of N sig (B 0 + B + ) ≈ 7000 three-prong events is estimated in 50 ab −1 of Belle II data.
In Ref. [4], the Belle Collaboration reported a signal purity of 18.6%. Although the combinatorial background at Belle II and LHCb differ, the B backgrounds generated in Sec. 4 are still the most important. Thus, a data sample is created containing 7000 signal events with 18.6% purity, where the relative background fractions remain the same as those used in Sec. 4.
Results for 50 ab −1 of e + e − data (N sig = 7000): The four-dimensional template fit to the B-factory sample is performed in ((cos θ D , cos θ L , χ) Reco , BDT) variable space, where the decay angles are calculated using the true B meson four-vector to mimic the benefit of the hadronic tagging. The number of bins in each dimension is chosen in the same manner as the LHCb scenario fits. The signal fraction is measured to be f sig = 0.195 ± 0.014 (7.0% relative uncertainty) and is consistent with the input value. The uncertainties on the I X measurements are compared to the 23 fb −1 LHCb scenario in Fig. 11. Even though the B-factory signal yield is lower, the overall I X precision is competitive due to the higher purity and constraint on the initial state from the tagging of the other B decay.  Figure 11: Comparison of absolute I X coefficient statistical uncertainties in the N sig = 40, 000 hadron collider template fit (navy) and the N sig = 7, 000 B-factory fit (green). The average uncertainties over all I X coefficients are indicated by the dotted lines.

Systematic uncertainties
The dominant systematic uncertainty comes from the assumed accuracy of the templates used to model the background. Measured branching fractions are used to define the contribution from each background decay, so these are varied within their uncertainties to determine the appropriate uncertainty. Similarly, fixed fractions are used to define the feed-down contribution, which has not yet been confirmed experimentally and thus a 40% variation around f D * * = 0.11 is used. Smaller variations in the angular coefficient measurements are seen when the number of bins in the weighting procedure is varied from the default 30 3 binning. The total systematic uncertainties are found to be small relative to the statistical uncertainties, even in the highest yield case. The systematic uncertainties are shown to modestly increase the error bars in Fig. 10.
Further systematic effects not quantified here include those due to limited MC statistics and imperfect simulation of experimental data. The Beeston-Barlow [50] method can account for the first effect, while the second must be addressed through the development of control channels such as B 0 → D * − D + s decays at LHCb or an inclusive study of D (s) → 3πX decays at BESIII.

Determination of R(D * )
It is possible to convert the measured signal fraction f sig to a value of B(B 0 → D * − τ + ν τ ) and thus R(D * ) if sig , the total signal efficiency, is known. Typically, sig is calculated from simulated samples that accurately model detector and selection inefficiencies. However, the simulated sample used to estimate sig must be generated using a specific model, and so sig is unavoidably model-dependent. This leads to an additional systematic uncertainty that must be considered in the extrapolation to R(D * ), particularly if measurements of the I X coefficients diverge from their SM expectations.
It should be noted that the model-independent strategy cannot be competitive with the accuracy with which model-dependent fits can measure R(D * ). For example, the statistical uncertainty on f sig in the 9 fb −1 LHCb scenario is 8.6%. The statistical uncertainty of a model-dependent fit to the same dataset may be under 3%, based on an extrapolation of the statistical uncertainty in Ref. [8]. The inferior precision is due to the fact that (1) the model-independent fit has twelve parameters to describe the signal rather than one overall yield, and (2) the angular variables are less discriminating between signal and background than the model-dependent variables currently used such as q 2 . To confirm these assertions, a test is performed using the generated 9 fb −1 LHCb dataset, where all of the I X coefficients are fixed and the signal-background separation is artificially improved by 20%. In this test, the fit uncertainty on f sig indeed reduces from 8.6% to 3.3%.

Conclusion
A model-independent method for measuring the angular coefficients of B 0 → D * − τ + ν τ decays is demonstrated. Reconstruction bias and resolution effects caused by the missing neutrino are handled using a binned template fit to the decay angles, with a BDT classifier included to improve signal-background separation. A realistic background mixture is introduced, and the template fit is found to be statistically unbiased and model-independent even with the current LHCb statistics (Run 1 + 2). Due to the cleaner e + e − environment and a better-constrained B meson reconstruction, Belle II should perform competitively despite lower signal yields.
The template fit is directly applicable to the isospin partner decay B + → D * 0 τ + ν τ , if the neutral vector meson can be efficiently reconstructed. By extension, template angular analysis of the pseudoscalar semitauonic decays B + → D 0 τ + ν τ and B 0 → D − τ + ν τ is motivated for experimental reasons. Here, the decay rate depends only on cos θ L , where a, b, and c are the q 2 -dependent angular coefficients [30]. However, a full angular analysis would enable the measurement of I X coefficients in the D * − feed-down in addition to the (a, b, c) coefficients of the pseudoscalar modes. The template procedure should be applied as a null test to B → D ( * ) lν (l ∈ {e, µ}) decays, since they are also governed by Eq. (1.1). And in all cases, CP conservation, which could be violated if additional NP processes interfere with the single SM amplitude [51], can be verified by splitting according to the τ lepton charge. Finally, the template method is ideal for angular analysis that searches for right-handed currents in B 0 s → K * − µ + ν µ and B + → ρ 0 µ + ν µ decays, which suffer similar complications from neutrinos in the final state.

A Decay angle definitions
In this work, θ D is defined as the angle between the direction of the D 0 meson and the direction opposite that of the B 0 meson in in the D * − meson rest frame. The angle θ L is defined as the angle between the direction of the τ + lepton and the direction opposite that of the B 0 meson in in the mediator (W + ) rest frame. The angle χ is the angle between the plane containing the τ + and ν τ and the plane containing the D 0 and pion from the D * − in the B 0 rest frame. The three decay angles are displayed graphically in Fig. 12. Explicitly, the decay angles are defined following the definitions in Ref. [32] cos θ D = p X are unit vectors describing the direction of a particle X in the rest frame of the system Y . In every case the particle momenta are first boosted to the B 0 rest frame. In this basis, the angular definition for theB 0 decay is a CP transformation of that for the B 0 decay.

Angular observablesà la K ⇤ µµ
• Reconstruct these three angles using the same momentum estimates for B and ⌧ that go into q 2 e.t.c.