Angular analysis of $B^0 \to D^{*-}D_s^{*+}$ with $D_s^{*+} \to D_s^+ \gamma$ decays

The first full angular analysis of the $B^0 \to D^{*-} D_s^{*+}$ decay is performed using 6 fb$^{-1}$ of $pp$ collision data collected with the LHCb experiment at a centre-of-mass energy of 13 TeV. The $D_s^{*+} \to D_s^+ \gamma$ and $D^{*-} \to \bar{D}^0 \pi^-$ vector meson decays are used with the subsequent $D_s^+ \to K^+ K^- \pi^+$ and $\bar{D}^0 \to K^+ \pi^-$ decays. All helicity amplitudes and phases are measured, and the longitudinal polarisation fraction is determined to be $f_{\rm L} = 0.578 \pm 0.010 \pm 0.011$ with world-best precision, where the first uncertainty is statistical and the second is systematic. The pattern of helicity amplitude magnitudes is found to align with expectations from quark-helicity conservation in $B$ decays. The ratio of branching fractions $[\mathcal{B}(B^0 \to D^{*-} D_s^{*+}) \times \mathcal{B}(D_s^{*+} \to D_s^+ \gamma)]/\mathcal{B}(B^0 \to D^{*-} D_s^+)$ is measured to be $2.045 \pm 0.022 \pm 0.071$ with world-best precision. In addition, the first observation of the Cabibbo-suppressed $B_s^0 \to D^{*-} D_s^+$ decay is made with a significance of seven standard deviations. The branching fraction ratio $\mathcal{B}(B_s^0 \to D^{*-} D_s^+)/\mathcal{B}(B^0 \to D^{*-} D_s^+)$ is measured to be $0.049 \pm 0.006 \pm 0.003 \pm 0.002$, where the third uncertainty is due to limited knowledge of the ratio of fragmentation fractions.


Introduction
The B 0 → D * − D * + s decay involves the production of two vector charm mesons from a pseudoscalar B 0 parent. This process exhibits a polarisation structure, where three complex helicity amplitudes H 0 , H + , and H − contribute to the total decay rate. These amplitudes correspond to the relative orientation of the linear polarisation vectors of the two vector mesons. Parity-even ( ) and parity-odd (⊥) transversity amplitudes can also be defined in terms of H + and H − , namely A ,⊥ = (H + ± H − )/ √ 2. The helicity amplitudes can interfere, with interference governed by the strong phases of the transverse components, φ + and φ − , relative to the phase of the longitudinal component, φ 0 , which is conventionally taken to be equal to zero. Therefore, five parameters in total determine the decay rate: • |H 0 |, the magnitude of the longitudinal amplitude; • |H + | and |H − |, the magnitudes of the two transverse amplitudes; • φ + and φ − , the phases of the transverse amplitudes relative to H 0 .
In order to normalise the total decay rate, |H 0 | 2 + |H + | 2 + |H − | 2 = f L + f T = 1 is required, where f L ≡ |H 0 | 2 is the longitudinal polarisation fraction and f T ≡ |H + | 2 + |H − | 2 is the transverse polarisation fraction. The current world average for f L is 0.52 ± 0.05 [1,2], while theoretical predictions cover a similar range [3][4][5][6]; the transverse helicity amplitudes have not been measured previously. The normalisation condition reduces the total number of independent observables to four, where the additional observable is absorbed into the absolute branching fraction of the decay which is not measured. Measuring the relative magnitudes of the helicity amplitudes offers a test of quark-helicity conservation in this tree-level decay involving a b → c quark transition. In such decays, a |H 0 | > |H + | > |H − | hierarchy is expected [7], where the V − A nature of the weak interaction causes the longitudinal component to dominate.
Using data corresponding to an integrated luminosity of 6 fb −1 collected at a centre-ofmass energy of 13 TeV with the LHCb experiment between 2015 and 2018, B 0 → D * − D * + s with D * + s → D + s γ decays are reconstructed via the D * − → (D 0 → K + π − )π − and D + s → K + K − π + channels; the inclusion of charge-conjugate processes is implied throughout. Partially reconstructed decays, where the photon is not considered in the invariantmass calculation, are used in a fit to the m(D * − D + s ) distribution to measure f L . Fully reconstructed decays are then considered in a subsequent angular analysis to measure the remaining helicity observables. Measurements are performed under the assumption that both the D 0 π − and D + s γ systems are pure vector, as no evidence for a scalar contribution is found in the m(D 0 π − ) distribution in data and no scalar component is permitted in m(D + s γ) due to the photon angular momentum. The analysis includes an improved measurement of f L and first measurements of the transverse helicity amplitude magnitudes and phases. The data sample is also used to measure the ratio of branching fractions R ≡ [B(B 0 → D * − D * + s ) × B(D * + s → D + s γ)]/B(B 0 → D * − D + s ), where the current value of R = 2.07 ± 0.33 is calculated using world-average branching fractions taken from Ref. [2]. In addition, a measurement of the previously unobserved Cabibbosuppressed B 0 s → D * − D + s decay is performed and the ratio of branching fractions B(B 0 s → D * − D + s )/B(B 0 → D * − D + s ) determined. The formalism adopted is described in Sect. 2, essential details of the LHCb detector and simulation are given in Sect. 3, and the event selection is outlined in Sect. 4. The longitudinal polarisation fraction and ratios of branching fractions are measured in Sect. 5, and the remaining helicity observables are measured in Sects. [6][7][8]. Systematic uncertainties are determined in Sect. 9, and final results and conclusions are presented in Sect. 10.

Angular decay rate formalism
The B 0 → D * − D * + s decay rate is a function of three decay angles, θ D , θ X , and χ, where θ D is the angle between the D 0 meson and the direction opposite the B 0 momentum vector in the D * − rest frame, θ X is the angle between the D + s meson and the direction opposite the B 0 momentum vector in the D * + s rest frame, and χ is the angle between the two decay planes as defined in the B 0 rest frame. The angles are illustrated in Fig. 1, and are explicitly defined as follows sin X are unit vectors describing the direction of a particle X in the rest frame of the system Y . In the B 0 rest frame, the angular definition for the B 0 decay is a charge-parity (CP ) transformation of that for the B 0 decay. The sign of sin χ is negative for B 0 candidates and positive for B 0 candidates, where the B-meson flavour is tagged by the D * -meson charge. This formalism is the same as that adopted in other LHCb angular analyses such as that of B → K * µ + µ − decays [10,11].
The full three-dimensional differential decay rate expressed in terms of the helicity rest frame rest frame amplitudes is given by [3]

LHCb detector and simulation
The LHCb detector [12,13] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b-or c-quarks. The detector includes a high-precision tracking system consisting of a siliconstrip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of the momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary pp collision vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers. The online event selection is performed by a trigger, which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction. At the hardware trigger stage, events are required to have a muon with high p T or a hadron, photon or electron with high transverse energy in the calorimeters. For hadrons, the transverse energy threshold is 3.5 GeV. The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from any primary pp interaction vertex. At least one charged particle must have a transverse momentum p T > 1.6 GeV/c and be inconsistent with originating from any PV. A multivariate algorithm is used for the identification of secondary vertices consistent with the decay of a b hadron. In the offline selection, trigger information is associated with reconstructed particles. Selection requirements can therefore be made on the trigger selection itself and on whether the decision was due to the signal candidate, other particles produced in the pp collision, or an overlap of both.
Simulation is required to model the effects of the detector acceptance and the imposed selection requirements. In the simulation, pp collisions are generated using Pythia [14] with a specific LHCb configuration [15]. Decays of unstable particles are described by EvtGen [16], in which final-state radiation is generated using Photos [17]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [18] as described in Ref. [19]. The underlying pp interaction is reused multiple times, with an independently generated signal decay for each [20]. In addition, the m(D * − D + s ) distributions of pure longitudinal and transverse polarised B 0 → D * − D * + s decays are studied using fast-simulated samples generated with the RapidSim package [21], where an LHCb momentum resolution configuration is used to smear the generated four-momenta. The same tool is used to study the m(D * − D + s ) distributions of various background contributions from decays involving higher-excited charm mesons.

Event selection
s decays are reconstructed through the D * − → (D 0 → K + π − )π − and D + s → K + K − π + channels. The tracks of the final-state particles are required to have a good quality, fulfil loose particle identification (PID) criteria, and have a high χ 2 IP value with respect to any PV, where χ 2 IP is defined as the difference in the vertex-fit χ 2 of a given PV reconstructed with and without the particle being considered. The reconstructed masses of the D 0 and D + s candidates are required to lie inside mass windows of ±20 MeV/c 2 around their known values [2]. The D * − candidate mass is required to be within ±40 MeV/c 2 of the known value [2], while the difference in mass between the D * − and D 0 candidates is required to be in the range 140-150 MeV/c 2 . In combination with the track PID cuts, these narrow mass windows reduce potential backgrounds from misidentified decays such as B 0 → D * − D + to negligible levels.
The B 0 candidate is reconstructed by combining the D * − and D + s candidates to form a common vertex. If multiple PVs are reconstructed in the same event, the PV for which the B 0 candidate has the lowest χ 2 IP is assigned as the associated PV. The p T of the B 0 candidate is required to be larger than 5 GeV/c, and the χ 2 IP of the B 0 candidate for the associated PV is required to be small. To suppress combinatorial background and background from decays involving the production of a D * − and three prompt tracks, the flight distance of the D + s candidate along the beam axis is required to be different from zero by more than one standard deviation, considering both the origin and decay-vertex uncertainties of the D + s candidate. To suppress combinatorial background from combinations of tracks originating from the PV, the decay time of the B 0 candidate is required to be larger than 0.2 ps. To improve the invariant-mass resolution, a kinematic fit is performed to the decay chain [22], the B 0 candidate is constrained to originate from the PV and the D + s and D 0 masses are constrained to their known values. Candidates are retained if the resulting invariant mass of the D * − D + s combination falls within the 4900-5500 MeV/c 2 range, which includes the region occupied by partially reconstructed B 0 → D * − D * + s decays when the neutral particle produced in the D * + s decay is not reconstructed. This sample is considered in Sect. 5, where a fit to the m(D * − D + s ) distribution of candidates is used to measure f L .
A subsample of fully reconstructed B 0 → D * − D * + s candidates is selected by combining D + s candidates from the above dataset with photons. The difference between the D * + s and D + s candidate masses is required to be in the range 120-180 MeV/c 2 , and the photon is required to have a p T larger than 500 MeV/c. Each D * + s candidate is then recombined with the corresponding D * − candidate from the above dataset to form a B 0 candidate, where candidates in the invariant-mass range 5150-5500 MeV/c 2 are retained. Fully reconstructed candidates with m(D * − D + s ) values greater than 5240 MeV/c 2 are vetoed to remove B 0 → D * − D + s decays where a random photon is combined with the D + s candidate. This dataset is used in Sect. 8 to measure the remaining helicity observables in an angular analysis.

Measurement of f L and branching fraction ratios
The longitudinal polarisation fraction, f L , determines the fractional contribution of the H 0 helicity amplitude to the total B 0 → D * − D * + s decay rate. The longitudinal and transverse amplitudes contribute to the one-dimensional differential decay rate in cos θ X as follows, which is obtained from Eq.
(2) via a definite integral over cos θ D and χ. Experimentally, the integral over cos θ D and χ must also include the acceptance in these angles. However, the acceptance is predominantly linear for both angles, as shown in Figs. 4 and 5, such that no significant residual dependence remains after the integration. Due to a common dependence on photon kinematics, the angle cos θ X and the invariant mass of the D * − D +  Fig. 8. This feature enables f L to be measured using a binned maximum-likelihood fit to the m(D * − D + s ) distribution in data, where the total B 0 → D * − D * + s contribution is modelled by the sum of probability density functions (PDFs) for the longitudinal and transverse components with relative fractions f L and 1 − f L . Determining f L via an m(D * − D + s ) fit enables partially reconstructed B 0 → D * − D * + s decays to be used, which increases the sample size by avoiding efficiency losses due to the limited photon reconstruction efficiency of the LHCb detector.
Due to the presence of B 0 → D * − D + s decays in the same sample, a measurement of the branching fraction ratio can also be made. Experimentally, this quantity is defined as where N denotes the yields for each decay mode, and ξ is the ratio of their total reconstruction and selection efficiencies. In the case of B 0 → D * − D * + s decays, the yields and efficiencies correspond to those of partially reconstructed signal. The efficiency ratio is determined using simulated samples of B 0 → D * − D * + s and B 0 → D * − D + s decays, and is found to be ξ = 1.142 ± 0.034, where the uncertainty quoted accounts only for the use of finite simulated samples and potential variation in the efficiency across data-taking years. This uncertainty is considered as a source of systematic uncertainty on R.
A contribution from Cabibbo-suppressed B 0 s → D * − D + s decays is also considered in the m(D * − D + s ) fit, enabling a measurement of the branching fraction ratio to be made. Experimentally, r(B 0 s ) is defined as where N denotes the yields for each decay mode, and f s /f d = 0.2539 ± 0.0079 is the ratio of fragmentation fractions at √ s = 13 TeV as measured inside the LHCb acceptance [23].
The relative efficiency ξ(B 0 s ) is assumed to be unity, with a 5% relative systematic uncertainty assigned to account for potential variation in efficiency due to mass and lifetime differences.

Fit components
The m(D * − D + s ) distribution of selected candidates is shown in Fig. 2, and is dominated by the narrow signal due to fully reconstructed B 0 → D * − D + s decays and a broad structure due to B 0 → D * − D * + s decays with missing a photon or π 0 from the D * + s decay. The distribution is modelled as a sum of several components which are described below.
s decays are modelled using the sum of two Crystal Ball PDFs [24] with a freely varying common mean and width, and a relative yield fraction that is Gaussian-constrained according to simulation. The component PDF tails are modelled on opposite sides, and the tail parameters are Gaussian-constrained from simulation. The branching fraction ratio R is measured directly in the fit, such that the yield of γ) component via a freely varying parameter R and the fixed relative efficiency ratio ξ.  3). This approach closely follows the method used in Refs. [25] and [26] for CP violation studies of partially reconstructed B − → D * 0 h − with D * 0 → Dγ/π 0 decays, where h − is a pion or a kaon and the neutral particle produced in the D * 0 decay is not reconstructed. The total yield of the , varies freely and is used along with R and ξ to set the The contribution from this mode is small compared to the signal due to the lower branching fraction of the D * +

Background from higher-excited charm states
At low m(D * − D + s ) values, decays involving higher-excited charm states contribute when one or more particles are not reconstructed. To model the effective contribution from this feed-down background, simulated samples of

Combinatorial background
Background from random track combinations is modelled using an exponential function with a freely varying shape parameter and yield. Due to the application of mass windows for the charm-meson candidates and a D + s candidate flight requirement, the combinatorial background is found to be small across the full m(D * − D + s ) range considered.

Results
The fit to the m(D * − D + s ) distribution in data is shown in Fig. 2 where the first uncertainty is statistical, the second is systematic, and the third is due to the use of an external value of f s /f d [23]. The systematic uncertainties on R and r(B 0 s ) are due to the use of fixed PDF shape parameters and branching fractions in the fit, as well as the use of the relative efficiency corrections ξ and ξ(B 0 s ). The contributing systematic uncertainties on both branching fraction ratios are summarised in Table 1 in Sect. 9. The value of R is in agreement with the world average, R = 2.07 ± 0.33, but has a considerably smaller uncertainty. The measurement of r(B 0 s ) is a world first, and constitutes the first observation of the Cabibbo-suppressed B 0 s → D * − D + s decay with a statistical significance of seven standard deviations. The significance is calculated by determining the difference in r(B 0 s ) from zero, where both the statistical and systematic uncertainties are considered.
The longitudinal polarisation fraction in where the first uncertainty is statistical and the second is systematic. The systematic uncertainty quoted is due to the limited knowledge of the fixed terms used in the fit. This result is in agreement with, but substantially more precise than, the current world-average value. Pseudoexperiment studies indicate that the fitted central value and uncertainty of f L are unbiased. In the subsequent analysis of fully reconstructed B 0 → D * − D * + s decays presented herein, f L is fixed to the value measured in the m(D * − D + s ) fit. This enables the angular acceptance functions for cos θ D and cos θ X to be derived directly from data (see Sect. 7), rather than modelling such effects using simulation. The cos θ X distribution in particular is sensitive to mis-modelling in the simulation, due to its dependence on the soft photon kinematics which can be distorted by the hardware trigger emulation in the simulation. The fit is performed using the sum of a B 0 → D * − (D * + s → D + s γ) signal component, a B 0 s → D * − (D * + s → D + s γ) background component, and a combinatorial background component. The signal is described using the sum of two Crystal Ball PDFs which share a common freely varying mean and width. The tail parameters and relative fraction of the two Crystal Ball PDFs are constrained from fits to simulation, where the component PDFs are required to have tails on opposite sides. The yield of the signal component varies freely, and is found to be 6457 ± 116. The B 0 s → D * − (D * + s → D + s γ) component is modelled using the same PDF as the signal, but with a mean shifted upwards using the known B 0 s -B 0 meson mass difference. The rate of this contribution is fixed relative to signal using the proportions determined in the m(D * − D + s ) fit. The combinatorial background is modelled using a second-order Chebyshev polynomial, where the yield and shape parameters of this contribution vary freely. As the background distribution is not known a priori, an alternative parameterisation using a Gaussian function is also used to model the combinatorial background. The signal weights derived from this alternative model are used to determine the systematic uncertainty on the helicity observables. A Gaussian function is used as it provides a background description of equivalent quality to the second-order Chebyshev, whereas linear and exponential background models do not describe the background sufficiently well.
An underlying assumption of the sPlot method used to derive per-candidate signal weights is that the discriminating variable, in this case m(D * − D * + s ), is uncorrelated with the target distributions to be studied with weights applied, in this instance the decay angles. Due to a common underlying dependence on the decay product kinematics, the invariant mass and decay angles do exhibit some degree of correlation. To assess the potential bias from this, a combined four-dimensional simulated sample of signal and background events is generated in m(D * − D * + s ) and the decay angles. The background sample is generated according to the m(D * − D * + s ) background shape observed in data, and with a flat distribution in each of the decay angles. The total simulated sample contains the same number of signal and background events as measured in the m(D * − D * + s ) data fit. A fit to the m(D * − D * + s ) distribution of the simulated sample is performed to derive signal weights, which are then applied when creating histograms in the decay angles. Using χ 2 tests, these histograms are compared to histograms of the decay angles created using only the simulated signal sample. All of the signal-weighted distributions are found to agree with the pure signal distributions, indicating that no significant biases are incurred from the use of signal weights.

Angular acceptance functions
Due to experimental acceptance and resolution effects, the angular distributions in data are distorted relative to the true distributions. As the decay angles are measured with a relative resolution of 2-4% according to simulation, the dominant effect on the experimental angular distributions is due to the acceptance. This effect must be modelled in the angular fit in order to derive unbiased measurements of the helicity observables, which is achieved by multiplying the true differential decay rate PDF by acceptance functions defined in each of the decay angles. This approach assumes that the total angular acceptance can be factorised into a product of the individual acceptance functions for each angle, which is validated using a simulated sample of signal decays generated according to Eq. (2) with the world-average value of f L = 0.52 [1,2], |H + | = |H − | = (1 − f L )/2, and all phases equal to zero. The efficiency of a cut applied to all three decay angles together, xyz , is compared with a product of the efficiencies for cuts applied separately to each decay angle, = x × y × z ; the values of xyz and are found to agree within the uncertainties due to the use of finite simulated samples.

Acceptance functions for cos θ D and cos θ X
The acceptance functions for cos θ D and cos θ X are derived from data. Binned normalised distributions in each decay angle are produced by creating histograms of the fully reconstructed B 0 → D * − D * + s candidates in data with signal weights applied. The only physical observable that can alter the shape of the one-dimensional cos θ D and cos θ X distributions is f L , which is known from the m(D * − D + s ) fit. The acceptance is thus determined by comparing the data distributions with angular distributions generated with RapidSim using the value of f L measured in Sect. 5. In the generated sample, no detector acceptance or resolution effects are included. The signal-weighted data and generated signal distributions in cos θ D and cos θ X are compared in Fig. 4 (left column). The observed differences between data and the generated sample are attributed to the experimental acceptance and resolution, since both distributions share a common f L value. The cos θ X distribution in particular exhibits substantial acceptance effects, where candidates at low cos θ X are preferentially removed. This warping is due to the application of photon p T requirements in the selection, which bias the sample to more positive values of cos θ X .
To determine acceptance functions for cos θ D and cos θ X , the binned ratios of data to the generated sample are fitted with sixth-order polynomial functions. The fits are shown in Fig. 4 (right column), and the polynomial coefficients are employed as fixed terms in the angular fit in Sect. 8. To determine the systematic uncertainty on the helicity observables due to the finite dataset used in the acceptance fits, the acceptance function coefficients are varied within their uncertainties according to the acceptance fit covariance matrices. When determining the systematic uncertainty due to the use of a fixed f L value in the angular analysis, the acceptance fits are performed many times with f L varied randomly within its total measured uncertainty.
For values of cos θ X close to −1, which correspond to the smallest photon momentum values, the acceptance function becomes slightly negative due to limited data statistics in this region. A fiducial cut of cos θ X > −0.9 is applied to data in order to remove the region of negative modelled acceptance; this requirement is found to have a negligible impact on the measured helicity observables.

Acceptance function for χ
In Eq. (2), all of the angular terms that are sensitive to the relative magnitudes and phases of the transverse amplitudes have a dependence on the angle χ. As such, no information on the χ acceptance can be derived from data. To determine the χ acceptance, the reconstructed χ distribution in a sample of fully-simulated B 0 → D * − D * + s decays passing all selection requirements is compared to a generated χ distribution produced using RapidSim with the same model parameters but no acceptance or resolution effects. For this comparison, the simulated samples are generated with the f L value measured in Sect. 5, with |H − | = |H + | and φ + = φ − = 0. The binned χ distributions are shown in Fig. 5 (left), where good agreement between the reconstructed and generated distributions is found. This indicates that the reconstructed χ distribution is not strongly modified by acceptance effects. To model residual acceptance effects, the reconstructed to generated χ ratio is fitted with a second-order polynomial, as shown in Fig. 5 (right). This function is employed as a fixed correction PDF in the angular fit, and the polynomial coefficients are varied within their uncertainties to determine the systematic uncertainties on the helicity parameters. In this procedure, the correlations between the polynomial coefficients are accounted for using the acceptance fit covariance matrix.

Angular fit to data
To measure |H − |, φ − , and φ + , an unbinned maximum-likelihood fit to the three-dimensional angular distribution of signal-weighted data is performed using zfit [28]. For the fit, the B 0 → D * − (D * + s → D + s γ) candidates from the m(D * − D * + s ) fit in Sect. 6 are used with per-candidate signal weights assigned. The longitudinal polarisation amplitude, H 0 , is assigned a fixed magnitude |H 0 | using the value of f L measured in Sect. 5, and its phase is set to the arbitrary value φ 0 = 0. The parameter |H + | is fully determined by the normalisation of the helicity amplitudes to unity. The signal density at each point in angular phase space is described using Eq. (2) multiplied by acceptance functions in each of the decay angles. To determine the statistical uncertainties of the observables, the fit applies an asymptotic correction to the covariance matrix as detailed in Ref. [29], which correctly accounts for the use of signal-weighted data. The distributions for each decay angle are shown in Fig. 6, with the one-dimensional fit projections overlaid.
Studies with pseudoexperiments are performed to determine the level of bias present in the results, where pull distributions of mean µ x P and width σ x P are constructed for each observable x. The pull distributions for each helicity observable are found to follow Gaussian distributions closely, where σ |H − | P is consistent with unity. However, σ φ + P = 1.14 ± 0.02 and σ φ − P = 1.12 ± 0.02, indicating that the default fit uncertainties for these observables are underestimated. The mean values of the pulls for the transverse phases are consistent with zero, but µ |H − | P = −0.14 ± 0.02. These biases are traced to the finite size of the fitted dataset, and are found to resolve when pseudoexperiment datasets containing more events than are present in data are generated. The values of µ x P and σ x P are used to correct the default fit results x ± σ x as follows where x c ± σ c x are the corrected fit results. In Sect. 10, the results for |H − |, φ + , and φ − are quoted after this correction procedure.

Systematic uncertainties
The values of R, r(B 0 s ), and f L measured in Sect. 5 are subject to systematic uncertainties due to limited knowledge of the shape parameters, branching fractions, and relative efficiency corrections used in the fit. To determine these systematic uncertainties, the m(D * − D + s ) fit to data is performed many times with the parameters randomly varied within their prescribed uncertainties according to Gaussian distributions. This procedure is performed separately for the shape parameters, branching fractions, and efficiency corrections, and the total systematic uncertainties calculated as the sum in quadrature of these contributions. The systematic uncertainties are summarised in Table 1.
The observables |H − |, φ + , and φ − measured in the angular fit are subject to several systematic uncertainties. Firstly, the angular analysis is performed at a fixed value of f L , which is used as input in the cos θ D and cos θ X acceptance fits and also to set the value of |H 0 | in the angular fit. To determine the systematic uncertainty, the angular analysis is repeated many times with f L varied within its total uncertainty; the standard deviations of the helicity observable results are taken as the systematic uncertainties. In this procedure, the varied f L value used in the acceptance fits is shared with the angular fit to ensure consistency. A small systematic uncertainty is also assigned for the use of signal-weighted data, where the angular fit is run many times while varying the signal weights within the signal yield uncertainties from the m(D * − D * + s ) fit. To determine the systematic uncertainty from the use of finite samples to obtain the acceptance functions, the acceptance coefficients are varied within their uncertainties according to the acceptance fit covariance matrices. Finally, the angular analysis is repeated with an alternative background model in the m(D * − D * + s ) fit, and the differences in central value for each helicity observable are assigned as a systematic uncertainty. The contributing systematic uncertainties are summarised in Table 2.   where the first uncertainty is statistical and the second is systematic. The corresponding magnitude of the longitudinal helicity amplitude, given by

Systematic uncertainty
This information is used to measure the remaining helicity observables in an angular fit to fully reconstructed B 0 → D * − (D * + s → D + s γ) decays, obtaining where the quoted value and uncertainties for |H + | are fully determined by the normalisation of the three helicity amplitudes to unity. The measurement of f L is consistent with and more precise than the current world average, f L = 0.52 ± 0.05 [1,2]. The transverse amplitude magnitudes and phases are measured for the first time, where both phases are consistent with zero but the magnitudes differ from each other at the level of nine standard deviations. It is noted that |H 0 | > |H + | > |H − |, which is expected from quark-helicity conservation in B decays involving a b → c quark transition. In such decays, the V − A nature of the weak interaction causes the longitudinal component to dominate. The inequality is stronger for decays involving light vector mesons [7], but also appears to be satisfied in B 0 → D * − D * + s decays where two vector charm mesons are produced. This helicity hierarchy is not observed in decays dominated by penguin amplitudes such as B 0 → φK * 0 , where the longitudinal and transverse components are found to have roughly equal amplitudes [30][31][32][33].
The branching fraction ratio of where the first uncertainty is statistical and the second is systematic. This result is in agreement with, but considerably more precise than, the current world-average value R = 2.07±0. This knowledge is then used in a subsequent angular fit to fully reconstructed data in order to measure the remaining helicity observables. The measurement of f L is consistent with and more precise than the current world-average value, while the magnitudes and phases of the transverse helicity amplitudes are measured for the first time. The pattern of helicity amplitude magnitudes is found to align with expectations from quark-helicity conservation for tree-level B decays involving a b → c transition. The B 0 → D * − D * + s decay is a large background in B 0 → D * − τ + ν τ analyses, particularly when the τ + decays hadronically. Analyses aiming to measure angular observables in B 0 → D * − τ + ν τ decays must control the angular distributions of prominent hadronic backgrounds such as B 0 → D * − D * + s , and the results presented herein will help to significantly reduce background model uncertainties in future measurements.

Appendices
A Relationship between m(D * − D + s ) and cos θ X In Fig. 7, the relationship between m(D * − D + s ) and cos θ X is shown for fully reconstructed B 0 → D * − (D * + s → D + s γ) simulated decays. A strong negative correlation is evident, due to a common dependence on the kinematics of the photon produced in the D * + s decay. The one-dimensional decay rate as a function of cos θ X is given by Eq. (3), where separate transverse and longitudinal components contribute; these components are illustrated in Fig. 8. Due to the co-dependence of m(D * − D + s ) and cos θ X , the different angular forms for transverse and longitudinal signal give rise to different m(D * − D + s ) distributions. This is illustrated in Fig. 9, where RapidSim samples of transverse and longitudinal signal are shown. The fits used to derive shape parameters for the m(D * − D + s ) fit are overlaid.