Observation of the $B_s^0 \rightarrow D^{\ast \pm} D^{\mp}$ decay

A search for the $B^0_s \!\rightarrow D^{*\pm} D^\mp$ decay is performed using proton-proton collision data at centre-of-mass energies of $7$, $8$ and $13\,\text{TeV}$ collected by the LHCb experiment, corresponding to an integrated luminosity of $9\,\text{fb}^{-1}$. The decay is observed with a high significance and its branching fraction relative to the $B^0 \!\rightarrow D^{*\pm} D^\mp$ decay is measured to be \begin{align*} \frac{\mathcal{B}(B_s^0 \rightarrow D^{\ast \pm} D^{\mp}) }{\mathcal{B}(B^0 \rightarrow D^{\ast \pm} D^{\mp}) } = 0.137 \pm 0.017 \pm 0.002 \pm 0.006 \,, \end{align*} where the first uncertainty is statistical, the second systematic and the third is due to the uncertainty on the ratio of the $B_s^0$ and $B^0$ hadronisation fractions.

Both  Fig. 2. In contrast, the B 0 s → D * ± D ∓ decay is forbidden at tree level and its dominant contributions originate from W -exchange and penguin-annihilation diagrams shown in Fig. 2, or from rescattering of intermediate states [16]. Thus, the B 0 s → D * ± D ∓ decay can be used to estimate the subleading contributions of the B 0 → D * ± D ∓ decay mode.
The B 0 s → D * ± D ∓ decay has not been previously observed, but an excess of possible B 0 s → D * ± D ∓ candidates was seen in a recent measurement of CP violation in B 0 → D * ± D ∓ decays by the LHCb experiment [7]. Assuming prominent contributions from rescattering of e.g. D * ± s D ∓ s states, the branching fraction is predicted to be (6.1 ± 3.6) × 10 −5 [16]. A perturbative QCD approach predicts a much larger branching fraction of (3.6 ± 0.6) × 10 −3 [17].
This paper presents the first observation of the B 0 s → D * + D − and B 0 s → D * − D + decays, which have indistinguishable final states. Throughout this paper these decays are treated together and charge conjugation is applied. The branching fraction of the B 0 s → D * ± D ∓ decay is measured relative to the B 0 → D * ± D ∓ decay. Since both decay channels have the same final state, the experimental systematic uncertainties on the ratio of branching fractions are expected to be small. The measurement uses proton-proton (pp) collision data collected with the LHCb detector in the years 2011, 2012, and 2015-2018 at centre-of-mass energies of 7, 8, and 13 TeV, respectively, corresponding to an integrated luminosity of 9 fb −1 .

Detector and simulation
The LHCb detector [18,19] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a siliconstrip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of the momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary pp collision vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers. Simulated data samples are used to train a multivariate algorithm, model the shapes of mass distributions and calculate efficiencies. In the simulation, pp collisions are generated using Pythia [20] with a specific LHCb configuration [21]. Decays of unstable particles are described by EvtGen [22], in which final-state radiation is generated using Photos [23]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [24] as described in Ref. [25].
The distributions of particle identification (PID) variables do not match perfectly between simulation and data. This difference is corrected using an approach where functions are constructed that transform the simulated PID response to match calibration samples of recorded data. This is based on a four-dimensional kernel density estimation for distributions in PID value, p T and η of the track and the event multiplicity [26].

Candidate selection
Due to varying data-taking conditions, the data samples for the three periods 2011-2012, 2015-2016 and 2017-2018 are treated differently. The online event selection is performed by a trigger, which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction. At the hardware trigger stage, events are required to have a muon with high p T or a hadron, photon or electron with high transverse energy in the calorimeters. The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from any PV. At least one charged particle must have a large transverse momentum and be inconsistent with originating from any PV. A multivariate algorithm [27,28] is used for the identification of secondary vertices consistent with the decay of a b hadron.
The B 0 (s) → D * ± D ∓ candidates are reconstructed through the decays D * + → D 0 π + with D 0 → K − π + and D − → K + π − π − . The tracks of the final-state particles are required to have a good quality, fulfil loose PID criteria, and have a high χ 2 IP value with respect to any PV, where χ 2 IP is defined as the difference in the vertex-fit χ 2 of a given PV reconstructed with and without the particle being considered. The probability of a candidate being a duplicate track is required to be small. Additionally, the distance of closest approach between all possible combinations of tracks is required to be small. The reconstructed masses of the D * + , D 0 and D − candidates are required to lie inside a mass window of ±50 MeV/c 2 around their known values [29], and the difference of the reconstructed masses between the D * + and D 0 candidates is required to be smaller than 150 MeV/c 2 . The ratio of the D − decay time and its uncertainty, t/σ t , is required to be larger than −1. The B 0 (s) candidate is reconstructed by combining the D * ± and D ∓ candidates to form a common vertex. In case multiple PVs are reconstructed in the same event, the PV for which the B 0 (s) candidate has the lowest χ 2 IP is assigned as the associated PV. The sum of the transverse momenta of the decay products of the B 0 (s) candidate is required to be larger than 5 GeV/c and the χ 2 IP of the B 0 (s) candidate for the associated PV is required to be small. Furthermore, the decay time of the B 0 (s) candidate is required to be larger than 0.2 ps. Candidates are retained if the mass of the D * ± D ∓ system, m D * ± D ∓ , is in the range 5000 MeV/c 2 < m D * ± D ∓ < 5600 MeV/c 2 .
Background contributions to D + candidates arise when kaons or protons stemming from hadronic decays of D + s and Λ + c hadrons are misidentified as pions. A combination of mass and PID requirements is used to suppress contributions from to a negligible level. The mass of the K − π + π + system from the D + candidate is recalculated using a kaon (proton) mass hypothesis for either of the pions. The candidate is rejected if the pion has a high probability to be identified as a kaon (proton) and the recomputed mass is compatible with the known D + s (Λ + c ) mass [29]. Background contributions that arise from φ(1020) → K − K + transitions in the D + s decay chain are further suppressed by rejecting candidates if the pion has a high probability to be identified as a kaon and the mass of the K − π + system, where the kaon mass is assigned to either of the pions from the D + decay, is compatible with the known φ(1020) meson mass [29]. Decays of B 0 (s) mesons of the form B 0 (s) → D * − h − h + h + are suppressed by ensuring that the B 0 (s) and D + decay vertices are well separated. Partially reconstructed decays, i.e. decays where one or more final-state particles are not reconstructed, contribute to the lower-mass sideband and are accounted for in the fit to the data.
To suppress combinatorial background from random combinations of final-state tracks, a boosted decision tree (BDT) classifier [30,31], implemented in the TMVA toolkit [32], utilising the AdaBoost method is used. The BDT classifier is trained using simulated B 0 → D * ± D ∓ decays as signal proxy and the upper-mass sideband (5450 MeV/c 2 < m D * ± D ∓ < 6000 MeV/c 2 ) as background proxy to avoid contributions from signal and partially reconstructed decays. For each data-taking period a k-folding technique [33] with k = 5 is adopted. The following variables are used in the training of the BDT classifier: the mass difference of the D * + and D 0 candidates; PID variables of the final-state particles of the D − candidate decay, the kaon coming from the D 0 candidate decay and the pion coming from the D * + decay; the transverse momenta of the B 0 (s) candidate and the kaon from the D − decay; t/σ t of the D − candidate; the χ 2 IP of the B 0 (s) and D − candidates; the χ 2 of the flight distance of the D − and D 0 candidates and the χ 2 of a kinematic fit to the whole decay chain. The optimal requirement on the BDT response (also referred to as working point) is determined by maximising the figure-of-merit ε/(a/2 + √ N B ) [34]. The efficiency of signal decays, ε, for a specific working point is determined by fits to the data around the known B 0 mass [29] before and after the application of the BDT requirement. The number of background candidates in the B 0 s signal region, N B , is estimated with the upper-mass sideband, and the targeting significance in numbers of the standard deviation, a, is set to three. A three-dimensional scan of the figure-of-merit in the three data-taking periods is conducted, resulting in slightly different working points.
Afterwards, a single candidate is selected randomly from each event containing multiple candidates. The total selection efficiencies of the B 0 s → D * ± D ∓ and B 0 → D * ± D ∓ decays are needed for the calculation of the branching fraction ratio and are calculated using simulated samples.

Candidate mass fit
To improve the B 0 (s) mass resolution, a kinematic fit is applied to the decay chain, where the masses of the D * + , D 0 and D − candidates are constrained to their known values [29]. An unbinned maximum-likelihood fit to the mass distribution of the D * ± D ∓ system is performed separately for each data-taking period to determine the number of signal candidates. To determine the significance of the observation of the B 0 s → D * ± D ∓ decay, the three likelihoods are added together. The fit model consists of the signal B 0 → D * ± D ∓ and B 0 s → D * ± D ∓ decays, a contribution from combinatorial background and components for partially reconstructed B 0 → D * ± D * ∓ and B 0 s → D * ± D * ∓ decays, where one of the D * mesons decays into a charged D meson and an unreconstructed π 0 meson or photon. The B 0 → D * ± D ∓ component is modelled by the sum of two Crystal Ball functions [35], with the same mean but different widths and tail parameters. The B 0 s → D * ± D ∓ component is described by the same model but with the mean shifted by the difference of the known B 0 s and B 0 masses [29]. The parameters of the Crystal Ball functions are determined using fits to simulated B 0 → D * ± D ∓ decays, apart from their mean and a single scale factor, which corrects the widths for inaccuracies in simulation. The combinatorial background component is described by an exponential function. The functional forms of the B 0 → D * ± D * ∓ and B 0 s → D * ± D * ∓ contributions depend on the polarisation of the D * ± mesons and are modelled using simulated decays with a combination of functions corresponding to pure longitudinal and transverse polarisations. For a longitudinally polarised D * ± meson the shape is a double peak, in contrast to a single broad peak for the case of a transversely polarised D * ± system. The free parameters in the fit are the mass of the B 0 → D * ± D ∓ peak, the scaling factor, the slope of the exponential function,   Two sources of systematic uncertainty on the efficiency ratio are considered. The first is caused by the finite size of the simulated data samples. The second originates in the use of PID variables, whose distributions do not match perfectly between data and simulation. This uncertainty is determined by choosing a different kernel density estimation in the transformation of the simulated PID response and calculating the difference of the resulting efficiency ratio.
The systematic uncertainties on the signal yields due to the fit models are evaluated using pseudoexperiments. A systematic uncertainty due to the choice of the signal model and the assumption that the B 0 and B 0 s distributions have identical shape in the mass fit is evaluated. Candidates are generated with a mass distribution described by a Hypatia function [38] with different parameters for the B 0 and B 0 s models. The values of the parameters of the Hypatia function are determined by a fit to simulated B 0 → D * ± D ∓ and B 0 s → D * ± D ∓ data, respectively. All yields and background parameters are set to the values found in the default fit to the data. The generated candidates are then fitted with the default model and the result for the branching fraction ratio is calculated for each fit. The mean of all experiments and its residual are calculated for the three periods separately. The residual and its uncertainty are summed in quadrature and the square root is assigned as the systematic uncertainty.
In addition, a systematic uncertainty due to the model of the combinatorial background is evaluated. Pseudoexperiments are used with parameters of the signal and partially reconstructed background models set to the values found in the default result. The slope of the exponential function is extracted by a fit to the data, where a looser BDT requirement is applied to enhance the contribution of the combinatorial background. The generated candidates are fitted with the default model and the systematic uncertainty is calculated in the same way as the systematic uncertainty for the signal model.
To determine the systematic uncertainty on the combined result, the systematic uncertainties related to the PID variables, signal model and background model are assumed to be fully correlated between the data-taking periods when calculating the weighted average. The systematic uncertainties due to the finite size of the simulated data are assumed to be uncorrelated. All systematic uncertainties are added in quadrature to obtain the total systematic uncertainty per data-taking period, and are listed together with their contributions in Table 1.

Results
The B 0 s → D * ± D ∓ decay is observed with a high significance, which is calculated using Wilk's theorem [39]. The relative branching ratio is calculated using the expression where the B 0 s and B 0 yields, N B 0 s and N B 0 , are determined from the fit to the D * ± D ∓ mass distribution. The ratios of the B 0 s → D * ± D ∓ and B 0 → D * ± D ∓ selection efficiencies, ε B 0 s /ε B 0 , calculated using simulation samples for the data-taking periods 2011-2012, where the first uncertainty is statistical, the second systematic and the third is due to the uncertainty of the fragmentation fraction ratio f s /f d . Using the quadratic sums of the uncertainties as weights and including the correlation of the systematic uncertainties, the average of these measurements is = 0.137 ± 0.017 ± 0.002 ± 0.006 .
Using the measured value of the B 0 → D * ± D ∓ branching fraction from Ref.
This result assumes an average B 0 s lifetime for the B 0 s → D * ± D ∓ decay. The heavy and light eigenstates of the B 0 s meson have significantly different lifetimes. As the selection efficiency depends on the lifetime, correction factors for the efficiency are calculated following the procedure outlined in Ref. [40] that considers either a purely heavy or a purely light B 0 s eigenstate. The correction factors are found to be compatible for all data-taking periods. The integrated correction factors are 1.042 (0.949) for a purely heavy (light) B 0 s eigenstate. The equivalent effect in the selection efficiency of the B 0 decay is negligible due to the small value of ∆Γ d [29].

Conclusion
This paper presents the first observation of the B 0 s → D * ± D ∓ decay along with the measurement of its branching fraction relative to the B 0 → D * ± D ∓ decay. The analysis is performed with data collected by the LHCb experiment in the years 2011, 2012, and 2015 to 2018, corresponding to an integrated luminosity of 9 fb −1 . The combined ratio of branching fractions for all data-taking periods is determined to be where the first uncertainty is statistical, the second systematic and the third is due to the uncertainty of the fragmentation fraction ratio f s /f d . The B 0 s → D * ± D ∓ branching fraction is determined to be B(B 0 s → D * ± D ∓ ) = (8.41 ± 1.02 ± 0.12 ± 0.39 ± 0.79) × 10 −5 , where the fourth uncertainty is due to the B 0 → D * ± D ∓ branching fraction [29]. The result is in agreement with predictions from other B-meson decays [16] and disagrees with predictions from a perturbative QCD approach [17]. It can be used to constrain subleading contributions in the measurement of the CP -violating parameter sin(2β) with B 0 → D * ± D ∓ decays.