Observation of the $$ {B}_s^0 $$ → D*+D*− decay

<jats:title>A<jats:sc>bstract</jats:sc>
                     </jats:title><jats:p>The first observation of the <jats:inline-formula><jats:alternatives><jats:tex-math>$$ {B}_s^0 $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
                  <mml:msubsup>
                    <mml:mi>B</mml:mi>
                    <mml:mi>s</mml:mi>
                    <mml:mn>0</mml:mn>
                  </mml:msubsup>
                </mml:math></jats:alternatives></jats:inline-formula><jats:italic>→ D</jats:italic><jats:sup>∗+</jats:sup><jats:italic>D</jats:italic><jats:sup>∗<jats:italic>−</jats:italic></jats:sup> decay and the measurement of its branching ratio relative to the <jats:italic>B</jats:italic><jats:sup>0</jats:sup><jats:italic>→ D</jats:italic><jats:sup>∗+</jats:sup><jats:italic>D</jats:italic><jats:sup>∗<jats:italic>−</jats:italic></jats:sup> decay are presented. The data sample used corresponds to an integrated luminosity of 9 fb<jats:sup><jats:italic>−</jats:italic>1</jats:sup> of proton-proton collisions recorded by the LHCb experiment at centre-of-mass energies of 7, 8 and 13 TeV between 2011 and 2018. The decay is observed with more than 10 standard deviations and the time-integrated ratio of branching fractions is determined to be<jats:disp-formula><jats:alternatives><jats:tex-math>$$ \frac{\mathcal{B}\left({B}_s^0\to {D}^{\ast +}{D}^{\ast -}\right)}{\mathcal{B}\left({B}^0\to {D}^{\ast +}{D}^{\ast -}\right)}=0.269\pm 0.032\pm 0.011\pm 0.008, $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
                  <mml:mfrac>
                    <mml:mrow>
                      <mml:mi>B</mml:mi>
                      <mml:mfenced>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>B</mml:mi>
                            <mml:mi>s</mml:mi>
                            <mml:mn>0</mml:mn>
                          </mml:msubsup>
                          <mml:mo>→</mml:mo>
                          <mml:msup>
                            <mml:mi>D</mml:mi>
                            <mml:mrow>
                              <mml:mo>∗</mml:mo>
                              <mml:mo>+</mml:mo>
                            </mml:mrow>
                          </mml:msup>
                          <mml:msup>
                            <mml:mi>D</mml:mi>
                            <mml:mrow>
                              <mml:mo>∗</mml:mo>
                              <mml:mo>−</mml:mo>
                            </mml:mrow>
                          </mml:msup>
                        </mml:mrow>
                      </mml:mfenced>
                    </mml:mrow>
                    <mml:mrow>
                      <mml:mi>B</mml:mi>
                      <mml:mfenced>
                        <mml:mrow>
                          <mml:msup>
                            <mml:mi>B</mml:mi>
                            <mml:mn>0</mml:mn>
                          </mml:msup>
                          <mml:mo>→</mml:mo>
                          <mml:msup>
                            <mml:mi>D</mml:mi>
                            <mml:mrow>
                              <mml:mo>∗</mml:mo>
                              <mml:mo>+</mml:mo>
                            </mml:mrow>
                          </mml:msup>
                          <mml:msup>
                            <mml:mi>D</mml:mi>
                            <mml:mrow>
                              <mml:mo>∗</mml:mo>
                              <mml:mo>−</mml:mo>
                            </mml:mrow>
                          </mml:msup>
                        </mml:mrow>
                      </mml:mfenced>
                    </mml:mrow>
                  </mml:mfrac>
                  <mml:mo>=</mml:mo>
                  <mml:mn>0.269</mml:mn>
                  <mml:mo>±</mml:mo>
                  <mml:mn>0.032</mml:mn>
                  <mml:mo>±</mml:mo>
                  <mml:mn>0.011</mml:mn>
                  <mml:mo>±</mml:mo>
                  <mml:mn>0.008</mml:mn>
                  <mml:mo>,</mml:mo>
                </mml:math></jats:alternatives></jats:disp-formula></jats:p><jats:p>where the first uncertainty is statistical, the second systematic and the third due to the uncertainty of the fragmentation fraction ratio <jats:italic>f</jats:italic><jats:sub><jats:italic>s</jats:italic></jats:sub><jats:italic>/f</jats:italic><jats:sub><jats:italic>d</jats:italic></jats:sub>. The <jats:inline-formula><jats:alternatives><jats:tex-math>$$ {B}_s^0 $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
                  <mml:msubsup>
                    <mml:mi>B</mml:mi>
                    <mml:mi>s</mml:mi>
                    <mml:mn>0</mml:mn>
                  </mml:msubsup>
                </mml:math></jats:alternatives></jats:inline-formula><jats:italic>→ D</jats:italic><jats:sup>*+</jats:sup><jats:italic>D</jats:italic><jats:sup>*<jats:italic>−</jats:italic></jats:sup> branching fraction is calculated to be<jats:disp-formula><jats:alternatives><jats:tex-math>$$ \mathcal{B}\left({B}_s^0\to {D}^{\ast +}{D}^{\ast -}\right)=\left(2.15\pm 0.26\pm 0.09\pm 0.06\pm 0.16\right)\times {10}^{-4}, $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
                  <mml:mi>B</mml:mi>
                  <mml:mfenced>
                    <mml:mrow>
                      <mml:msubsup>
                        <mml:mi>B</mml:mi>
                        <mml:mi>s</mml:mi>
                        <mml:mn>0</mml:mn>
                      </mml:msubsup>
                      <mml:mo>→</mml:mo>
                      <mml:msup>
                        <mml:mi>D</mml:mi>
                        <mml:mrow>
                          <mml:mo>∗</mml:mo>
                          <mml:mo>+</mml:mo>
                        </mml:mrow>
                      </mml:msup>
                      <mml:msup>
                        <mml:mi>D</mml:mi>
                        <mml:mrow>
                          <mml:mo>∗</mml:mo>
                          <mml:mo>−</mml:mo>
                        </mml:mrow>
                      </mml:msup>
                    </mml:mrow>
                  </mml:mfenced>
                  <mml:mo>=</mml:mo>
                  <mml:mfenced>
                    <mml:mrow>
                      <mml:mn>2.15</mml:mn>
                      <mml:mo>±</mml:mo>
                      <mml:mn>0.26</mml:mn>
                      <mml:mo>±</mml:mo>
                      <mml:mn>0.09</mml:mn>
                      <mml:mo>±</mml:mo>
                      <mml:mn>0.06</mml:mn>
                      <mml:mo>±</mml:mo>
                      <mml:mn>0.16</mml:mn>
                    </mml:mrow>
                  </mml:mfenced>
                  <mml:mo>×</mml:mo>
                  <mml:msup>
                    <mml:mn>10</mml:mn>
                    <mml:mrow>
                      <mml:mo>−</mml:mo>
                      <mml:mn>4</mml:mn>
                    </mml:mrow>
                  </mml:msup>
                  <mml:mo>,</mml:mo>
                </mml:math></jats:alternatives></jats:disp-formula></jats:p><jats:p>where the fourth uncertainty is due to the <jats:italic>B</jats:italic><jats:sup>0</jats:sup><jats:italic>→ D</jats:italic><jats:sup><jats:italic>*</jats:italic>+</jats:sup><jats:italic>D</jats:italic><jats:sup><jats:italic>*−</jats:italic></jats:sup> branching fraction. These results are calculated using the average <jats:inline-formula><jats:alternatives><jats:tex-math>$$ {B}_s^0 $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
                  <mml:msubsup>
                    <mml:mi>B</mml:mi>
                    <mml:mi>s</mml:mi>
                    <mml:mn>0</mml:mn>
                  </mml:msubsup>
                </mml:math></jats:alternatives></jats:inline-formula> meson lifetime in simulation. Correction factors are reported for scenarios where either a purely heavy or a purely light <jats:inline-formula><jats:alternatives><jats:tex-math>$$ {B}_s^0 $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
                  <mml:msubsup>
                    <mml:mi>B</mml:mi>
                    <mml:mi>s</mml:mi>
                    <mml:mn>0</mml:mn>
                  </mml:msubsup>
                </mml:math></jats:alternatives></jats:inline-formula> eigenstate is considered.</jats:p>


Introduction
The family of B-meson decays into a pair of open-charm mesons are sensitive to elements of the Cabibbo-Kobayashi-Maskawa matrix [1,2]. While The B 0 s → D * ± D ∓ decay has not been previously observed, but an excess of possible B 0 s → D * ± D ∓ candidates was seen in a recent measurement of CP violation in B 0 → D * ± D ∓ decays by the LHCb experiment [8]. Assuming prominent contributions from rescattering of e.g. D * ± s D ∓ s states, the branching fraction is predicted to be (6.1 ± 3.6) × 10 −5 [17]. A perturbative QCD approach predicts a much larger branching fraction of (3.6 ± 0.6) × 10 decay is measured relative to the B 0 → D * ± D ∓ decay. Since both decay channels have the same final state, the experimental systematic uncertainties on the ratio of branching fractions are expected to be small. The measurement uses proton-proton (pp) collision data collected with the LHCb detector in the years 2011, 2012, and 2015-2018 at centre-of-mass energies of 7, 8, and 13 TeV, respectively, corresponding to an integrated luminosity of 9 fb −1 .

Detector and simulation
The LHCb detector [19, 20] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of siliconstrip detectors and straw drift tubes placed downstream of the magnet.  [24], in which final-state radiation is generated using Photos [25]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [26,27] as described in ref. [28].
The distributions of particle identification (PID) variables do not match perfectly between simulation and data. This difference is corrected using an approach where functions are constructed that transform the simulated PID response to match calibration samples of recorded data. This is based on a four-dimensional kernel density estimation for distributions in PID value, p T and η of the track and the event multiplicity [29].

Candidate selection
Due to varying data-taking conditions, the data samples for the three periods 2011-2012, 2015-2016 and 2017-2018 are treated differently. The online event selection is performed by a trigger, which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction. At the hardware trigger stage, events are required to have a muon with high p T or a hadron, photon or electron with high transverse energy in the calorimeters. The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from any PV. At least one charged particle must have a large transverse momentum and be inconsistent with originating from any PV. A multivariate algorithm [30,31] is used for the identification of secondary vertices consistent with the decay of a b hadron.
The B 0 (s) → D * ± D ∓ candidates are reconstructed through the decays D * + → D 0 π + with D 0 → K − π + and D − → K + π − π − . The tracks of the final-state particles are required to have a good quality, fulfil loose PID criteria, and have a high χ 2 IP value with respect to any PV, where χ 2 IP is defined as the difference in the vertex-fit χ 2 of a given PV reconstructed with and without the particle being considered. The probability of a candidate being a duplicate track is required to be small. Additionally, the distance of closest approach between all possible combinations of tracks is required to be small. The reconstructed masses of the D * + , D 0 and D − candidates are required to lie inside a mass window of ±50 MeV/c 2 around their known values [32], and the difference of the reconstructed masses between the D * + and D 0 candidates is required to be smaller than 150 MeV/c 2 . The ratio of the D − decay time and its uncertainty, t/σ t , is required to be larger than −1. The B 0 (s) candidate is reconstructed by combining the D * ± and D ∓ candidates to form a common vertex. In case multiple PVs are reconstructed in the same event, the PV for which the B 0 (s) candidate has the lowest χ 2 IP is assigned as the associated PV. The sum of the transverse momenta of the decay products of the B 0 (s) candidate is required to be larger than 5 GeV/c and the χ 2 IP of the B 0 (s) candidate for the associated PV is required to be small. Furthermore, the decay time of -3 -

JHEP03(2021)099
the B 0 (s) candidate is required to be larger than 0.2 ps. Candidates are retained if the mass of the D * ± D ∓ system, m D * ± D ∓ , is in the range 5000 MeV/c 2 < m D * ± D ∓ < 5600 MeV/c 2 .
Background contributions to D + candidates arise when kaons or protons stemming from hadronic decays of D + s and Λ + c hadrons are misidentified as pions. A combination of mass and PID requirements is used to suppress contributions from to a negligible level. The mass of the K − π + π + system from the D + candidate is recalculated using a kaon (proton) mass hypothesis for either of the pions. The candidate is rejected if the pion has a high probability to be identified as a kaon (proton) and the recomputed mass is compatible with the known D + s (Λ + c ) mass [32]. Background contributions that arise from φ(1020) → K − K + transitions in the D + s decay chain are further suppressed by rejecting candidates if the pion has a high probability to be identified as a kaon and the mass of the K − π + system, where the kaon mass is assigned to either of the pions from the D + decay, is compatible with the known φ(1020) meson mass [32]. Decays of B 0 (s) mesons of the form B 0 (s) → D * − h − h + h + are suppressed by ensuring that the B 0 (s) and D + decay vertices are well separated. Partially reconstructed decays, i.e. decays where one or more final-state particles are not reconstructed, contribute to the lower-mass sideband and are accounted for in the fit to the data.
To suppress combinatorial background from random combinations of final-state tracks, a boosted decision tree (BDT) classifier [33,34], implemented in the TMVA toolkit [35,36], utilising the AdaBoost method is used. The BDT classifier is trained using simulated B 0 → D * ± D ∓ decays as signal proxy and the upper-mass sideband (5450 MeV/c 2 < m D * ± D ∓ < 6000 MeV/c 2 ) as background proxy to avoid contributions from signal and partially reconstructed decays. For each data-taking period a k-folding technique [37] with k = 5 is adopted. The following variables are used in the training of the BDT classifier: the mass difference of the D * + and D 0 candidates; PID variables of the final-state particles of the D − candidate decay, the kaon coming from the D 0 candidate decay and the pion coming from the D * + decay; the transverse momenta of the B 0 (s) candidate and the kaon from the D − decay; t/σ t of the D − candidate; the χ 2 IP of the B 0 (s) and D − candidates; the χ 2 of the flight distance of the D − and D 0 candidates and the χ 2 of a kinematic fit to the whole decay chain.
The optimal requirement on the BDT response (also referred to as working point) is determined by maximising the figure-of-merit ε/(a/2 + √ N B ) [38]. The efficiency of signal decays, ε, for a specific working point is determined by fits to the data around the known B 0 mass [32] before and after the application of the BDT requirement. The number of background candidates in the B 0 s signal region, N B , is estimated with the upper-mass sideband, and the targeting significance in numbers of the standard deviation, a, is set to three. A three-dimensional scan of the figure-of-merit in the three data-taking periods is conducted, resulting in slightly different working points.
Afterwards, a single candidate is selected randomly from each event containing multiple candidates. The total selection efficiencies of the B 0 s → D * ± D ∓ and B 0 → D * ± D ∓ decays are needed for the calculation of the branching fraction ratio and are calculated using simulated samples.

Candidate mass fit
To improve the B 0 (s) mass resolution, a kinematic fit is applied to the decay chain, where the masses of the D * + , D 0 and D − candidates are constrained to their known values [32]. An unbinned maximum-likelihood fit to the mass distribution of the D * ± D ∓ system is performed separately for each data-taking period to determine the number of signal candidates. To determine the significance of the observation of the B 0 s → D * ± D ∓ decay, the three likelihoods are added together. The fit model consists of the signal B 0 → D * ± D ∓ and B 0 s → D * ± D ∓ decays, a contribution from combinatorial background and components for partially reconstructed B 0 → D * ± D * ∓ and B 0 s → D * ± D * ∓ decays, where one of the D * mesons decays into a charged D meson and an unreconstructed π 0 meson or photon. The B 0 → D * ± D ∓ component is modelled by the sum of two Crystal Ball functions [39], with the same mean but different widths and tail parameters. The B 0 s → D * ± D ∓ component is described by the same model but with the mean shifted by the difference of the known B 0 s and B 0 masses [32]. The parameters of the Crystal Ball functions are determined using fits to simulated B 0 → D * ± D ∓ decays, apart from their mean and a single scale factor, which corrects the widths for inaccuracies in simulation. The combinatorial background component is described by an exponential function. The functional forms of the B 0 → D * ± D * ∓ and B 0 s → D * ± D * ∓ contributions depend on the polarisation of the D * ± mesons and are modelled using simulated decays with a combination of functions corresponding to pure longitudinal and transverse polarisations. For a longitudinally polarised D * ± meson the shape is a double peak, in contrast to a single broad peak for the case of a transversely polarised D * ± system. The free parameters in the fit are the mass of the B 0 → D * ± D ∓ peak, the scaling factor, the slope of the exponential function, the relative fractions between longitudinally and transversely polarised D * ± mesons in B 0 → D * ± D * ∓ and B 0 s → D * ± D * ∓ decays, and the yields of all shapes. Pseudoexperiments are used to validate that the model provides unbiased results. LHCb Data Total Two sources of systematic uncertainty on the efficiency ratio are considered. The first is caused by the finite size of the simulated data samples. The second originates in the use of PID variables, whose distributions do not match perfectly between data and simulation. This uncertainty is determined by choosing a different kernel density estimation in the transformation of the simulated PID response and calculating the difference of the resulting efficiency ratio.
The systematic uncertainties on the signal yields due to the fit models are evaluated using pseudoexperiments. A systematic uncertainty due to the choice of the signal model and the assumption that the B 0 and B 0 s distributions have identical shape in the mass fit is evaluated. Candidates are generated with a mass distribution described by a Hypatia function [43] with different parameters for the B 0 and B 0 s models. The values of the parameters of the Hypatia function are determined by a fit to simulated B 0 → D * ± D ∓ and B 0 s → D * ± D ∓ data, respectively. All yields and background parameters are set to the values found in the default fit to the data. The generated candidates are then fitted with -6 -

JHEP03(2021)099
The systematic uncertainty is given relative to the measured value. the default model and the result for the branching fraction ratio is calculated for each fit. The mean of all experiments and its residual are calculated for the three periods separately. The residual and its uncertainty are summed in quadrature and the square root is assigned as the systematic uncertainty.
In addition, a systematic uncertainty due to the model of the combinatorial background is evaluated. Pseudoexperiments are used with parameters of the signal and partially reconstructed background models set to the values found in the default result. The slope of the exponential function is extracted by a fit to the data, where a looser BDT requirement is applied to enhance the contribution of the combinatorial background. The generated candidates are fitted with the default model and the systematic uncertainty is calculated in the same way as the systematic uncertainty for the signal model.
To determine the systematic uncertainty on the combined result, the systematic uncertainties related to the PID variables, signal model and background model are assumed to be fully correlated between the data-taking periods when calculating the weighted average. The systematic uncertainties due to the finite size of the simulated data are assumed to be uncorrelated. All systematic uncertainties are added in quadrature to obtain the total systematic uncertainty per data-taking period, and are listed together with their contributions in table 1.

Results
The B 0 s → D * ± D ∓ decay is observed with a high significance, which is calculated using Wilks' theorem [44] together with the Neyman-Pearson lemma [45]. The relative branching ratio is calculated using the expression where the uncertainties are statistical. The ratios of the hadronisation fractions are taken as 0.259 ± 0.015 and 0.244 ± 0.012 for the 2011-2012 [40,41] and 2015-2018 [42] data-taking periods, respectively.
The ratios of branching fractions are found to be where the first uncertainty is statistical, the second systematic and the third is due to the uncertainty of the fragmentation fraction ratio f s /f d . Using the quadratic sums of the uncertainties as weights and including the correlation of the systematic uncertainties, the average of these measurements is Using the measured value of the B 0 → D * ± D ∓ branching fraction from ref.
[6], the B 0 s → D * ± D ∓ branching fraction is determined to be where the fourth uncertainty is due to the B 0 → D * ± D ∓ branching fraction. This result assumes an average B 0 s lifetime for the B 0 s → D * ± D ∓ decay. The heavy and light eigenstates of the B 0 s meson have significantly different lifetimes. As the selection efficiency depends on the lifetime, correction factors for the efficiency are calculated following the procedure outlined in ref. [46] that considers either a purely heavy or a purely light B 0 s eigenstate. The correction factors are found to be compatible for all data-taking periods. The integrated correction factors are 1.042 (0.949) for a purely heavy (light) B 0 s eigenstate. The equivalent effect in the selection efficiency of the B 0 decay is negligible due to the small value of ∆Γ d [32].

Conclusion
This paper presents the first observation of the B 0 s → D * ± D ∓ decay along with the measurement of its branching fraction relative to the B 0 → D * ± D ∓ decay. The analysis is performed with data collected by the LHCb experiment in the years 2011, 2012, and 2015 to 2018, corresponding to an integrated luminosity of 9 fb −1 . The combined ratio of branching fractions for all data-taking periods is determined to be = 0.137 ± 0.017 ± 0.002 ± 0.006 , where the first uncertainty is statistical, the second systematic and the third is due to the uncertainty of the fragmentation fraction ratio f s /f d . The B 0 s → D * ± D ∓ branching fraction is determined to be where the fourth uncertainty is due to the B 0 → D * ± D ∓ branching fraction [32].    -10 -