Observation of the decays Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {B}_{(s)}^0 $$\end{document} → Ds1(2536)∓K±

This paper reports the observation of the decays Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {B}_{(s)}^0 $$\end{document} → Ds1(2536)∓K± using proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb−1. The branching fractions of these decays are measured relative to the normalisation channel B0→D¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \overline{D} $$\end{document}0K+K−. The Ds1(2536)− meson is reconstructed in the D¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \overline{D} $$\end{document}*(2007)0K− decay channel and the products of branching fractions are measured to beBBs0→Ds12536∓K±×BDs12536−→D¯∗20070K−=2.49±0.11±0.12±0.25±0.06×10−5,BB0→Ds12536∓K±×BDs12536−→D¯∗20070K−=0.510±0.021±0.036±0.050×10−5.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\displaystyle \begin{array}{c}\mathcal{B}\left({B}_s^0\to {D}_{s1}{(2536)}^{\mp }{K}^{\pm}\right)\times \mathcal{B}\left({D}_{s1}{(2536)}^{-}\to {\overline{D}}^{\ast }{(2007)}^0{K}^{-}\right)\\ {}=\left(2.49\pm 0.11\pm 0.12\pm 0.25\pm 0.06\right)\times {10}^{-5},\\ {}\mathcal{B}\left({B}^0\to {D}_{s1}{(2536)}^{\mp }{K}^{\pm}\right)\times \mathcal{B}\left({D}_{s1}{(2536)}^{-}\to {\overline{D}}^{\ast }{(2007)}^0{K}^{-}\right)\\ {}=\left(0.510\pm 0.021\pm 0.036\pm 0.050\right)\times {10}^{-5}.\end{array}} $$\end{document} The first uncertainty is statistical, the second systematic, and the third arises from the uncertainty of the branching fraction of the B0→D¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \overline{D} $$\end{document}0K+K− normalisation channel. The last uncertainty in the Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {B}_s^0 $$\end{document} result is due to the limited knowledge of the fragmentation fraction ratio, fs/fd. The significance for the Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {B}_s^0 $$\end{document} and B0 signals is larger than 10 σ. The ratio of the helicity amplitudes which governs the angular distribution of the Ds1(2536)−→D¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \overline{D} $$\end{document}*(2007)0K− decay is determined from the data. The ratio of the S- and D-wave amplitudes is found to be 1.11 ± 0.15 ± 0.06 and the phase difference between them 0.70 ± 0.09 ± 0.04 rad, where the first uncertainty is statistical and the second systematic.


Introduction
Precise measurements of CP violation are essential tests of the Standard Model (SM) of particle physics.In the SM, CP violation originates from a single phase in the Cabibbo-Kobayashi-Maskawa (CKM) matrix [1,2].The angle γ ≡ arg(−V ud V * ub /V cd V * cb ), one of the parameters related to this CP violation phase, can be measured at the tree level using the interference between b → csū and b → usc transitions.The comparison of this measurement with the value determined using other measurements involving particle loops serves as a probe to search for physics beyond the SM.Previous LHCb analyses of B 0 s → D ∓ s K ± [3] and B 0 s → D − s K + π + π − [4] decays have reported the value of γ based on B 0 s decays to final states with a D + s meson and one or more light mesons.The unobserved B 0 s decay modes, B 0 s → D s1 (2536) ∓ K ± , have a similar topology.Their Feynman diagrams are shown in Fig. 1.The B 0 s → D s1 (2536) ∓ K ± decay adds a further channel through which γ can be determined using a time-dependent method.
Additionally, studies of the B 0 (s) → D s1 (2536) ∓ K ± decays are also helpful to understand the pattern in the decay modes B 0 → D ( * )− K + and B 0 s → D ( * )− s π + , whose measured branching fractions [5] are smaller than the predictions with QCD factorisation [6][7][8][9].In Ref. [10] the branching fractions of the B 0 s → D ∓ s K ± decays are extracted by combining the CP -violating observables with the measured average branching fraction of these two decays, taking into account the effects of B 0 s -B 0 s mixing.The result shows a similar difference with the calculations based on QCD factorisation.The B 0 (s) → D s1 (2536) ∓ K ± decays can provide additional information to validate this difference between experimental and theoretical results thanks to the similar topology as the B 0 s → D ∓ s K ± decays.This paper reports the observation of the decays B 0 (s) → D s1 (2536) ∓ K ± and a measurement of their branching fractions using proton-proton (pp) collision data collected with the LHCb detector.The data were collected at centre-of-mass energies of 7 TeV, 8 TeV and 13 TeV, corresponding to integrated luminosities of 1 fb −1 , 2 fb −1 (Run 1), and 6 fb −1 (Run 2), respectively.In B 0 (s) → D s1 (2536) ∓ K ± decays, the D s1 (2536) − meson is reconstructed in its decay to the D unless explicitly stated otherwise, and the symbol D * 0 is used to denote the D * (2007) 0 meson.
To measure the branching fractions of B 0 (s) → D s1 (2536) ∓ K ± decays, the B 0 → D 0 K + K − decay is used as a normalisation channel.The ratio of branching fractions for the decays B 0 (s) → D s1 (2536) ∓ K ± with the secondary decay D s1 (2536 where X indicates the B 0 (s) → D s1 (2536) ∓ K ± decay channels.
For the D s1 (2536) − → D * 0 K − decay, the ratio of its helicity amplitudes is measured and based on that, the ratio of the Sand D-wave amplitudes is also obtained.

Angular decay rate formalism
The decays B 0 (s) → D s1 (2536) − K + involve the production of one pseudovector, D s1 (2536) − , and one pseudoscalar meson, K + , from a pseudoscalar B 0 (s) parent.The subsequent process, D s1 (2536) − → D * 0 K − , presents a polarisation structure where three complex helicity amplitudes, H 0 , H + and H − , contribute to the total decay rate.These three amplitudes correspond to the orientation of the linear polarisation of the vector particle, D * 0 , with respect to the pseudovector meson, D s1 (2536) − .The two transverse amplitudes, H + and H − , describe the helicity value of 1 and −1 of the D * 0 meson and the longitudinal amplitude, H 0 , the null helicity value.Parity conservation implies H + equal to H − .
The decay rates of B 0 (s) → D s1 (2536) ∓ K ± are a function of three decay angles, θ D * , θ D and χ (see Fig. 2): θ D * is the angle between the D * 0 meson and the direction opposite to the B momentum vector in the D s1 (2536) − rest frame, θ D is the angle between the D 0 meson and the direction opposite to the D s1 momentum vector in the D * 0 rest frame, and χ is the angle between the two decay planes defined in the B rest frame.The full three-dimensional differential decay rate expressed in terms of the helicity amplitudes is where the factors ω long , ω tran and ω int are functions of the decay angles associated with the longitudinal component, the transverse component and the interference term between the longitudinal and transverse components, respectively.The ratio of the amplitudes of longitudinal and transverse components, H + /H 0 , is expressed as ke iϕ where k > 0 and ϕ ∈ [−π, π].The interference term, ℜ(H 0 H + ), is only sensitive to cos(ϕ), therefore ϕ has a two-fold ambiguity, and only the absolute value |ϕ| is reported.In addition, the signal channel is split into the photon chain (with the intermediate decay D * 0 → D 0 γ) and the π 0 chain (with the decay D * 0 → D 0 π 0 ).The ω factors and helicity amplitudes are different for the photon and π 0 chains, but they share the same k and ϕ.The expressions for the ω factors for the two chains are given in Table 1.This formalism is the same as that adopted in other LHCb angular analyses such as that of B → K * µ + µ − decays [11,12].For the decay chain, D s1 (2536) − → D * 0 K − , D * 0 → D 0 (π 0 /γ), the decay angle θ D * and the orientation of the decay plane of D * 0 K − are uniform in their value range.Therefore, the differential rate of the D s1 (2536) − decay is obtained by integrating Eq. 2 over θ D * and χ, resulting in no contribution from ω int .However, for B 0 (s) → D s1 (2536) ∓ K ± decays, there is a non-zero interference contribution in the decay rate.

LHCb detector
The LHCb detector [13,14] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks.The detector includes a high-precision tracking system consisting of a siliconstrip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet.The tracking system provides a measurement of the momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c.The minimum distance of a track to a primary pp collision vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c.Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors.Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter.Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
The online event selection is performed by a trigger, which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction.At the hardware trigger stage, events are required to have a muon with high p T or a hadron, photon or electron with high transverse energy in the calorimeters.For hadrons, the transverse energy threshold is 3.5 GeV.A global hardware trigger decision is ascribed to the reconstructed candidate, the rest of the event or a combination of both; events triggered as such are defined respectively as triggered on signal (TOS), triggered independently of signal (TIS), and triggered on both.The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from any primary pp interaction vertex.At least one charged particle must have a transverse momentum p T > 1.6 GeV/c and be inconsistent with originating from a PV.A multivariate algorithm [15,16] is used for the identification of secondary vertices consistent with the decay of a b hadron.
In the simulation, pp collisions are generated using Pythia [17] with a specific LHCb configuration [18].Decays of unstable particles are described by EvtGen [19], in which final-state radiation is generated using Photos [20].The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [21] as described in Ref. [22].

Event selection
For the signal channels, the B 0 (s) candidate is reconstructed from a D s1 (2536) − candidate and an additional kaon of opposite charge.The D s1 (2536) − candidate is reconstructed through the D * 0 K − mode, and the D * 0 candidate is partially reconstructed in the D 0 γ or D 0 π 0 modes, where the photon or π 0 is not reconstructed.For the normalisation channel, the B 0 candidate is reconstructed from a D 0 candidate and two kaon candidates of opposite charge.For both signal and normalisation channels, the D 0 candidate is reconstructed in the Cabbibo favoured decay, D 0 → K + π − .The selection criteria discussed below are applied to both signal and normalisation channels, except if explicitly mentioned otherwise.
The tracks of the final-state particles are required to be of good fit quality and have a high χ 2 IP value with respect to any PV, where χ 2 IP is defined as the difference in the vertex-fit χ 2 of a given PV reconstructed with and without the particle being considered.Tracks are also required to be within the kinematic coverage of the RICH detectors, which provide particle identification (PID) information used to reduce backgrounds with misidentified particles.The kaon track from the B 0 (s) vertex, which has the same charge as the kaon from the D 0 candidate, is required to fulfil stringent PID requirements, and the other three final-state particles are selected with loose PID requirements.The reconstructed mass of the D 0 meson is required to be within 25 MeV/c 2 of its known value [5], which reduces background from decays with mis-identified hadrons such as D 0 → K + K − (π + π − ), in combination with the PID requirements on the D 0 decay particles.The D 0 decay vertex is required to be well reconstructed, with χ 2 /ndf < 6, where ndf is the number of degrees of freedom.A vertex formed by one D 0 and two kaon candidates with χ 2 /ndf < 4 is required to originate from a B 0 (s) candidate.The reconstructed D 0 and B 0 (s) vertices are required to be significantly displaced from the associated PV, defined as the PV with the smallest χ 2 IP with respect to the B 0 (s) candidate, in case of more than one PV in the pp collision.In order to improve the B 0 (s) mass resolution, a kinematic fit [23] is performed with the D 0 mass constrained to its known mass [5] and the B 0 (s) momentum constrained to point to the PV with the smallest χ 2 IP .The B 0 (s) candidates with invariant mass in the interval 5000 − 6000 MeV/c 2 are retained.
A multilayer perceptron (MLP) neural network technique, provided by the TMVA package [24], is used to further reduce the combinatorial background, consisting of random combination of tracks that mimic the signal, while preserving a high signal efficiency.The MLP classifier is trained with a simulated sample of the signal, and data from the upper m(D 0 K + K − ) sideband to represent combinatorial background.The variables used in the training procedure are the smallest χ 2 IP with respect to the PV and p T of the decay products from the B-decay vertex; the χ 2 probability of the B vertex fit; the χ 2 probability of the distance from the PV to the B-decay vertex; and the signed minimum cosine of the angle between the direction of one of the charged tracks from the B decay and the D 0 meson, as projected in the plane perpendicular to the beam axis.The MLP neural network is trained separately for Run 1 and Run 2 data samples and applied to both the B 0 (s) → D s1 (2536) ∓ K ± and B 0 → D 0 K + K − decays.The threshold for the MLP response is chosen by optimising the significance of the B 0 s meson signal yield with the help of simulated samples, which reduces the combinatorial background by an order of magnitude while retaining about 88% of the signal.
Considering that the invariant mass of the D s1 (2536) − meson is 2535 MeV/c 2 [5] and it is partially reconstructed, the invariant mass of the D 0 K − pair is far below 2500 MeV/c 2 .Thus, a requirement of m(D 0 K − ) < 2500 MeV/c 2 is applied to the signal channels, which keeps almost all signal candidates and reduces the background by about 95%.Other requirements are additionally applied to the B 0 → D 0 K + K − normalisation channel.The significance of the distance in the z direction along the beam axis, defined as , where z D(B) and σ z D(B) are the z position and its uncertainty along the z axis of the decay vertex of D 0 (B 0 (s) ), is used to remove the background from charmless B 0 (s) decays, such as B 0 (s) → K + π − h + h − decays (h stands for any light hadron).To remove the large background from B 0 → D * − π + decays, the mass difference m(D 0 π − ) − m(D 0 ) must not lie within 2.5 MeV/c 2 of the known D * − − D 0 mass difference [5].Moreover, the swapped background, where the K ± from the D 0 meson decay and the K ± of the B 0 (s) meson decay are swapped during reconstruction, is vetoed if its reconstructed mass with the two kaons exchanged lies in the range within 25 MeV/c 2 around the known D 0 mass.
If there is more than one candidate existing in one pp collision, only the candidate with the smallest value of the sum of χ 2 IP for the B 0 (s) and D 0 vertices, or the largest PID accuracy if the sum of χ 2 IP is the same, is retained; the associated systematic uncertainty for this choice is found to be negligible.

Mass fit
The m(D 0 K − ) and m(D 0 K − K + ) distributions are used to distinguish signal from background.As neutral particles, photon or π 0 , are not reconstructed, B 0 (s) → D s1 (2536) ∓ K ± decays cannot be described by Gaussian functions in these two mass distributions, and are decomposed into three components: longitudinal (H + ), transverse (H 0 ) and interference.The shapes of m(D 0 K − K + ) distribution for longitudinal, transverse and interference components in each chain are obtained by weighting the simulated sample, uniformly generated in the phase space, with the corresponding ω factors defined in Table 1.The resulting shapes, S X,long , S X,tran and S X,int , for decay channel X, can be written as where S X,PHSP is the shape generated using a phase-space distribution.The contributions from longitudinal, transverse and interference components are proportional to 1 : k 2 : k cos ϕ, based on Eq. 2. This feature allows k and |ϕ| to be measured using a binned maximum-likelihood fit to the m(D 0 K − K + ) distribution in data.The description of the m(D 0 K − ) distribution is similar to that of m(D 0 K − K + ), except that there is no interference contribution.As the correlation between the discriminating variable, m(D 0 K − ), and the control variable, m(D 0 K − K + ) is small, the sF it technique [25] is used to subtract non-D s1 (2536) − decay products before fitting to m(D 0 K − K + ).Details of the main physical channels involved in the mass fit are described below.
As the interference term is null in the m(D 0 K − ) distribution, the partially reconstructed D s1 (2536) − → D * 0 K − (photon/π 0 chain) signals are modelled using the sum of longitudinal and transverse components.The efficiency-corrected yield ratio of the photon chain to the π 0 chain is fixed to the branching fraction ratio between the D * 0 → D 0 γ and the where N γ (D s1 (2536) − ) and ϵ γ (D s1 (2536) − ) are the signal yield and efficiency of the photon chain, respectively, while the corresponding parameters for the π 0 chain are N π 0 (D s1 (2536) − ) and ϵ π 0 (D s1 (2536) − ).The sum of yields of these two chains is a free parameter in the fit.The probability density function of the photon/π 0 chain can be expressed as where correspond to the longitudinal and transverse components, respectively.
The partially reconstructed B 0 (s) → D s1 (2536) ∓ K ± (photon/π 0 chain) signals are modelled using the sum of longitudinal, transverse and interference components, where the two parameters k and |ϕ|, determining the relative proportion of these components, are free parameters in the fit.The ratio between the photon and π 0 chain yields is constrained similarly as when fitting the m(D 0 K − ) distribution using Eq. 4. The sum of yields of these two chains for each decay channel is a free parameter in the fit.The probability density function of the photon/π 0 chain can be expressed as where )) correspond to the longitudinal, transverse and interference components obtained from simulation, respectively.
Although stringent PID criteria are applied to the kaon that does not decay from the D s1 (2536) − meson, there is still a non-negligible contribution from the B 0 s → D s1 (2536) − π + decay in the selected dataset.This component is modelled in the same way as the B 0 (s) → D s1 (2536) ∓ K ± signal channels and is obtained from simulated samples, taking into account the dependence on the k and ϕ parameters.
At low m(D 0 K − K + ) values, there are contributions from decays which have more than one particle not reconstructed.For example, in B 0 s → D s1 (2536) ∓ K * ± decays, neutral particles from D * 0 and K * + decays are not considered in the reconstruction.To model this contribution, simulated samples are studied.The dominant contribution of this background lies in the region with m(D 0 K − K + ) lower than 5000 MeV/c 2 and only the right tail is present in the fitting region.The decays B 0 → D s1 (2536) − ρ + , B − → D s1 (2536) − K * 0 and its isospin partner B 0 → D s1 (2536) ∓ K * ± also belong to this type of background.It is hard to distinguish these contributions within the current fitting range in m(D 0 K − K + ).Therefore in the default fit only the B 0 s → D s1 (2536) ∓ K * ± decay is taken into consideration.The B 0 → D s1 (2536) − ρ + and B 0 → D s1 (2536) ∓ K * ± decay channels are considered when determining the systematic uncertainty on k, |ϕ| and R.
The m(D 0 K − ) mass distribution with fit projections overlaid are shown in Fig. 3.The sF it technique [25] is applied to subtract non-D s1 (2536) − decay products, when fitting the m(D 0 K − K + ) distribution.Figure 4 shows the m(D 0 K − K + ) mass distribution with the fit results overlaid.A simultaneous fit to the two data-taking periods, Run 1 and Run 2, sharing the same k and |ϕ| parameters, is performed.The distribution is dominated by two broad structures due to B 0 (s) → D s1 (2536) ∓ K ± decays.The combinatorial background in the m(D 0 K − K + ) is negligible and not considered.To estimate the possible bias on the fitted parameters due to the sF it technique [25], a set of pseudoexperiment samples are generated where the correlation between these two mass variables, m(D 0 K − ) and m(D 0 K − K + ), is taken into account, obtained from full simulation samples which preserve correlations.Small biases and underestimation of uncertainties on the fitted parameters are found and corrected for.The yields of photon/π 0 chains for signal channels B 0 (s) → D s1 (2536) ∓ K ± and helicity-related parameters, k and |ϕ|, are summarised in Table 2.The correlation coefficients between the fitted parameters are given in the Table 3.The significance is evaluated with a likelihood-based test, in which the likelihood distribution of the background-only hypothesis is obtained using pseudoexperiments [26] taking into consideration both statistical and systematic uncertainties.Significances of 17.9 σ and 12.2 σ are obtained for the modes B 0 s → D s1 (2536) ∓ K ± and B 0 → D s1 (2536) ∓ K ± , respectively.
For the normalisation channel, the selected B 0 → D 0 K + K − candidates consist of signal and various background contributions: combinatorial, misidentified, and partially reconstructed b-hadron decays.Contributions from partially reconstructed decays are reduced by the lower bounds on the mass region used in the fit.Sources of misidentified backgrounds are investigated using simulated samples.Most of the potential sources of  background are found to have broad mass distributions, e.g.Λ 0 b → D 0 pK + , Λ 0 b → D 0 pπ + and Ξ b → D 0 pK + decays.But they have obvious peaks in the D 0 pK + or D 0 pπ + mass distribution and hence their yields are determined precisely from data.The B 0 → D 0 K + K − and B 0 s → D 0 K + K − components are described using the sum of two Crystal Ball distributions [27].The Crystal Ball distributions for both decays share common resolution and tail parameters.The tail parameters are constrained from fits to simulation.The values of the mass and its resolution are free parameters, but the mass difference between B 0 and B 0 s is fixed to the value, ∆m B = 87.45 ± 0.45 MeV/c 2 , measured by LHCb [28].The combinatorial background is modelled using a first-order Chebyshev polynomial, where the yield and the shape parameters of this contribution vary freely.The signal yields are obtained from unbinned extended maximum-likelihood fits to data.The invariant mass distributions of the normalisation channel are shown in Fig. 5.The yields are 2530 ± 80 and 7860 ± 130 for Run 1 and Run 2 data, respectively.

Efficiency
The total efficiency is factorised into the product of the detector acceptance, reconstruction and selection efficiency, particle identification efficiency and trigger efficiency.The detector acceptance is the fraction of simulated decays reconstructed within the LHCb detector.Selection efficiency includes the effect of the software selection in the trigger system, the initial selection, the MLP selection, and the reconstruction of the charged tracks.The efficiency in Run 2 is higher than that in Run 1 due to improvements in the reconstruction software.The PID and hardware trigger efficiencies are determined from calibration data samples where the abundant D 0 → K − π + sample is used [29].The simulated PID response is corrected in order to match the data.The simulated samples of the normalisation mode are generated uniformly over the phase space of the decay and are different from the distributions in the data.This effect is taken into account by weighting the simulation samples to match the distributions found in the data and the average efficiency is calculated using these weighted samples.The total efficiencies of Run 1 and Run 2 for the signal and normalisation channels are shown in Table 4.
Table 4: Summary of total efficiencies for the signal B 0 (s) → D s1 (2536) ∓ K ± and the normalisation mode B 0 → D 0 K + K − , where the uncertainty is statistical.

Systematic uncertainties
Systematic uncertainties on the branching fraction ratios, R(B 0 (s) → D s1 (2536) ∓ K ± ), and the helicity-related parameters, k and |ϕ|, arise from the mass fits for the signal and normalisation channels, the limited sizes of the simulated samples, the uncertainties on the efficiency ratio corrections, and the branching fraction ratios B(D * 0 → D 0 π 0 )/B(D * 0 → D 0 γ).The uncertainties are given in % relative to the measured value and are listed in Table 5.

Signal channel fit
The choice of PDFs used to model the signal and background components in the m(D 0 K − ) or m(D 0 K − K + ) fit is the main source of systematic uncertainty.Taking for example the modelling of the m(D 0 K − ) distribution of the signal channels, the non-parametric PDFs from the simulation sample are used in the default fit, then the RooHILLdini and RooHORNdini functions are used as alternative models, which are usually able to describe the shape of the mass distribution of vector decays and are defined in Ref. [30].To determine this uncertainty, the alternative fit model is used to generate a set of pseudoexperiments and the default model is used to fit them.The bias obtained from these fits is taken as the corresponding systematic uncertainty.
The most significant contribution to this systematic uncertainty arises from the physical background description in the fit of the m(D 0 K − K + ) distribution.In the lower side of the fit region, the primary contribution comes from B 0 → D s1 (2536) ∓ K ± decays.In addition, there is a need for contributions from other components, such as decays, which are discussed in Sec. 5.According to simulation, all of them are likely to enter the fit region, albeit with different slopes.Attempts to include each component independently into the fit model have been made, and the case resulting in the largest change in yield is used to calculate the systematic uncertainty.Since the lower side of the fit region is primarily used to determine the yields of B 0 → D s1 (2536) ∓ K ± decays, the uncertainties for results from B 0 decays are larger than those from B 0 s decays.In addition, the sF it method is used to subtract background contributions when fitting the m(D 0 K − K + ) distribution.It assumes no correlation between the discriminating variable m(D 0 K − ) and the control variable m(D 0 K − K + ).To estimate the corresponding systematic uncertainty, the fit procedure is performed with 1,500 pseudoexperiments samples obtained from full simulation which preserves correlations.The bias obtained from these fits is assigned as the systematic uncertainty.The systematic uncertainty from the fit model for the signal channel is considered fully correlated between Run 1 and Run 2 data samples.

Normalisation channel fit
The signal model for the normalisation channel is the sum of two Crystal Ball distributions with tail parameters determined from simulation.To determine the related systematic uncertainties, the fit to the m(D 0 K + K − ) distribution is performed many times with the parameters randomly varied within their uncertainties according to Gaussian distributions.The width of the distribution of the normalised residuals of the yields is assigned as a systematic uncertainty.The background model is changed to an exponential function to estimate the systematic uncertainty and the difference with respect to the default model is assigned as a systematic uncertainty.This source of systematic uncertainty only affects the B 0 → D 0 K + K − yield and does not contribute to the helicity-related parameters, k and |ϕ|.This source of systematic uncertainty is considered as fully correlated between the Run 1 and Run 2 data samples.

Simulated sample size
In the m(D 0 K − K + ) fit, the efficiency-corrected ratios of yields between the photon chain and the π 0 chain are fixed.Statistical uncertainties of efficiencies due to limited simulated sample size, therefore, affect the ratios and are taken as a source of systematic uncertainty.
To determine this systematic uncertainty, the m(D 0 K + K − ) fit to data is performed 1,000 times with efficiencies Gaussian constrained.The width of the distribution of the fitted yields is taken as a systematic uncertainty.This source of systematic uncertainty is considered as uncorrelated between the Run 1 and Run 2 samples due to its statistical nature.

Efficiency correction
Differences between data and simulation are corrected using data and calibration samples.The systematic uncertainty due to the limited size of the calibration samples is considered.It is composed of contributions from the PID efficiency and the hardware trigger efficiency.The systematic uncertainty is evaluated using the same method used for the simulated sample size.Almost all of this source of systematic uncertainty is considered to be uncorrelated between the Run 1 and Run 2 samples, except the one related to the detector acceptance that is considered as fully correlated.

Branching fraction ratio
In the m(D 0 K − K + ) fit, the efficiency-corrected ratios of yields between the photon and π 0 chains is related to the branching fraction ratio, B(D * 0 → D 0 π 0 )/B(D * 0 → D 0 γ) = 1.83 ± 0.07.In the default fit, this branching fraction ratio is fixed to its central value.To evaluate the related systematic uncertainty, the branching fraction ratio is varied by one standard deviation.The fit is repeated with the new values of the branching fraction ratio, and the relative difference is assigned as a systematic uncertainty which is considered as fully correlated between Run 1 and Run 2 samples.
Fragmentation fraction ratio: f s /f d The fragmentation fraction ratio, f s /f d , of a b quark into a B 0 s and a B 0 meson in LHCb proton-proton collisions [31] is an important source of systematic uncertainty for the B 0 s → D s1 (2536) ∓ K ± decay and depends on the centre-of-mass energy for the collected data.The values of this quantity are 0.2387 ± 0.0075 and 0.2539 ± 0.0079 for the Runs 1 and 2, respectively.The uncertainty from f s /f d is propagated as a systematic uncertainty, which is considered as uncorrelated between the Run 1 and Run 2 samples.
For each data taking period, the total systematic uncertainty is obtained as the quadratic sum of all contributions, except the systematic uncertainty from f s /f d , which is listed separately, and shown in Table 5.

Results
The B 0 (s) → D s1 (2536) ∓ K ± branching fractions relative to that of the B 0 → D 0 K + K − decay channel are defined as where f s (f d ) is the fragmentation fraction of a b quark into a B 0 s (B 0 ) meson, and N cor (X) is the efficiency-corrected yield, N (X)/ϵ(X), of the corresponding channel X.Using the corrected yields determined in the previous sections and the ratio f s /f d , the branching fraction ratios, R (B 0 (s) → D s1 (2536) ∓ K ± ), are determined to be where the first uncertainty is statistical, the second systematic, and the third due to the uncertainty of f s /f d .Using B(B 0 → D 0 K + K − ) = (6.1 ± 0.4 ± 0.3 ± 0.3) × 10 −5 [32], the absolute branching fractions are calculated to be where the third uncertainties are due to the uncertainty of B(B 0 → D 0 K + K − ), and the fourth from the f s /f d ratio.
The helicity-related parameters k and |ϕ|, used to describe the ratio and phase difference between the helicity amplitudes H + and H 0 , are determined to be k = 1.89 ± 0.24 ± 0.06, |ϕ| = 1.81 ± 0.20 ± 0.11 rad, where the first uncertainty is statistical and the second systematic.
The helicity amplitudes are related to the LS couplings (B L,S ) using the Clebsch-Gordan coefficients, where L is the orbital angular momentum in a two body decay A → B + C, and S is the total spin of the secondary particles, ⃗ S = ⃗ J B + ⃗ J C (|J B − J C | ≤ S ≤ J B + J C ), where J B and J C are the spins of B and C particles.This leads to the following relations: where B 0,1 corresponds to the S-wave and B 2,1 to the D-wave.The amplitude ratio between Sand D-waves, B 0,1 /B 2,1 ≡ Ae iB , are determined to be A = 1.11 ± 0.15 ± 0.06, |B| = 0.70 ± 0.09 ± 0.04 rad, where the first uncertainty is statistical and the second systematic.The fraction of S-wave component in D s1 (2536) + → D * 0 K + is calculated to be (55 ± 7 ± 3)%, consistent with the results from its isospin partner D s1 (2536) + → D * + K 0 , in which the S-wave fraction is (72 ± 5 ± 1)% [33].

Conclusion
In summary, the decays B 0 (s) → D s1 (2536) ∓ K ± are observed for the first time and their branching fractions are measured using a data sample corresponding to an integrated luminosity of 9 fb −1 of pp collisions collected by the LHCb experiment with the significance for the B 0 s and B 0 signals larger than 10 σ.The fraction of the S-wave component in the D s1 (2536) + → D * 0 K + decay is determined to be (55 ± 7 ± 3)%, which is comparable to that of its isospin partner D s1 (2536) + → D * + K 0 [33].The observation of the B 0 s → D s1 (2536) ∓ K ± decay channel is the first step towards the CKM γ angle extraction using this channel.By the end of the LHCb Run 3 data taking period, the signal yields will allow us to measure the γ angle with a precision similar to that of B 0 s → D ∓ s K ± decay in Run 1 [3], which could provide useful information as a reference value to be compared with theory predictions.In addition, the results of the branching fractions of the B 0 (s) → D s1 (2536) ∓ K ± decay channels can be compared with theoretical predictions with QCD factorisation to shed light on whether or not there is physics beyond the Standard Model in the b → cūs and b → ūcs processes.

Figure 3 :
Figure 3: Mass distribution m(D 0 K − ) for selected candidates in (left) Run 1 and (right) Run 2 data, with the fit overlaid.The data is shown as black solid dots, while the blue solid line shows the results of fit.The green dashed line represents the photon chain with longitudinal polarised decay, the blue dashed line represents the photon chain with transverse polarised decay, the violet and red dashed lines describe the π 0 chain with longitudinal and transverse polarisation, respectively, and the orange filled histogram represents the background.

Figure 4 :
Figure 4: Mass distribution m(D 0 K − K + ) for selected weighted candidates in (left) Run 1 and (right) Run 2 data, with the fit overlaid.The cyan dashed line represents the B 0 s → D s1 (2536) ∓ K ± decay channel, the yellow filled histogram represents the B 0 → D s1 (2536) ∓ K ± decay channel, the green filled histogram represents the B 0 s → D s1 (2536) ∓ K * ± channel and the purple filled histogram represents the B 0 s → D s1 (2536) − π + decay channel.

Figure 5 :
Figure 5: Mass distribution m(D 0 K + K − ) for selected candidates in the normalisation channel in (left) Run 1 and (right) Run 2 data, with the fit overlaid.The red dot-dashed line represents the B 0 → D 0 K + K − channel, the red dotted line represents the B 0 s → D 0 K + K − channel, the green dashed line represents the combinatorial background, the cyan dashed line represents the mis-identified backgrounds, which includes the B 0 (s) → D 0 Kπ, Λ 0 b → D 0 pK, Λ 0 b → D 0 pπ and Λ 0 b → D 0 pπ channels, the yellow dashed line represents the partially reconstructed background, which includes the B 0 (s) → D * 0 KK and B 0 (s) → D * 0 Kπ channels.

Table 1 :
Definitions of the ω functions in the differential decay rate.

Table 2 :
The result of fitted parameters in the m(D 0 K − K + ) fit.The uncertainties are statistical.

Table 3 :
Correlation coefficients between fitted parameters from the m(D 0 K − K + ) fit.N B 0 is the sum of the yields from photon and π 0 chains for B 0 → D s1 (2536) ∓ K ± .N B 0 s is the sum of the yields from photon and π 0 chains for B 0 s → D s1 (2536) ∓ K ± .