Evidence for the decay $B_{s}^0 \rightarrow \overline{K}{}^{*0}\mu^+\mu^-$

A search for the decay $B_{s}^0 \rightarrow \overline{K}{}^{*0}\mu^+\mu^-$ is presented using data sets corresponding to 1.0, 2.0 and 1.6 $\text{fb}^{-1}$ of integrated luminosity collected during $pp$ collisions with the LHCb experiment at centre-of-mass energies of 7, 8 and 13 TeV, respectively. An excess is found over the background-only hypothesis with a significance of 3.4 standard deviations. The branching fraction of the $B_{s}^0 \rightarrow \overline{K}{}^{*0}\mu^+\mu^-$ decay is determined to be $\mathcal{B}(B_{s}^0 \rightarrow \overline{K}{}^{*0}\mu^+\mu^-) = [2.9 \pm 1.0~(\text{stat}) \pm 0.2~(\text{syst}) \pm 0.3~(\text{norm})] \times 10^{-8}$, where the first and second uncertainties are statistical and systematic, respectively. The third uncertainty is due to limited knowledge of external parameters used to normalise the branching fraction measurement.


Introduction
The decay B 0 s → K * (892) 0 µ + µ − , hereafter referred to as B 0 s → K * 0 µ + µ − , proceeds via a b → d flavour-changing neutral-current (FCNC) transition. The leading contributions to the amplitude of the decay correspond to loop Feynman diagrams and involve the off-diagonal element V td of the Cabibbo-Kobayashi-Maskawa (CKM) quark-mixing matrix. This process is consequently rare in the Standard Model of particle physics (SM). New particles predicted by extensions of the SM can enter in competing diagrams and can significantly enhance or suppress the rate of the decay, see for example Refs. [1,2]. While no prediction of the decay rate is presently available in the literature, form-factor computations for the B 0 s → K * 0 transition have been made using light-cone sum rule [3,4] and lattice QCD [5] techniques. An order of magnitude estimate for the branching fraction of the decay B 0 s → K * 0 µ + µ − can be achieved by combining the value of |V td /V ts | with the known branching fractions of corresponding b → s + − processes. This leads to an expectation that the branching fraction should be O(10 −8 ) in the SM.
The observation of the rare b → d + − FCNC decays B + → π + µ + µ − and Λ 0 b → pπ − µ + µ − has been previously reported by the LHCb collaboration in Refs. [6] and [7], respectively. Evidence for the decay B 0 → π + π − µ + µ − has also been established in Ref. [8]. The decay B 0 s → K * 0 µ + µ − has not yet been observed. The measured ratio of the B + → π + µ + µ − and B + → K + µ + µ − branching fractions has also been used to determine the ratio of CKM elements |V td /V ts | [9], exploiting correlations between the B → K and B → π form-factors in lattice computations. A similar approach could, in the future, be applied to the ratio of the B 0 s → K * 0 µ + µ − and B 0 → K * 0 µ + µ − decay rates [10]. The decay B 0 → K * 0 µ + µ − , which involves a b → s + − transition, has been studied extensively by BaBar, Belle, CDF and by the LHC experiments [11][12][13][14][15][16]. The rate of the decay appears to be systematically lower than current SM predictions. Global analyses of b → s processes favour a modification of the SM at the level of 4 to 5 standard deviations [17][18][19][20][21]. Similar studies of b → d processes are important to understand the flavour structure of the underlying theory. This paper presents a search for the decay B 0 s → K * 0 µ + µ − , where the inclusion of charge-conjugate processes is implied throughout, using data collected with the LHCb experiment in pp collisions during Runs 1 and 2 of the LHC. The data set used in this paper is as follows: 1.0 fb −1 of integrated luminosity collected at a centre-of-mass energy of 7 TeV during Run 1; 2.0 fb −1 of integrated luminosity collected at a centre-of-mass energy of 8 TeV during Run 1; and 1.6 fb −1 of integrated luminosity collected at a centre-of-mass energy of 13 TeV during Run 2. Section 2 of this paper describes the LHCb detector and the experimental setup used for the analysis. Section 3 outlines the selection processes used to identify signal candidates. Section 4 describes the method used to estimate the number of B 0 s → K * 0 µ + µ − decays in the data set. Section 5 describes the determination of the B 0 s → K * 0 µ + µ − branching fraction, normalising the number of observed signal decays to the number of B 0 → J/ψ K * 0 decays present in the data set. Section 6 discusses sources of systematic uncertainty on the B 0 s → K * 0 µ + µ − branching fraction. Finally, conclusions are presented in Sec. 7.
in the range 0.1 < q 2 < 19.0 GeV 2 /c 4 , excluding the region 12.5 < q 2 < 15.0 GeV 2 /c 4 dominated by the ψ(2S) resonance. Candidates in the region 8.0 < q 2 < 11.0 GeV 2 /c 4 , which are dominated by decays via a J/ψ resonance, are treated separately in the analysis. The remaining candidates include B 0 s meson decays that produce a dimuon pair through the decay of a light-quark resonance or a charmonium state above the open charm threshold, which are inseparable from the short-distance component of the decay. These are considered part of the signal in the analysis.
The selection process used in this analysis is similar to that described in Ref. [15]. The four charged tracks are required to each have a significant IP with respect to all PVs in the event and to be consistent with originating from a common vertex. The B 0 (s) meson candidate is required to be consistent with originating from one of the PVs in the event and its decay vertex is required to be well separated from that PV. The kaon and pion candidates must also be identified as kaon-like and pion-like by a multivariate algorithm [23] based on information from the RICH detectors, tracking system and calorimeters. The PID requirements are chosen to maximise the sensitivity to a SM-like To improve the resolution on the reconstructed K − π + µ + µ − invariant mass, m(K − π + µ + µ − ), candidates with a large uncertainty on their measured mass are rejected. The opening angle between every pair of final-state particles is also required to be larger than 5 mrad in the detector. This requirement removes a possible source of background that arises when the hits associated to a given charged particle are mistakenly used in more than one reconstructed track. A kinematic fit is also performed, constraining the candidate to originate from its most likely production vertex [37]. In the kinematic fit of candidates with q 2 in the J/ψ mass window, the dimuon pair is also constrained to the known J/ψ mass. This mass constraint improves the resolution in m(K − π + µ + µ − ) for candidates involving an intermediate J/ψ resonance decay by a factor of two.
Signal candidates are further classified using an artificial neural network [38]. The neural network is trained using a sample of simulated B 0 → K * 0 µ + µ − decays as a proxy for the signal decay. Candidates in data with m(K − π + µ + µ − ) > 5670 MeV/c 2 are used as a background sample. This sample is predominantly comprised of combinatorial background, where uncorrelated tracks from the event are mistakenly combined. The neural network uses the following variables related to the topology of the B 0 (s) meson decay: the angle between the reconstructed momentum vector of the B 0 (s) meson and the vector connecting the PV and the decay vertex of the B 0 (s) candidate; the IP, p T and proper decay time of the B 0 (s) candidate; the vertex fit quality of the B 0 (s) decay vertex and of the dimuon pair; the minimum and maximum p T of the final-state particles and for the Run 1 data set a measure of the isolation of the final-state particles in the detector. It has been verified that the distribution of the variables used as input to, and the output distribution from, the classifier agree between the simulation and the data. The output of the neural network is transformed such that it is uniform in the range 0-1 on the signal proxy. Candidates with neural network response below 0.05 are rejected in the subsequent analysis. This requirement removes a background-dominated part of the data sample. The neural network response is validated on simulated B 0 → K * 0 µ + µ − and B 0 s → K * 0 µ + µ − decays to ensure that it does not introduce any bias in m(K − π + µ + µ − ).
Finally, a number of vetoes are applied to reject specific sources of background. Signal candidates are rejected if the pion candidate has a nonnegligible probability to be a kaon and if the K − K + mass, after assigning the kaon mass to the pion candidate, is consistent with that of the φ(1020) meson. This removes background from the decay B 0 s → φµ + µ − , where a kaon is mistakenly identified as a pion. Candidates are also rejected if the kaon or pion are identifiable as a muon and the K − µ + or π + µ − mass, after assigning the muon mass hypothesis to the kaon or pion candidate, are consistent with that of a J/ψ or ψ(2S) meson.

Signal yields
In order to maximise sensitivity to a B 0 s → K * 0 µ + µ − signal, candidates are divided into regions of neural network response. The candidates are also divided based on the two data-taking periods, Run 1 and Run 2. Four regions of neural network response are selected for each data-taking period, each containing an equal amount of expected signal decays. The yield of the B 0 s → K * 0 µ + µ − decay is determined by performing a simultaneous unbinned maximum likelihood fit to the m(K − π + µ + µ − ) distribution of the eight resulting subsets of the data.
In the likelihood fit, the signal lineshape of both the B 0 and the B 0 s → K * 0 µ + µ − decays is described by the sum of two Crystal Ball functions [39] and a Gaussian function, which share a common peak position. The two Crystal Ball functions have tails on either side of the peak. The B 0 s peak position is displaced from that of the B 0 by 87.5 MeV/c 2 [40]. The relative fractions of the Crystal Ball and Gaussian functions are fixed from fits to simulated B 0 and B 0 s → K * 0 µ + µ − decays. The widths of the functions and the tail parameters of the Crystal Ball functions are also fixed from the simulation, except for an overall scaling of the widths and of the tail parameters to allow for potential data-simulation differences. The peak position and these scale factors are obtained from a fit to candidates in the J/ψ mass window, where the mass constraint on the dimuon mass has not been applied. The result of this fit is shown in the appendix in Fig. 4.
After applying the selection procedure, the background predominately comprises combinatorial background. The combinatorial background is described in the fit by a separate exponential function in each subset of the data. A number of other sources of background are accounted for in the fit. The decay B 0 → K * 0 µ + µ − forms a source of background if the kaon is mistakenly identified as the pion and vice versa. The shape of this background is taken from the simulation. The yield of the background is constrained relative to that of the B 0 → K * 0 µ + µ − decay based on measurements of the kaon-topion and pion-to-kaon misidentification probabilities in the PID calibration samples. The decay Λ 0 b → pK − µ + µ − forms a source of background if the final-state hadrons are misidentified. This background is constrained from a control region in the data, by modifying the PID requirements on the candidates to preferentially select pK − rather than K − π + combinations. The shape of this background is modelled in the fit by Crystal Ball functions. The yield in each subset of the data is constrained using the proton and kaon identification and misidentification probabilities determined from the PID calibration samples. The decay B − → K − µ + µ − forms a source of background if a pion from the event is mistakenly combined with the particles coming from the B − meson decay. The background contribution from B − → K − µ + µ − decays is determined from a control region in the data, by selecting candidates with a K − µ + µ − invariant mass that is consistent with the known B − mass. This background is only visible for the candidates with q 2 in 5200 5300 5400 5500 5600 the J/ψ mass region. The shape of the background in the fit is modelled by Crystal Ball functions. Several other sources of background are considered but are found to have a negligible contribution to the fit. These sources include semileptonic decays of b hadrons via intermediate open-charm states and fully hadronic b-hadron decays. The background from semileptonic decays is predominantly reconstructed at low m(K − π + µ + µ − ) and does not contribute to the analysis. Fully hadronic b-hadron decays contribute at the level of 1 to 2 candidates at masses close to the known B 0 s mass. This background is neglected in the analysis but is considered as a source of systematic uncertainty in Sec. 6. Figure 1 shows the fit to the candidates, where the result of the fit in the three most signal-like neural network response bins for each data-taking period has been combined. The dominant contribution in the fit is the B 0 → K * 0 µ + µ − decay. Figure 2 shows the fit to the mass-constrained candidates in the J/ψ mass region, also with the three highest neural network response bins for each data taking period combined. In this fit, a small background component from B 0 → K * 0 µ + µ − decays is included. This background has the same final state but is constrained to the wrong dimuon mass and becomes a broad component in the fit. The fit results in individual bins of neural network response are shown in the appendix in Figs. 5 and 6. Summing over the bins of neural network response and data-taking periods, the yields are: 627 244 ± 837 for the B 0 → J/ψ K * 0 decay, 5730 ± 94 for the B 0 s → J/ψ K * 0 decay, 4157 ± 72 for the B 0 → K * 0 µ + µ − decay, and 38 ± 12 for the B 0 s → K * 0 µ + µ − decay. No correction has been made to these yields to account for cases where the K − π + system does not originate from a K * (892) 0 decay. Contamination from non-K * 0 decays is discussed further in Sec. 5. Using Wilks' theorem, the significance of the B 0 s → K * 0 µ + µ − yield is determined to be 3.4 standard deviations compared to the background-only hypothesis. This includes the systematic uncertainties on the yield discussed in Sec. 6. Figure 3 shows the variation of the log-likelihood of the simultaneous fit as a function of the B 0 s → K * 0 µ + µ − yield.

Results
The branching fraction of the B 0 s → K * 0 µ + µ − decay is determined with respect to that of B 0 → J/ψ K * 0 according to Here, N is the yield for a given decay mode determined from the fit to m(K − π + µ + µ − ) or m(J/ψ K − π + ) and ε is the efficiency to reconstruct and select the given decay mode. The ratio f s /f d is the relative production fraction of B 0 s and B 0 mesons in pp collisions. The efficiency to trigger, reconstruct and select each of the decay modes is determined from the simulation after applying the data-driven corrections. The efficiency for the B 0 s → K * 0 µ + µ − decay is corrected to account for events in the vetoed q 2 regions following the same prescription as Ref. [16]. The efficiency corrected yields are further corrected for contamination from decays with the K − π + system in an S-wave configuration. For the decay B 0 s → J/ψ K * 0 , the S-wave fraction of F S (B 0 → J/ψ K * 0 ) = (6.4 ± 0.3 ± 1.0)% determined in Ref. [41] is used. The S-wave contamination of the B 0 s → K * 0 µ + µ − decay is unknown but it is assumed to be at a similar level to that of the B 0 → K * 0 µ + µ − decay. The full size of the S-wave correction is taken as a systematic uncertainty. The S-wave contamination of the B 0 → K * 0 µ + µ − decay is determined using the model from Ref. [16]. This model predicts an S-wave fraction of F S (B 0 → K * 0 µ + µ − ) = (3.4 ± 0.8)% in the K − π + mass window used in this analysis.
The ratio of production fractions, f s /f d , has been measured at 7 and 8 TeV to be f s /f d = 0.259 ± 0.015 in the LHCb detector acceptance [42]. The production fraction at 13 TeV has been shown to be consistent with that of the 7 and 8 TeV data in Ref. [43]. The production fraction at 13 TeV has also been validated in this analysis by comparing the efficiency-corrected yields of the B 0 and the B 0 s → J/ψ K * 0 decays in bins of the B 0 (s) meson p T . Taking the branching fractions of the decays B 0 → J/ψ K * 0 and J/ψ → µ + µ − to be (1.19 ± 0.01 ± 0.08) × 10 −3 [44] and (5.96 ± 0.03)% [36], respectively, results in a branching fraction for the B 0 s → K * 0 µ + µ − decay of B(B 0 s → K * 0 µ + µ − ) = [2.9 ± 1.0 (stat) ± 0.2 (syst) ± 0.3 (norm)] × 10 −8 . The first and second uncertainties are statistical and systematic, respectively. The third uncertainty is due to limited knowledge of the external parameters used to normalise the observed yield. This includes the uncertainties on the external branching fraction measurements, on f s /f d , F S (B 0 → J/ψ K * 0 ) and F S (B 0 s → K * 0 µ + µ − ). A measurement of the branching fraction of the B 0 s → K * 0 µ + µ − decay relative to that of B 0 s → J/ψ K * 0 is also made. The S-wave contamination of the B 0 s → J/ψ K * 0 decay is corrected for by using the measurements of F S in bins of m(K − π + ) from Ref. [45], scaled according to the model in Ref. [16], giving F S (B 0 s → J/ψ K * 0 ) = (16.0 ± 3.0)%. The resulting ratio of branching fractions is where the third uncertainty is due to F S (B 0 s → J/ψ K * 0 ) and F S (B 0 s → K * 0 µ + µ − ) .
In order to determine the ratio |V td /V ts | it is also useful to extract the ratio where the third uncertainty corresponds to the uncertainties on f s /f d , F S (B 0 → K * 0 µ + µ − ) and F S (B 0 s → K * 0 µ + µ − ).

Systematic uncertainties
The measurements presented in Sec. 5 are performed relative to decays that have the same final-state particles as the B 0 s → K * 0 µ + µ − decay. Consequently, many potential sources of systematic uncertainty largely cancel in the ratios. The remaining sources of systematic uncertainty are discussed below and are summarised in Table 1. Only systematic uncertainties that have an effect on the measured yield are considered when evaluating the significance of the observed signal. These are systematic uncertainties related to the signal resolution, neural network binning scheme and the residual backgrounds at m(K − π + µ + µ − ) close to the known B 0 s meson mass. The m(K − π + µ + µ − ) model used to describe the decays B 0 and B 0 s → K * 0 µ + µ − is taken from the simulation with a simple scaling of the width and tail parameters based on the fit to the data in the J/ψ mass region. Any difference in the q 2 spectrum of the simulation and the data could result in a small mismodelling of the lineshape. To account for this possibility, the width of the m(K − π + µ + µ − ) resolution model is allowed to vary within 0.5 MeV/c 2 in the fit. This covers the full variation in the simulation of the width across the allowed q 2 range and contributes 0.1% to the systematic uncertainty. A final uncertainty on the signal lineshape is evaluated based on the difference in fits to the candidates in the J/ψ mass region with and without the constraint on the dimuon mass. A systematic uncertainty of 0.5% is assigned, taken as the difference in efficiency-corrected B 0 → J/ψ K * 0 yields between these two fits. In addition, an alternative parameterisation with an exponential tail rather than a power-law tail is tested for the lineshape describing the Λ 0 b background. The difference in yields between the two models results in a systematic uncertainty of 0.1% on the B 0 s → K * 0 µ + µ − yield. The total uncertainty related to mass lineshapes is taken as the sum in quadrature of the uncertainties.
The systematic uncertainty related to the relative efficiencies in each neural network response bin is evaluated in two parts: an uncertainty due to the limited size of the simulation sample used to determine the relative fractions and an uncertainty due to differences between simulated samples and the data. The latter is evaluated by correcting the fraction of B 0 s → K * 0 µ + µ − decays in each neural network response bin by the measured difference between simulation and data for the B 0 → J/ψ K * 0 decays. The combination of these uncertainties is 0.5%.
Sources of background from hadronic b-hadron decays, where two of the final-state hadrons are misidentified as muons, are neglected in the final fit to the K * 0 µ + µ − candidates. These backgrounds are estimated to contribute 1 to 2 candidates at m(K − π + µ + µ − ) close to the known B 0 s mass. The resulting systematic uncertainty on the B 0 s → K * 0 µ + µ − yield is estimated to be 2%. The background is negligible compared to the B 0 yield. Table 1: Main sources of systematic uncertainty considered on the branching fraction measurements. The first uncertainty applies to the measurement of B(B 0 s → K * 0 µ + µ − ), the second to B(B 0 s → K * 0 µ + µ − )/B(B 0 → K * 0 µ + µ − ) and the third to B(B 0 s → K * 0 µ + µ − )/B(B 0 s → J/ψ K * 0 ), respectively. A description of the different contributions can be found in the text. The first three sources of uncertainty affect the measured yield of the signal decay. The total uncertainty is the sum in quadrature of the individual sources. The final row indicates the additional uncertainty arising from the uncertainties on external parameters used in the measurements. The background yield from Λ 0 b decays is constrained using PID efficiencies from control samples and these efficiencies have an associated systematic uncertainty. This uncertainty is accounted for in the statistical uncertainty of the fit and is negligible.

Uncertainties Source
Other sources of systematic uncertainties are associated to the normalisation of the observed yield for the measurements of the branching fraction and branching-fraction ratios. The largest source of systematic uncertainty on both B(B 0 s → K * 0 µ + µ − ) and the branching-fraction ratio measurements is associated to how well external parameters are known: there is a 5.8% uncertainty on the ratio of the B 0 s and B 0 fragmentation fractions, a 1.1% systematic uncertainty due to F S (B 0 → J/ψ K * 0 ), a 0.8% uncertainty due to F S (B 0 → K * 0 µ + µ − ), a 4.0% uncertainty due to F S (B 0 s → J/ψ K * 0 ) and a 6.8% uncertainty on B(B 0 → J/ψ K * 0 ). It is assumed that these external uncertainties are uncorrelated.
The second largest source of uncertainty is due to how well the amplitudes for the B 0 → J/ψ K * 0 , B 0 s → J/ψ K * 0 , B 0 → K * 0 µ + µ − , and B 0 s → K * 0 µ + µ − decays are known. The uncertainty on the decay structure leads to an uncertainty on the efficiencies used to correct the observed yields. The amplitude structure of the B 0 → J/ψ K − π + decay has been studied in Refs. [41,44], and the amplitude structure of the B 0 s → J/ψ K − π + decay in Ref. [45]. These measurements are used to weight the simulated events used to determine ε and a systematic uncertainty is assigned as the difference of ε with and without the weighting. The full angular distribution of B 0 → K * 0 µ + µ − has been studied by the LHCb collaboration in Ref. [16]. The decay structure of the B 0 s → K * 0 µ + µ − decay is, however, unknown. To determine a systematic uncertainty associated to the knowledge of these decay models, the simulated samples are weighted such that the coupling strengths used in the model are consistent with the results from global fits to b → s data [17][18][19][20][21]. Again, the systematic uncertainty is assigned as the difference of ε with and without the weighting. The total systematic uncertainty due to the knowledge of decay models is 4% for all measurements. Finally, the contribution from non-K * 0 states in the B 0 s → K * 0 µ + µ − is also considered. This contribution is also unknown and is assumed to be at a similar level as seen in the decay B 0 → K − π + µ + µ − [16]. Assigning the full size of the effect as systematic uncertainty results in a 3.4% uncertainty.
The efficiency ratios used to determine the different branching fraction measurements have an uncertainty of around 1.5%. These uncertainties comprise a statistical component due to the limited size of the simulated samples and a systematic component associated to the choice of binning in kinematic variables used to evaluate PID and track reconstruction efficiencies. A separate systematic uncertainty is also considered on the ratio of efficiencies due to data-simulation differences. This systematic uncertainty is evaluated by taking the deviation between the efficiency ratio with and without corrections described in Sec. 2 applied. This includes corrections to the B 0 (s) meson kinematics, PID performance and track reconstruction efficiency. This results in an additional uncertainty of 1 to 2% depending on the measurement considered.

Summary
A search for the decay B 0 s → K * 0 µ + µ − is performed using data sets corresponding to 1.0, 2.0 and 1.6 fb −1 of integrated luminosity collected with the LHCb experiment at centre-of-mass energies of 7, 8 and 13 TeV, respectively. A yield of 38 ± 12 B 0 s → K * 0 µ + µ − decays is obtained, providing the first evidence for this decay with a significance of 3.4 standard deviations above the background-only hypothesis. The resulting branching fraction is determined to be While no SM prediction of the branching fraction of this decay exists in the literature, the measurement is consistent with a naïve scaling of B(B 0 → K * 0 µ + µ − ) by a SM-like value of |V td /V ts |. A detailed analysis of the q 2 spectrum of the B 0 s → K * 0 µ + µ − decay requires a larger data set. Such a data set should be available with the upgraded LHCb experiment [46].

Appendix
In these appendices, the fits to the J/ψ K − π + and K − π + µ + µ − invariant mass of the selected candidates in bins of neural network response for both the Run 1 and Run 2 data sets are shown. The fit to the K − π + µ + µ − invariant mass of the candidates in the J/ψ mass window is shown in Fig. 4. This fit is used to determine the resolution and tail parameters for the B 0 s → K * 0 µ + µ − decay. The fit to K − π + µ + µ − invariant mass of the B 0 s → K * 0 µ + µ − candidates is shown in Fig. 5. The fit to the J/ψ K − π + invariant mass after application of the J/ψ mass constraint is shown in Fig. 6.