Improved limit on the branching fraction of the rare decay $K^0_{\scriptscriptstyle S}\to\mu^+\mu^-$

A search for the decay $K^0_{\scriptscriptstyle S}\to\mu^+\mu^-$ is performed, based on a data sample of proton-proton collisions corresponding to an integrated luminosity of 3 fb$^{-1}$, collected by the LHCb experiment at centre-of-mass energies of 7 and 8 TeV. The observed yield is consistent with the background-only hypothesis, yielding a limit on the branching fraction of ${\cal B}(K^0_{\scriptscriptstyle S}\to\mu^+\mu^-)<0.8~(1.0) \times 10^{-9}$ at $90\%~(95\%)$ confidence level. This result improves the previous upper limit on the branching fraction by an order of magnitude.


Introduction
In the Standard Model (SM), the unobserved K 0 S → µ + µ − decay proceeds only through a Flavour-Changing Neutral Current (FCNC) transition, which cannot occur at tree level. It is further suppressed by the small amount of CP violation in kaon decays, since the S-wave component of the decay is forbidden when CP is conserved. In the SM, the decay amplitude is expected to be dominated by long distance contributions, which can be constrained using the observed decays K 0 S → γγ and K 0 L → π 0 γγ, leading to the prediction for the branching fraction B(K 0 S → µ + µ − ) = (5.0 ± 1.5) × 10 −12 [1,2]. The predicted branching fraction for the K 0 L decay is (6.85 ± 0.32) × 10 −9 [3], in excellent agreement with the experimental world average B(K 0 L → µ + µ − ) = (6.84 ± 0.11) × 10 −9 [4]. The prediction for K 0 S → µ + µ − is currently being updated with a dispersive treatment, which leads to sizeable corrections in other K 0 S leptonic decays [5]. Due to its suppression in the SM, the K 0 S → µ + µ − decay is sensitive to possible contributions from dynamics beyond the SM, notably from light scalars with CP -violating Yukawa couplings [1]. Contributions up to one order of magnitude above the SM branching fraction expectation naturally arise in many models and are compatible with the present bounds from other FCNC processes [2]. An upper limit on B(K 0 S → µ + µ − ) close to 10 −11 could be translated into model-independent bounds on the CP -violating phase of the s → d + − amplitude. This would be very useful to discriminate between scenarios beyond the SM if other modes, such as K + → π + νν, indicate a non-SM enhancement.
The current experimental limit, B(K 0 S → µ + µ − ) < 9 × 10 −9 at 90% confidence level (CL), was obtained using pp collision data corresponding to 1.0 fb −1 of integrated luminosity at a centre-of-mass energy √ s = 7 TeV, collected with the LHCb detector in 2011 [6]. This result improved the previous upper limit [7] but is still three orders of magnitude above the predicted SM level.
In this paper, an update of the search for the K 0 S → µ + µ − decay is reported. Its branching fraction is measured using the known K 0 S → π + π − decay as normalisation. The analysis is performed on a data sample corresponding to 2 fb −1 of integrated luminosity at √ s = 8 TeV, collected in 2012, and the result is combined with that from the previous LHCb analysis [6]. Besides the gain in statistical precision due to the larger data sample, the sensitivity is noticeably increased with respect to the previous result due to a higher trigger efficiency, as well as other improvements to the analysis that are discussed in the following sections.
An overview on how K 0 S → µ + µ − decays are detected and triggered in LHCb is given in Sect. 2, while the strategy for this measurement is outlined in Sect. 3. Details of background suppression and the resulting sensitivity are given in Sects. 4 and 5, respectively. The final result, taking into account the systematic uncertainties discussed in Sect. 6, is given in Sect. 7.

K 0 S decays in LHCb
The LHCb detector [8,9] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex locator (VELO) surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors (RICH). Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by five stations which alternate layers of iron and multiwire proportional chambers The online event selection is performed by a trigger [10], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a two-step software stage, which applies a full event reconstruction. Candidates are subsequently classified as TOS, if the event is triggered on the signal candidate, or TIS, if triggered by other activities in the detector, independently of signal. Only candidates that are classified as TOS at each trigger stage are used to search for K 0 S → µ + µ − decays. The trigger selection constitutes the main limitation to the efficiency for detecting K 0 S decays. A muon is only selected at the hardware stage when it is detected in all muon stations, implying a momentum larger than about 5 GeV/c, and a p T above 1.76 GeV/c. These thresholds have an efficiency of order 1% for K 0 S → µ + µ − decays. In the first step of the software trigger, all charged particles with p T > 500 MeV/c are reconstructed. At this stage most signal decays are triggered either by requiring a reconstructed track loosely identified as a muon [10,11], with IP > 0.1 mm and p T > 1.0 GeV/c, or by finding two oppositely charged muon candidates forming a detached secondary vertex (SV). Since these two categories, hereafter referred to as TOS µ and TOS µµ , induce different kinematic biases on the signal and background candidates, the analysis steps described below are performed independently on each category. The two categories are made mutually exclusive by applying the TOS µµ selection only to candidates not already selected by TOS µ .
In the second software trigger stage, an offline-quality event reconstruction is performed. Signal candidates are selected requiring a dimuon with p T > 600 MeV/c detached from the primary vertex, with both tracks having p T > 300 MeV/c. In the 2011 data taking, the dimuon mass was required to be larger than 1 GeV/c 2 in the second software trigger stage. This excluded the K 0 S region, making the use of TIS candidates necessary. Due to the trigger reoptimisation, no mass requirements were applied during 2012 and a lower p T threshold for reconstructed tracks was used. According to simulation, these changes improve the trigger efficiency over the previous analysis [6] by about a factor 2.5.
Due to its large and well-known branching fraction and its similar topology, the K 0 S → π + π − decay is taken as the normalisation mode. A large sample of candidates is obtained from an unbiased trigger, which does not apply any selection requirement.
Despite the low trigger efficiency, the study detailed in this paper profits from the unprecedented number of K 0 S produced at the LHC, O(10 13 ) per fb −1 of integrated luminosity within the LHCb acceptance, and from the fact that about 40% of these K 0 S decays occur inside the VELO region. For such decays, the K 0 S invariant mass is reconstructed with a resolution of about 4 MeV/c 2 .
The analysis makes use of large samples of simulated collisions containing a signal decay, or background decays which can be reconstructed as the signal, and contaminate the µµ invariant mass distribution, such as K 0 S → π + π − or K 0 S → π + µ −ν µ . 1 In the simulation, pp collisions are generated using Pythia [12] with a specific LHCb configuration [13]. Decays of hadronic particles are described by EvtGen [14], in which final-state radiation is generated using Photos [15]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [16] as described in Ref. [17].

Selection and search strategy
Common offline preselection criteria are applied to K 0 S → µ + µ − and K 0 S → π + π − candidates to cancel many systematic effects in the ratio. Candidates are required to decay in the VELO region, where the best K 0 S mass resolution is achieved. The two reconstructed tracks must have momentum smaller than 100 GeV/c and quality requirements are set on the track and secondary vertex fits. The SV must be well detached from the PV by requiring the K 0 S decay time to be larger than 8.95 ps, 10% of the K 0 S mean lifetime. The K 0 S IP must be less than 0.4 mm, while the two charged tracks are required to be incompatible with originating from any PV.
Decays of Λ baryons to pπ − are suppressed by removing candidates close to the expected ellipses in the Armenteros-Podolanski (AP) plane [18]. In this plane the p T of the final-state particles under the pion mass hypothesis is plotted versus the longitudinal momentum asymmetry, defined as , where p ± L is the longitudinal momentum of the charged tracks. Both p T and p L are considered with respect to the direction of the mother particle. The K 0 S decays are symmetrically distributed on the AP plane while Λ decays produce two ellipses at low p T and |α| ∼ 0.7. A kaon veto, based on the response of the RICH detector, is used to suppress K * 0 → K + π − decays and other possible final states including a charged kaon.
The preselection reduces the combinatorial background, arising from candidates formed from secondary hadronic collisions in the detector material or from spurious reconstructed SV. The purity of the K 0 S → π + π − sample used for normalisation, whose mass distribution is shown in Fig. 1, is estimated from a fit to the mass spectrum to be 99.8%. The fraction of events with more than one candidate is less than 0.1% for signal and 4% for the normalisation channel, and all candidates are retained. Additional discrimination against backgrounds for the signal mode is achieved through the use of two multivariate discriminants. The first is designed to further suppress combinatorial background, and the second to reduce the number of K 0 S → π + π − decays in which both pions are misidentified as muons.
After requirements on the output of these discriminants have been applied, the number of signal candidates is obtained by fitting the K 0 S → µ + µ − mass spectrum. The mass sidebands provide a data-driven estimation of the residual background by extrapolation into the signal region. The number of candidates is converted into a branching fraction using the yield of the K 0 S → π + π − normalisation mode, and the estimated relative efficiency. Events in the K 0 S mass region are scrutinised only after fixing the analysis strategy.

Backgrounds
The K 0 S → µ + µ − sample contains two main sources of background. Combinatorial background candidates are expected to exhibit a smooth mass distribution, and can therefore be estimated from the sidebands. The other relevant source of background is due to K 0 S → π + π − decays where both pions pass the loose muon identification requirements after the trigger stage. This can be due either to π + → µ + ν µ decays or to random association of muon detector hits with the pion trajectory. In such cases the K 0 S mass, reconstructed with a wrong mass hypothesis for the final-state particles, is underestimated by 39 MeV/c 2 on average, as shown in Fig. 1. Despite the excellent mass resolution, the right-hand tail of the reconstructed mass distribution under the dimuon hypothesis extends into the K 0 S signal mass range and, given the large branching fraction of the K 0 S → π + π − mode, constitutes a nonnegligible background. Two multivariate discriminants, based on a Boosted Decision Tree (BDT) algorithm [19,20], are applied on the preselected candidates to improve the signal discrimination with respect to these backgrounds.
The first discriminant, named hereafter BDT cb , aims to reduce the combinatorial background, exploiting the different decay topologies, kinematic spectra and reconstruction qualities of signal and combinatorial candidates. It is optimised separately for each trigger category. A set of ten input variables is used in BDT cb : the K 0 S p T and IP, the minimum IP of the two charged tracks, the angle between the positively charged final-state particle and the K 0 S flight direction in the K 0 S rest frame, the χ 2 of the SV fit, the distance of closest approach between the two tracks, an SV isolation variable, defined as the difference in vertex-fit χ 2 when the next nearest track is included in the vertex fit, and the SV absolute position coordinates. The SV position is particularly important, since a large fraction of the background is found to originate from interactions in the detector material. This set of variables does not distinguish between K 0 S → µ + µ − and K 0 S → π + π − decays as it does not contain quantities related to muon identification and ignores the K 0 S candidate invariant mass distribution.
The signal training sample for BDT cb is composed of K 0 S → µ + µ − simulated candidates passing the trigger and preselection criteria. A signal training sample consisting of K 0 S → π + π − decays in data is also used as a cross-check, as explained in Sect. 6. The background training sample is made from K 0 S → µ + µ − data candidates surviving the trigger and preselection requirements with reconstructed mass in the range [520, 600] MeV/c 2 . Since candidates in the same mass region are also used to estimate the residual background, the training is performed using a k-fold cross-validation technique [21] to avoid any possible effect of overtraining.
A loose requirement on the BDT cb output is applied to suppress the combinatorial background. The cut is chosen to remove 99% of the background training candidates. The corresponding signal efficiency is about 56% and 66% for the TOS µ and TOS µµ trigger categories, respectively. To exploit further the information provided by the discriminant, the candidates surviving this requirement are allocated to ten bins according to their BDT cb value, with bounds defined in order to have approximately equal population of signal training candidates in each bin.
The background from misidentified K 0 S → π + π − decays is further reduced with the second multivariate discriminant, called BDT µ . Its input includes the position, time and number of detector hits around the extrapolated track position to each muon detector station, a global match χ 2 between the muon hit positions and the track extrapolation, and other variables related to the tracking and the response of the RICH and calorimeter detectors.
To train the BDT µ discriminant, highly pure samples of 1.2 million pions and 0.68 million muons are obtained from TIS-triggered K 0 S → π + π − and B + → J/ψK + decays, respectively. In the latter case, a probe muon from the J/ψ is required to be TIS at all trigger stages, while stringent muon identification requirements are set on the other muon, reaching an estimated purity for muons above 99.9%. Before using it in the BDT µ training, the muon sample is weighted to have the same two-dimensional distribution in p and p T as the pion sample, as well as the same distribution of number of tracks in the event. This is to prevent the BDT µ from discriminating pions and muons using these variables, which are included in the input because of their strong correlation with the identification variables. Weighting also allows optimisation of the discrimination power for the kinematic spectrum relevant to this search.
The level of misidentification of the discriminant for a pion from K 0 S → π + π − decay is found to be 0.4% for 90% muon efficiency. This reduces the level of double misidentification background, for a given efficiency, by about a factor of four with respect to the discriminant used in the previous publication [6], which was not tuned specifically for K 0 S → µ + µ − searches.
The BDT µ discriminant is trained using half of the B + → J/ψK + sample, while the other half is used to evaluate the muon identification efficiency as a function of (p, p T ). These values are used to compute the efficiency of a BDT µ requirement on the candidate K 0 S → µ + µ − decays after selection and trigger requirements, in each bin of the BDT cb discriminant. The muon spectra assumed in this calculation are obtained from simulated decays, weighted to better reproduce the K 0 S p T spectrum observed in The BDT µ requirement on the signal candidates is optimised by maximising the figure of merit [22] µID /( N bg + a/2), with a = 3, where µID is the signal efficiency and N bg the expected background yield. The latter is estimated from a fit to the mass distribution, after removing candidates in the range [492, 504] MeV/c 2 around the K 0 S mass, and extrapolating the result into the signal region. This optimisation is performed independently for the two trigger categories, with no significant difference found as a function of the BDT cb bin. The optimal threshold corresponds to a signal efficiency of µID ∼ 98% in both cases.
Other possible sources of background have been explored and found to give negligible contribution to this search. The irreducible background due to K 0 L → µ + µ − decays and from K 0 S -K 0 L interference is evaluated from the known K 0 L → µ + µ − branching fraction and lifetime, and by studying the decay-time dependence of the selection efficiency for K 0 S → π + π − decays in data. The yield from this background becomes comparable to the signal for a branching fraction lower than 2 × 10 −11 , which is well below the sensitivity of this search.
Semileptonic K 0 → π + µ − ν µ decays with pion misidentification provide another possible source of background. Simulated events, where the pion is forced to decay to µν within the detector, are used to determine the efficiency of the offline selection requirements. No event survives the trigger selection. Under the very conservative hypothesis that the trigger efficiency is the same as in K 0 S → µ + µ − decays, the expected yields from both K 0 L and K 0 S semileptonic decays are negligible. Decays including a dimuon from resonances, like ω → π 0 µ + µ − and η → µ + µ − γ, do not produce peaking structures in the mass distribution, and are accounted for in the combinatorial background.

Search sensitivity
The observed number of K 0 S → µ + µ − candidates is converted into a branching fraction using the normalisation mode and its precisely known branching fraction B(K 0 S → π + π − ) = 0.6920 ± 0.0005 [4]. The computation is made in every BDT cb bin i and trigger category j as follows where N µµ ij and N ππ denote the background-subtracted yields for the signal and normalisation modes, respectively. The total selection efficiencies can be factorised as The first factor refers to the offline selection requirements, which are applied identically to both modes and cancel to first order in the ratio; the residual difference is mainly due to the different interaction cross-sections for pions and muons with the detector material, and is estimated from simulation. The second factor is the ratio of trigger efficiencies; the efficiency for the signal is determined from simulation, with its systematic uncertainty estimated from data-driven checks, while that for the normalisation mode is the efficiency of the random trigger used to select K 0 S → π + π − , (9.38 ± 1.01) × 10 −8 . The third factor reflects the fraction of candidates in each BDT cb bin, and is also determined from simulation. Finally, the efficiency of the BDT µ requirement is obtained from the B + → J/ψK + calibration sample described in Sect. 4, for each BDT cb bin and trigger category.
To account for the difference between the kaon p T spectra observed in the K 0 S → π + π − decays in data and simulation, all efficiencies obtained from simulation are computed in six roughly equally populated p T bins. A weighted average of the efficiencies is then performed, where the weights are determined from the yields in each bin observed in data for K 0 S → π + π − candidates. The resulting values for the single candidate sensitivity α ij are reported in Table 1. The quoted uncertainties are statistical only. They are separated between the uncertainty on µµ BDT;ij , due to the limited statistics of simulated data and uncorrelated among BDT cb bins, and all the other statistical uncertainties, which are conservatively considered as fully correlated among bins within the same trigger category. Table 1 also presents the number of candidates after the inspection of the signal region. The separation between signal and background is presented in Sect. 7.

Systematic uncertainties
Several systematic effects, summarised in Table 2, contribute to the uncertainty on the normalisation factors. Tracking efficiencies are not perfectly reproduced in simulated events. Corrections based on a J/ψ → µ + µ − data control sample are determined as a function of the muon p and η. The average effect of these corrections on the ratio ππ sel / µµ sel and its standard deviation, added in quadrature, leads to a systematic uncertainty of 0.4%. The distributions of all variables relevant to the selection are compared in data and simulation for K 0 S → π + π − decays. The largest differences are found in the kaon p T and its decay vertex radial position. The effect on ππ sel / µµ sel of applying a two-dimensional weight to account for these discrepancies is taken as a systematic uncertainty, and amounts to a relative 1.9% and 1.8% for the TOS µ and TOS µµ trigger categories, respectively.
The difference between data and simulation in the kaon p T spectrum could also affect the other factors in the computation of α ij . An additional uncertainty is assigned by repeating the whole calculation with a finer binning in p T . Due to the limited size of the data samples, this is possible only in the TOS µ category. The average relative change in α ij , 4.3%, is assigned as an uncertainty for the TOS µµ category.
A specific cross-check is performed to validate the efficiencies predicted by the simulation for the BDT cb requirements. An alternative discriminant is made using a signal training sample consisting of trigger-unbiased K 0 S → π + π − decays, selected with additional kinematic criteria which mimic the effect of the muon trigger selections. The distributions of this alternative discriminant in data and simulation are found to agree within the statistical uncertainty, and no systematic uncertainty is assigned.
The uncertainty due to the simulation of TOS selections in the first two trigger stages is assessed by comparing the trigger efficiency in simulation and data, using a control sample of B + → J/ψ K + decays. The resulting relative differences, 8.1% for TOS µ and 11.5% for TOS µµ , are assigned as systematic uncertainties. No uncertainty is considered for the selection in the last trigger stage, which is based on the same offline kinematic variables used in the selection, for which a systematic uncertainty is already assigned.
The uncertainty on µID;ij is estimated from half the difference between the values obtained with and without the weighting of the B + → J/ψ K + sample used in the determination of the muon identification efficiency. This results in an uncertainty of 0.2% and 0.3% for the TOS µ and TOS µµ categories, respectively, which is comparable to the statistical uncertainties on these efficiencies due to the limited size of the B + → J/ψ K + samples.
Systematic uncertainties on the signal yields N µµ ij are related to the assumed models for the reconstructed K 0 S mass distribution, determined from simulation. Possible discrepancies from the shape in data are estimated by comparing the shape of the invariant mass distribution in data and simulation for K 0 S → π + π − decays, leading to a relative 0.8% systematic uncertainty on the signal yield. The final fit for the determination of the branching fraction is performed with two different background models, as discussed in Sect. 7. This leads to a relative variation on the branching fraction of 0.9%, which is assigned as a systematic uncertainty.

Results
The µ + µ − mass distribution of the signal candidates is fitted in the range [470, 600] MeV/c 2 to determine the signal and background yield in each trigger category and BDT cb bin. The model chosen for the signal is a Hypatia distribution [23], the parameters of which are determined from simulation and fixed in the fit to data. In the background model, a power law function describes the tail of the double-misidentification background from K 0 S → π + π − decays, affecting the mass region below the K 0 S mass, while the combinatorial background mass distribution is described by an exponential function. The background model is validated on simulation, and its parameters are left free in the fit to data to account for possible discrepancies. An alternative combinatorial background shape, based on a linear function, is used instead of the exponential function to determine a systematic uncertainty due to the choice of the background shape. The normalisation channel candidates within the mass region [460, 530] MeV/c 2 are counted, leading to N (K 0 S → π + π − ) = 70 318 ± 265. The µ + µ − invariant mass distributions for the two highest BDT cb bins, which exhibit the best signal-to-background ratio and therefore the best sensitivity for a discovery, are shown in Fig. 2.
A simultaneous maximum likelihood fit to the dimuon mass in all BDT cb bins is performed, using the values of α ij given in Table 1, to determine the branching fraction. The quoted systematic uncertainties are included in the likelihood computation as nuisance parameters with Gaussian uncertainties. A posterior probability is obtained by multiplying the likelihood by a prior density computed from the result based on the 2011 data sample. Limits are obtained by integrating 90% (95%) of the area of the posterior probability distribution provided by the fit, as shown in Fig. 3. Due to the much larger sensitivity achieved with the 2012 data, the inclusion of the 2011 data result does not have a significant effect on the final limit, and a uniform prior would have provided very similar results.
In conclusion, a search for the K 0 S → µ + µ − decay based on a data sample corresponding to an integrated luminosity of 3 fb −1 of proton-proton collisions, collected by the LHCb experiment at centre-of-mass energies √ s = 7 and 8 TeV, allows upper limits to be set on the branching fraction B(K 0 S → µ + µ − ) < 0.8 (1.0) × 10 −9 at 90% (95%) CL.
This result supersedes the previous upper limit published by LHCb [6], and represents a factor 11 improvement.  Figure 2: Fits to the reconstructed kaon mass distributions, for the two most sensitive BDT cb bins in the two trigger categories, TOS µ and TOS µµ . The fitted model is shown as the solid blue line, while the combinatorial background and K 0 S → π + π − double misidentification are overlaid with dotted red and dashed green lines, respectively. For each fit, the pulls are shown on the lower smaller plots.