Search for the rare decay KS ->mu+ mu-

A search for the decay KS ->mu+ mu- is performed, based on a data sample of 1.0 fb^-1 of pp collisions at \sqrt{s}=7 TeV collected by the LHCb experiment at the Large Hadron Collider. The observed number of candidates is consistent with the background-only hypothesis, yielding an upper limit of BR(KS ->mu+ mu-)<11 (9) x 10^-9 at 95 (90)% confidence level. This limit is a factor of thirty below the previous measurement.

Although the dimuon decay of the K 0 L meson is known to be B(K 0 L → µ + µ − ) = (6.84 ± 0.11) × 10 −9 [4], in agreement with the SM, effects of new particles can still be observed in K 0 S → µ + µ − decays. In the most general case, the decay width of K 0 L,S → µ + µ − can be written as [5] Γ where A is an S-wave amplitude and B a P-wave amplitude. These two amplitudes have opposite CP eigenvalues, and in absence of CP violation (K 0 S = K 0 1 , K 0 L = K 0 2 ), K 0 L decays would be generated only by A while K 0 S decays would be generated only by B. The decay width Γ(K 0 L → µ + µ − ) receives long-distance 1 contributions to A from intermediate two-photon states, as well as short distance contributions to the real part of A. In any model with the same basis of effective FCNC operators as the SM, the contributions from B can be neglected for B(K 0 L → µ + µ − ). The decay width of K 0 S → µ + µ − depends on the imaginary part of the short-distance contributions to A and on the longdistance contributions to B generated by intermediate two-photon states. Therefore, the measurement of B(K 0 L → µ + µ − ) in agreement with the SM does not necessarily imply that B(K 0 S → µ + µ − ) has to agree with the SM. Contributions up to one order of magnitude above the SM expectation are allowed [2]; enhancements of the branching fraction above 10 −10 are less likely. The study of K 0 S → µ + µ − has been suggested as a possible way to look for new light scalars [1].
In addition, bounds on the upper limit of B(K 0 S → µ + µ − ) close to 10 −11 could be very useful to discriminate among scenarios beyond the SM if other modes, such as K + → π + νν (charge conjugation is implied throughout this paper), were to indicate a non-standard enhancement of the s → d ¯ transition [2]. The KLOE collaboration has searched for the related decay K 0 S → e + e − , which is affected by a larger helicity suppression than the muonic mode, and set an upper limit on the branching fraction B(K 0 S → e + e − ) < 9 × 10 −9 at 90% confidence level [6].
The LHC produces ∼ 10 13 K 0 S per fb −1 inside the LHCb acceptance. In this paper, a search for K 0 S → µ + µ − is presented using 1.0 fb −1 of pp collisions at √ s = 7 TeV collected by LHCb in 2011. Dimuon candidates are classified in bins of a multivariate discriminant, and compared to background and signal expectations. The background present in the signal region is a combination of combinatorial background and K 0 S → π + π − decays in which both pions are misidentified as muons. The number of expected signal candidates for a given branching fraction hypothesis is obtained by normalising to the measured K 0 S → π + π − rate. The results obtained by the measurements in different bins are combined, and a limit is set using the CL s method [7,8]. The data in the signal region were only analysed once the full analysis strategy was defined, including the selection, the binning and the evaluation of systematic uncertainties.
The LHCb apparatus, and the aspects of the trigger relevant for this analysis are presented in Sect. 2. Section 3 is devoted to the full signal selection and to the definition of the multivariate method used as the main discriminant. In Sect. 4 the different backgrounds for K 0 S → µ + µ − decay are described, as well as the expected background in the signal region. The normalisation, required to convert the number of K 0 S → µ + µ − candidates to the branching fraction, is detailed in Sect. 5. The systematic uncertainties are described in Sect. 6. The limit setting procedure, together with the corresponding expected and observed limits, is presented in Sect. 7, and conclusions are drawn in Sect. 8.

Experimental setup
The LHCb detector [9] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high precision tracking system consisting of a silicon-strip vertex detector (VELO) surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream. The combined tracking system has a momentum resolution ∆p/p that varies from 0.4% at 5 GeV/c to 0.6% at 100 GeV/c, and an impact parameter (IP) resolution of 20 µm for tracks with high transverse momentum (p T ) with respect to the beam direction. Charged hadrons are identified using two ring-imaging Cherenkov detectors. Photon, electron and hadron candidates are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
The trigger consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage which applies a full event reconstruction. For this analysis, the events are first required to pass a hardware trigger which selects at least one muon with p T > 1.5 GeV/c. In the subsequent software trigger [10], at least one of the final state tracks is required to be of good quality and to have p T > 1.3 GeV/c, an IP > 0.5 mm and the χ 2 of the impact parameter (IP χ 2 ) above 200. The IP χ 2 is defined as the difference between the χ 2 of the proton-proton, pp, interaction point (primary vertex, PV) built with and without the considered track. A prescale factor of two is applied to the lines triggered by the K 0 S → µ + µ − candidates. The K 0 S → µ + µ − candidates responsible for the trigger of both the hardware and software levels are called TOS (trigger on signal). Events with a reconstructed K 0 S → µ + µ − candidate can also be triggered independently of the signal candidate if some other combination of particles in the underlying event passes the trigger. Such candidates are called TIS (trigger independently of signal). The TIS and TOS categories are not exclusive as muons from both the K 0 S → µ + µ − candidates and from the underlying event can pass the trigger. There is overlap between the two, which allows the determination of trigger efficiencies from the data [11]. Finally, minimum bias candidates triggered by a dedicated random trigger (MB) provide a negligible amount of K 0 S → µ + µ − candidates. Instead they allow the selection of a sample of K 0 S → π + π − useful to understand the distributions that the signal would have in the case of no trigger bias.
For the simulation, pp collisions are generated using Pythia 6.4 [12] with a specific LHCb configuration [13]. Decays of hadronic particles are described by EvtGen [14] in which final state radiation is generated using Photos [15]. The interaction of the generated particles with the detector and its response are implemented using the Geant4 toolkit [16] as described in Ref. [17].

Selection and multivariate classifier
The K 0 S → µ + µ − candidates are reconstructed requiring two tracks with opposite curvature with hits in the VELO and in the tracking stations. About 40% of the K 0 S mesons with the two daughter tracks inside the LHCb acceptance decay in the VELO detector. Those tracks are required to be of high quality (χ 2 < 5 per degree of freedom), to have an IP χ 2 greater than 100 and a distance of closest approach of less than 0.3 mm. The two tracks are required to be identified as muons [18]. The reconstructed K 0 S → µ + µ − candidates are required to have a proper decay time greater than 8.9 ps and to point to the PV (IP(K 0 S ) < 400 µm). The secondary vertex, SV, of the K 0 S → µ + µ − candidate is required to be downstream of the PV. If more than one PV is reconstructed, the PV associated to the K 0 S is the one that minimises its IP χ 2 . Furthermore, Λ → pπ − decays are vetoed via a requirement in the Armenteros-Podolanski plane [19], by including cuts on the transverse momentum of the daughter tracks with respect to the K 0 S flight direction and on their longitudinal momentum asymmetry. The reconstructed K 0 S → µ + µ − mass is required to be in the range [450,1500] MeV/c 2 .
The K 0 S → π + π − decay is used as a control channel and is reconstructed and selected in the same way as the signal candidates, with the exception of the particle identification requirements on the daughter tracks and the mass range, which is requested to be between 400 and 600 MeV/c 2 . Figure 1 shows the mass spectrum for selected K 0 S → π + π − candidates in the MB sample after applying the set of cuts described above and in the ππ and µµ mass hypotheses: the two mass peaks are separated by 40 MeV/c 2 . This separation, combined with the LHCb mass resolution of about 4 MeV/c 2 for such combinations of tracks, is used to discriminate  In order to further increase the background rejection, a boosted decision tree (BDT) [20] with the AdaBoost algorithm [21] is used. The variables entering in the BDT discriminant are: • the decay time of the K 0 S candidate, computed using the distance between the SV and the PV, and the reconstructed momentum of the K 0 S candidate; • the smallest muon IP χ 2 of the two daughter tracks with respect to any of the PVs reconstructed in the event; • the K 0 S IP χ 2 with respect to the PV; • the distance of closest approach between the two daughter tracks; • the secondary vertex χ 2 , which adds complementary information with respect to the distance of closest approach of the tracks, as it uses information on the uncertainty of the vertex fit; • the angle of the decay plane in the K 0 S rest frame with respect to the K 0 S flight direction, which is isotropic for signal decays, but not necessarily for background candidates; • variables used to discriminate against material interactions, as further detailed below. VELO. The position of the SV of the background candidates from the K 0 S mass sidebands in the x − z plane is shown in Fig. 2. The structures observed correspond to the position of the material inside the VELO detector. To discriminate against this background, two different approaches are used for the TIS and TOS trigger categories, consisting of two different choices of variables for the BDT.
For the TOS category, two additional variables are included in the BDT, the p T of the K 0 S and a boolean matter veto that uses the VELO geometry to assess whether a given decay vertex coincides with a point in the detector material or not. Muons from material interactions have a harder p T spectrum than muons from other background sources and hence are more likely to be selected by the trigger. The use of this variable in the BDT provides 50% less background yield for the same signal efficiency than simply applying the veto as a selection cut.
For the TIS category, the coordinates of the position of the SV in the laboratory frame are used to deal with this background. As the simultaneous use of the lifetime, p T of the K 0 S meson, and the SV position allows the BDT to effectively compute the mass of the candidate, a fake signal peak could be artificially created out of the combinatorial background. Hence the p T of the K 0 S meson is not used in the TIS analysis. This second approach provides a factor of two less background yield for the same signal efficiency than the matter veto (and K 0 S p T ) for the TIS analysis, while, on the contrary, the matter veto boolean variable gives a factor of four less background yield for the same signal efficiency than the SV coordinates for the TOS analysis.
Because of these different approaches and to take into account the biases on the variable distributions introduced by the trigger, the data sample is split in two subsamples according to the TIS and TOS categories, for which BDT discriminants are optimised separately. In the TOS analysis, the K 0 S → π + π − decays are required to have at least one of the daughters with a p T above 1.3 GeV/c in order to minimise the difference in the momentum distributions with respect to the triggered K 0 S → µ + µ − candidates. The candidates that are simultaneously TIS and TOS are analysed only as TIS candidates to avoid counting them twice. Only one per mille of the TOS candidates overlap with TIS candidates.
In addition, the BDT discriminants for both trigger categories are defined and trained on data using K 0 S → π + π − candidates as signal sample and K 0 S → µ + µ − candidates in the upper mass sideband as background sample. For the background sample, the region above 1100 MeV/c 2 (above the φ resonance) is used to define the BDT settings and the region between 504 and 1000 MeV/c 2 to train the BDT algorithm chosen. For the signal sample, the K 0 S → π + π − TIS events are used to train the BDT for the TIS category, while K 0 S → π + π − decays with both pions misidentified as muons and passing the same trigger requirements as the K 0 S → µ + µ − signal are used for the TOS category. In order to minimise the differences between misidentified K 0 S → π + π − events and K 0 S → µ + µ − decays, tight muon identification requirements (including cuts in the quality of the tracks or in the number of muon hits shared by different tracks) are applied to the K 0 S → π + π − sample. These tight requirements are chosen such that the efficiency of the trigger in the K 0 S → π + π − simulated decays is the same as in the K 0 S → µ + µ − simulated decays. In addition, the TOS and TIS categories are further split in two equal-sized subsamples, corresponding to the first and second halves of the data taking period. This procedure prevents possible biases related to the use of the same events in the mass sidebands both to train the BDT discriminant and to evaluate the background in the signal region, while making maximal use of the available data both for BDT training and background evaluation. Thus, in total, four different samples are defined (two subsamples for the TIS trigger category and two subsamples for the TOS trigger category) and combined as described in Sect. 7.
Candidates with low values of the BDT response are not considered because of the large amount of background in that region. This requirement provides about 50% signal efficiency and 99% background rejection, depending on the sample. The rest of the candidates are classified in ten bins of equal signal efficiency, i.e. a total of forty bins are combined to get the CL s limit.

Background
The search region is defined as the mass range [492, 504] MeV/c 2 . The background level is calibrated by interpolating the observed yield from mass sidebands ([470, 492] and [504, 600] MeV/c 2 ) to the signal region. This is done by means of an unbinned maximum likelihood fit in the sidebands, using a model with two components. The first component is a power law that describes the tail of K 0 S → π + π − decays where both pions are misidentified as muons; this model has been checked to be appropriate using MC simulation. The second component is an exponential function describing the combinatorial background. As an illustration, Fig. 3 shows the distribution of candidates for all BDT bins and for TIS and TOS samples, respectively. The expected total background yield in the most sensitive BDT bins of both samples ranges from 0 to 1 candidates. Other sources of background, such as K 0 are negligible for the current analysis. In the case of K 0 L → µ + µ − and K 0 L → µ + µ − γ, the contributions have been evaluated using the ratio of the K 0 S and K 0 L lifetimes and the proper time acceptance measured in data with the K 0 S → π + π − decays. The contributions of the other decay modes have been determined using MC simulated events.

Normalisation
A normalisation is required to translate the number of K 0 S → µ + µ − signal decays into a branching fraction measurement. Two normalisations are determined independently for TIS and TOS candidates. The B(K 0 S → µ + µ − ) is computed using where, in a given BDT bin, N K 0 S →µ + µ − is the observed number of signal decays, N K 0 S →π + π − the number of K 0 S → π + π − decays, and ππ / µµ the ratio of the corresponding efficiencies. The efficiencies are factorised as = SEL PID TRIG/SEL where: • SEL is the offline selection efficiency. It includes the geometrical acceptance, reconstruction and selection, i.e, it is the probability for a K 0 S → π + π − (K 0 S → µ + µ − ) decay generated in a pp collision, to have been reconstructed and selected; • PID is the efficiency of the muon identification for reconstructed and selected K 0 S → µ + µ − signal decays; • TRIG/SEL = N SEL&PID&TRIG /N SEL&PID , where TRIG denotes either the TIS or the TOS categories, is the trigger efficiency for decays that would be offline selected. Under this definition, trigger efficiencies can be determined from data using the procedure described in Ref. [11].
The ratio of reconstruction and selection efficiencies between K 0 S → µ + µ − and K 0 S → π + π − decays is evaluated in bins of p T and rapidity of the K 0 S meson using simulated events reweighted in order to reproduce the K 0 S p T and rapidity spectra measured in data [22]. The reconstruction and selection efficiency for K 0 S → π + π − decays is between 60% and 85% (depending on which point in the phase space a given event is from) of that of the K 0 S → µ + µ − decays due to difference in the material interactions of the pions compared to muons.
The factor PID is evaluated in bins of the BDT (both for the TOS and TIS categories) by measuring the muon identification efficiency as a function of p and p T using calibration muons. The sample of calibration muons is obtained from a J/ψ → µ + µ − sample in which positive muon identification is required for only one of the tracks. The p and p T spectra of the pions from K 0 S → π + π − decays in a MB sample is later used to get the efficiency for K 0 S → µ + µ − decays. The PID efficiency is between 68% and 82% (depending on the BDT bin and the sample). It is measured with a precision between 1% and 10%. For the ratio of trigger efficiencies, different strategies are considered for the TIS and TOS samples.
For the TIS samples, the K 0 S → µ + µ − yield is normalised to the K 0 S → π + π − TIS yield. In this case, the trigger efficiencies cancel in the ratio, because the probability to trigger on the underlying event is independent of the decay mode of the K 0 S meson. This cancellation is verified in simulation. The normalisation expression for TIS decays reads where N TIS K 0 S →µ + µ − and N TIS K 0 S →π + π − are the number of TIS decays in a given BDT bin for signal and K 0 S → π + π − modes respectively. N TIS K 0 S →π + π − is found to be around 9000 for every BDT bin.
For the TOS sample, the K 0 S → µ + µ − yield is normalised to the K 0 S → π + π − yield from MB triggers. The normalisation requires in this case an absolute determination of the TOS trigger efficiency for K 0 S → µ + µ − , TOS/SEL µµ , as well as the knowledge of the average prescale factor of the MB trigger, s MB . The absolute TOS trigger efficiency for the signal is computed using muons from B + → J/ψ(→ µ + µ − )K + decays. 2 The p and p T spectra of the B + → J/ψ(→ µ + µ − )K + muons are reweighted in order to match those of pions from the K 0 S → π + π − decays. Trigger unbiased p and p T spectra of the K 0 S → π + π − decays can be obtained from the MB sample. The TOS efficiency is found to be at the level of 20% for all BDT bins. The normalisation expression for TOS decays reads N MB K 0 S →π + π − being the number of K 0 S → π + π − decays from the MB trigger and N TOS denoting the number of signal decays from the TOS category. N MB K 0 S →π + π − is found to be around 1000 for every BDT bin.
The quantities and are called normalisation factors and are defined for each of the BDT bins. For a given number N of K 0 S → µ + µ − signal decays, the corresponding value of B(K 0 S → µ + µ − ) is then α × N . Using the value of B(K 0 S → π + π − ) from Ref. [4], the normalisation factors are in the range [6.6, 16.2] × 10 −8 for the TIS category, and [0.9, 7.8] × 10 −8 for the TOS category, depending on the BDT bin. From the normalisation factors, around 2 × 10 −4 (6 × 10 −5 ) SM candidates are expected per BDT bin for the TOS (TIS) analysis.

Systematic uncertainties
The quantities considered in the determination of the branching fraction that are affected by systematic uncertainties are listed below.
• The background expectations per bin, obtained by comparing the results with the model described in Sect. 4 to those computed: a) if the combinatorial background is modelled by a linear function; b) if the mass range over which the fit is performed is modified; c) repeating the fit excluding (together with the signal region) the 12 MeV/c 2 left and right windows neighbouring the search window and comparing the fit prediction to the yields in those regions; no correlation is considered among the different bins for this systematic uncertainty.
• The ratios of reconstruction and selection efficiencies and absolute muon identification efficiencies, for which systematic uncertainties are obtained from the difference between different methods in the data reweighting of the MC computed ratios and from the comparison to simulation respectively (around 20% for the ratios and 5% for muon identification efficiencies); no correlation is considered among the different bins.
• The absolute TOS efficiency, for which the systematic uncertainty is obtained from the comparison to simulation (around 15%, depending on the BDT bin); no correlation is considered among the different bins.
• The effective prescale factor of the MB sample, s MB = (2.70 ± 0.76) × 10 −6 . The uncertainty is evaluated from the difference between the prescale factor as measured in data and the value of the prescale as set in the trigger system. This systematic uncertainty affects coherently the signal expectations of the twenty bins of the TOS analysis.
The leading systematic uncertainties are those coming from the absolute TOS efficiency and s MB factor for the TOS analysis and from the ratio of reconstruction and selection efficiencies for the TIS analysis.

Results
The modified frequentist approach (or CL s method) [7,8] is used to assess the compatibility of the observation with expectations as a function of B(K 0 S → µ + µ − ). Test statistics are built from pseudo-experiments for the signal plus background and background-only hypotheses. For each pseudo-experiment a product of likelihood ratios is computed depending on the expected number of signal events for a given branching fraction, s i , the expected number of background events, b i and the observed number of events, d i for bin i. The CL s+b (CL b ) is defined as the probability for signal plus background (background only) generated pseudo-experiments to have a test-statistic value larger than or equal to that observed in the data. The CL s is defined as the ratio of confidence levels CL s+b CL b . This ratio is used to set the exclusion (upper) limit on the branching fraction, whereas 1−CL b is used as a p-value to claim evidence or observation. A 95(90)% confidence level exclusion corresponds to CL s = 0.05(0.1).
The values of b i are obtained from the fit of the mass sidebands, as detailed in Sect. 4. The values of s i depend on the assumed branching fraction, as well as on the normalisation factors computed in Sect. 5. The uncertainties on the input parameters are taken into account by fluctuating the signal and background expectations when generating the b and s + b ensembles. These fluctuations are performed via asymmetric Gaussian priors, following the formula where x i is the central value of the parameter, r is a random number generated from a normal distribution and s + and s − are the relative (signed) errors of x i [23]. Correlations are implemented by using the same value of r for the parameters that should fluctuate coherently. The observed distribution of events is compatible with background expectations, giving a p-value of 27%. In particular, in the last 4 bins of the BDT output, corresponding to the most significant region of the analysis, just one candidate is observed in each of the trigger categories, in agreement with the background expectations. Figure 4 shows the expected and observed CL s curves for the TIS category and for the TOS category as well as for the combined measurement. The upper limit found is 11 (9)×10 −9 at 95 (90)% confidence level and is a factor of thirty below the previous world best limit. Table 1 summarises the limits in the TIS, TOS categories, and the combined result.

Conclusions
A search for K 0 S → µ + µ − has been performed using 1.0 fb −1 of data collected at the LHCb experiment in 2011. This search profits from the 10 13 K 0 S produced inside the LHCb acceptance and the powerful discrimination against the K 0 S → π + π − decay in which both pions are misidentified as muons, achieved thanks to the LHCb mass resolution for two body decays of the K 0 S meson. The candidates observed are consistent with the expected background, with the p-value for the background only hypothesis being 27%. The measured upper limit B(K 0 S → µ + µ − ) < 11(9) × 10 −9 at 95(90)% confidence level is an improvement of a factor of thirty below the previous world best limit [3].