Measurement of the isospin asymmetry in $B \to K^{(*)}\mu^+\mu^-$ decays

The isospin asymmetries of $B \to K^{(*)}\mu^+\mu^-$ decays and the partial branching fractions of $B^0 \to K^0\mu^+\mu^-$ and $B^+ \to K^{*+}\mu^+\mu^-$ are measured as a function of the di-muon mass squared $q^2$ using an integrated luminosity of 1.0 fb$^{-1}$ collected with the LHCb detector. The $B \to K\mu^+\mu^-$ isospin asymmetry integrated over $q^2$ is negative, deviating from zero with over 4 $\sigma$ significance. The $B \to K^{*}\mu^+\mu^-$ decay measurements are consistent with the Standard Model prediction of negligible isospin asymmetry. The observation of the decay $B^0 \to K^0_{\rm\scriptscriptstyle S}\mu^+\mu^-$ is reported with 5.7 $\sigma$ significance. Assuming that the branching fraction of $B^0 \to K^0\mu^+\mu^-$ is twice that of $B^0 \to K^0_{\rm\scriptscriptstyle S}\mu^+\mu^-$, the branching fractions of $B^0 \to K^0\mu^+\mu^-$ and $B \to K^{*+}\mu^+\mu^-$ are found to be ($0.31^{+0.07}_{-0.06}) \times 10^{-6}$ and ($1.16\pm0.19) \times 10^{-6}$, respectively.


Introduction
The flavour-changing neutral current decays B → K ( * ) µ + µ − are forbidden at tree level in the Standard Model (SM). Such transitions must proceed via loop or box diagrams and are powerful probes of physics beyond the SM. Predictions for the branching fractions of these decays suffer from relatively large uncertainties due to form factor estimates. Theoretically clean observables can be constructed from ratios or asymmetries where the leading form factor uncertainties cancel. The CP averaged isospin asymmetry (A I ) is such an observable. It is defined as where Γ(B → f ) and B(B → f ) are the partial width and branching fraction of the B → f decay and τ 0 /τ + is the ratio of the lifetimes of the B 0 and B + mesons. 1 For B → K * µ + µ − , the SM prediction for A I is around −1% in the di-muon mass squared (q 2 ) region below the J/ψ resonance, apart from the very low q 2 region where it rises to O(10%) as q 2 approaches zero [1]. There is no precise prediction for A I in the B → Kµ + µ − case, but it is also expected to be close to zero. The small isospin asymmetry predicted in the SM is due to initial state radiation of the spectator quark, which is different between the neutral JHEP07(2012)133 and charged decays. Previously, A I has been measured to be significantly below zero in the q 2 region below the J/ψ resonance [2,3]. In particular, the combined B → Kµ + µ − and B → K * µ + µ − isospin asymmetries measured by the BaBar experiment were 3.9 σ below zero. For B → K * µ + µ − , A I is expected to be consistent with the B → K * 0 γ measurement of 5 ± 3% [4] as q 2 approaches zero. No such constraint is present for B → Kµ + µ − . The isospin asymmetries are determined by measuring the differential branching fractions of B + → K + µ + µ − , B 0 → K 0 S µ + µ − , B 0 → (K * 0 → K + π − )µ + µ − and B + → (K * + → K 0 S π + )µ + µ − ; the decays involving a K 0 L or π 0 are not considered. The K 0 S meson is reconstructed via the K 0 S → π + π − decay mode. The signal selections (section 3) are optimised to provide the lowest overall uncertainty on the isospin asymmetries; this leads to a very tight selection for the B + → K + µ + µ − and B 0 → (K * 0 → K + π − )µ + µ − channels where signal yield is sacrificed to achieve overall uniformity with the B 0 → K 0 S µ + µ − and B + → (K * + → K 0 S π + )µ + µ − channels, respectively. In order to convert a signal yield into a branching fraction, the four signal channels are normalised to the corresponding B → J/ψ K ( * ) channels (section 5). The relative normalisation in each q 2 bin is performed by calculating the relative efficiency between the signal and normalisation channels using simulated events. The normalisation of Finally, A I is determined by simultaneously fitting the K ( * ) µ + µ − mass distributions for all signal channels. Confidence intervals are estimated for A I using a profile likelihood method (section 7). Systematic uncertainties are included in the fit using Gaussian constraints.

Experimental setup
The measurements described in this paper are performed with 1.0 fb −1 of pp collision data collected with the LHCb detector at the CERN LHC during 2011. The LHCb detector [5] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high precision tracking system consisting of a silicon-strip vertex detector (VELO) surrounding the pp interaction region, a large-area silicon-strip detector (TT) located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift-tubes placed downstream. The combined tracking system has a momentum resolution ∆p/p that varies from 0.4% at 5 GeV/c to 0.6% at 100 GeV/c, and an impact parameter (IP) resolution of 20 µm for tracks with high transverse momentum. Charged hadrons are identified using two ring-imaging Cherenkov (RICH) detectors. Photon, electron and hadron candidates are identified by a calorimeter system consisting of scintillating-pad and pre-shower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a muon system composed of alternating layers of iron and multiwire proportional chambers.
The trigger consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage which applies a full event reconstruction. For this analysis, candidate events are first required to pass a hardware trigger which selects muons with a transverse momentum, p T > 1.48 GeV/c for one muon, and 0.56 and JHEP07(2012)133 0.48 GeV/c for two muons. In the subsequent software trigger [6], at least one of the final state particles is required to have both p T > 0.8 GeV/c and IP > 100 µm with respect to all of the primary proton-proton interaction vertices in the event. Finally, the tracks of two or more of the final state particles are required to form a vertex which is significantly displaced from the primary vertices in the event.
For the simulation, pp collisions are generated using Pythia 6.4 [7] with a specific LHCb configuration [8]. Decays of hadronic particles are described by EvtGen [9] in which final state radiation is generated using Photos [10]. The EvtGen physics model used is based on ref. [11]. The interaction of the generated particles with the detector and its response are implemented using the Geant4 toolkit [12,13] as described in ref. [14].

Event selection
Candidates are reconstructed with an initial cut-based selection, which is designed to reduce the dataset to a manageable level. Channels involving a K 0 S meson are referred to as K 0 S channels whereas those with a K + meson are referred to as K + channels. Only events which are triggered independently of the K + candidate are accepted. Therefore, apart from a small contribution from candidates which are triggered by the K 0 S meson, the K 0 S and the K + channels are triggered in a similar way. The initial selection places requirements on the geometry, kinematics and particle identification (PID) information of the signal candidates. Kaons are identified using information from the RICH detectors, such as the difference in log-likelihood (DLL) between the kaon and pion hypothesis, DLL Kπ . Kaon candidates are required to have DLL Kπ > 1, which has a kaon efficiency of ∼ 85% and a pion efficiency of ∼ 10%. Muons are identified using the amount of hits in the muon stations combined with information from the calorimeter and RICH systems. The muon PID efficiency is around 90%. Candidate K 0 S are required to have a di-pion mass within 30 MeV/c 2 of the nominal K 0 S mass and K * candidates are required to have an mass within 100 MeV/c 2 of the nominal K * mass. At this stage, the K 0 S channels are split into two categories depending on how the pions from the K 0 S decay are reconstructed. For decays where both pions have hits inside the VELO and the downstream tracking detectors the K 0 S candidates are classified as long (L). If the daughter pions are reconstructed without VELO hits (but still with TT hits upstream of the magnet) they are classified as downstream (D) K 0 S candidates. Separate selections are applied to the L and D categories in order to maximise the sensitivity. The selection criteria described in the next paragraph refer to the K 0 S channels. After the initial selection, the L category has a much lower level of background than the D category. For this reason simple cut-based selections are applied to the former, whereas multivariate selections are employed for the latter. Both B 0 and B + L selections require the K 0 S decay time to be greater than 3 ps, and for the IP χ 2 to be greater than 10 when the IP of the K 0 S , with respect to the PV, is forced to be zero. The B 0 → K 0 µ + µ − L selection requires that K 0 S p T > 1 GeV/c and B p T > 2 GeV/c. The K 0 S mass window is also tightened to ±20 MeV/c 2 . The B + → K * + µ + µ − L selection requires that the pion from the K * + has an IP χ 2 > 30. Multi-variate selections are applied to the D categories using a boosted decision tree (BDT) [15] which uses geometrical and kinematic information of JHEP07(2012)133 the B candidate and of its daughters. The most discriminating variables according to the B 0 and B + BDTs are the K 0 S p T and the angle between the B momentum and its line of flight (from the primary vertex to the decay vertex). The BDTs are trained and tested on simulated events for the signal and data for the background. The simulated events have been corrected to match the data as described in section 5. All the variables used in the BDTs are well described in the simulation after correction. The background sample used is 25% of B candidates which have |m where m B is obtained from fits to the appropriate B → J/ψ K ( * ) normalisation channel. These data are excluded from the analysis. The selection based on the BDT output maximises the metric S/ √ S + B, where S and B are the expected signal and background yields, respectively. The K + channels have, as far as possible, the same selection criteria as used to select the K 0 S channels. The cut-based selections applied to the L categories have the K 0 S specific variables (e.g. K 0 S decay time) removed and the remaining requirements are applied to the K + channels. The BDTs trained on the D categories contain variables which can be applied to both K 0 S and K + candidates and the BDTs trained on the K 0 S channels are simply applied to the corresponding K + channels. The K + channels are therefore also split into two different categories, one of which has the L selection applied, while the other one has the D selection applied. The overlap of events between these categories induces a correlation between the L and D categories for the K + channels. This correlation is accounted for in the fit to A I .
The final selection reduces the combinatorial background remaining after the initial selection by a factor of 5-20, while retaining 60-90% of the signal, depending on the category and decay mode. It is ineffective at reducing background from fully reconstructed B decays, where one or more final state particles have been misidentified. Additional selection criteria are therefore applied. For the K 0 S channels, the Λ → pπ − decay can be mistaken for a K 0 S → π + π − decay if the proton is misidentified as a pion. If one of the pion daughters from the K 0 S candidate has a DLL pπ > 10, the proton mass hypothesis is assigned to it. For the L(D) categories, if the pπ − mass is within 10(15) MeV/c 2 of the nominal Λ mass the candidate is rejected. This selection eliminates background from Λ 0 b → (Λ → pπ − )µ + µ − which peaks above the B mass. For the B 0 → K * 0 µ + µ − decay, the same peaking background vetoes are used as in ref. [16], which remove contaminations from B 0 s → φµ + µ − , B 0 → J/ψ K * 0 and B 0 → K * 0 µ + µ − decays where the kaon and pion are swapped. Finally, for the B + → K + µ + µ − decay, backgrounds from B + → J/ψ K + and B → ψ(2S)K + are present, where the K + and µ + candidates are swapped. If a candidate has a K + µ − track combination consistent with originating from a J/ψ or ψ(2S) resonance, the kaon is required to be inside the acceptance of the muon system but to have insufficient hits in the muon stations to be classified as a muon. These vetoes remove less than 1% of the signal and reduce peaking backgrounds to a negligible level.
The mass distribution of B candidates is shown versus the di-muon mass for B + → K + µ + µ − data in figure 1. The other signal channels have similar distributions, but with a smaller number of events. The excess of candidates seen as horizontal bands around 3090 MeV/c 2 and 3690 MeV/c 2 are due to J/ψ and ψ(2S) decays, respectively. These events are removed from the signal channels by excluding the di-muon regions in the In the subsequent analysis only candidates with masses above 5170 MeV/c 2 are included to avoid dependence on the shape of this background.

Signal yield determination
The yields for the signal channels are determined using extended unbinned maximum likelihood fits to the K ( * ) µ + µ − mass in the range 5170-5700 MeV/c 2 . These fits are performed in seven q 2 bins and over the full range as shown in table 1. The results of the fits integrated over the full q 2 range are shown in figure 2. After selection, the mass of K 0 S candidates is constrained to the nominal K 0 S mass. The signal component is described by the sum of two Crystal Ball functions [17] with common peak and tail parameters, but different widths. The shape is taken to be the same as the B → J/ψ K ( * ) normalisation channels. The combinatorial background is fitted with a single exponential function. As stated in JHEP07(2012)133  Table 1. Signal yields of the B → K ( * ) µ + µ − decays. The upper bound of the highest q 2 bin, q 2 max , is 19.3 GeV 2 /c 4 and 23.0 GeV 2 /c 4 for B → K * µ + µ − and B → Kµ + µ − , respectively. section 3, part of the combinatorial background is removed by the charmonium vetoes. This is accounted for by scaling the remaining background. For the B → Kµ + µ − decays, a component arising mainly from partially reconstructed B → K * µ + µ − decays is present at masses below the B mass. This partially reconstructed background is characterised using a threshold model detailed in ref. [18]. The shape of the partial reconstruction component is again assumed to be the same as for the normalisation channels. For the B + → K + µ + µ − channel, the impact of this component is negligible due to the relatively high signal and low background yields. For the B 0 → K 0 S µ + µ − channel, the amount of partially reconstructed decays is found to be less than 25% of the total combinatorial background in the fit range.
The signal-shape parameters are allowed to vary in the B 0 → J/ψ K 0 S mass fits and are subsequently fixed for the B 0 → K 0 S µ + µ − mass fits when calculating the significance. The significance σ of a signal S for B 0 → K 0 S µ + µ − is defined as σ 2 = 2lnL L (S) + 2lnL D (S) − 2lnL L (0) − 2lnL D (0) where L L,D (S) and L L,D (0) are the likelihoods of the fit with and without the signal component, respectively. The B 0 → K 0 S µ + µ − channel is observed with a significance of 5.7 σ.

Normalisation
In order to simplify the calculation of systematic uncertainties, each signal mode is normalised to the B → J/ψ K ( * ) channel, where the J/ψ decays into two muons. These decays have well measured branching fractions which are approximately two orders of magnitude higher than those of the signal decays. Each normalisation channel has similar kinematics and the same final state particles as the signal modes.
The relative efficiency between signal and normalisation channels is estimated using simulated events. After smearing the IP resolution of all tracks by 20%, the IP distributions of candidates in the simulation and data agree well. The performance of the PID is studied using the decay D * + → (D 0 → π + K − )π + , which provides a clean source of kaons to study  the kaon PID efficiency, and a tag-and-probe sample of B + → J/ψ K + to study the muon PID efficiency. The simulation is reweighted to match the PID performance of the data.

JHEP07(2012)133
Integrating over q 2 , the relative efficiency between the signal and normalisation channels is between 70 and 80% depending on the decay mode and category. The relative efficiency includes differences in the geometrical acceptance, as well as the reconstruction, selection and trigger efficiencies. Most of these effects cancel in the efficiency ratio between

LHCb simulation
Veto regions L D Figure 3. Efficiency of the K 0 S channels with respect to the K + channels for (left) B → Kµ + µ − and (right) B → K * µ + µ − , calculated using the simulation. The efficiencies are shown for both L and D K 0 S reconstruction categories and include the visible branching fraction of K 0 → K 0 S → π + π − . The error bars are not visible as they are smaller than the marker size.
K 0 S and K + channels, as shown in figure 3. The dominant effect remaining is due to the K 0 S reconstruction efficiency, which depends on the K 0 S momentum. At low q 2 , the efficiency for B 0 → K 0 S µ + µ − (D) decreases with respect to that for B + → K + µ + µ − due to the high K 0 S momentum in this region. This results in the K 0 S meson more often decaying beyond the TT and consequently it has a lower reconstruction efficiency. This effect is not seen in the B + → K * + µ + µ − D category as the K 0 S typically has lower momentum in this decay and so the K 0 S reconstruction efficiency is approximately constant across q 2 . This K 0 S reconstruction effect is also seen in the L category for both modes but is partially compensated by the fact that the K 0 S daughters can cause the event to be triggered, which increases the trigger efficiency with respect to the K + channels at low q 2 . Summed over both the L and D categories, the efficiency of the decays involving a K 0 meson is approximately 10% with respect to those involving a charged kaon. This is partly due to the visible branching fraction of K 0 → K 0 S → π + π − (∼30%) and partly due to the lower reconstruction efficiency of the K 0 S due to the long lifetime and the need to reconstruct an additional track (∼30%). The relative efficiency between the L and D signal categories is cross-checked by comparing the ratio for the B → ψ(2S)K ( * ) decay to the corresponding ratio for the B → J/ψ K ( * ) decays seen in data. The results agree within the statistical accuracy of 5%.

Systematic uncertainties
Gaussian constraints are used to include all systematic uncertainties in the fits for A I and the branching fractions. In most cases the dominant systematic uncertainty is that from the branching fraction measurements of the normalisation channels, ranging from 3 to 6%. There is also a statistical uncertainty on the yield of the normalisation channels, which is in the range 0.5-2.0%, depending on the channel.
The finite size of the simulation samples introduces a statistical uncertainty on the relative efficiency and leads to a systematic uncertainty in the range 0.8-2.5% depending on q 2 and decay mode.

JHEP07(2012)133
The relative tracking efficiency between the signal and normalisation channels is corrected using data. The statistical precision of these corrections leads to a systematic uncertainty of ∼ 0.2% per long track. The differences between the downstream tracking efficiency between the simulation and data are expected to mostly cancel in the normalisation procedure. A conservative systematic uncertainty of 1% per downstream track is assigned for the variation across q 2 .
The PID efficiency is derived from data, and its corresponding systematic uncertainty arises from the statistical error associated with the PID efficiency measurements. The uncertainty on the relative efficiency is determined by randomly varying PID efficiencies within their uncertainties, and recomputing the relative efficiency. The resulting uncertainty is found to be negligible.
The trigger efficiency is calculated using the simulation. Its uncertainty consists of two components, one associated with the trigger efficiency of the K 0 S meson, and one associated with the trigger efficiency of the muons (and pion from the K * ). For the muons and pion the uncertainty is obtained using B + → J/ψ K + and B 0 → J/ψ K * 0 events in data that are triggered independently of the signal. These candidates are used to calculate the trigger efficiency and are compared to the efficiency calculated using the same method in simulation. The difference is found to be ∼ 2% for both B + → J/ψ K + and B 0 → J/ψ K * 0 decays and is assigned as a systematic uncertainty. This uncertainty is assumed to cancel for the isospin asymmetry as the presence of muons is common between the K 0 S channels and the K + channels. The uncertainty associated with the K 0 S trigger efficiency is calculated by comparing the fraction of candidates triggered by K 0 S daughters in the simulation and the data. The difference is used as an estimate of the capability of simulation to reproduce these trigger decisions. The simulation is found to underestimate the K 0 S trigger decisions by 10-20% depending on the decay mode. This percentage is multiplied by the fraction of trigger decisions where the K 0 S participates in a given bin of q 2 leading to an uncertainty of 0.2-4.1% depending on q 2 and decay mode.
The effect of the unknown angular distribution of B + → K * + µ + µ − decays on the relative efficiency is estimated by altering the Wilson coefficients appearing in the operator product expansion method [19,20]. The Wilson coefficients, C 7 and C 10 , have their real part inverted and the relative efficiency is recalculated. This can be seen as an extreme variation which is used to obtain a conservative estimate of the associated uncertainty. The calculation was performed using an EvtGen physics model which uses the transition form factors detailed in ref. [21]. The difference in the relative efficiency varies from 0-6%, depending on q 2 , and it is assigned as a systematic uncertainty.
The shape parameters for the signal modes are assumed to be the same as the normalisation channels. This assumption is validated using the simulation and no systematic uncertainty is assigned. The statistical uncertainties of these shape parameters are propagated through the fit using Gaussian constraints, accounting for correlations between the parameters. The uncertainty on the amount of partially reconstructed background is also added to the fit using Gaussian constraints, therefore no further uncertainty is added. The parametrisation of the fit model is cross-checked by varying the fit range and background model. Consistent yields are observed and no systematic uncertainty is assigned.

JHEP07(2012)133
Overall the systematic error on the branching fraction is 4-8% depending on q 2 and the decay mode. This is small compared to the typical statistical error of ∼ 40%.

Results and conclusions
The differential branching fraction in the i th q 2 bin can be written as is the number of normalisation candidates, the product of B(B → J/ψ K ( * ) ) and B(J/ψ → µ + µ − ) is the visible branching fraction of the normalisation channel [22], i rel is the relative efficiency between the signal and normalisation channels in bin i and finally ∆ i is the bin i width. The differential branching fraction is determined by simultaneously fitting the L and D categories of the signal channels. The branching fraction of the signal channel is introduced as a fit parameter by re-arranging eq. (7.1) in terms of N (B → K ( * ) µ + µ − ). Confidence intervals are evaluated by scanning the profile likelihood. The results of these fits for B 0 → K 0 µ + µ − and B + → K * + µ + µ − decays are shown in figure 4 and given in tables 2 and 3. Theoretical predictions [23][24][25] are superimposed on figures 4 and 5. In the low q 2 region, these predictions rely on the QCD factorisation approaches from refs. [26,27] for B → K * µ + µ − and ref. [28] for B → Kµ + µ − which lose accuracy when approaching the J/ψ resonance. In the high q 2 region, an operator product expansion in the inverse b-quark mass, 1/m b , and in 1/ q 2 is used based on ref. [29]. This expansion is only valid above the open charm threshold. In both q 2 regions the form factor calculations for B → K * µ + µ − and B → Kµ + µ − are taken from refs. [30] and [31] respectively. These form factors lead to a high correlation in the uncertainty of the predictions across q 2 . A dimensional estimate is made of the uncertainty from expansion corrections [32]. The non-zero isospin asymmetry arises in the low q 2 region due to spectator-quark differences in the so-called hard-scattering part. There are also sub-leading corrections included from refs. [1] and [27] which only affect the charged modes and further contribute to the isospin asymmetry.
The total branching fractions are also measured by extrapolating underneath the charmonium resonances assuming the same q 2 distribution as in the simulation. The branching fractions of B 0 → K 0 µ + µ − and B + → K * + µ + µ − are found to be respectively, where the errors include statistical and systematic uncertainties. These results are in agreement with previous measurements and with better precision [22]. The isospin asymmetries as a function of q 2 for B → Kµ + µ − and B → K * µ + µ − are shown in figure 5 and given in tables 2 and 3. As for the branching fractions, the fit is done simultaneously for both the L and D categories where A I is a common parameter for the two cases. The confidence intervals are also determined by scanning the profile likelihood.   Figure 5. Isospin asymmetry of (left) B → Kµ + µ − and (right) B → K * µ + µ − . For B → K * µ + µ − the theoretical SM prediction, which is very close to zero, is shown for q 2 below 8.68 GeV/c 2 , from ref. [25].
The significance of the deviation from the null hypothesis is obtained by fixing A I to be zero and computing the difference in the negative log-likelihood from the nominal fit.
In summary, the isospin asymmetries of B → K ( * ) µ + µ − decays and the branching fractions of B 0 → K 0 µ + µ − and B + → K * + µ + µ − are measured, using 1.0 fb −1 of data taken with the LHCb detector. The two q 2 bins below 4.3 GeV/c 2 and the highest bin above 16 GeV/c 2 have the most negative isospin asymmetry in the B → Kµ + µ − channel. These q 2 regions are furthest from the charmonium regions and are therefore cleanly predicted theoretically. This asymmetry is dominated by a deficit in the observed B 0 → K 0 µ + µ − signal. Ignoring the small correlation of errors between each q 2 bin, the significance of the deviation from zero integrated across q 2 is calculated to be 4.4 σ. The B → K * µ + µ − case agrees with the SM prediction of almost zero isospin asymmetry [1]. All results agree with previous measurements [3,33,34] Table 3. Partial branching fractions of B + → K * + µ + µ − and isospin asymmetries of B → K * µ + µ − decays. The significance of the deviation of A I from zero is shown in the last column. The errors include the statistical and systematic uncertainties.
decays. We express our gratitude to our colleagues in the CERN accelerator departments for the excellent performance of the LHC. We thank the technical and administrative staff at CERN and at the LHCb institutes, and acknowledge support from the National Agen