Search for the lepton-flavour violating decays B 0 ( s ) → e ± μ ∓ The LHCb collaboration

A search for the lepton-flavour violating decays B0 s → e±μ∓ and B0 → e±μ∓ is performed based on a sample of proton-proton collision data corresponding to an integrated luminosity of 3 fb−1, collected with the LHCb experiment at centre-of-mass energies of 7 and 8 TeV. The observed yields are consistent with the background-only hypothesis. Upper limits on the branching fraction of the B0 s → e±μ∓ decays are evaluated both in the hypotheses of an amplitude completely dominated by the heavy eigenstate and by the light eigenstate. The results are B(B0 s → e±μ∓) < 6.3 (5.4) × 10−9 and B(B0 s → e±μ∓) < 7.2 (6.0) × 10−9 at 95% (90%) confidence level, respectively. The upper limit on the branching fraction of the B0 → e±μ∓ decay is also evaluated, obtaining B(B0 → e±μ∓) < 1.3 (1.0) × 10−9 at 95% (90%) confidence level. These are the strongest limits on these decays to date.


Introduction
Processes that are suppressed or forbidden in the Standard Model (SM) are sensitive to potential contributions from new mediators, even if their masses are inaccessible to direct searches. Despite the fact that lepton-flavour violating (LFV) decays are forbidden within the SM, neutrino oscillation phenomena are proof that lepton flavour is not conserved in the neutral sector. However, LFV decays have not yet been observed, and their observation would be clear evidence of physics beyond the SM.
The study of LFV decays is particularly interesting in light of hints of lepton nonuniversality (LNU) effects in semileptonic decays [1] and b → s transitions [2,3], which could be associated with LFV processes [4]. Possible explanations of these hints can be found in various scenarios beyond the SM, e.g. models with a new gauge Z boson [5] or leptoquarks [6,7]. In these models, the branching fractions of the B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ decays 1 can be enhanced up to 10 −11 . Other models also predict possible enhancement for B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ decays, e.g. heavy singlet Dirac neutrinos [8], supersymmetric models [9] and the Pati-Salam model [10]. The most stringent published limits on the branching fractions of these decays are currently B(B 0 s → e ± µ ∓ ) < 1.4 × 10 −8 and B(B 0 → e ± µ ∓ ) < 3.7×10 −9 at 95% confidence level (CL) from the LHCb collaboration using data corresponding to 1 fb −1 of integrated luminosity [11].

JHEP03(2018)078
This article presents an analysis performed on a larger data sample, corresponding to an integrated luminosity of 3 fb −1 of pp collisions collected at centre-of-mass energies of 7 and 8 TeV by the LHCb experiment in 2011 and 2012. In addition to a larger data sample, this analysis benefits from an improved selection and in particular a better performing multivariate classifier for signal and background separation. It supersedes the previous LHCb search for B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ decays [11]. Two normalisation channels are used: the B 0 → K + π − decay which has a similar topology to that of the signal, and the B + → J/ψ K + decay, with J/ψ → µ + µ − , which has an abundant yield and a similar purity and trigger selection. To avoid potential biases, B 0 (s) → e ± µ ∓ candidates in the signal region, m e ± µ ∓ ∈ [5100, 5500] MeV/c 2 , where m e ± µ ∓ is the invariant mass of the e ± µ ∓ pair, were not examined until the selection and fitting procedure were finalised.

Detector and simulation
The LHCb detector [12,13] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
The online event selection is performed by a trigger, which consists of a hardware stage, based on information from the muon and calorimeter systems, followed by a software stage that applies a full reconstruction of the event. The B 0 (s) → e ± µ ∓ candidates must fulfill the requirements of the electron or muon triggers. At the hardware stage, the electron trigger requires the presence of a cluster in the electromagnetic calorimeter with a transverse energy deposit, E T , of at least 2.5 (3.0) GeV for 2011 (2012) data. The muon trigger selects muon candidates with p T higher than 1.5 (1.8) GeV/c for 2011 (2012) data. The software stage requires a two-track secondary vertex identified by a multivariate algorithm [14] to be consistent with the decay of a b hadron with at least two charged tracks, and at least one track with high p T and large IP with respect to any PV.
Simulated samples are used to evaluate geometrical, reconstruction and selection efficiencies for both signal and backgrounds, to train multivariate classifiers and to determine the shapes of invariant mass distributions of both signal and backgrounds. In the simula- tion, pp collisions are generated using Pythia [15] with a specific LHCb configuration [16]. Decays of hadronic particles are described by EvtGen [17], in which final-state radiation is generated using Photos [18]. The interaction of the generated particles with the detector, and its response, are simulated using the Geant4 toolkit [19,20] as described in ref. [21].

Selection
The B 0 (s) → e ± µ ∓ candidates in the events passing the trigger selection are constructed by combining pairs of tracks producing good quality secondary vertices that are separated from any PV in the downstream direction by a flight distance greater than 15 times its uncertainty. Only B 0 (s) candidates with p T > 0.5 GeV/c and a small impact parameter χ 2 , χ 2 IP , are considered, where the χ 2 IP of a B 0 (s) candidate is defined as the difference between the χ 2 of the PV reconstructed with and without the considered candidate. The PV with the smallest χ 2 IP is associated to the B 0 (s) candidate. The measured momentum of electron candidates is corrected for the loss of momentum due to bremsstrahlung. This correction is made by adding to the electron the momentum of photons consistent with being emitted from the electron before the magnet [22]. Since bremsstrahlung can affect the kinematic distribution of B 0 (s) → e ± µ ∓ candidates, the sample is split into two categories: candidates in which no photon is associated with the electron and candidates for which one or more photons are recovered. The fraction of electrons with recovered bremsstrahlung photons is about 60% for B 0 (s) → e ± µ ∓ decays. Only B 0 (s) → e ± µ ∓ candidates with m e ± µ ∓ ∈ [4900, 5850] MeV/c 2 are retained to be further analysed.
Particles forming the B 0 (s) → e ± µ ∓ candidates are required to be well identified as an electron and a muon [23], using information from the Cherenkov detectors, the calorimeters and the muon stations. These identification criteria are optimised to keep high signal efficiency while maximising the rejection power for the two-body hadronic B decays, B → h + h − , which are the major peaking backgrounds.
In order to reduce combinatorial background -combinations of two random tracks that can be associated to a common vertex -a loose requirement on the response of a multivariate classifier trained on simulated events is applied to the signal candidates. This classifier takes the following geometrical variables as input: the direction of the B 0 (s) meson candidate; its impact parameter with respect to the assigned PV, defined as the PV with which it forms the smallest χ 2 IP ; the separation between the two outgoing leptonic tracks at their point of closest approach; and the minimum IP of each lepton particle with respect to any PV. In total 22 020 B 0 (s) → e ± µ ∓ candidates are selected, which are mainly comprised of combinatorial background that is made up of true electrons and muons.
The normalisation channels are selected with requirements as similar as possible to those used for the signal. The selection for B 0 → K + π − candidates is the same as for the B 0 (s) → e ± µ ∓ channel, except for the particle identification criteria which are changed into hadronic particle identification requirements. Similarly, the B + → J/ψ K + candidate selection is also kept as similar as possible, applying the same selection used for the signal to the dimuon pair from the J/ψ , except for the particle identification requirements. Additionally, loose quality requirements are applied on the B + vertex and particle identification is required on both muons. Finally, a 60 MeV/c 2 mass window around the nominal J/ψ mass and the requirement 1.4 < 1 + p J/ψ /p K < 20.0 is used. The latter removes backgrounds that have a least one track that is misidentified and another that is not reconstructed, mainly B → J/ψ π + X, where X can be one or more particles.

BDT training and calibration
A Boosted Decision Tree (BDT) classifier is used to separate the B 0 (s) → e ± µ ∓ signal from the combinatorial background. The BDT is trained using a simulated sample of B 0 s → e ± µ ∓ events to describe the signal and a data sample of same-sign e ± µ ± candidates to describe the combinatorial background. The following input variables are used: the proper decay time of the B 0 (s) candidate; the minimum χ 2 IP of the two leptons with respect to the assigned PV; the IP of the B 0 (s) candidate with respect to its PV; the distance of closest approach between the two lepton tracks; the degree of isolation of the two tracks with respect to the other tracks in the same event [24]; the transverse momentum of the B 0 (s) candidate; the cosine of the angle between the muon momentum in the B 0 (s) candidate rest frame and the vector perpendicular to the B 0 (s) candidate momentum and the beam axis; the flight distance of the B 0 (s) candidate with respect to its PV; the χ 2 of the B 0 (s) candidate decay vertex; the maximum transverse momentum of the two decay products and their difference in pseudorapidity.
The BDT response is transformed such that it is uniformly distributed in the range [0,1] for the signal, while peaking at zero for the background. The linear correlation between the BDT response and the dilepton invariant mass is found to be around 4%.
Since the BDT is trained using only kinematic information of a two-body B 0 (s) decay, its response is calibrated using B 0 → K + π − decays as a proxy. To avoid biases, B 0 → K + π − candidates are selected from candidates where the trigger decision did not depend on the presence of the B 0 decay products. Furthermore, the candidates are weighted to emulate the effect of the lepton triggers and the particle identification requirements. The number of B 0 → K + π − candidates in bins of BDT response is determined by fitting the K + π − invariant mass distribution. As expected, the BDT response is found to be consistent with a uniform distribution across the range [0,1]. The distribution of the BDT response is also checked on a B 0 → K + π − simulated sample and a uniform distribution is obtained. Candidates with a value smaller than 0.25 are then excluded, as this region is highly contaminated by background, leaving a total of 476 signal candidates. The signal candidates are classified in a binned two-dimensional space formed by the BDT response and the two bremsstrahlung categories. The expected probability density function (PDF) of the BDT response for B 0 (s) → e ± µ ∓ decays with recovered bremsstrahlung photons is shown in figure 1. Unrecovered bremsstrahlung photons emitted by signal electrons can affect the BDT response and are not accounted for in the calibration procedure since hadrons do not emit significant bremsstrahlung. The impact of bremsstrahlung on the BDT response distribution is evaluated using simulation and a correction is applied where no bremsstrahlung is recovered. . Expected distribution of the BDT response for B 0 (s) → e ± µ ∓ decays with recovered bremsstrahlung photons obtained from the B 0 → K + π − control channel. The total uncertainty is shown as a light grey band. Each bin is normalised to its width.

Normalisation
The B 0 (s) → e ± µ ∓ yields are obtained from a fit to the lepton-pair invariant mass distribution and translated into branching fractions according to where the index i identifies the normalisation channel and N i norm and B i norm are its number of candidates and its branching fraction. The signal yields are denoted by N B 0 (s) →e ± µ ∓ and the factors f q indicate the probabilities that a b quark fragments into a B 0 or B 0 s meson. Assuming f d = f u , the fragmentation probability for the B 0 and B + channels is set to f d . The value of f s /f d used is measured in pp collision data at √ s = 7 TeV by the LHCb collaboration and is evaluated to be 0.259 ± 0.015 [25]. The two normalisation channels are averaged with weights w i proportional to the square of the inverse of the uncertainty related to their branching fractions and yields. A correction has also been applied for the marginal difference in luminosity, L, between the channels. The branching fractions of the signal decays include both charge configurations of the final-state particles, e + µ − and The results of the two fits are shown in figure 2 and the measured yields are reported in table 1.  Table 1. Yields of normalisation channels obtained from fits to data. The efficiency ε sig(norm) for the signal (normalisation) channels depends on several factors: the geometric acceptance of the detector, the probability for particles to produce hits in the detector which can be reconstructed as tracks, and the efficiency of the selection requirements that are applied both in the trigger and selection stages, which includes the particle identification requirements. The ratios of acceptance, reconstruction and selection efficiencies are evaluated using simulation with the exception of the trigger and particle identification efficiencies, which are not well reproduced by simulation, and are calibrated using data [26,27]. Calibration samples where the trigger decision was independent of the candidate decay products are used to study the trigger efficiency. From these samples, B + → J/ψ K + candidates, with J/ψ → e + e − and J/ψ → µ + µ − , are used to study the requirements for the electrons and muons, respectively. The efficiencies are determined as a function of the p T and IP for the muon and E T and IP for the electron. The single-track efficiencies are then combined with a weighted average over the properties of the electron and muon tracks of a B 0 s → e ± µ ∓ simulated sample. Particle identification efficiencies are evaluated using calibration samples where the identity of one of the particles can be inferred by means uncorrelated to particle identification requirements. A tag-and-probe method is applied on J/ψ → µ + µ − and J/ψ → e + e − decay samples, where only one lepton, the tag, is required to be well identified and the identity of the other lepton is deduced. The single-track efficiencies, calculated as a function of kinematic variables, are then combined and averaged using the momentum distributions of the leptons in a B 0 s → e ± µ ∓ simulated sample. The two normalisation factors α B 0 s and α B 0 are determined to be (2.48 ± 0.17) × 10 −10 and (6.16 ± 0.23) × 10 −11 . The total efficiencies for the B 0 → e ± µ ∓ , B 0 s → e ± µ ∓ , -6 -
To validate the normalisation procedure, the ratio between the measured branching fractions of B 0 → K + π − and B + → J/ψ K + is determined as where ε B + →J/ψ K + and ε B 0 →K + π − are the selection efficiencies for the B 0 → K + π − and B + → J/ψ K + decays respectively. A correction of about 1% is applied in order to take into account the difference in luminosity between the two channels. The value obtained for R norm is in excellent agreement with the measured value of 0.321 ± 0.013 [28].

Backgrounds
In addition to the combinatorial background, the signal region is also potentially polluted by backgrounds from exclusive decays where one or more of the final-state particles are misidentified or not reconstructed. The potentially most dangerous of these backgrounds are hadronic B → h + h − decays where both hadrons are misidentified as an electron-muon pair, resulting in peaking structures near the B 0 s → e ± µ ∓ signal mass. Other decays which could contribute, especially at low invariant masses, are where / ± = e ± or µ ± . These decays do not peak under the signal but are potentially abundant. The expected number of candidates from each possible background decay that pass the signal selection is evaluated using simulation. The candidates are normalised to the number of B + → J/ψ K + decays found in data as where N X is the expected number of candidates from the X decay that fall into the B 0 s → e ± µ ∓ signal mass window; f q is the fragmentation fraction; B(X), B(B + → J/ψ K + ) and B(J/ψ → µ + µ − ) are respectively the branching fractions of the decay under study, B + → J/ψ K + and J/ψ → µ + µ − [28]; ε(X) is the efficiency for each considered decay to pass the B 0 s → e ± µ ∓ selection; and ε(B + → J/ψ K + ) is the efficiency for B + → J/ψ K + candidates to pass the respective selection.
The mass and BDT distributions of these background modes are evaluated using simulated samples, while the probabilities of misidentifying kaons, pions and protons as muons or electrons are determined from D * + → D 0 π + with D 0 → K − π + and Λ → pπ − decays selected from data. The expected total number of B → h + h − candidates is 0.11 ± 0.02 in the full BDT range, which is negligible. This yield estimation is cross-checked using data. A sample of B → h + h − decays is selected by applying only a partial B 0 (s) → e ± µ ∓ selection: only the signal electron PID requirements are applied while the second particle is required to be identified as a pion. The application of these criteria still leaves a sizeable peak to be fit in data. The yield of decays identified as B 0 (s) → e ± π ∓ is then modified to take into account the probability of a pion to be misidentified as a muon. After this correction the expected yield is compatible with the yield obtained using the simulation.
The expected yields of most of the other backgrounds are also found to be negligible. The only backgrounds which are relevant are B 0 → π − µ + ν and Λ 0 b → p − ν for which 55 ± 3 and 82 ± 39 candidates, respectively, are expected in the full BDT range. The contributions from these two decays are included in the fit model.

Mass calibration
The invariant-mass distribution of B 0 (s) → e ± µ ∓ candidates is modelled by a modified Crystal Ball function [29] with two tails on opposite sides defined by two parameters each. The signal shape parameters are obtained from simulation, with data-driven scale factors applied to the core resolution to correct for possible data-simulation discrepancies. For this purpose, since there is no appropriate control channel with an electron and a muon in the final state, J/ψ → e + e − and J/ψ → µ + µ − decays are analysed comparing the mass resolution in data and simulation. The results are then combined to reproduce the effect on an e ± µ ∓ final state. Corrections to the widths of the mass are of the order of 10%. Since bremsstrahlung can significantly alter the mass shape by enhancing the tails, the fit model for B 0 (s) → e ± µ ∓ candidates is obtained separately for the two bremsstrahlung categories (see figure 3). The mass shape parameters are found to be independent of the particular BDT bin chosen and a single model for each bremsstrahlung category is therefore used.

Results
The data sample is split into two bremsstrahlung categories, which are further divided into seven subsets each depending on the BDT response covering the range from 0.  Table 2. Expected (assuming no signal) and observed upper limits for B(B 0 s → e ± µ ∓ ) and B(B 0 → e ± µ ∓ ) at 95% (90%) CL. The upper limit on the B(B 0 s → e ± µ ∓ ) is evaluated under the assumption of pure heavy eigenstate contribution on the decay amplitude. lower than 0.25, which is mostly populated by combinatorial background, is excluded from the fit. The B 0 → e ± µ ∓ and B 0 s → e ± µ ∓ yields are obtained from a single unbinned extended maximum likelihood fit performed simultaneously to the m e ± µ ∓ distributions in each subset. The B 0 (s) → e ± µ ∓ fractional yields and the mass shape parameters in each category are Gaussian-constrained according to their expected values and uncertainties. The combinatorial background is modelled with an exponential function with independent yield and shape parameters in each subset. The exclusive backgrounds are included as separate components in the fit. Their mass shapes are modelled using nonparametric functions determined from simulation for each bremsstrahlung category. The overall yields and fractions of these backgrounds are Gaussian-constrained to their expected values. The result of this fit is shown in figure 4.
s → e ± µ ∓ decays is observed and upper limits on the branching fractions are set using the CL s method [30]. The ratio between the likelihoods in two hypotheses, signal plus background and background only, is used as the test statistic. The likelihoods are computed with nuisance parameters fixed to their nominal values. Pseudoexperiments, in which the nuisance parameters are varied according to their statistical and systematic uncertainties, are used for the evaluation of the test statistic. The resulting CL s scans are shown in figure 5 and upper limits at 95% and 90% confidence level are reported in table 2.
Several systematic uncertainties can affect the evaluation of the limit on the B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ branching fractions through the normalisation formula in eq. (5.1) and the fit model used to evaluate the signal yields. The systematic uncertainties are taken into account for the limit computation by constraining the respective nuisance parameters in the likelihood fit with a Gaussian distribution having the central value of the parameter as the mean and its uncertainty as the width. The nuisance parameters for the B 0 (s) → e ± µ ∓ yields are related to the calibration of the BDT response, the parameters of the signal shape, the estimated yields of the B 0 → π − µ + ν and Λ 0 b → p − ν backgrounds and the fractional yield per bremsstrahlung category. For the limit on the B 0 (s) → e ± µ ∓ branching fractions, the nuisance parameters are in addition related to the signal efficiency, whose uncertainty is dominated by the systematic uncertainty on the trigger efficiencies, and the uncertainties on the efficiencies, branching fractions and yields of the normalisation channels. For the B 0 s → e ± µ ∓ branching fraction estimation, eq. (5.1) also includes the hadronisation fraction f s /f d , which dominates the systematic uncertainty for the normalisation. The overall impact on the limits is evaluated to be below 5%.  Figure 5. Results of the CL s scan used to obtain the limit on (left) B(B 0 → e ± µ ∓ ) and (right) B(B 0 s → e ± µ ∓ ). The background-only expectation is shown by the dashed line and the 1σ and 2σ bands are shown as dark (green) and light (yellow) bands respectively. The observed limit is shown as the solid black line.
The two B 0 s mass eigenstates are characterised by a large lifetime difference. Depending on their contribution to the decay amplitude, the selection efficiency and the BDT shape can be affected. Given the negligible difference in lifetime for the B 0 system, this effect is not taken into account for the B 0 → e ± µ ∓ limit evaluation. Two extreme cases can be distinguished: when only the heavy or the light eigenstate contributes to the total decay amplitude. For example, if the only contribution to the LFV B 0 s → e ± µ ∓ decay is due to neutrino oscillations, it is expected that the amplitude is dominated by the heavy eigenstate as for the B 0 s → µ + µ − decay [24]. As the contribution to the total amplitude from the heavy and light eigenstate can have an effect on the acceptance, the limit on B(B 0 s → e ± µ ∓ ) is evaluated in the two extreme cases. The one reported in table 2 and obtained from the CLs scan in figure 5, is evaluated assuming only a contribution from the heavy eigenstate. For the light eigenstate case the limit is found to be B(B 0 s → e ± µ ∓ ) < 7.2 (6.0) × 10 −9 at 95% (90%) CL.

Summary
In summary, a search for the LFV decays B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ is performed using pp collision data collected at centre-of-mass energies of 7 and 8 TeV, corresponding to a total integrated luminosity of 3 fb −1 . No excesses are observed for these two modes and upper limits on the branching fractions are set to B(B 0 s → e ± µ ∓ ) < 6.3 (5.4) × 10 −9 and B(B 0 → e ± µ ∓ ) < 1.3 (1.0) × 10 −9 at 95% (90%) CL, where only a contribution from the heavy eigenstate is assumed for the B 0 s meson. If the B 0 s amplitude is completely dominated by the light eighenstate, the upper limit on the branching fraction becomes B(B 0 s → e ± µ ∓ ) < 7.2 (6.0) × 10 −9 at 95% (90%) CL. These results represent the best upper limits to date and are a factor 2 to 3 better than the previous results from LHCb [11].