Search for the lepton-flavour violating decays $B^0 \to K^{*0} \mu^\pm e^\mp$ and $B_s^0 \to \phi \mu^\pm e^\mp$

A search for the lepton-flavour violating decays $B^0 \to K^{*0} \mu^\pm e^\mp$ and $B_s^0 \to \phi \mu^\pm e^\mp$ is presented, using proton-proton collision data collected by the LHCb detector at the LHC, corresponding to an integrated luminosity of $9\,\text{fb}^{-1}$. No significant signals are observed and upper limits of \begin{align} {\cal B}( B^0 \to K^{*0} \mu^+ e^- )&<\phantom{1}5.7\times 10^{-9}~(6.9\times 10^{-9}),\newline {\cal B}( B^0 \to K^{*0} \mu^- e^+ )&<\phantom{1}6.8\times 10^{-9}~(7.9\times 10^{-9}),\newline {\cal B}( B^0 \to K^{*0} \mu^\pm e^\mp )&<10.1\times 10^{-9}~(11.7\times 10^{-9}),\newline {\cal B}( B_s^0 \to \phi \mu^\pm e^\mp )&<16.0\times 10^{-9}~(19.8\times 10^{-9}) \end{align} are set at $90\%~(95\%)$ confidence level. These results constitute the world's most stringent limits to date, with the limit on the decay $B_s^0 \to \phi \mu^\pm e^\mp$ the first being set. In addition, limits are reported for scalar and left-handed lepton-flavour violating New Physics scenarios.

1 Introduction three stations of silicon-strip detectors and straw drift tubes [34,35] placed downstream of the magnet. The tracking system provides a measurement of the momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary pp collision vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors [36]. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [37,38].
The online event selection is performed by a trigger system [39]. Signal candidates first need to pass the hardware trigger (L0), which requires the final-state muon to have sizeable p T . In the subsequent software trigger, a full event reconstruction is performed. The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from any primary pp interaction vertex. At least one charged particle must have a transverse momentum p T > 1.6 GeV/c (p T > 1.0 GeV/c if the particle is identified as muon) and be inconsistent with originating from a PV. A multivariate algorithm [40,41] is used for the identification of secondary vertices consistent with the decay of a b hadron.
Simulated samples are required to determine the reconstruction and selection efficiencies, and to estimate contributions from residual backgrounds. In the simulation, pp collisions are generated using Pythia [42] with a specific LHCb configuration [43]. Decays of unstable particles are described by EvtGen [44], in which final-state radiation is generated using Photos [45]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [46] as described in Ref. [47]. The underlying pp interaction is reused multiple times, with an independently generated signal decay for each [48]. Residual mismodelling of the particle identification and tracking performance, the p T spectra of B 0 and B 0 s mesons, track multiplicity, and the efficiency of the L0 trigger are calibrated using high-yield control samples from data.

Selection of signal candidates
Candidates for B 0 → K * 0 µ ± e ∓ and B 0 s → ϕµ ± e ∓ signal decays are reconstructed in the K + π − µ ± e ∓ and K + K − µ ± e ∓ final states, respectively. Stringent particle identification criteria are applied to the final-state hadrons and leptons, using information from the Cherenkov detectors, the muon chambers, and the calorimeter system. The final-state tracks are required to have significant χ 2 IP with respect to any PV in the event, where χ 2 IP is defined as the difference in the vertex-fit χ 2 of a given PV reconstructed with and without the track being considered. The four final-state tracks are fit to a secondary vertex (SV), which needs to have good fit quality and be significantly displaced from any PV in the event. The invariant mass of the K + π − (K + K − ) system is required to be within 100 MeV/c 2 (12 MeV/c 2 ) of the known K * 0 (ϕ) mass [49]. Furthermore, the reconstructed B 0 (s) mass of signal candidates is required to be in the range [4300, 6700] MeV/c 2 . Dedicated vetoes are applied to reject backgrounds originating from misidentified b-hadron decays, referred to as peaking backgrounds. The decays B 0 → J/ψ(→ ℓ + ℓ − )K * 0 and B 0 → ψ(2S)(→ ℓ + ℓ − )K * 0 (B 0 s → J/ψ(→ ℓ + ℓ − )ϕ and B 0 s → ψ(2S)(→ ℓ + ℓ − )ϕ) can be a source of background if one of the leptons is misidentified as the different lepton species. Candidates for which the reconstructed B 0 (B 0 s ) mass is in the range of [5200, 5350] MeV/c 2 ([5300, 5450] MeV/c 2 ) are rejected, where the reconstructed B 0 (B 0 s ) mass is determined in a fit forcing the dilepton system to have the known mass of the J/ψ or ψ(2S) meson [49]. The same tree-level decays to charmonia can also mimic the signal decay if a lepton is misidentified as a hadron and vice-versa. To suppress these contributions, criteria on the invariant mass of the system comprised of a hadron and the lepton of opposite charge are applied, where the lepton is assigned the hadron mass hypothesis. Candidates are rejected if the resulting invariant mass is in the range [800, 1050] MeV/c 2 ([1000, 1040] MeV/c 2 ) and therefore consistent with the decay of a K * 0 (ϕ) meson. Semileptonic b → c(→ sℓ ′+ ν ℓ ′ )ℓ −ν ℓ cascade decays can result in the same charged final-state tracks in the detector as signal decays. These decays are rejected by a stringent requirement on m(K + π − ℓ ± ) > 2 GeV/c 2 (m(K + K − ℓ ± ) > 2 GeV/c 2 ), where the invariant mass is calculated using the lepton momentum without the addition of potential bremsstrahlung photons. This removes (s) resonances can escape this veto due to their higher masses; they are therefore modelled in the fit as discussed in Sec. 5. Background from several further sources are studied and found to be suppressed to negligible levels by the stringent particle identification criteria; these include rare b → sℓ + ℓ − decays like B 0 → K * 0 ℓ + ℓ − , and B 0 s → ϕℓ + ℓ − (with lepton and potentially also hadron misidentification), as well as fully hadronic B 0 → K * 0 π + π − and B 0 s → ϕπ + π − decays (with misidentification of the pions as leptons).
Background from combinations of random tracks (combinatorial background) is reduced using a boosted decision tree (BDT) [50] classifier trained with the AdaBoost algorithm [51] as implemented in the TMVA software package [52]. The BDT classifier is trained separately for the two signal decays using calibrated simulation as signal. Data from the B 0 (s) upper mass sideband region [5600, 6700] MeV/c 2 are used as a proxy for the background. The classifier is trained using a k-folding approach and its performance is verified using standard cross-validation techniques [53]. The classifier uses the (transverse) momentum of the B 0 (s) candidate, its vertex fit quality and flight distance significance, the angle between the B 0 (s) momentum and the vector connecting the associated PV and the B 0 (s) decay vertex, and the χ 2 IP of the B 0 (s) candidate and the final-state particles. The selection criterion on the classifier output is chosen according to the Punzi figure of merit ε sig /(3/2 + √ N comb ) [54]. Here, ε sig denotes the signal efficiency and N comb the expected combinatorial background yield, which is extrapolated from the upper mass sideband using the reconstructed same-sign lepton samples K + π − µ ± e ± and K + K − µ ± e ± . Relative to the previously described selection criteria, the BDT requirement results in a signal efficiency of 55-80%, depending on the signal mode, and a rejection for combinatorial background of larger than 99%.
The normalisation modes, B 0 → J/ψ(→ µ + µ − )K * 0 and B 0 s → J/ψ(→ µ + µ − )ϕ, are reconstructed and selected in a way that is as similar as possible to the corresponding signal decays. In contrast to the signal modes, the muon identification criteria are applied to both final-state leptons. In addition, the invariant mass of the dimuon system is required to be within ±60 MeV/c 2 of the known J/ψ mass [49], and the dedicated vetoes against b-hadron decays to charmonia are removed. For the B 0 → J/ψK * 0 normalisation mode, an additional veto on the invariant mass of the K + µ + µ − system rejects candidates with m(K + µ + µ − ) ∈ [5200, 5400] MeV/c 2 . The decays B 0 → J/ψK * 0 and B 0 s → J/ψϕ, with both J/ψ → e + e − and J/ψ → µ + µ − , are used as control modes to validate simulation and to model the signal mass resolution as discussed in Sec. 5. The control decays B 0 → J/ψ(→ e + e − )K * 0 and B 0 s → J/ψ(e + e − )ϕ are selected in the m(e + e − ) mass region [2400, 3300] MeV/c 2 .
The efficiency ratio between normalisation and signal decays, ε norm /ε sig , is determined using calibrated simulation and found to be around 2.8 (2.6) for the decays B 0 → J/ψK * 0 and B 0 → K * 0 µ ± e ∓ (B 0 s → J/ψϕ and B 0 s → ϕµ ± e ∓ ). The efficiencies for the normalisation modes are significantly higher than for the signal decays as they exhibit higher trigger efficiencies, more efficient particle identification, and background vetoes with higher efficiencies.
The yields for the normalisation modes are determined using an unbinned extended maximum-likelihood fit to the reconstructed B 0 (s) mass distribution, in which the invariant mass of the dimuon system is constrained to the known J/ψ mass. The fit range is limited to [5150, 5900] MeV/c 2 in the B 0 mode ([5250, 5900] MeV/c 2 in the B 0 s mode) to avoid partially reconstructed b-hadron decays at low invariant masses. The signal distribution is modelled using a double-sided Hypatia function [56]. The shape parameters of the Hypatia function are determined using simulation, except for a resolution and mass shift parameter which are left to float in the fit to data to allow for mismodelling. In the fit of the decay B 0 → J/ψK * 0 , a component for the CKM-suppressed mode B 0 s → J/ψK * 0 is also included, which is modelled identically to the normalisation mode, but shifted by the known B 0 s − B 0 mass difference [49]. The B 0 s → J/ψK * 0 component is Gaussian constrained to its expectation using simulation. Similarly, in the B 0 s → J/ψϕ normalisation mode fit, a small component of the decay B 0 → J/ψK + K − must be accounted for, where the invariant mass of the non-resonant K + K − system overlaps with the ϕ selection. Again, the normalisation channel model is used to describe this component, including a mass shift by the known B 0 − B 0 s mass difference. The background yield is floated in the fit. Residual backgrounds from misidentified b-hadron decays to J/ψ final states are modelled using kernel density estimates obtained from calibrated simulation. For the decay B 0 → J/ψK * 0 , misidentified backgrounds from Λ 0 b → J/ψpK − , B 0 s → J/ψϕ, and B 0 → J/ψK * 0 (with K ↔ π misidentification), are included in the fit. For the decay B 0 s → J/ψϕ, misidentified backgrounds from Λ 0 b → J/ψpK − and B 0 → J/ψK * 0 decays are included. The yields of all backgrounds from misidentification are constrained to their expectations using 5200 5300 5400 5500 5600 5700 5800 5900 5300 5400 5500 5600 5700 5800 5900 Gaussian functions. The last component of the fit for the normalisation yields is the combinatorial background, which is modelled using an exponential function, with its slope and normalisation allowed to vary freely. The m(J/ψK + π − ) and m(J/ψK + K − ) distributions, overlaid with the fit results combined over the different data taking periods, are shown in Fig. 1. The obtained normalisation channel yields are given in Tab. 1. The resulting normalisation constants α for the different data taking periods are given in Tab. 2. Table 2: Normalisation constant α [10 −9 ] with associated statistical and systematic uncertainties, added in quadrature, for different periods of data taking. The total uncertainty is dominated by systematic effects, which are discussed in Sec. 6. The year-to-year B 0 /B 0 s ratio variation is due to different BDT criteria against combinatorial background, tuned individually for each data taking period and mode.

Signal fit
The signal decays B 0 → K * 0 µ ± e ∓ and B 0 s → ϕµ ± e ∓ are modelled using the sum of two Crystal Ball functions [57], with power-law tails on either sides. The shape parameters are determined on simulation and fixed in the fit to data. Corrections to the mass resolution are determined using the control decays B 0 → J/ψK * 0 and B 0 s → J/ψϕ. Information from both J/ψ → µ + µ − and J/ψ → e + e − final states is combined to determine these correction factors, as no appropriate control mode with a µ ± e ∓ final state exists. Depending on the data taking period, the correction factors scale the core Gaussian widths by 1.04-1.08 with uncertainties at the percent level. In the nominal fit, the uncertainties on the scale factors are included as Gaussian constraints.
Semileptonic cascade decays involving higher excited D − (s) resonances are modelled in the fit as they can pass the selection requirement m(K + π − ℓ ± ) > 2 GeV/c 2 (m(K + K − ℓ ± ) > 2 GeV/c 2 ) due to their high masses. Their shapes are modelled using kernel density estimates of fully simulated as these decays are one of the dominant contributions of the remaining background from semileptonic cascades. Alternative models that include potential contributions from D 1 (2420) and D * 0 (2300) states have also been considered. For the decay B 0 → K * 0 µ ± e ∓ , an additional background contribution arises from B + → D 0 (→ K + ℓ −ν ℓ )ℓ ′+ ν ℓ ′ decays, which are combined with a random π − from the event and thus can pass the veto against semileptonic cascade decays. This background is modelled using a kernel density estimate from simulated samples. Its yield is Gaussian constrained to the expectation from simulation. The combinatorial background is modelled with a single exponential function. Due to the low residual combinatorial background yield in the signal data samples, the exponential slope is constrained from a fit to same-sign lepton data samples with the reconstructed final states K + π − µ ± e ± and K + K − µ ± e ± , using a relaxed BDT cut. A systematic uncertainty is assigned to this choice, as discussed in Sec. 6. The combinatorial background yields are allowed to float independently in each data taking periods.
The signal branching fractions are determined in a simultaneous fit of the data taking periods 2011-2012, 2015-2016, and 2017-2018 using Eq. 1. In the fit, the signal branching fraction and the branching fraction of the semileptonic cascade decays involving higher excited D − (s) resonances, are shared between data taking periods. Figure 2 shows the reconstructed B 0 (s) mass distributions for B 0 → K * 0 µ ± e ∓ and B 0 s → ϕµ ± e ∓ candidates, combined over data taking periods, and overlaid with the fit results for the full and the background-only model. For illustration, the signal shape, scaled to a branching fraction of 5 × 10 −8 for the B 0 → K * 0 µ ± e ∓ decays and 1 × 10 −7 for B 0 s → ϕµ ± e ∓ , is drawn in addition. The fits for the same-sign (B 0 → K * 0 (→ K + π − )µ + e − ) and opposite-sign  s → ϕµ ± e ∓ candidates. The data are overlaid with the fit results. For illustration, the signal shape, scaled to a branching fraction of 5 × 10 −8 for the B 0 → K * 0 µ ± e ∓ decays and 1 × 10 −7 for B 0 s → ϕµ ± e ∓ , is drawn as red dashed line.
background. They are included in the fit through Gaussian constraints. For the signal mass resolutions, the correction factors for the core Gaussian resolution are allowed to vary within their uncertainties. For the exponential slope, the difference between the fit to same-sign lepton data with a reduced BDT cut and with the nominal BDT requirement is used as an estimate for the uncertainty on this parameter. This is added in quadrature to the uncertainty of the parameter from the fit with the reduced BDT requirement, and included as Gaussian constraint in the fit. A summary of the systematic uncertainties affecting the normalisation constant α is given in Tab. 3. The dominant source of systematic uncertainty originates from the uncertainty on the branching fraction of the normalisation channel. For the signal decay B 0 s → ϕµ ± e ∓ , a systematic uncertainty of similar size arises from the significant lifetime difference of the B 0 s mass eigenstates. The effective lifetime of the final state is a priori unknown and can affect the signal efficiency which depends on the B 0 s decay time. The difference between the maximum and minimum lifetimes, given by those of the light and the heavy mass eigenstate, is used to determine a conservative systematic uncertainty. Further systematic uncertainties arise from the limited size of the simulation samples and the limited precision of the calibration procedures applied to simulation. These include the weighting of the B production kinematics and event multiplicity, calibration of the particle

Results
No significant excess of B 0 → K * 0 µ ± e ∓ or B 0 s → ϕµ ± e ∓ decays is observed and limits are set using the CL s method [60]. A one-sided test statistic is used [61], as implemented in the GammaCombo framework [62,63]. The test statistic is evaluated using pseudoexperiments, which are generated using the best fit values for the nuisance parameters, and where the central values of the Gaussian constraints are varied according to their uncertainties.
The resulting CL s scans are shown in Fig. 3, and upper limits at 90% and 95% CL are reported in Tab. 4; the limits given for B(B 0 → K * 0 µ ± e ∓ ) are determined using the combined B 0 → K * 0 µ + e − and B 0 → K * 0 µ − e + data samples.
The default limits assume a uniform phase space model for the signal decays. However, NP models can result in very different decay kinematics and differential decay rates [64][65][66]. This is illustrated in Fig. 4 of App. A for two distinct NP scenarios: a scalar model with  Table 5: Exclusion limits [10 −9 ] on the B 0 (s) branching fractions for a scalar (C µe s ̸ = 0) and left-handed (C µe 9 = −C µe 10 ̸ = 0) NP model at 90% (95%) CL.

Appendices A New Physics models
New Physics scenarios can result in very different distributions in the three decay angles cos θ ℓ , cos θ K and ϕ, defined as in Ref. [68], and the four-momentum transfer q 2 = m 2 (µ ± e ∓ ) [64][65][66]. This is illustrated in Fig. 4.    s → ϕµ ± e ∓ signal decays depending on q 2 and the three decay angles cos θ ℓ , cos θ K , and ϕ. The selection efficiency drops in the q 2 region around the J/ψ and ψ(2S) masses as a result of the veto against misidentified backgrounds.