Search for the lepton-flavour violating decays $B^0 \to K^{*0} \tau^\pm \mu^\mp$

A first search for the lepton-flavour violating decays $B^0 \to K^{*0} \tau^\pm \mu^\mp$ is presented. The analysis is performed using a sample of proton-proton collision data, collected with the LHCb detector at centre-of-mass energies of 7, 8 and 13 TeV between 2011 and 2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. No significant signal is observed, and upper limits on the branching fractions are determined to be ${\cal{B}}(B^0 \to K^{*0} \tau^+ \mu^-)<1.0$ $(1.2)\times 10^{-5} $ and ${\cal{B}}(B^0 \to K^{*0} \tau^- \mu^+)<8.2$ $(9.8) \times 10^{-6}$ at the 90% (95%) confidence level.

In this article, the first ever search for the charged lepton-flavour violating decay B 0 → K * 0 τ ± µ ∓ , not investigated by any prior experiment, is presented.The K * 0 meson is reconstructed through its decay into a K + and a π − .The final states B 0 → K * 0 τ − µ + with the charged kaon and tau having opposite charges and B 0 → K * 0 τ + µ − with the charged kaon and tau having the same charge (charged conjugate processes are implied throughout) are treated independently.From a theoretical point of view, these channels could be affected differently by model extensions beyond the SM [29], and from the experimental point of view they are affected by different background contributions.The tau lepton is reconstructed through the decays τ − → π − π + π − ν τ or τ − → π − π + π − π 0 ν τ , representing approximately 14% of all tau decays.
The analysis is performed on a data set of proton-proton (pp) collisions collected with the LHCb detector at centre-of-mass energies of 7 and 8 TeV in 2011-2012 (Run 1) and 13 TeV in 2015-2018 (Run 2), corresponding to an integrated luminosity of 9 fb −1 .The decay B 0 → D − D + s followed by D − → K + π − π − and D + s → K + K − π + , with a topology similar to the signal decay and a precisely known branching fraction, is used both as a normalisation channel and a control channel to test the reliability of the simulation and evaluate some systematic uncertainties.

LHCb detector, trigger and simulation
The LHCb detector [30,31] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks.The detector includes a high-precision charged-particle reconstruction (tracking) system consisting of a silicon-strip vertex detector surrounding the pp interaction region [32], a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes [33,34] placed downstream of the magnet.The tracking system provides a measurement of the momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c.The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c.Different types of charged hadrons are distinguished from one another using information from two ring-imaging Cherenkov (RICH) detectors [35].Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter.Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [36].
The online event selection is performed by a trigger [37], consisting of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction.In the hardware stage, signal candidates are required to have at least one high-p T muon.For the normalisation channel, at least one hadron with high transverse energy is required.The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from any PV.At least one charged particle must have significant transverse momentum and be inconsistent with originating from a PV.A multivariate algorithm [38,39] based on kinematic, geometric and lepton identification criteria is used for the identification of secondary vertices consistent with the decay of a b hadron.
Simulation is used to optimise the selection, determine the signal model for the fit and obtain the selection efficiencies.In the simulation, pp collisions are generated using Pythia 8 [40] with a specific LHCb configuration [41].The B 0 decay is assumed to proceed according to a uniform phase-space model.The tau decay is simulated using the Tauola decay library tuned with BaBar data [42].The decays of all other unstable particles are described by EvtGen [43], in which final-state radiation is generated using Photos [44].The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [45] as described in Ref. [46].

Event selection
The B 0 → K * 0 τ ± µ ∓ candidates are reconstructed by combining six good quality charged tracks, with p smaller than 110 GeV/c, p T larger than 250 MeV/c, η between 2 and 4.9.Two of the tracks of opposite charge, one compatible with a kaon hypothesis and the other with a pion hypothesis, are required to form a K * 0 candidate, with mass within the range 700 to 1100 MeV/c 2 and with a good quality vertex.A third track is identified as a muon.The other three remaining tracks, identified as pions, with momentum higher than 2 GeV/c each, should come from another vertex.These pions form the tau candidate, with charge opposite to that of the muon and with a reconstructed mass within 0.5 to 2.0 GeV/c 2 .The tau vertex must have a radial distance between 0.1 and 7 mm and a distance along the z-axis larger than 5 mm with respect to the best PV.
The K * 0 , the muon and the tau candidates should have a transverse momentum greater than 1 GeV/c each.They form a B 0 candidate, with a vertex of good quality and sufficiently displaced from any PV.Finally, the K * 0 τ mass should be lower than 5 GeV/c 2 and the reconstructed B 0 candidate mass be within 2 to 10 GeV/c 2 .A sample of data, with tau and muon of the same charge (called same-sign data in the following), is also selected as a proxy for backgrounds.
Neutral particles in the tau decays, namely the neutrino and possibly π 0 s, are not explicitly reconstructed.For this reason, the invariant mass of the six tracks, m K * τ µ , does not peak at the B 0 meson mass.The corrected mass m corr = p 2 ⊥ + m 2 K * τ µ + p ⊥ is used to recover part of the missing energy, where p ⊥ is the component of the missing momentum perpendicular to the direction of flight of the B meson [47].
Backgrounds can be divided into two categories: combinatorial background, arising from random combinations of tracks; and physics background due to b-hadron decays, which are partially reconstructed and/or reconstructed with particle mis-identification, called physics background.In particular, a large component of background involves D mesons with a decay time compatible with the tau and decaying into multiple charged tracks.In order to optimise the rejection of these backgrounds of a different nature, a multi-stage selection procedure is used.The procedure was developed without looking at candidates in the region where the signal is expected.This region, called the signal region, is defined to be 4.6 < m corr < 6.4 GeV/c 2 .Consequently, the region 6.4 < m corr < 18.0 GeV/c 2 is called the upper mass sideband, while the region 3.0 < m corr < 4.6 GeV/c 2 is called the lower mass sideband.The former is dominated by combinatorial background while the latter contains both combinatorial and physics background.
The selection is optimised independently for Run 1 and Run 2 data-taking periods, to account for possible different background levels due to different data-taking conditions.The Punzi figure of merit [48], defined as ε/(3/2 + √ Y b ), is used for identifying the optimal selection requirement, apart from the vetoes.Here ε is the efficiency of the signal selection, measured with simulation, and Y b is the background yield in the signal region, estimated differently depending on the selection step.
The first stage of the selection is based on a multivariate discriminant exploiting the differences between the topologies of the signal decays and the combinatorial background.A boosted decision tree (BDT) [49], using the AdaBoost algorithm from the TMVA package [50], combines information from: the χ 2 of the vertices of the B 0 , K * 0 and tau candidates; the χ 2 of the flight-distances of the B 0 and the tau candidates.The BDT is trained using simulated signal samples and the upper sideband of the same-sign data sample as a background proxy.A k-folding approach [51] is used to exploit the training samples without biasing the output of the classifiers.For the threshold optimisation, the background yield in the signal region is extrapolated from a fit with a decreasing exponential function to the candidate m corr distribution in the region 11 to 18 GeV/c 2 , dominated by combinatorial background.
The second stage is a multivariate selection dedicated to the rejection of charmed mesons mis-identified as tau leptons.The decay of a tau into three charged pions and a neutrino occurs mostly through the a ± 1 resonance, which in turn decays into a ρ 0 and a π ± meson.In order to exploit the kinematic properties of this decay chain, the minimum and maximum of the momenta of the pions from the tau candidate and the masses of pion pairs with zero combined charge are combined into a BDT.The BDT is trained using a k-folding approach, with B 0 → K * 0 τ ± µ ∓ simulation and inclusive bb simulated events reconstructed as B 0 → K * 0 τ ± µ ∓ as a proxy for the background.Particle identification requirements and K * 0 kinematic constraints are removed, so that particles from charmed mesons in the bb sample can form tau and K * 0 candidates.For the threshold optimisation, the background yield is extrapolated from a fit to the B 0 candidate mass distribution in the 3 to 18 GeV/c 2 mass range, excluding the signal region.The fit model used is the sum of a Crystal-Ball function [52], modelling the lower mass sideband, dominated by partially reconstructed events, and an exponential function, with a slope primarily determined by events in the upper mass sideband.The reliability of this background yield estimation is verified on the same-sign data, where the prediction can be compared to the observed number of candidates in the signal region.
The third selection step consists of requirements on the muon, kaon and pion particleidentification variables.A three-dimensional scan over the possible thresholds of the three selection criteria is performed, and a configuration maximising the Punzi figure of merit is chosen.For this optimisation, the background yields evaluated in the previous step are scaled by the rejection of the particle-identification requirements, as estimated from the same-sign sample within the signal region.
The search sensitivity is further increased by exploiting the discriminating power of the mass of K * 0 candidates, required to be within 856 to 946 MeV/c 2 .In addition, the mass of the tau candidate is calculated using the known mass of the pion [53] and considering only the pions and neutrino momentum components orthogonal to the B 0 direction of flight.The neutrino component is measured as the difference between the momentum orthogonal components of the K * 0 µ system and the three pions system.This mass is required to be within 789 to 1900 MeV/c 2 .Both mass regions have been optimised using the Punzi figure of merit, as in the previous step.A particle selected from a partially reconstructed decay is oftentimes surrounded by other particles which are not used to reconstruct the candidate.To exploit this feature, a Fisher discriminant [54] is built, using the k-folding procedure and based on the following variables evaluating the particle isolation: the logarithm of the smallest variation of the B 0 , the K * 0 or the tau vertex χ 2 when adding one or two tracks to them, independently; and the invariant mass of all the particles forming the vertex under the hypotheses of such additions; the number of tracks compatible with the aforementioned vertices (i.e. whose addition keeps the overall χ 2 below 9).The Fisher discriminant is chosen for its stability in the training procedure, despite the small size of the training samples available at this point of the selection.Signal simulation and, for the background, same-sign data that fall into the signal mass region, are used for the training and for the optimisation of the threshold.
A powerful variable to reject the physics backgrounds is the tau flight-distance χ 2 ; it is already used to reject the combinatorial background, which tends to have larger values than the signal.It also offers residual discriminating power against physics backgrounds from charmed mesons, for example, B 0 → D * − µ + ν decays for the B 0 → K * 0 τ − µ + decay channel.Given the different nature of the physics backgrounds, the thresholds applied to the B 0 → K * 0 τ − µ + and B 0 → K * 0 τ + µ − channels are optimised separately.
The remaining physics backgrounds are studied with events rejected by the first BDT and within the region 4.7 < m corr < 5.7 GeV/c 2 .Mass distributions from possible final state particle combinations are used to define vetoes that remove the observed resonant structures.
For the B 0 → K * 0 τ − µ + case, an important physics background comes from the decay B 0 → D * − µ + ν, with D * − → D 0 π − and D 0 → K + π − π + π − .Three of the four pions can be mis-reconstructed as a tau candidate, while the remaining pion and kaon are used to mis-reconstruct a K * 0 candidate.To reduce the backgrounds from D 0 mesons, events with m(K + τ − ) or m(K * 0 π − π + ) within 60 MeV/c 2 from the known D 0 mass are removed.This requirement additionally rejects most of the D * 0 contributions.However, considering the minimal impact on the signal efficiency, events with m(K * 0 τ − ) − m(K * 0 π − π + ) or m(K * 0 τ − ) − m(K + τ − ) between 135.5 and 155.5 MeV/c 2 are also rejected.Masses accounting for a possible mis-identification of the muon as a pion have been checked, but no significant contributions from charmed mesons have been observed.
After the selection procedure described above, there is never more than one B 0 candidate selected per event.

Normalisation channel
The normalisation channel B 0 → D − D + s , with D − → K + π − π − and D + s → K + K − π + , is reconstructed using six good-quality tracks, with 2 < p < 110 GeV/c, p T > 250 MeV/c, 2 < η < 4.9 and having associated deposits in the RICH detectors.Three of these tracks, one identified as kaon and the other two as pions, are used to form a D − candidate with a displaced vertex of good quality and an invariant mass lying within the range 1750 to 2080 MeV/c 2 .Analogously, one track identified as pion and other two tracks identified as kaons form the D + s candidate, with a displaced vertex of good quality and an invariant mass within the range 1938 to 1998 MeV/c 2 .Both the D − and D + s candidates are required to have p T > 1 GeV/c.
Combinatorial background is rejected by a BDT exploiting the same topological variables used in the selection of the signal channel, but with two flight-distance significances, one for each D meson.The BDT is trained using simulated B 0 → D − D + s decays and s candidates with a mass exceeding 5400 MeV/c 2 as a background proxy.A k-folding procedure is applied.Considering that a significant signal is expected for the normalisation mode, the selection optimisation is based on the figure of merit Y s / √ Y s + Y b , where Y s and Y b are the yields of the signal and background, respectively.These yields are determined from fits in the B 0 mass range 5150 to 5400 MeV/c 2 , using a Gaussian model to parameterise the signal component and a decreasing exponential function to describe the background.
The same particle identification requirements used for selecting the signal are applied on kaons and pions.For each charmed meson, the difference between its reconstructed mass and known mass [55] must be less than 20 MeV/c 2 .
Finally, the normalisation channel yields are measured from fits of the B 0 mass distribution for each year of data taking.The global fits for Run 1 and Run 2 data, leading to yields of 1155 ± 35 and 6516 ± 84, respectively, are shown in Fig. 1.

Determination of efficiencies
The efficiencies for the different steps of the selection chain are evaluated separately for each year of data taking.With the exception of the particle identification efficiency, they are estimated with simulation.The reliability of the simulation is checked using B 0 , D − and D + s mesons from the normalisation channel ) as proxies for two-and three-prong particle vertices, selected with the additional requirement that the B 0 candidate mass lies within ±50 MeV/c 2 from its nominal value.All the variables used in the multivariate selections are found to be well described by simulation.Residual discrepancies are assessed as a systematic uncertainty.
The particle identification variables are not perfectly described in simulation.Therefore, the particle identification efficiency is evaluated using data calibration samples.A pure sample of pions and kaons is obtained from the decay D * + → (D 0 → Kπ)π + while muons are selected from the decay J/ψ → µ + µ − , without relying on PID selection criteria [56].The efficiencies of the PID requirements can then be computed in intervals of the kinematic variables of the final-state-particles momentum and pseudorapidity [57].Intervals are chosen making a compromise between the stability of the efficiency within the interval and statistical uncertainty.The event occupancy, parameterised by the number of tracks per event, is also taken into account, relying on same-sign events in the case of signal and on the observed distribution for the normalisation channel.
The distribution of the selection efficiency as a function of the kinematic variables m 2 K * 0 µ ± and m 2 τ ∓ µ ± , normalised to unity, is shown in Fig. 2.These distributions can be used to re-cast the results for models having B 0 → K * 0 τ − µ + and B 0 → K * 0 τ + µ − decay kinematics different from the uniform distribution in the phase space used for the signal simulation in this analysis.

Strategy to fit the corrected mass distributions
An extended maximum likelihood fit to the m corr distributions of the selected events is performed independently for the B 0 → K * 0 τ − µ + and B 0 → K * 0 τ + µ − decay channels.For each channel the distribution is described by a function P tot which is the sum of three components, P tot = Y τ 3π P τ 3π + Y τ 3ππ 0 P τ 3ππ 0 + Y bkg P bkg .The function P τ 3π models the dominant signal contribution with τ − → π − π + π − ν τ and Y τ 3π indicates the corresponding yields.Analogously, P τ 3ππ 0 models the subdominant signal contribution with τ − → π − π + π − π 0 ν τ and Y τ 3ππ 0 represents the corresponding yields.Finally, P bkg models the background, with yields Y bkg .
The signal components P τ 3π and P τ 3ππ 0 are parameterised independently, using doublesided Crystal-Ball functions (DSCB), i.e. a crystal ball with an additional tail on the right side.The parameters are determined and fixed from fits to the simulated signal samples.The yields for the dominant signal contribution are expressed, for the B 0 → K * 0 τ − µ + channel and separately for Run 1 and Run 2, as Analogous expressions hold for the B 0 → K * 0 τ + µ − channel, and for the subdominant signal components.Here, ε represents the signal efficiency, while Y N and ε N indicate the yields and the efficiency for the normalisation channel.The ratio εY N /ε N is calculated for each year in the run, as indicated by the y index.The parameter of interest B(B 0 → K * 0 τ − µ + ) is in common among the yields of the dominant and subdominant signal contributions, as well as among runs of data taking, and is therefore fitted simultaneously.A fit bias on the measured signal branching ratio, evaluated on background-only pseudoexperiments, is at the level of 10 −7 and is subtracted.The other branching fractions are taken from Ref. [53].Apart from the signal branching fraction, all the parameters are constrained with Gaussian functions to account for the systematic uncertainties described in the next section.
The functional form used to parameterise the background shape is also a DSCB function.The background yields are free to vary in the fit, while the shape parameters are constrained to the value determined on a control sample obtained by loosening the combinatorial multivariate selection.Signal contamination in the control regions is estimated to be less than 5% of the total signal, assuming a branching fraction of 10 −5 .

Systematic uncertainties
For the determination of the limit, some branching fractions are used as external inputs.A systematic uncertainty is assigned imposing Gaussian constraints on their values, taken from the PDG [53].
The normalisation procedure involves ratios of signal and normalisation efficiencies, for which systematic uncertainties cancel.However, some efficiencies do not cancel in the ratio and are assessed as follows.
The uncertainty related to the limited size of the simulation samples used to determine the efficiencies is included as part of the statistical uncertainty.
The ratio between the signal and the normalisation tracking efficiencies is determined from simulation.Possible differences in data with respect to simulation will be of the same order in the numerator and denominator for five out of the six particles in the final state, and will cancel.Because the sixth hadron in the normalisation channel interacts differently with the material with respect to the muon in the signal channel, a 1.4% uncertainty is assigned to the tracking efficiency ratios [58].
The systematic uncertainty on the determination of the PID selection efficiency accounts for: the limited size of simulation and calibration samples; the choice of the p, η and number of tracks interval sizes; and the use of the sP lot technique [59] to extract the signal yield in the control samples.
Although the simulation of the variables used in the multivariate classifiers is validated with the normalisation channel, small residual discrepancies could affect the evaluation of the classifier efficiencies.To account for this, the B 0 → D − D + s candidates with a mass within 50 MeV/c 2 of the known mass [53] of the B 0 meson are selected from both data and simulation, providing a high purity sample on which the classifiers used to select the B 0 → K * 0 τ µ signal are applied.The variables of the classifiers relative to the tau are applied to the D − meson and analogously those of the K * 0 are applied to the D + s meson.For each classifier, the requirement giving an efficiency on the B 0 → D − D + s simulation sample equivalent to that obtained by the requirement on the simulated B 0 → K * 0 τ µ sample is determined.The relative systematic uncertainty is the absolute value of the difference in efficiencies of this requirement between B 0 → D − D + s simulation and data, divided by the B 0 → D − D + s simulation efficiency.Over the entire data taking period, changes were applied to the hardware muon trigger, in particular regarding the requirement on the transverse momentum of the muon.To account for these changes, the hardware muon trigger efficiency evaluated from simulation is multiplied by a correction factor derived as the ratio between the efficiency from data and simulation for the B ± → J/ψK ± control sample.These corrections are functions of the muon transverse momentum.The difference between the efficiency resulting from this weighting procedure and the efficiency from simulation is taken as a systematic uncertainty.The overall effect is less than 1%.
The hardware hadron trigger, used in the selection of the normalisation channel, is less well reproduced in simulation.A data-driven efficiency estimation is provided by Table 1: Relative increase (in %) of the observed upper limit when applying each of the systematic uncertainties.

Systematic effect
Limit increase [%] Normalisation yields 1 1 Background control region choice 18 26 Background analytical shape 1 1 the number of events triggered simultaneously by any muon in the event and one of the hadrons in the B 0 → D − D + s decay, divided by the number of events triggered by any muon in the event.The absolute difference with the efficiency estimated on the B 0 → D − D + s simulation is assumed as a systematic uncertainty.
The systematic uncertainty associated to the fit model used for measuring the normalisation channel yields (Y n ) is assessed via pseudoexperiments, generated using a kernel estimation of the B 0 → D − D + s mass distribution and subsequently fit using the nominal normalisation channel fit model.The relative systematic uncertainty is the ratio between the fitted mean of the yields with respect to the mean of the generated values, and is 1.8% for Run 1 and 1.7% for Run 2.
The parameters of the DSCB function describing the background are Gaussian constrained to the values determined by the fit of the background control region, obtained by loosening the combinatorial multivariate selection.Alternative control regions are defined by choosing different requirements on the combinatorial multivariate discriminant.The mean value and width of the DSCB are Gaussian constrained to half the maximum spread of the results obtained from fits on the alternative control regions, if this is larger than the parameter uncertainty on the default control region.The systematic uncertainty for the choice of the DSCB functional form is assessed by generating samples using a kernel estimation of background control samples, and fitting with the default DSCB function.A bias term is added to each of the signal branching ratios, and its value is constrained to a Gaussian function with a mean of zero and a width set to the average value of the bias measured on pseudoexperiments, which is at the level of 6 × 10 −7 .
The effect of the systematic uncertainties is summarised in Tab. 1, where the increase of the observed upper limit when applying each of the systematic uncertainties in addition to the others is shown.The dominant systematic effect comes from the uncertainty on the arbitrary choice of the control region.

Conclusion
The first search for the lepton-flavour violating decays B 0 → K * 0 τ ± µ ∓ has been performed on a sample of proton-proton collision data, collected with the LHCb detector at centre-of-mass energies of 7, 8 and 13 TeV between 2011 and 2018, corresponding to an integrated luminosity of 9 fb −1 .No significant signal is observed, and upper limits on the branching fractions are set: B(B 0 → K * 0 τ + µ − ) < 1.0 (1.2) × 10 −5 and B(B 0 → K * 0 τ − µ + ) < 8.2 (9.8) × 10 −6 at the 90% (95%) confidence level.These results  assume a uniform distribution of the signal events within the phase space accessible to the K * 0 , tau and muon.They are currently the most stringent upper limits on b → sτ µ transitions.

Figure 1 :
Figure 1: Mass distributions of selected D − D + s candidates in (left) Run 1 and Run 2 data, with the fit overlaid (blue solid line).The red dashed line is signal, while the green dotted line is background.

Figure 3 :
Figure3: Distribution of the corrected mass m corr of selected B 0 → K * 0 τ + µ − candidates in (left) Run 1 and (right) Run 2 data, with the simultaneous fit overlaid (blue solid line).The red dashed line is the dominant signal component with τ + → π + π − π + ντ ; the violet dotted line is the subdominant signal component with τ + → π + π − π + π 0 ντ , extremely small and consequently barely visible; and the green dash-dotted line is the background.

Figure 4 :
Figure4: Distribution of the corrected mass m corr of selected B 0 → K * 0 τ − µ + candidates in (left) Run 1 and (right) Run 2 data, with the simultaneous fit overlaid (blue solid line).The red dashed line is the dominant signal component with τ − → π − π + π − ν τ ; the violet dotted line is the subdominant signal component with τ − → π − π + π − π 0 ν τ , extremely small and consequently barely visible; and the green dash-dotted line is the background.

Figure 5 :
Figure 5: The expected and observed p-values derived with the CL s method as a function of the (left) B 0 → K * 0 τ + µ − and (right) B 0 → K * 0 τ − µ + branching fraction.The red line corresponds to the 95% CL.