Search for lepton-flavor-violating τ decays into a lepton and a vector meson using the full Belle data sample

Charged-lepton-flavor-violation is predicted in several new physics scenarios. We update the analysis of τ lepton decays into a light charged lepton (` = e± or μ±) and a vector meson (V 0 = ρ0, φ, ω, K∗0, or K∗0) using 980 fb−1 of data collected with the Belle detector at the KEKB collider. No significant excess of such signal events is observed, and thus 90% credibility level upper limits are set on the τ → `V 0 branching fractions in the range of (1.7–4.2)× 10−8. These limits are improved by 30% on average from the previous results.

counters, time-of-flight scintillation counters, and an electromagnetic calorimeter composed of 8736 CsI(Tl) crystals (ECL). These devices are located inside a superconducting solenoid coil that provides a 1.5 T magnetic field. An iron flux return located outside of the coil is instrumented to detect K 0 L mesons and identify muons. The Belle detector is described in detail elsewhere [32,33].
The e + e − collision events in the Belle detector are simulated by the Monte Carlo (MC) method. Signal MC events of τ → V 0 are generated by a dedicated MC with KKMC and TAUOLA [34], where τ + τ − pairs are initially produced and one of the τ 's decays into V 0 and the other decays generically. The numbers of generated signal MC events are 1.1 × 10 6 events at the Υ(4S) resonance, 0.4 × 10 6 events at the Υ(5S), 0.1 × 10 6 events at each of the Υ(1-3S), and 0.1 × 10 6 events at an energy 60 MeV below the Υ(4S). We assume a uniform CLFV decay angle in the τ rest frame. No specific NP model is assumed in the CLFV decay process, and the spin direction of V 0 is set randomly and independently of the spin of the mother τ . For background MC simulations, e + e − → qq (q = u, d, s, c), e + e − → τ + τ − , Bhabha, and two-photon processes are generated by EvtGen [35], KKMC [34], BHLUMI [36], and AAFH [37], respectively. The detector responses are simulated by GEANT3 [38].

Reconstruction and event selection
A signal τ is reconstructed from a lepton and a neutral vector meson. We separate the event into two hemispheres in the center-of-mass (c.m.) frame by a plane perpendicular to the thrust vector ( n T ) [39,40]. The thrust vector is obtained by maximizing the thrust T = Σ i | p c.m. i · n T |/Σ i | p c.m. i |, where i runs over all tracks and photons, and p c.m. i is the momentum in the c.m. frame. In the hemisphere that contains a τ CLFV decay (called signal side and τ sig ), V 0 is reconstructed as follows: ρ 0 from π + π − within the reconstructed mass window of 0.445-1.08 GeV/c 2 , φ from K + K − within 1.00-1.04 GeV/c 2 , ω from π + π − π 0 within 0.7-0.9 GeV/c 2 , K * 0 from K + π − within 0.7-1.1 GeV/c 2 , and K * 0 from K − π + within 0.7-1.1 GeV/c 2 . In the other hemisphere (called tag side), the other τ (τ tag ) is reconstructed from ± νν, π ± ν, π ± π 0 ν, π ± π 0 π 0 ν, or π ± π ∓ π ± ν. This τ tag information enables the suppression of background events that have no neutrinos in the tag side.
The signal τ → V 0 events have a unique kinematical feature; the V 0 invariant mass (M V 0 ) is close to the τ mass and the difference of the V 0 energy from the beam energy in the c.m. frame (∆E) is close to zero. The signal events within 1.65 GeV/c 2 < M V 0 < 1.90 GeV/c 2 and |∆E| < 0.5 GeV are reconstructed in this paper. We follow a blind analysis approach in this search by not looking at the signal candidates in the data set until finalizing the event selection and background estimation. The blind region is 1.75 GeV/c 2 ≤ M V 0 < 1.81 GeV/c 2 and |∆E| < 0.08 GeV for the µρ 0 , µφ and µK * 0 (K * 0 ) modes, and 1.74 GeV/c 2 ≤ M V 0 < 1.82 GeV/c 2 and |∆E| < 0.1 GeV for the other modes.
Charged tracks, photons, and π 0 s should satisfy the following selection criteria. Each charged track or photon is within the fiducial volume defined by −0.866 < cos θ < 0.956, where θ is the polar angle with respect to the direction opposite to the e + beam in the laboratory frame. Charged tracks come from the interaction point; the distance of the closest point from the interaction point is less than 0.5 cm in the transverse direction and less than 3.0 cm in the longitudinal direction. Each π 0 is reconstructed from two photons inside the same hemisphere and the photon energy (E γ ) should be larger than 0.05 GeV. The π 0 mass window is 0.12 GeV/c 2 < M γγ < 0.15 GeV/c 2 , corresponding to ±3σ in the π 0 mass resolution. A π 0 mass-constrained fit is performed to improve the energy resolution.
After reconstructing the signal and tag τ 's, no extra charged tracks are allowed. We count the number of photons (n γ ) with E γ larger than 0.1 GeV in the signal side, and require n γ ≤ 3 for the ω mode, which includes a π 0 → γγ, and n γ ≤ 1 for the other modes.
Particle identification is effective in suppressing the main background events of threehadron-track to the τ → V 0 signal. We use likelihood ratios for electron identification (P(e)) [41] and muon identification (P(µ)) [42]. The lepton identification criteria are P(e) > 0.9 for electrons, and P(µ) > 0.95 and the momentum is larger than 0.6 GeV/c for muons. The electron (muon) identification efficiency is 90% (75%), whereas the probability of misidentifying a pion as an electron (muon) is 0.1% (2%). The energy loss of an electron by bremsstrahlung is recovered by adding back the energy of every photon within 0.05 radians from the electron track direction into the electron momentum. To suppress lowmultiplicity background events like Bhabha, ee → eeee, or ee → eeµµ, an electron veto (P(e) < 0.9) is applied to all hadron candidate tracks.
For hadron identification, we use a binary likelihood ratio P(i|j) = L i /(L i +L j ), where L i(j) is the likelihood of particle i (j) [43] and i (j) is π, K, or p. The kaon identification criteria are P(K|π) > 0.6 for both charged kaons from φ decay and P(K|π) > 0.8 for the charged kaon from K * 0 and K * 0 decays. The kaon identification efficiency is 86% (77%), whereas the probability of misidentifying a charged pion as a kaon is 4% (2%) for the kaons from φ (K * 0 , K * 0 ). A kaon veto (P(K|π) < 0.6) is applied to both charged pions from ρ 0 in the signal side, and 96% of pions are retained, whereas 14% of kaons are not vetoed. To suppress muons from kaons decaying inside the CDC (K ± → µ ± ν), the kaon veto is also applied to the signal-side muon track for the hadronic tags (τ tag → πν, ππ 0 ν, πππν, or ππ 0 π 0 ν). For the µV 0 modes with the hadronic tags, a proton veto (P(p|K) < 0.6 and P(p|π) < 0.6) is applied for the tag-side tracks.
The signal events have one or two neutrinos from the τ tag decay. We introduce some event selection criteria requiring one or more neutrinos in the tag side. The missing momentum due to the neutrino(s) is calculated by subtracting the vector sum of the momenta of all tracks and photons from the sum of the beam momenta. The missing energy is also calculated by subtracting the sum of the energy of all tracks and photons from the sum of the beam energy. Here, extra photons that are not used for the τ reconstruction are included. The transverse missing momentum is required to be larger than 0.5 GeV/c, and the missing energy in the c.m. frame (E c.m. miss ) is required to be larger than 0 GeV. Events with missing particles other than neutrinos should be rejected as background events. These non-neutrino missing particles can arise in two ways: neutral particles pass through the gaps between the barrel and end-cap ECLs, and any particles go outside the CDC volume. Thus, the direction of the missing momentum is required not to point to such regions. The missing particles should be in the tag side and hence cos θ c.m.
miss−tag is the angle between the missing momentum and the vector sum of the momenta of the tag-side tracks and photons in the c.m. frame. The neutrino angle with respect to the τ tag momentum direction is restricted in a τ tag two-body decay; thus cos θ c.m. miss−tag < 0.85 is also applied for the ρ 0 modes with τ tag → πν.
We require features of a generic τ decay in the tag side. The invariant mass of the particles including all photons in the tag hemisphere should be less than the τ mass (1.777 GeV/c 2 ). For τ tag decays into ππ 0 ν (3πν), the reconstructed mass of those pions is required to be 0.4 GeV/c 2 < M ππ 0 < 1.3 GeV/c 2 (0.7 GeV/c 2 < M 3π < 1.7 GeV/c 2 ), which corresponds to the mass of ρ ± (a ± 1 ). After the above event reconstruction, the background sources are the qq continuum (q = u, d, s, c), generic τ + τ − , and low-multiplicity events. The low-multiplicity events especially contribute to the background events for eV 0 modes that have electron tracks. We suppress the low-multiplicity events first, and then use a maltivariate analysis tool to suppress the qq continuum and generic τ + τ − events.
The Bhabha events have tracks from photon conversion. To suppress these background events for the eV 0 modes, the invariant mass of the electron and one of the tracks from the V 0 , assigned the electron-mass hypothesis, should be larger than 0.2 GeV/c 2 . In addition, for the eK * 0 and eK * 0 modes, the invariant mass of the two tracks from the V 0 , each assigned the electron-mass hypothesis, is required to be larger than 0.1 GeV/c 2 . This event selection also suppresses some of the generic τ + τ − events, which have tracks from photon conversion.
The low-multiplicity background events are still not negligible for the events with electrons: τ → eV 0 or τ tag → eνν. Because the missing particles of the low-multiplicity background events are the bremsstrahlung photons from the electron in the tag side, cos θ c.m. miss−tag is close to one ( Figure 1). In addition, the missing energy is small for some low-multiplicity background events. For the µρ 0 mode with τ tag → eνν, cos θ c.m. miss−tag < 0.99 and E c.m. miss > 0.4 GeV selection criteria are applied. For the eV 0 modes with τ tag → eνν or πν, cos θ c.m. miss−tag < 0.97 is applied. For the eV 0 modes with τ tag → eνν, E c.m. miss should be larger than 0.4, 2.0, and 1.5 GeV for eφ, eρ 0 , and the other eV 0 modes, respectively.
The remaining background events are mainly from the qq continuum (q = u, d, s, c) and generic τ + τ − events, which have three charged pion tracks in the signal side. We use a two-class Boosted Decision Tree (BDT) for signal and these background classification. The BDT library is LightGBM [44]. This BDT outputs a signal probability using the following input variables:

miss−tag
• (categorical variables) τ tag decay mode, collision energy Figure 1: The cos θ c.m. miss−tag distribution of the τ → eρ 0 mode with a electron tag track after the reconstruction, particle identification, and photon conversion event suppression. Black points with error bars are the data outside the blind region. Red solid histogram is the signal MC. The signal MC is scaled to the number of events corresponding to 100 times as large branching fraction as the current upper limit. The red dashed line is the upper limit to remove the low-multiplicity events. The low-multiplicity events cluster around cos θ c.m. miss−tag = 1, whereas the other background events are linearly distributed in the region of cos θ c.m. miss−tag > 0.8.
where M V 0 is the invariant mass of the vector meson, M 2 ν is the missing mass squared, P c.m.
ν is the missing momentum in the c.m. frame, T is the magnitude of the thrust vector [39,40], P sig is the momentum of the lepton in the signal side, E hemi tag is the energy sum of the tracks and photons in the tag hemisphere, P sig π 0 is the momentum of π 0 from ω and E low γ is the lower energy of the two photons from the π 0 . The variables of neutrino kinematics (M 2 ν and P c.m. ν ) were not used for the event selection in the previous paper [29]. They are calculated from the momenta of the reconstructed τ sig and τ tag , where the energy of τ sig is fixed to the half of the beam energy in the c.m. frame. The qq continuum background events can be effectively suppressed by a M 2 ν selection in the hadronic tags, involving only one neutrino ( Figure 2).
The training, validation and evaluation of the BDT are done with 40%, 10%, and 50% of the signal MC, respectively. Regarding the training and validation samples for the background events, we utilize hadron background enhanced data that are obtained by removing the lepton identification for the signal-side leptons but with a lepton identification veto (P(e) ≤ 0.9 and P(µ) ≤ 0.95) for all the signal-side tracks in the data. The hadron background enhanced data have a much larger number of events than the background data with the nominal selection criteria, whereas both data sets are composed mainly of three charged pions from τ decays or from continuum events. The training is done with 80% of the hadron background enhanced data and the validation is done with 20%. During BDT training, a weight is applied to each of the signal MC events such that the sum of the weights is equal to the number of background events. We monitor the area under curve (AUC) of the Receiver Operating Characteristic curve [45] for the validation samples during the training and choose the BDT with the best AUC score.
The event selection with the BDT output (BDT selection) is determined only by a target signal efficiency. The target signal efficiency is determined based on the signal efficiency with a cut-based event selection. In the cut-based event selection, the M V 0 windows correspond to ±2σ of reconstructed mass distribution, and the M 2 ν windows are set for each V 0 mode and each τ tag decay mode so that the expected number of background events inside the signal region (N BG , see the next section) is approximately one or less. The target signal efficiency with the BDT selection is set as relatively 5% larger than that with the cut-based event selection, because we expect improvement in separating the signal events from the background events.
The finalized BDT selection shows similar N BG to that of the cut-based event selection. The BDT selection is not applied to the φ modes because N BG in each of the two modes is small enough. ν distribution of the τ → µρ 0 mode with the hadronic tags after the event selection except for the requirement of the BDT output. Black points with error bars are the data outside the blind region. Red solid histogram is the signal MC. The signal MC is scaled to the number of events corresponding to 100 times as large a branching fraction as the current upper limit. The events constituting the upper tail of the signal distribution originate from wrong or missing π 0 in the tag side.

Signal efficiency and background estimation
We define the signal region with an ellipse in the M V 0 -∆E plane. Most of the signal events cluster around M V 0 = 1.777 GeV/c 2 and ∆E = 0 GeV with some correlation. The ellipse oblateness and the rotation angle are calculated from the covariance matrix of the signal MC distribition after the event selection. The center of the ellipse is the mean of the distribution. The ellipse size is determined to maximize the figure-of-merit (FOM) [46], where ε is the signal efficiency inside the ellipse, α is the confidence coefficient (α = 1.64 at 90% C.L.). We estimate N BG through interpolation from the sideband data. Here the sideband data is a set of data passing the event selection and inside the sideband region: 1.65 GeV/c 2 < M V 0 < 1.9 GeV/c 2 and |∆E| < 0.1 GeV outside of the blind region. The interpolation is based on a function in the M V 0 -∆E plane. This function is obtained by fitting the distribution of the hadron background enhanced data within |∆E| < 0.1 GeV, and then it is scaled to the sideband data. Figure 3 shows the distributions of the hadron background enhanced data and MC for the µρ 0 mode. The function is: where f (x) represents the background distribution as a function of M V 0 ; c 1 , c 0 , x 0 , and k are parameters that define the shape of the function; a y represents sharpness of the sigmoid function along the ∆E axis; y 0 is the center of the sigmoid function; and c flat 0 is a term of flat background events in the M V 0 -∆E plane. We define f (x) for each V 0 in eq. (4.3) and the functions for the ρ 0 ( ω) modes are smeared by a Gaussian with standard deviation (σ) of 6.6 (9.6) MeV/c 2 . This σ corresponds to the mass resolution that affects the edge of the M V 0 distribution close to the τ mass for the τ + τ − background. The edge is broad for the other modes owing to wrong mass assignment of fake kaons. The τ + τ − background events for the φ modes are included in c 0 because they are flat along the M V 0 axis in 1.65 GeV/c 2 < M V 0 < 1.9 GeV/c 2 .
The parameters of a y , y 0 , k, and x 0 are fixed at the fit results of the hadron background enhanced data within |∆E| < 0.1 GeV. The fit uncertainties of these fixed parameters are included in the systematic uncertainty of N BG . The other fit parameters correspond to the scale factors of each background component: generic τ + τ − (c 1 ), and continuum and low-multiplicity background events (c 0 and c flat 0 ). We fit the function floating these scale factors (c 1 , c 0 , and c flat 0 ) to the sideband data. The same region around the D meson mass as for the fit to the hadron background enhanced data is excluded from the fitting for the φ and K * 0 modes. The functions are integrated in the elliptical signal regions to deduce N BG , which are in the range of 0.25-0.95.
Another systematic uncertainty on N BG comes from difference of the M V 0 -∆E distributions between the sideband data and the hadron background enhanced data within |∆E| < 0.1 GeV. The difference mainly arises from the electron (muon) identification fake rate, R fake e(µ) (P, θ), which depends on the momentum P and θ of the track. The sideband data have a pion misidentified as a lepton, which tends to have a lower momentum than the pions in the hadron background enhanced data. We evaluate a change of N BG when the parameters-a y , y 0 , k, and x 0 -are redetermined with weighted hadron background enhanced data, where each event is weighted by the ratio of R fake e(µ) (P, θ) to 1 − R fake e (P, θ) − R fake µ (P, θ) for the track in order to conform the M V 0 -∆E distribution to the one of the sideband data. The amout of change of N BG is taken as the systematic uncertainty of N BG .
The statistical uncertainty of N BG is calculated as follows: We generate 100 sets of pseudo-data for each mode in the M V 0 -∆E histogram. The content of each bin in the histogram is set randomly following a Poisson distribution, with the mean taken from the function fitted to the sideband data. We fit the function to each set of the pseudo-data to deduce N BG , and the standard deviation of these N BG is taken as the statistical uncertainty. The major contribution to N BG comes from the M V 0 flat term in eq. (4.2) (c 0 and c flat 0 ), which corresponds to the continuum or low-multiplicity background events. The contribution of the generic τ + τ − background events, which depends on M V 0 , is about one-third as large as the other background contributions. We cannot distinguish the background components of the φ modes through the fit to the data, because the generic τ + τ − background events are distributed evenly along the M V 0 axis. The systematic uncertainties of the expected number of signal events are listed in Table 1. The dominant uncertainties are from the particle identification.
The track and photon energy resolutions in the MC are corrected such that the mass resolution of the D ( * )+ meson matches between the data and MC, where D ( * )+ → K − π + π + (π 0 ) is reconstructed with similar event selection criteria to the signal ones (e.g. |∆E| < 0.5 GeV). The uncertainty of the data mass resolution propagates to the uncertainties of the corrected energy resolutions. We generate two additional signal MC sets in which the track (photon) energy resolution is different by plus and minus its uncertainty, and take the half of the difference in the expected number of the signal events as the systematic uncertainty.
All the uncertainties in Table 1 are summed in quadrature to yield the total systematic uncertainties shown in Table 2. Table 1: List of the systematic uncertainties of the expected number of signal events. The average number of tracks (particles) in the reconstructed τ + τ − events for each signal mode is represented as N track(particle) . When the uncertainty is different mode by mode, we show the range of the uncertainty.

Source
σ syst (%) Integrated luminosity 1.4 ee → τ τ (γ) cross section [48] 0.3 B(φ → KK) and B(ω → πππ 0 ) 1.2 and 0.7 Trigger efficiency 0.2-0.9 Tracking efficiency 0.35 × N track Electron identification efficiency 1.7 × N electron Muon identification efficiency 1.8 × N muon K and π identification efficiency 1.6 (ρ 0 ), 1.8 (φ) and 1.1 (K * 0 and K * 0 ) π 0 efficiency 2.2 × N π 0 Electron veto for hadrons 0.4-1. We set 90% C.L. upper limits on the branching fractions based on a Bayesian method with the use of Markov Chain Monte Carlo [49]. The probability density function of the branching fraction (B(τ → V 0 )) is calculated assuming that N obs follows a Poisson distribution function whose mean value is the expected number of events (N exp ), where L is the integrated luminosity (980.4 ± 13.7 fb −1 ), σ τ τ is the cross section of τ -pair production that is calculated with KKMC [48] (the weighted average of σ τ τ at all the beam energies is 0.916 ± 0.003 nb), and ε is the signal efficiency including the branching fraction of the V 0 . We assume that these values (L, σ τ τ , ε, and N BG ) follow a Gaussian distribution with the width equal to the uncertainty of each value.
The upper limits on B(τ → V 0 ) are listed in Table 2. The average of the limits is better than that of the previous results using 854 fb −1 [29] by 30%. This is due to the additional 15% of integrated luminosity; the addition of π ± π ∓ π ± ν and π ± π 0 π 0 ν modes in τ tag reconstruction, which increases the signal efficiency by 9.6%; and the event selection by multivariate analysis (BDT). The upper limit on B(τ → µρ 0 ) is worse than that of the previous result, though the expected upper limit before unblinding is better. This is because we use the Bayesian limits instead of the Frequentist limits, which are negatively proportional to N BG when N obs is fixed. The signal efficiency (ε), the expected number of background events (N BG ), total systematic uncertainty of the expected number of signal events (σ syst ), the number of observed events in the signal region (N obs ), and the observed 90% C.L. upper limits on the branching fraction (B obs (10 −8 )).

Conclusion
To conclude, we searched for lepton-flavor-violating τ decays into one lepton and one vector meson using the full 980 fb −1 of Belle data. No statistically significant signal candidates are observed, and the 90% C.L. upper limits on the branching fraction are in the range of (1.7-4.2) × 10 −8 for τ → µV 0 and (1.7-2.4) × 10 −8 for τ → eV 0 . The upper limits are improved by 30% on average from the previous results. We achieve these improvements both with the reconsideration of the event selection criteria and with the 126 fb −1 of additional data set.
These acknowledgements are not to be interpreted as an endorsement of any statement made by any of our institutes, funding agencies, governments, or their representatives. We thank the KEKB group for the excellent operation of the accelerator; the KEK cryogenics group for the efficient operation of the solenoid; and the KEK computer group and the Pacific Northwest National Laboratory (PNNL) Environmental Molecular Sciences Laboratory (EMSL) computing group for strong computing support; and the National Institute of Informatics, and Science Information NETwork 6 (SINET6) for valuable network support.