Probing charged lepton flavor violation with axion-like particles at Belle II

We study charged lepton flavor violation associated with a light leptophilic axion-like particle (ALP), $X$, at the $B$-factory experiment Belle II. We focus on production of the ALP in the tau decays $\tau \to X l$ with $l=e,\mu$, followed by its decay via $X\to l^- l^+$. The ALP can be either promptly decaying or long-lived. We perform Monte-Carlo simulations, recasting a prompt search at Belle for lepton-flavor-violating $\tau$ decays, and propose a displaced-vertex (DV) search. For both types of searches, we derive the Belle~II sensitivity reaches in both the product of branching fractions and the ALP coupling constants, as functions of the ALP mass and lifetime. The results show that the DV search exceeds the sensitivity reach of the prompt search to the relevant branching fractions by up to about a factor of 40 in the long decay length regime.


Introduction
The latest experimental bounds on the neutron electric dipole moment have constrained the strength of Quantum Chromodynamics (QCD) θ-term in the Standard Model (SM) of particle physics to be less than ∼ 10 −10 [1,2], pointing to a fine-tuning issue, known as the strong CP problem in QCD [3][4][5]. One of the most compelling solutions to this problem is to introduce a new global symmetry U (1) PQ , which is spontaneously broken by a dynamical CP-conserving axion field [3,4]. Doing so predicts a new pseudoscalar boson, the pseudo-Nambu-Goldstone boson of the broken U (1) PQ symmetry, called the QCD axion, a. The breaking scale of the new symmetry has to be much higher than the electroweak scale: f a 10 9 GeV [6]. Since the mass of the QCD axion, m a , as well as its couplings to the SM particles, are both inversely proportional to f a , the QCD axion should be very light (10 −6 eV < m a < 10 −2 eV) and feebly interacting, with a very long lifetime [5]. Similar to the QCD axion, an axion-like particle (ALP), denoted X throughout this work, is a pseudoscalar boson that interacts feebly with SM particles. Unlike the QCD axion, an ALP does not have a linear proportionality relation between its mass m X and its couplings to the SM particles, and hence does not necessarily solve the strong CP problem. ALPs can be as light as 10 −22 eV [7,8] or as heavy as 1 TeV (heavy GUT axions) [9]. Axions and ALPs are considered some of the most plausible candidates for dark matter [10][11][12][13][14][15][16].
ALPs can be potentially produced at various experiments, including the Large Hadron Collider and B-factories (see for instance Refs. [17][18][19][20][21][22][23][24][25] for related studies). Depending on their masses and strengths of the couplings to the SM particles, ALPs can be short-lived and decay promptly, or be long-lived and decay after traveling a macroscopic distance, leading to signatures of displaced objects or missing energy. ALPs can couple to gauge bosons as well as quarks and leptons, giving rise to a large class of possible experimental signatures. For instance, using its initial dataset of about 445 pb −1 , Belle II has reported the results of a search for a light pseudoscalar boson that couples to photons [26].
In this paper, we restrict ourselves to the case in which the ALPs are leptophilic, i.e., couple at tree level only to the charged leptons, and neglect couplings to neutrinos. In general, the ALP couplings to the charged leptons are not necessarily diagonal, and may have charged-lepton-flavor-violating (LFV) interactions [27]. Since the SM predicts no flavor-changing neutral current (FCNC) at tree level, experimental observations of LFV can be a clear signature of new physics. For muon LFV processes such as µ → eγ and µ → 3e, the SM predictions for the branching fractions are about 10 −55 [28,29], much lower than the present experimental limits BR(µ → eγ) < 4.2 × 10 −13 [30] and BR(µ → 3e) < 1.0 × 10 −12 [31]. The future experiments Mu3e and MEG II will have an improved sensitivity to these processes by as much as four orders of magnitude [32,33]. Compared to the muon-related LFV, LFV effects in the τ sector are much more loosely constrained, as we briefly discuss below. Studies of FCNCs with the ALPs can be found for the lepton sector in Refs. [34][35][36][37][38][39] and for the quark sector in Refs. [25,[40][41][42].
In order to study the ALP couplings to the τ lepton, we consider the ongoing Bfactory Belle II experiment [43,44] at the SuperKEKB collider, where energy-asymmetric collisions of electron and positron beams at the center-of-mass energy √ s = 10.58 GeV take place. Belle II is planned to collect about 4.6 × 10 10 e + e − → τ + τ − events with the design integrated luminosity of 50 ab −1 [45]. The large dataset will enable probing rare τ decays, including LFV decays, with unprecedented precision.
We propose to search for the decays τ → Xl with X → l − l + , where l can be either an electron or a muon. This restricts the ALP mass to the range 2 m l < m X < m τ − m l . In this mass range, the strongest published limits on lepton-flavor-conserving ALP couplings stem from the beam-dump experiments conducted at SLAC [46,47] and a dark-photon search performed at BaBar [21,48], both of which assumed no tree-level LFV couplings. Bounds on the LFV ALP couplings can also be derived from τ → Xe, where X is invisible (see studies at ARGUS [49], Belle [50], and Belle II [38,[51][52][53][54]). This signature, which is not covered in this work, suffers from large background from τ → lνν, requiring more sophisticated methods to suppress. In Section 3.4 we extract stronger constraints from this signature based on test of lepton-flavor universality in leptonic τ decays [55], relevant for the case of interest of non-universal couplings. For past studies on probing LFV associated with an ALP at Belle II, see e.g. Refs. [39,56].
The previous Belle experiment [57] has searched for lepton-flavor-violating τ decays into three charged leptons and obtained limits on the decay branching ratios in the order of magnitude of 10 −8 [58] (see also Ref. [59] for a similar search at the BaBar experiment). In this work we recast that search into the studied theoretical scenarios of a leptophilic ALP, obtaining tighter constraints than previously published ones. Assuming equal selection efficiencies at Belle and Belle II, we calculate the expected Belle II limits on the modelindependent as well as model-dependent parameters.
The organization of this paper is as follows. In Sec. 2 we introduce the effective model under consideration, give the relevant analytic formulas for τ and X decay rates, and list the two benchmark scenarios we focus on for numerical studies. We summarize the current constraints on the ALP couplings to the charged leptons in Sec. 3. In Sec. 4 we briefly describe the Belle II experiment and detail the simulation procedures that we use to study the sensitivity of prompt and displaced τ → Xl searches on the ALP couplings. The numerical results are shown and discussed in Sec. 5, and we conclude in Sec. 6 with a summary and an outlook.

Model
We consider a phenomenological model in the framework of effective field theory of dimensionfive for a pseudo-Goldstone boson X, corresponding to the ALP considered in this work. The X couples at tree level (only) to the SM charged leptons e, µ, or τ , with the following Lagrangian [6] where Λ labels the effective scale below which the model is applicable, G denotes a symmetric matrix of real couplings (G αβ = G βα ), and l α,β = e, µ, τ with mass m α,β . To derive the second line of Eq. (2.1), we apply integration by parts and the equations of motion.
For simplicity we have assumed that the scalar and pseudoscalar couplings are of equal strength for the same lepton flavor combinations. In order to simplify the notation, we define the following coupling matrix, We can thus express the tree-level decay width of one charged lepton l α into a lighter lepton l β plus X with the formula where the two equations are for decays into same-flavor and different-flavor charged leptons, and the second equation is obtained in the limit m β m α . We note that the flavor-diagonal couplings g αα can induce X decays into a pair of photons, as well as those into a pair of neutrinos, both at one-loop order. Compared to the di-photon channel, the di-neutrino decay is highly suppressed (see the relevant discussions in Ref. [21]) and is hence ignored in this work. The corresponding decay width can be calculated with: [21] Γ(X → γγ) = 4πα 2 m 3 X |g eff γγ | 2 , (2.6) where g eff γγ is an effective coupling that can be derived from the ALP-lepton couplings: where the loop function B 1 is defined as follows: Note that both the tree-level couplings g αα and the effective coupling g eff γγ are of mass dimension −1, cf. Eq. (2.2). With Eqs. (2.4), (2.5), and (2.7), we can compute the total decay width and, hence, the proper lifetime of the ALP as a function of its mass and the couplings to the charged leptons.
In this paper, we concentrate on the scenarios where we switch on two couplings: one dominating the production and another governing the decay of the ALP. Specifically, the ALP is produced in the decay τ → l α X via the production couplings g τ α (α = e, µ), and undergoes the decay X → e − e + or µ + µ − via the decay couplings g αα . For simplicity, we do not consider X decays via the off-diagonal coupling g eµ , 1 but will offer brief comments when we discuss the numerical results in Sec. 5. Moreover, the g τ τ coupling is also switched off, so we do not consider ALP production by final-state radiation.
We should point out that in principle, the production coupling g τ α alone can also induce (lepton/quark-level) four-body X decays via an off-shell τ , as well as, sub-dominantly, the two-body decays X → ν τνα (or their CP conjugate) via a triangular loop with a Wboson. However, these contributions are negligible compared to those from the tree-level decays considered here, unless g τ α g αα . In the case that the diagonal couplings are indeed negligible, the ALP becomes very long-lived, requiring a missing-energy search strategy [49,54] that is not the topic of this paper. This particular caveat will not affect our numerical results, and so the highly suppressed X decays via g τ α couplings are not included in the numerical estimates. For m X > m τ , limits on g τ α can be derived from e − e + → l ± l ± τ ∓ τ ∓ and e − e + → l ± τ ∓ + missing [56].

Benchmark Scenarios
In our numerical studies, we focus on two representative benchmark scenarios. In Scenario 1, g τ e and g ee are switched on, and the other couplings are set to zero. As a result, g τ e induces the decay τ → Xe and g ee leads to X → e − e + and γγ. In Scenario 2, only g τ µ and g µµ are non-zero, so that the ALP is produced via τ → Xµ and decays to µ − µ + and γγ. We focus on the di-lepton final state and do not consider the di-photon decay mode as a signature of interest in this work. We summarize these two scenarios in Table 1.

Scenario 1
Scenario 2 g τ α g τ e g τ µ tau decays τ → Xe τ → Xµ g ββ g ee g µµ X decays X → e − e + (sig.)/γγ X → µ − µ + (sig.)/γγ Table 1. Summary of the basic features of the two benchmark scenarios studied in this work, including the couplings switched on and the corresponding induced decays of τ and X. "Sig." indicates the signature studied.
In Fig. 1 and Fig. 2 we present plots of the branching fraction BR(τ → Xl α ), the ALP proper flight distance cτ X , and its decay branching fractions as functions of m X for the two benchmark scenarios. The upper plots of Fig. 1 and Fig. 2 show the variation of BR(τ → Xl α ) with respect to m X , for several benchmark values of g τ α . Intriguingly, the X branching fraction plots show that BR(X → γγ) exceeds BR(X → e − e + ) for m X 0.6 Eqs. (2.5) and (2.7). Therefore, given the tiny electron mass, Γ(X → e − e + ) can become subdominant for sufficiently large m X .
Besides the two benchmark scenarios introduced above, there are, of course, additional scenarios with other possible lepton flavor combinations. In Sec. 5.3, we will qualitatively comment on their numerical results instead of performing detailed studies.

Constraints
In this section, we discuss various experimental and astrophysical results from which we derive constraints on the model parameters related to Scenarios 1 and 2 introduced in Sec. 2. The experimental results come from LEP, B-factories, and muon fixed-target experiments, while the astrophysical bounds stem mainly from the supernova 1987A (SN1987A). We also refer the reader to Ref. [37] for a comprehensive study on probes of ALPs with LFV.

LEP
The LEP-2 collider at CERN provided samples of e − e + → τ − τ + events at center-of-mass energies √ s ranging from 189 to 209 GeV. The inclusive cross section of e − e + → τ − τ + at √ s = 207 GeV was measured to be σ LEP τ τ = 6.05 ± 0.39 (stat) ± 0.09 (syst) pb with 402 observed events at the ALEPH experiment [77], in good agreement with the SM predictions. If the g τ e coupling is non-zero, it can induce the same process via a t-channel ALP exchange, modifying the theoretical computation of the inclusive cross section. Thus, we use the experimental measurement to place a bound on the coupling g τ e .
We implement the ALP model (see Lagrangian in Eq. (2.1)) using FeynRules [78,79] to generate an UFO (Universal FeynRules Output) [80] model file, with which we simulate e − e + → τ − τ + events and compute the corresponding cross sections for given m X and g τ e with MadGraph5 2.7.3 [81]. By scanning over a 2D grid of (m X , g τ e ), we exclude the points where the computed cross section lies outside the two-sided range around the central value of the ALEPH measurement: The theoretical uncertainty is negligible and hence neglected. For m X 2 GeV the limit is roughly a constant at g τ e < 0.078 GeV −1 , because of the relatively large center-of-mass energy. This is a very weak constraint relative to others studied here.
We also note that if one switches on the coupling g τ τ in Scenario 1 together with g ee , it can induce τ -pair production via an s-channel X in e − e + collisions, thus allowing to probe the product of these two couplings. Such a scenario is beyond the scope of this work. Moreover, the g ee coupling itself affects Møller and Bhabha scattering. A precise measurement of the parity violation asymmetry in Møller scattering has been performed in a fixed target experiment [82]. Nevertheless, the axion model under consideration only has the pseudoscalar coupling to electron after taking the derivative of the field X (see the second line of Eq. (2.1) ), so it does not contribute to parity violation.
A limit on the ALP interactions using Bhabha scattering excludes values of Γ(X)[BR(X → e − e + )] 2 above 1.3 · 10 −3 eV for m X between 1.75 and 1.88 MeV [83]. This can be translated to a weak bound of g ee 0.15 GeV −1 for m X = 1.8 MeV in our model, cf. Eqs. (2.4) and (2.6).

Supernova
If the ALP couples to electrons or muons, it can be produced in the supernova environment and contribute to the cooling rate of the supernova core, thus allowing for deriving bounds on these couplings. For instance, Ref. [88] obtained an upper bound on g µµ at 10 −8.1 GeV −1 for m X < 1 MeV. Also, Ref. [89] derived the limits on g ee with SN1987A ranging from ∼ 2.5 × 10 −7 GeV −1 to ∼ 8.8 × 10 −5 GeV −1 for m X 160 MeV. In our relevant mass range, we reproduce in Fig. 3 the SN1987A bounds on g ee here 2 . The gray region is disfavored by SN1987A cooling bounds.

τ → 3l
The tightest bounds on the branching fractions for τ LFV decays into three charged leptons were obtained at the Belle experiment [58]. These limits, which are all of order 10 −8 , are reproduced in Table 2. We discuss the resulting bounds on g τ α · g αα (α = e, µ) in Sections 4 and 5.

Lepton universality in τ → lνν
When the ALP lifetime is long enough that the ALP rarely decays before traversing the whole detector, the decay τ → l α X yields a signature similar to that of τ → lνν, namely τ → l + missing. It is thus possible to place bounds on the off-diagonal coupling g τ α for Decay modes Upper bounds on BR 1.5 Table 2. Upper bounds on the branching fractions of LFV decays of the τ into three charged leptons, reproduced from Ref. [58].
long-lived ALPs, in the scenario g τ α g τ β . For the coupling g τ µ , inspired by Refs. [90,91], we focus on the ratio of rates which has been computed accurately in the SM [92]: The BaBar collaboration measured this ratio to be [55] R BaBar We define the discrepancy between R SM µe and R BaBar We also define the theoretical value of this ratio in the presence of the ALP, where we use the experimental result Γ exp τ →eνν = 4.04 × 10 −13 GeV [5] for Γ SM τ →eνν , and Γ(τ → Xµ) is computed with Eq. (2.3). We determine the 95% confidence-level (C.L.) bounds on g τ µ vs. m X from the region that satisfies the following condition: where 0.0072 is the central value of ∆R µe given in Eq. (3.4) and 0.0040 is its uncertainty.
To derive bounds on g τ e we follow a similar procedure with the ratios R eµ = 1/R µe . In this case, since R SM eµ is now larger than R BaBar eµ , we have The negative value of ∆R eµ results in a particularly stringent bound on BR(τ → Xe), after requiring R SM+X eµ /R SM eµ − 1 < −0.0072 + 2 × 0.0040. The resulting constraints are shown in Fig. 4. For the kinematically allowed mass range, the constraints on BR(τ → Xl α ) are . Constraints on BR(τ → Xl α ) (left) and on g τ α (right) as a function of m X , derived from the upper bounds on lepton-universality violation in τ → l + missing, for Scenarios 1 (g τ e = 0) and 2 (g τ µ = 0). These limits are valid for long decay lengths of the X (small diagonal couplings g ee or g µµ ), so that it rarely decays inside the detector. about 1.4 × 10 −4 for Scenario 1 and 2.6 × 10 −3 for the second scenario. The corresponding upper limits on g τ α are as strong as about 4 × 10 −8 GeV −1 and 2 × 10 −7 GeV −1 for the two scenarios, respectively, at the low mass regime. After the limits described in Sec. 3.3, these are the tightest existing limits. We note that they are valid for small diagonal couplings, so that the ALP decays inside the detector only rarely.
It should be noted that the lepton-universality constraint will improve after the analysis of Ref. [55] is performed with the large Belle II data set. However, the results of Ref. [55] are already limited by the systematics uncertainties, the improvement is expected to be modest.
We notice that Ref. [93] has also recast the BaBar search results to the upper bounds on BR(τ → Xl α ). In contrast to our treatment, which takes into account the experimental measurements of the ratios R eµ and R µe of Γ τ →µνν and Γ τ →eνν , the authors of Ref. [93] worked on the partial decay widths of the two decay channels separately. As a result, our bound on BR(τ → Xe) is stronger than theirs by about one order of magnitude, while both works obtained similar constraints on BR(τ → Xµ).

τ → lγ
The ALP couplings to the charged leptons can also lead to the decays τ → l α γ via either one or two LFV couplings. In the case of two LFV couplings, the processes must involve the coupling g µe , which we set to zero throughout the paper. Therefore, we focus on the combinations of one LFV coupling and one diagonal coupling. The relevant coupling combinations are (g τ τ , g τ e ) and (g τ e , g ee ) for Scenario 1, and (g τ τ , g τ µ ) and (g τ µ , g µµ ) for Scenario 2. Since we take g τ τ = 0 in this paper, we derive only bounds on the coupling products g τ e ·g ee and g τ µ ·g µµ from the experimental upper bounds BR(τ → eγ) < 3.3×10 −8 and BR(τ → µγ) < 4.4 × 10 −8 [94], respectively.
Under these conditions, the ALP contributions to these decays consist of two categories: two-loop diagrams with an effective g eff γγ coupling, and one-loop diagrams without this coupling. The former turns out to be dominant because of the logarithmic dependence on the ultraviolet cutoff. The decay width of τ → l α γ is computed with the following analytic expression [36,37]: where α = e, µ, and F α 2 (0) and G α 2 (0) are dimensionless form factors consisting of two parts that are naively linear and quadratic in the ALP-lepton couplings: (3.14) We have assumed g eµ = g µe = g τ τ = 0, and confined ourselves only to the couplings that are turned on in Scenarios 1 and 2 for simplicity. These couplings, if non-zero, could also lead to τ → l α γ decays with two ALP-lepton couplings. For full and complete formulas, see Refs. [36,37]. The resulting bounds on g τ e g ee and g τ µ g µµ are roughly 10 −6 GeV −2 for m X between 0 and 2 GeV. Note that we have fixed Λ = 1 TeV for the logarithmic dependency (see Eq. (3.13)).

Leptonic decays of the muon
Multiple processes of muon decays into electrons or photons can be induced by the ALP couplings. For instance, the decay µ → eγ [28,29], which is extremely rare in the SM, can be mediated either by g µe together with g ee or g µµ , or by the two LFV couplings g τ µ and g τ e . The experimental limits on BR(µ → eγ) [30] place very strong bounds on these coupling products. Since in this work we take g µe = 0 and do not switch on g τ µ and g τ e simultaneously, both of these two contributions vanish (see Eqs. (11)(12)(13)(14) of Ref. [37]).
Similarly, the processes µ → 3e, µ → eγγ, and µ → e + missing can all be used to derive bounds on the coupling g µe (and g ee ). However, since we have restricted ourselves to the case of vanishing g µe , we do not study these processes in detail.

Leptonic g − 2 anomalies
One of the most sensitive avenues for testing the SM is the leptonic anomalous magnetic moments. The long-standing discrepancy in the muon g − 2 between the SM predictions and experimental measurements has been an important drive for searching for new physics. The latest determination of the fine-structure constant α EM makes the electron anomalous magnetic moment consistent with the SM prediction to within 1.6σ [95]: On the other hand, the recently published result of the Fermilab-based Muon g − 2 experiment [96], when combined with that of the E821 collaboration at BNL [97], has an excess of about 4.2σ over the latest SM theoretical computation [98]: The discrepancy may be explained by an ALP [37,99,100]. In principle, both diagonal and off-diagonal couplings could lead to new contributions to a α = (g − 2) α /2 (α = e, µ). However, since in the present work we keep g µe switched off and also take equal scalar and pseudoscalar couplings, the LFV couplings contributions vanish [37]. The diagonal coupling contributions to a l are where x α = m 2 X /m 2 α , g eff γγ can be computed by Eq. (2.7), and In Eq. (3.17), the second term, − m 2 α 4π 2 |g αα | 2 h 1 (x α ), is strictly negative. Therefore, in order to explain the observed discrepancy, the first term in Eq. (3.17) must be positive and large enough to overcome the negative second term. We find that for the electron g − 2 case, the leptophilic axions considered in Scenario 1 can bring the theoretical prediction to reach the 95% lower limit of ∆a e but not up to the central value. On the other hand, for the muon g −2 discrepancy the ALPs in Scenario 2 cannot even bring the theoretical prediction to the edge of the 95% lower limit of the experimental value (cf. Eq. (3.16)). For these reasons, we do not show their corresponding limits in the sensitivity plots in Sec. 5. Lastly, we note that a very recent lattice calculation of hadronc contributions to the muon g − 2 pushed the SM prediction to within 1σ of the experimental result [101].

Beam-dump experiments and dark-photon searches
In our m X range of interest, results from the E137 and E141 beam-dump experiments at SLAC [102], and similar ones at CHARM [103], Orsay [104], and Fermilab [105], typically constrain the coupling g ee to be outside the range 10 −5 GeV −1 to 1 GeV −1 when taking the axion to radiate off the electron beam [46,47]. Moreover, assuming universal diagonal couplings g ee = g µµ = g τ τ , a dark-photon search performed at BaBar [21,48] can be recast into bounds on these couplings [21]. The limits obtained exclude g ee above ∼ 6 × 10 −3 GeV −1 for m X between about 210 MeV and 8 GeV. 3 We extract these results from Ref. [21] and show them in Fig. 5. Figure 5. Constraints on g ee as a function of m X , reproduced from Ref. [21]. Note that the BaBar dark photon bound is valid for universal diagonal couplings g ee = g µµ = g τ τ .

Muonium-antimuonium oscillations and µ − → e − conversion
The bound state of e − and µ + is known as muonium M . In the presence of lepton flavor violation, the muoniums may oscillate into antimuoniumsM (e + µ − ). The strongest bound on the probability of spontaneous muonium to antimuonium conversion, P MM < 8.3×10 −11 at 90% C.L., was obtained by the MACS experiment in 1999 [106]. For the leptophilic ALPs, only the coupling g µe could result in such an oscillation. Since we keep this coupling off in this work, we do not go into the details of the derivation of the bound.
Similarly, µ − → e − conversion in nuclei is also a very sensitive probe of LFV [107]. In the context of the leptophilic ALPs, this process also only concerns the coupling g µe , and is not studied here. Bounds derived from past experiments [108] and those projected for future experiments [109,110] can be found in Ref. [37].

Prompt and displaced searches at Belle II
In this section we discuss our recast of the Belle prompt LFV search, projected to the Belle II sensitivity, as well as the displaced search that we propose to conduct. Our results are then shown in Section 5.
Belle II is an ongoing B-factory experiment hosted at the KEK laboratory in Tsukuba, Japan. It collides an electron beam of energy E e − = 7 GeV with a positron beam of E e + = 4 GeV, reaching a center-of-mass energy of 10.58 GeV. The experiment has so far collected about 200 fb −1 , less than the samples collected by the BaBar and Belle experiments, approximately 500 fb −1 and 1 ab −1 , respectively. However, Belle II is planned to eventually collect a sample of 50 ab −1 . With a cross section σ(e + e − → τ + τ − ) ≈ 0.92 nb, this implies 4.6 × 10 10 τ -pair events produced.

Recast of the prompt search
We first recast the Belle prompt search for lepton-flavor-violating τ decays into three charged leptons, which used a sample of 7.19 × 10 8 τ -pair events [58].
The background level reported in Ref. [58] ranges from 0.01 ± 0.01 events in the τ − → µ + e − e − channel to 0.20 ± 0.15 events in the τ − → e + e − e − channel. We assume that the number of background events will remain small even with the much larger sample of Belle II, sufficiently so that it can be ignored in our Belle II projection.
The event-selection efficiency reported in Ref. [58] for each final state is reproduced in Table 3. We refer to this as the "baseline efficiency". In addition, since the ALP in the region of sensitivity can be long-lived, its daughter leptons may escape detection in the analysis optimized for prompt decays, reducing the efficiency. We use events simulated with Pythia8.245 [111] and without detector simulation to estimate the efficiencies as functions of the ALP lifetime and mass. While the beams at Belle II collide with a small crossing angle, we simplify this and simulate head-on collisions, with the electron beam momentum in the +z direction.
As required in Ref. [58], we require the transverse and longitudinal impact parameters of both charged leptons produced in the ALP decay to satisfy d 0 < 5 mm and z 0 < 30 mm, respectively. The computation of z 0 is performed with where x, y, and z are the production position coordinates of the charged lepton with respect to the IP, p x , p y , and p z are the components of the lepton momentum in the three spatial directions, and p 2 T = p 2 x + p 2 y is its transverse momentum squared. The calculation of the transverse impact parameter is more involved. For straight tracks, it can be approximated as However, given the relatively low momenta and possibly large flight distances, the track curvature resulting from the local magnetic field can be non-negligible. We estimate the radius of the track R as where T stands for Tesla and B = 1.5 T is the strength of the magnetic field. The true transverse impact parameter d 0 is then computed with where r labels the transverse decay position of the ALP from the interaction point (IP). Furthermore, we require r < 10 cm. This requirement does not appear explicitly in Ref. [58], but we apply it in order to obtain a more realistic simulation of the minimal number of detector hits needed for track reconstruction. This requirement is important only for very light ALPs, which are highly boosted and can pass the d 0 and z 0 requirements even when decaying far from the IP. It is redundant for heavy ALPs, whose daughters are separated by a large angle and would anyway fail the d 0 and z 0 cuts.

Proposal of a displaced-vertex search
Since the prompt search is not optimized for detection of a long-lived ALP, we propose to complement it with a displaced search, in which the maximal d 0 and z 0 requirements of the prompt analysis are replaced with a minimal requirement on the ALP flight distance, r > 1 cm. Since this displaced-vertex requirement leads to a large reduction in background (see, e.g., Ref. [61]), the background in this search is even smaller than the negligible background in the prompt search.
Given the strong background suppression provided by the displaced vertex and the sub-event background level of the prompt Belle analysis [58], we expect that many of the cuts applied in the prompt analysis can be avoided in the displaced one. For example, based on the efficiencies reported in Ref. [58], removing the missing-momentum cut and applying lepton identification criteria on only one of the leptons increases the baseline efficiency by a factor of 1.5 in scenario 1 and 2.3 in scenario 2. These efficiency enhancements relative to the prompt analysis are included into the final sensitivity estimates for the displaced search reported in Section 5.
In addition to the baseline efficiency, we determine the detection efficiency variation as a function of the ALP decay position following the procedure used in Refs. [63,72]. We require the ALP to decay inside the fiducial volume of the tracker, defined by the transverse and longitudinal distances from the IP: 1 cm < r < 80 cm and −40 cm < z < 120 cm, respectively. In addition, we account for the tracking efficiency decrease with the transverse distance from the IP. This so-called "displaced-tracking efficiency" track is parameterized as a linear function of r, so that track = 1 for r = 1 cm and track = 0 at r = 80 cm. In particular, in the large decay length limit, track is effectively reduced to an overall factor of 50% on average, for decays inside the fiducial volume.

Computation and Simulation
The number of signal events at Belle II is computed with: where l α = e or µ, N τ − τ + = 4.6 × 10 10 is the total number of τ pair events produced at Belle II, denotes the final efficiency of event selections as detailed above, and BR(τ → 1 prong) ≈ 85% is the SM decay branching ratio of the τ lepton into a single charged particle and neutrinos. Application of this factor assumes that the events are identified as e + e − → τ + τ − by requiring the topology of one track recoiling against the three leptons from the signal decay chain. The factor 2 is due to the fact that in each signal event there are two τ leptons. For vanishing background, N Belle II S = 3 corresponds to the 95% C.L. exclusion limits.
The simulations are performed on a grid in the model parameters. For each grid point, we simulate the process e − e + → τ − τ + . The τ leptons decay to X + e or X + µ, and the ALP decays to e − e + or µ − µ + , respectively. Pythia automatically processes the decay positions of the simulated ALPs according to the exponential decay distributions using the ALP lifetime and boost factor. Then the prompt and displaced sets of event selection criteria are imposed at the parton level to obtain the efficiency separately for each of the two search strategies. Thus, we obtain the efficiencies for the two strategies at each grid point.

Numerical Results
We now proceed to present numerical results in this section. We concentrate on the two representative benchmark scenarios introduced in Sec. 2 for detailed numerical studies in Secs. 5.1 and 5.2 , while for other further possibilities we will provide a qualitative discussion in Sec. 5.3. Since, as explained in Sec. 4, for both prompt and displaced searches we expect zero background events, we use 3-signal-event isocurves to show exclusion limits at 95% C.L.

Scenario 1: g τ e and g ee
In Scenario 1, the couplings g τ e and g ee are turned on, mediating the production and decay of the ALP, respectively, while the other couplings are all considered as vanishing. Thus, the signal events are τ → Xe with X → e − e + .
We show in Fig. 6 the cut-specific event selection efficiencies for the signal events as functions of cτ X for the prompt and displaced analysis and for three benchmark masses m X = 0.005 (blue), 0.5 (red), and 1.5 (black) GeV. Fig. 7 presents the estimated 95% C.L. sensitivities at Belle II for the prompt and displaced searches. The upper plot shows the projected limits on the branching fraction product BR(τ → Xe)·BR(X → e + e − ) as a function of cτ X for the three benchmark ALP masses. Since both searches have no background, their maximal sensitivities are approximately equal, with the prompt (displaced) search being more sensitive at shorter (longer) ALP lifetimes. Focusing our attention on the displaced search, which is proposed here for the first time, we observe that it is up to about 40 times more sensitive to the branching-fraction product than the prompt search at large values of cτ X . This translates to about a factor of 6 in the sensitivity to g τ e for a given value of g ee , as seen in the lowerleft plot. We note that the next-tightest constraint, which comes from lepton universality Figure 6. Efficiencies vs. cτ X for the signal events of Scenario 1, with three benchmark values of m X : 0.005, 0.5, 1.5 GeV. Left: prompt cut-specific efficiencies. Right: displaced cut-specific efficiencies. "Cut 1" denotes the baseline efficiency. For the prompt search, "cut 2" is the efficiency of requiring d 0 < 5 mm, z 0 < 30 mm, and r < 10 cm. For the displaced search, "cut 2" gives the efficiency of applying the fiducial-volume and the displaced-tracking requirements.
in τ → lνν (see Section 3.4), yields g τ e 10 −7 GeV −1 (which is valid only for long-lived ALPs and hence small g ee coupling). Since this is much weaker than the sensitivity reach of the prompt and displaced searches, it is not shown on the same plot.
In the lower-right plot we study the particular case g τ e = g ee , and present the limits on the coupling value vs. m X for both Belle II and Belle luminosities, together with bounds from existing measurements. Since the τ → lνν lepton-universality bound is only valid for long-lived ALPs, we show the corresponding excluded parameter space only for cτ X 1 m. In this case, the tightest bound in the case of small couplings is obtained from the leptonuniversality constraint. However, the prompt τ → 3 and the displaced search proposed here are uniquely sensitive to a mass-dependent range of coupling values between about 2 × 10 −5 and 2 × 10 −3 .
We note that our "model-independent" limits on the branching-fraction product can be used to obtain constraints on other models with similar scattering and decay topologies. Examples include a light new gauge boson Z [90,112,113] or a light CP -even scalar [114][115][116].

Scenario 2: g τ µ and g µµ
In Scenario 2, the non-zero couplings g τ µ as well as g µµ are responsible for the production and decay of the ALPs, respectively. The sensitive mass range is now between 2 m µ and m τ − m µ .
In Fig. 8 we show the signal event selection efficiencies as functions of cτ X , for m X = 0.5, 1.0, and 1.5 GeV. These plots are similar to those of Fig. 6, except for some differences that mainly stem from the different mass choices, as well as different baseline efficiencies. Sensitivity results for Scenario 1 at 95% C.L. Top: model-independent limits on the branching-fraction product BR(τ → Xe)·BR(X → e − e + ) vs. cτ X for various ALP masses. Bottom left: model-dependent bounds from Belle and Belle II on g τ e vs. g ee for the prompt and displaced searches at Belle II for various ALP masses. Bottom right: bounds on the couplings vs. ALP mass in the specific case g τ e = g ee , compared with bounds from lepton universality in τ → lνν [55,92], τ → eγ [94], SN1987A [89], beam-dump experiments [46,47], and a dark-photon search [21,48]. Note that the lepton-universality constraint is only valid for long-lived ALPs, resulting in the upper cutoff of the corresponding region.
We present the final sensitivity estimates for Scenario 2 in Fig. 9 with the same format as in Fig. 7. Due to the similar efficiencies of the two scenarios, the branching-fraction bounds are similar. However, the g µµ sensitivity of scenario 2 is significantly enhanced, mainly because the width Γ(X → ll) is proportional to the lepton mass squared. As a result, in the case g = g τ µ = g µµ shown in the bottom-right plot, the displaced search can probe g values that are smaller by more than a factor of 10 than those of the leptonuniversality constraint. The measurement of Γ(τ → µνν)/Γ(τ → eνν) is larger than the SM expectation. Therefore, the presence of g τ µ and g µµ couplings will bring the prediction closer to the central value of the measurement, so that the valid parameter space satisfies R SM +X µe /R SM µe − 1 < 0.0072 + 2 × 0.004. In contrast to the case of Scenario 2, the presence Figure 8. Efficiencies for Scenario 2 (g τ µ and g µµ are non-zero). The format is the same as in Fig. 6, except for the different set of mass choices: m X = 0.5, 1.0, and 1.5 GeV. Figure 9. Sensitivity results for Scenario 2 (g τ µ and g µµ are non-zero). The format is the same as in Fig. 7, except for the different mass choices.
off g τ e , g ee in Scenario 1 will bring the prediction further away from the central value of the measurement, thus resulting in a more stringent constraint. The valid parameter space has to satisfy R SM +X eµ /R SM eµ − 1 < −0.0072 + 2 × 0.004. It implies that the region excluded in Scenario 1 is much larger than that of Scenario 2.
We note that, while we focused on X → µ + µ − decays that take place inside the tracker fiducial volume, the muon tracks can also be found when the decay takes place in the muon detector, which extends out to a distance of about 3.5 m from the IP. Naively, the somewhat larger decay distance allows for a somewhat stronger reach. However, since the muon detector has no magnetic field, it is impossible to determine the momenta of the muons in this case. This would likely result in high background levels that would make this approach less sensitive.

Other Scenarios
We discuss briefly some other coupling scenarios.
One possibility is to take the combinations (g τ e , g µµ ) or (g τ µ , g ee ) to be non-zero. However, these results would be rather similar to those of Scenario 2 and Scenario 1, respectively, except at the edges of phase space. This is because the production of the ALP is dominantly proportional to m 3 τ , and is not affected by the light lepton or X masses except at the kinematic threshold (cf. Eq. (2.3)). Another possibility is to take both g τ e = g τ µ and g ee = g µµ to be non-vanishing. In this case the ALP decay is dominated by the di-muon channel, because m µ m e , and the production rate is approximately doubled compared to that in Scenarios 1 and 2 with the same LFV coupling values. Therefore, the end results are similar to those of Scenario 2 presented here, except for a factor of 2 in the production rate. It is nevertheless worth mentioning that LFV muon decays such as µ → eγ can place strong constraints on this scenario.
We also consider the possibility of a non-zero g eµ . According to Eqs. (2.5) and (2.4), equal values of g eµ and g µµ lead to equal widths for X → e ± µ ∓ and X → µ + µ − except near the kinematic thresholds. Therefore, we expect the sensitivity for g eµ , together with, e.g., non-vanishing g τ µ , to be similar to that obtained for Scenario 2. We note that LFV muon-decay processes such as µ → e+ missing can strongly constrain g eµ for m X < m µ .
Finally, if the coupling g τ τ is non-zero, it would lead to ALP-strahlung production of the ALP by radiation off the τ (see Ref. [117] for an experimental search at BaBar with this production mechanism) and would loop-induce the decay X → γγ for the m X range considered in this work.

Conclusions
In this paper we investigated the sensitivities of the B-factory experiment Belle II to a leptophilic axion-like particle X with lepton-flavor-violating (LFV) couplings. The decay channels of interest are τ → Xl, where l = e, µ, with the ALP undergoing the decay X → l + l − . The ALP can also decay to a pair of photons via a lepton loop. We do not consider this decay as a search signature, but we do account for it when calculating the signal branching fractions and ALP lifetime. Depending on the ALP lifetime, its decay may be effectively prompt or may take place visibly away from the interaction point. Correspondingly, we studied the sensitivity of both a prompt search and a displaced-vertex search.
In the case of the prompt search, we recast the Belle search for τ decays into three charged leptons [58] to derive constraints on the couplings in the scenarios of interest. The displaced search involves identifying the displaced vertex originating from the long-lived ALP decay. In both cases we used simulated events to determine the efficiency, accounting for the size of the Belle II tracker and for the dependence of the efficiency on the ALP decay position.
We focused on two benchmark scenarios for detailed numerical results: Scenario 1 (g τ e and g ee are non-zero) and Scenario 2 (g τ µ and g µµ are non-zero), taking all other couplings to be vanishing. The signatures for the two scenarios are, respectively, τ → Xe, X → e − e + and τ → Xµ, X → µ − µ + . Since the prompt Belle search was background-free [58], we expect this to also be the case at Belle II, particularly with the displaced search. Therefore, we present 3-signal-event isocurves to indicate the Belle II sensitivity at 95% C.L. exclusion limits. As a function of the ALP mass, we show both model-independent limits on the branching-fraction product BR(τ → Xe)·BR(X → e − e + ) vs. cτ X and model-dependent limits on the ALP couplings. Generally, the displaced-vertex (prompt) search shows better sensitivity for small (large) values of g αα , corresponding to long (short) decay lengths of the ALP. For long decay lengths, the displaced-vertex search can extend the prompt search's sensitivity to the branching-fraction product by a factor of about 40, corresponding to about a factor of 6 in g τ l .
The model-dependent results are also presented, for the case of equal production and decay couplings, in the plane g = g τ α = g αα vs. m X . In this case, in Scenario 1, the proposed displaced search is about as sensitive as constraints obtained from lepton-flavor universality tests in τ → lνν. However, in Scenario 2, the displaced search is sensitive to coupling values that are more than an order of magnitude smaller than those of the lepton-universality constraints.
Generally, we find that Belle II can probe the LFV coupling g τ α down to below 10 −10 GeV −1 , with the best sensitivity reached for a different value of g αα in each scenario.