Looking forward to Lepton-flavor-violating ALPs

We assess the status of past and future experiments on lepton flavor violating (LFV) muon and tau decays into a light, invisible, axion-like particle (ALP), $a$. We propose a new experimental setup for MEG II, the MEGII-fwd, with a forward calorimeter placed downstream from the muon stopping target. Searching for $\mu \to e a$ decays MEGII-fwd is maximally sensitive to LFV ALPs, if these have nonzero couplings to right-handed leptons. The experimental set-up suppresses the (left-handed) Standard Model background in the forward direction by controlling the polarization purity of the muon beam. The reach of MEGII-fwd is compared with the present constraints, the reach of Mu3e and the Belle-II reach from $\tau \to \ell a$ decays. We show that a dedicated experimental campaign for LFV muon decays into ALPs at MEG II and Mu3e will be able to probe the ALP parameter space in an unexplored region well beyond the existing astrophysical constraints. We study the implications of these searches for representative LFV ALP models, where the presence of a light ALP is motivated by neutrino masses, the strong CP problem and/or the SM flavor puzzle. To this extent we discuss the majoron in low-scale seesaw setups and introduce the LFV QCD axion, the LFV axiflavon and the leptonic familon, paying particular attention to the cases where the LFV ALPs constitute cold dark matter.


Introduction
Probes of the Standard Model (SM) based on rare processes with charged leptons are set to improve substantially in the next decade. The muon beam experiments MEG II [1], Mu3e [2,3], COMET [4] and Mu2e [5] will collect unprecedented datasets using O(10 15 − 10 17 ) muons each. Similarly, Belle-II is expected to collect roughly 5 × 10 10 τ + τ − pairs [6], exceeding by more than an order of magnitude the datasets at Belle and BaBar. The standard New Physics (NP) targets for these experiments are rare lepton flavor violating (LFV) transitions 1 induced by dimension-6 NP operators with SM particles on the external legs. The NP operators are suppressed by the heavy NP scale Λ so that the corresponding LFV branching ratios scale as BR ∝ 1/Λ 4 . Assuming O(1) Wilson coefficients for the dimension-6 operators the reach on the scale Λ is expected to exceed 10 8 GeV during the ongoing experimental campaign [7]. This can be contrasted with LFV decays into a light axion-like particle (ALP), a, in which case the LFV experiments probe much higher NP scales. The µ → ea, τ → µa or τ → ea decays are induced by dimension-5 operators so that the LFV branching ratios scale as BR ∝ 1/f 2 a , where f a is the ALP decay constant. The projected bounds on LFV decays then translate to a reach on f a that, as we will show below, could exceed 10 10 GeV, assuming O(1) flavor violating couplings. These scales are among the highest we could probe with ground-based experiments and are well above the present astrophysical constraints induced by the coupling of the light ALP to electrons. This conclusion is the main result of our paper and is presented in Fig. 1.
More broadly, in this paper we summarize the status of LFV ALP searches. The experimental difficulty is that the → a decays look very similar to the SM decays, resulting in a single visible object plus missing energy (since the ALP is long-lived on detector scales in a large region of the allowed parameter space). As a consequence, the → a decays are not covered by the standard LFV searches and require dedicated experimental strategies/setups. We show that the experimental strategies to improve the coverage of the LFV ALP parameter space depends crucially i) on the ALP mass, and ii) on the chiral structure of the ALP couplings to the SM. For this reason we explore the full range of ALP masses from an effectively massless ALP with m a m e to a massive one up to m a m τ , as well as all the possible chiral structures of ALP couplings to the SM leptons.
At present, the best bounds on f a from the µ → ea decays give f a 10 9 GeV [8]. The reach can be substantially improved with the next generation of experiments, mainly due to the increased integrated luminosities. For instance, a combination of the experiment by Jodidio et al. from 1986 [9] and the TWIST experiment from 2015 [10] gives the best present bounds on BR(µ → ea) based on merely 10 7 − 10 8 stopped muons. This luminosity is at least seven orders of magnitude less than those expected at MEG II and Mu3e.
Taking full advantage of the available datasets will require adjusted experimental approaches. In Section 3.2 we put forward a new experimental strategy to improve the bound on BR(µ + → e + a) using MEG II. The main idea is to mimic the 1986 experiment by Jodidio et al., utilizing that µ + is polarized antiparallel to the beam direction, up to depolarization effects. We study the feasibility of a forward detector configuration in MEG II which we call MEGII-fwd ("fwd" for forward), where a calorimeter is placed in the forward direction relative to the muon beam. In this configuration the µ + → e + a decay can be detected by searching for a positron of maximal energy emitted in the direction opposite to the polarization of µ + . For a highly polarized muon beam the SM background from µ + → e + νν is strongly suppressed in this part of the phase space, while the µ + → e + a decay is allowed for an LFV ALP with nonzero right-handed couplings to the SM leptons. We estimate the reach of this setup for two weeks of dedicated run at MEG II and for different configurations of the magnetic field, which will be crucial to control the polarization of the muon beam and the positron yield in the forward direction. We also compare the sensitivity of our proposal to the one that could be obtained at Mu3e by performing an online analysis of the positron spectrum obtained from triggerless data acquisition [11]. Our conclusion is that the two experiments are complementary and, given their timelines, MEGII-fwd has a chance to explore new ALP parameter space several years before Mu3e.
Accessing this new portion of ALP parameter space could unveil connections between ALP Dark Matter (DM) and lepton flavor violation. Indeed, for ALP masses below 1-10 eV, the ALP is a viable DM candidate, if it is produced non-thermally through the misalignment mechanism. The reach on LFV coupling from MEGII-fwd and Mu3e is then complemented by on-going and future experiments sensitive to the couplings of ALP DM to photons and electrons.
While the bulk of our analysis is based on a model-independent parametrization of ALP couplings at low energies, we do explore the implications of the projected experimental sensitivities for a number of different NP scenarios, where the presence of a light ALP can be motivated by the strong CP problem, Dark Matter (DM), the SM flavor puzzle or neutrino masses.
First, we show how LFV decays arise naturally in QCD axion models of the DFSZ-type [12,13]. In these models the axion solves the strong CP problem which then connects the axion mass, m a , and decay constant, f a , giving m a ∝ 1/f a . Furthermore, such a QCD axion can make up for the total amount of DM in the Universe [14][15][16][17]. The future reach on µ + → e + a decays could explore new axion parameter space in the mass range 30 meV m a 0.4 meV, where the upper bound is given by astrophyical constraints. Interestingly, this high mass range requires either a non-standard cosmology [18] or special axion initial conditions [19] in order for the axion to be the DM. The same mass range presents severe experimental challenges for axion DM detection through flavor diagonal couplings. While these could be possibly overcome by ongoing axion-mediated force experiments [20] or new experimental ideas [21,22], the QCD axion could also well be first observed through its flavor violating couplings. For related studies where flavor violating couplings are (only or also) in the quark sector, see [23][24][25][26][27][28][29][30][31][32][33].
Second, we discuss the reach of LFV decays in the parameter space of the familon. This is the pseudo-Goldstone boson associated with the spontaneous breaking of the lepton flavor symmetry which could explain the charged lepton hierarchies via the Froggatt-Nielsen mechanism [34]. In this setup we show how the strength of LFV decays is correlated with the texture of neutrino masses. A similar construction could also simultaneously address the strong CP problem and the flavor puzzle in the quark sector as proposed in Refs. [24,33] (and was therefore dubbed as "axiflavon"). In the axiflavon setup we show how the flavor violation in the quark sector could be naturally suppressed in a U (2) flavor model similar to the one presented in Ref. [35]. This would leave LFV decays as the main experimental signature to hunt for. Similarly, our updated LFV sensitivities will probe the parameter space of axion or relaxion models which try to address the flavor puzzle [36][37][38][39][40].
As a final example of a well-motivated LFV ALP setup we discuss a class of majoron models where the breaking of lepton family number is decoupled from the spontaneous breaking of the lepton number [41]. In this context the future reach of Mu3e and Belle II will explore new parameter space beyond the present astrophysical bounds.
The paper is organized as follows. We start by setting up the notation in Sec. 2, followed by the discussion of µ → ea searches in Sec 3, with section Sec. 3.1 devoted to the review of past searches, while the future proposals, both our proposal for MEGII-fwd as well as the prospects at Mu3e, are presented in Sec. 3.2. Comments on µ → e conversion are given in Sec. 3.3. Sec. 4 contains a short summary of searches for µ → eγa decays, while in Sec. 5 we summarize the bounds and prospects for tau decays. In Sec. 6.1 we discuss the astrophysical constraints coming from star cooling and SN1987a, comparing these with the reach of LFV decays. In Sec. 7 we discuss several models where LFV violation arises naturally: in Sec. 7.1 the LFV DFSZ axion, in Sec. 7.2 the LFV axiflavon, in Sec. 7.3 the leptonic familon, and in Sec. 7.4 the majoron. Finally, our conclusions are presented in Sec. 8.

Notation and Summary
The ALP is a (pseudo-) Nambu-Goldstone boson (PNGB) and couples derivatively to the SM fermions. The interaction Lagrangian is thus given by where C A i i is a real diagonal matrix 2 and C V,A i j are hermitian matrices in flavor space, while the summation is over i, j = 1, 2, 3, and we ignore possible axion couplings to quarks. For ALP mass we take m a < m τ , where m a could be well below the electron mass. The decay constant f a is related to the spontaneous breaking scale of the symmetry the ALP is associated with. We do not assume any relations between the couplings in Eq. (2.1), and discuss the experimental bounds and prospects separately. For these 6+3 couplings we also introduce the short-hand notation (2.2)  Figure 1. Summary of the present bounds and future projections for an ALP with generic couplings to leptons, i.e., we set C V = C A = 1 for all the couplings in Eq. (2.1). For the isotropic case we set C V µe = 0 and C A µe = 1 (the opposite choice leads to the same results). In the V ± A case we set C V µe = ±C A µe = 1. The gray shaded regions are excluded by the astrophysical bounds from star cooling due to C A ee and by SN1987A due to C A ee and C A µµ , see Sec. 6.1. We present these bounds for the isotropic case. The blue shaded region corresponds to a prompt/displaced ALP. The green solid line is the exclusion due to the bound on µ + → e + a by Jodidio et al., assuming an isotropic ALP [9]. The green dotted (dashed) line is our recast of this bound for the V − A (V + A) case. The sensitivity in the V − A case is worse since then the signal is suppressed in the forward direction as much as the background. The blue solid (dotted, dashed) lines are the bounds from the TWIST experiment on isotropic (V − A, V + A) ALP [10]. The dark orange thin solid line is the MEGII-fwd projection for an isotropic ALP with no magnetic focusing while for the orange thin solid line we assumed that focusing increases the luminosity in the forward direction by a factor of 100, cf. Sec. 3.2 for details. The dark red thin solid line is the Mu3e projection from [42], for the isotropic ALP. The sensitivity for the other chiral structures is expected to be similar since there is no background suppression in this setup. The purple solid line is the bound from the τ → µa search by the ARGUS collaboration [43], and does not dependent on the chirality of the ALP couplings. The purple thin line is the projected reach at Belle-II, see Sec. 5 for details. The bound on µ + → e + aγ from Crystal Box is subdominant, see Sec. 4, and is not displayed.
When kinematically allowed, the couplings in Eq. (2.1) give rise to LFV decays with the (invisible) ALP in the final state. 3 The corresponding total decay width is given by 3) For comparison we also show the bound on µ → ea from SN1987A derived in Sec. 6.1 even though it is subdominant relative to the existing bounds from ground based experiments. Right: The only nonzero coupling is C τ µ = 1. The plot for C τ e = 1 is similar, and is not displayed for brevity.
where for simplicity we neglected the mass of the final-state lepton. The differential decay rate reads (in the same m j = 0 limit) where θ is the angle between the polarization vector,η, of the decaying lepton i and the momentum of the final state lepton j , while P i is the polarization of the decaying leptons. The convention used for P i is such that for the phenomenologically most important case of µ + → e + a decays we have P i =η ·ẑ, whereẑ is the beam axis. The µ + are predominantly polarized antiparallel to the beam direction, thus P i < 0 and θ is the angle between −ẑ and the momentum of the positron, cf. Fig. 4 (left). The total width of the ALP can be computed as a function of its mass by summing the different partial decay widths (2.5) Since we restricted the ALP mass to m a < m τ , only the decays to photons, neutrinos, electrons, and possibly muons are kinematically open. The corresponding partial decay widths are that will be testable in future large scale structure surveys [50,51]. µ → e a 2.6 × 10 −6 * F µe (V or A) 4.8 × 10 9 Jodidio at al. [9] µ → e a 2.5 × 10 −6 * F µe (V + A) 4.9 × 10 9 Jodidio et al. [9] µ → e a 5.8 µ → e a γ 1.  Table, with future projections listed in the lower part. The bounds assume m a below the mass resolution of the experiments considered here (see Fig. 1 for modifications when m a is sizable). These follow from 90% C.L. ( * ) and 95% C.L. ( * * ) bounds on branching ratios in the 2nd column, rescaled using Z 95 /Z 90 = 1.3 when necessary. The MEGII-fwd projections are obtained for two different sets of assumptions: MEGII-fwd assumes δx e = 10 −2 and P µ − 1 = 10 −2 with no focusing, while MEGII-fwd in contrast sets the focusing to F = 100, roughly what was achieved in the 1986 experiment by Jodidio et al [9], cf. Sec. 3.2 for details. The Belle II projection for τ → µa is rescaled from the Belle MC simulation in Ref. [48], while the one for τ → ea is rescaled directly from the ARGUS result [43]. ( # ) The Crystal Box bound on F µe can vary between (5.1 − 8.3) × 10 8 GeV depending on the assumed positron energy loss, cf. Eq. (4.9).
where z ± = 1 − (m i ± m j ) 2 /m 2 a , so that for m a m i , j we have z ± → 1. In the limit m i = m j the result reduces to (2.8) The ALP decays to neutrinos are often suppressed, so that in the bulk of the paper we set Γ(a → ν i ν j ) = 0 (the majoron is an important exception, see Sec. 7.4). The coupling to photons, E eff , depends on the UV physics as well as on the IR derivative couplings of ALP to the SM leptons running in the loop, Here, τ f = 4m 2 f /m 2 a − i , and the summation is over the SM leptons, f = e, µ, τ . Note that the loop function in (2.9) tends to B(∞) = 0 for heavy fermions, and thus the contributions due to the derivative ALP couplings decouple in the heavy fermion limit. The anomaly contribution is encoded in the Wilson coefficient, E UV , and depends on the structure of the UV model. We use the the following normalization for the effective ALP-photon Lagrangian 4 such that for the QCD axion E UV = E/2N , where E and N are the electromagnetic and color anomaly coefficients of the Peccei-Quinn symmetry. For example, in the DFSZ-II model for the QCD axion [12,13], in which the charged leptons couple to the same Higgs as the up-quarks, one has E UV = 1/3. In Sec. 7 below, we give four explicit examples of LFV ALP models. For these we have E UV = {2/3, 10/9, (10 ÷ 24), 0} for the LFV QCD axion (Sec. 7.1), the LFV axiflavon (Sec. 7.2), the anarchic LFV familon (Sec. 7.3) and the majoron, (Sec. 7.4), respectively. If the ALP also couples to quarks and gluons, there are additional contributions to the effective photon coupling in Eq. (2.9), both from heavy quarks as well as from pions running in the loop (see Ref. [52] for complete expressions). From now on we fix E UV = 1, unless specified otherwise, since its precise value does not affect most of the physics discussed in this paper. As shown in Fig. 1, we focus in this paper on the region of parameter space where the ALP is long-lived on detector scales. As we will motivate extensively in Sec. 7, we believe that this region is the most appealing from a theoretical perspective. For the discussion of phenomenologically interesting decay channels in the displaced and prompt regions, we refer the reader to Refs. [53][54][55] and to the recent MEG limit on LFV light particles decaying to two photons, µ → eX, X → γγ [56]. ALP masses in the range 2m e ≤ m a ≤ m µ − m e are constrained by the search for the µ → 3e decays at the SINDRUM experiment [57]. Indeed, a tiny fraction of ALPs from prompt µ → ea decays leads to a → e + e − decays within the detector, even for large ALP decay lengths. Assuming 1 cm for the effective decay length measurable in the instrumented volume, the SINDRUM upper bound of BR(µ → 3e) ≤ 10 −12 is expected to constrain f a to be above 10 7 GeV for C µe = C ee = 1, which is much weaker than the other bounds discussed here. The bound on f a could be substantially improved by the Mu3e experiment [58] given the larger luminosity. However, both the precise SINDRUM exclusion and the Mu3e projection would require to take into account the effect of further experimental cuts on the vertex quality and it is beyond the scope of this paper. A complementary probe of LFV couplings in the region of heavier ALP masses is the muonium-antimuonium oscillation, which would be induced by s-and t-channel exchanges of ALPs with couplings to µe. While the resulting bounds 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 5 10 6 10 7 10 8 10 9 10 10   Table 1. On the lower axis we indicate the corresponding values for the effective axion mass defined by m i,eff = 4.7 eV × 10 6 GeV/F i . on f a are several order of magnitude below the ranges shown in Figs. 1 and 2, 5 they are stringent enough to effectively rule out the LFV ALP explanations of possible deviations in (g − 2) e and (g − 2) µ [54,55,60].
In the numerical analyses throughout the paper all the axion couplings are assumed to be real to simplify the discussion. The interpretations of the present LFV experimental results and future projections in terms of bounds on F i j are summarized in Fig. 1, assuming all the lepton couplings in Eq. (2.1) to be O(1). Fig. 2 shows instead the same constraints for the case when only a single LFV coupling is taken to be nonzero. In Figs. 1 and 2 we also show the typical reach of astrophysical bounds on the ALP decay constant coming from star cooling and SN1987A observations (see Sec. 6.1 for details). In Table 1 and Fig. 3 we summarize the current best bounds and future projections for an effectively massless m a , i.e. lighter than the typical mass resolution of the experiments considered here. This is the ALP mass range that applies to most of the concrete models discussed in Sec. 7. In the subsequent sections we discuss in detail the observables and the experiments from which these constraints were derived.
The constraints for light ALP, ma mµ, are obtained by taking mµ/ma → 1 in the two expressions above (see also similar results for heavy meson mixings in the limit of light ALP in Ref. [23]). In the future these bounds could be improved for ma few GeV at Belle II by searching for e + e − → eeµµ events [60].
3 ALPs in µ + → e + + invis. decays We first summarize the status and prospects to search for the two body µ + → e + a rare decays, where a is invisible, i.e., it decays outside the detector. The challenge of this measurement is to distinguish µ + → e + a from the background distribution of the SM µ + → e + νν decay. The µ + → e + a decay produces a monochromatic positron line in the muon rest frame at the positron momentum 1) or in terms of the positron energy, E line e m µ /2 for m a m µ . The angular distribution of the positrons depends on the initial muon polarization and the chiral structure of the ALP interactions. We discuss three representative cases that lead to distinct angular distributions: • the isotropic ALP has either C V µe = 0 or C A µe = 0. The angular distribution of the final state positrons is isotropic in the muon rest frame and independent of the muon polarization, cf. Eq. (2.4).
• the left/right-handed ALP couples only to the left/right-handed SM fermions, i.e., C V µe = −C A µe for the left-handed and C V µe = +C V µe for the right-handed ALP. The positron angular distribution is ∝ (1 ∓ P µ cos θ) for the left(right)-handed ALP.
The SM µ + → e + ν eνµ three-body decay proceeds through an off-shell W + and produces the so-called Michel spectrum with Γ µ m 5 µ G 2 F /(192π 3 ) = 3 × 10 −10 eV the total muon decay width, and θ the angle between muon polarization vector and the positron momentum in the muon rest frame, see Fig. 4 (left). The positron energy fraction x e = 2E e /m µ takes the values 0 ≤ x e ≤ 1 (when neglecting the positron mass). In writing Eq. (3.2) the NP scale was taken to be well above the weak scale, as is the case for NP models we are interested in, so that the three-body muon decay is the SM one. For further details on the SM muon properties we refer the reader to the two excellent reviews, Refs. [61,62].
For an unpolarized muon beam, P µ = 0, the SM background in Eq. (3.2) peaks at E line e = m µ /2 which corresponds to x e = 1. That is, the peak of the SM background for unpolarized muons coincides with the µ + → e + a positron line for a massless ALP. Luckily, this is not the situation encountered at the intense muon facilities. The muon flux in low energy muon beamlines such as those at TRIUMF or PSI is dominated by the muons produced from pions decaying at rest, at the surface of the production target. These muons are 100% polarized in the direction opposite to the muon momentum, i.e., P µ = −1 in the notation of Eq. (3.2). As a consequence, muon facilities produce very intense µ + fluxes of  almost 100% polarized muons. The final polarization at the stopping point varies between 80% and 100% depending on the size of the depolarization effects at the production point, during the propagation and at the stopping target. Polarization of the muon can significantly reduce the SM background. This is illustrated in Fig. 4 (right), which shows the Michel spectra as functions of x e for fixed P µ = −1 and three representative values of cosθ e = 1, 0.8, 0.6, where θ e ≡ π − θ is the angle between the positron momentum and the muon beamline, see Fig. 4 (left). 6 For 0 < cosθ e ≤ 1 the non-zero polarization moves the position of the maximum of the Michel spectrum to i.e., away from the massless ALP positron line, while for cosθ e ≤ 0 it remains at x max = 1. The SM decay rate at the position of the massless ALP positron line, x e = 1, is 6 Technically, the Michel spectra were integrated over small angular bins, cosθe ± δcos θe, where δθe = 5 × 10 −3 , which is the typical angular resolution of these experiments.
d dx e BR(µ → e ν eνµ )| xe=1,θe=θe = 2δcos θ e (1 + P µ cosθ e ) (3.4) The SM background for fully polarized muons is exactly zero for cosθ e = 1, i.e., for positrons emitted in the forward direction relative to the muon beam, up to terms quadratic in the angular resolution. However, in order to have appreciable signal rates, one cannot work in the the exact forward limit, cosθ e = 1, but need to accept events in some range aroundθ e = 0, which also makes the SM background nonzero. Furthermore, any suppression in the average muon polarization increases the SM background linearly with δP µ = P µ + 1. For the ALP positron line the background suppression is linear in the momentum uncertainty, δx e , at least within the naive assumption of Gaussian smearing, making this uncertainty dominant compared to the one on the angle, δθ e .
With judicious choice of cuts one can optimize the signal to background ratios. The only two discriminants between the SM background and the ALP signal are the momentum of the final state positron, p e , and the angle θ e . The reach of different µ + → e + a searches can be understood schematically in the cut-and-count scheme, giving where Z = 1.28(1.64) for 90% (95%) C.L. intervals, taking the limit of Gaussian statistics and using limits for one-sided intervals. Here, N µ + is the number of µ + at a given experiment, a bkd (a sig ) is the background acceptance (signal efficiency), while sys encodes the systematic uncertainties. The a bkd depends linearly on the momentum resolution, assuming the background is roughly constant in a given momentum bin. Whenever sys 1/ N µ + the reach in the branching ratio saturates independently on the muon luminosity. Fig. 4 (right) summarizes the experimentally accessible regions that can be used for µ + → e + a searches. Due to the momentum threshold for soft positrons the ALPs with masses above m a 95 MeV are unaccessible. For massive ALPs lighter than 95 MeV a standard bump hunt over the Michel spectrum can be performed, after the instrument is calibrated with the Michel spectrum endpoint. For an almost massless ALP, i.e., for masses below the momentum resolution of the experiment the positron line gets close to the endpoint of the Michel spectrum. As a consequence, a shift in the endpoint of the SM spectrum due to the momentum resolution or other uncertainties on the experimental setup would result in a spike which would be impossible to distinguish from the signal. Reducing the large systematics on the endpoint of the Michel spectrum then requires alternative ways to calibrate the instrument in this mass region. In Fig. 4 we show the two mass regions where this is required for the momentum resolutions of the two TRIUMF experiments we describe here.
In Sec. 3.1 we first review the two experimental searches which were performed at TRI-UMF: i) the 1986 experiment by Jodidio et al. [9] and ii) the 2015 TWIST experiment [10]. The two searches had very different philosophies. The setup by Jodidio et al. tried to suppress the SM background as much as possible, so that a bkd a sig in Eq. (3.5). This was achieved by using a highly polarized muon beam and measuring in the forward region with a tight angular cut and excellent momentum resolution. Crucially, magnetic focusing of the positrons improves the signal acceptance above the naive geometric one, and the sensitivity in Eq. (3.5) is maximized. The TWIST experiment relies instead on larger integrated luminosity and uses wide angular acceptance. As we show below, combining the results of the two experiments gives complete coverage of different ALP masses and chiral structures of couplings for leptons.
In Sec. 3.2 we discuss the future prospects for µ + → e + a at PSI: at MEG II and Mu3e [58]. MEG II is set to soon start its physics run, with the primary goal to improve the sensitivity to µ + → e + γ decays. The primary goal of Mu3e is to improve on µ + → e + e − e + , with data taking scheduled to start in the relatively near future. Both experiments use the πE5 beamline which provides roughly 10 8 µ/sec with muon momenta of 28 MeV, and can also be used for µ + → e + a searches. The two experimental proposals to hunt for µ + → e + a, MEGII-fwd that is part of this paper, and Mu3e-online, are very different in spirit and could be complementary given that Mu3e will presumably start the physics run after MEG II will have collected the planned muon luminosity.

Past Searches at TRIUMF: Jodidio et al. and TWIST
The 1986 experiment by Jodidio et al. [9] used two datasets, the "spin held" sample of 1.8 × 10 7 µ + , and "spin precessed" sample of 1.4 × 10 7 µ + . The two datasets both passed the trigger requirements, but differed in the magnetic field configurations that were used in the experiment. The spin held sample had much higher purity of polarized muons and was used to perform the µ + → e + a search. The spin precessed sample was used to calibrate the end point of the Michel spectrum in order to reduce the systematic uncertainty. The extremely high purity of polarized muons in the spin held sample was achieved through the use of high purity metal foils as targets, which highly suppressed the muonium production, while the strong magnetic field of 1.1 T parallel to the muon beam line suppressed the muon spin precession. The measured averaged muon polarization at the stopping point was where we assumed the SM values for the muon decaying parameters and combined in quadrature the statistical and systematic uncertainties. The µ + → e + a analysis searched for a positron line in the "spin held" data. The cuts cos θ e > 0.975 , x e > 0.97 , cos θ e 1, were measured downstream in the spectrometer after they bent by more than 90 • by a magnetic field. First, we check how well the simple cut-and-count scheme reproduces the bound derived by the experimental collaboration for the massless isotropic ALP, C V µe = 0 or C A µe = 0 : BR(µ → e a) < 2.6 × 10 −6 (90% CL) [9]. (3.9) This will prove useful when deducing the projected sensitivities in the next section. Applying the cuts (3.8) on the Michel spectrum, Eq. (3.2), gives a geo = 5.3 × 10 −5 for the geometric acceptance of the background. Moreover, for a massless ALP, the number of background events in a ±2σ band around x e = 1 can be estimated by integrating the distribution provided in Ref. [9], accounting for the quoted momentum resolution δp e p e = 0.13%. (3.10) This leads to a background efficiency of bkd = 1.7 × 10 −2 , i.e., the fraction of background events satisfying (3.8) that are in the signal region x e ∈ [1−2δx e , 1+2δx e ]. The background acceptance in the signal region of the initial spin held sample is thus a bkd = a geo · bkd = 8.7× 10 −7 . Applying instead the 2σ band cut around x e 1 directly on the Michel spectrum, Eq. (3.2), gives a similar estimate for the background acceptance, a bkd 10 −6 . This shows that using the simple analytical Michel spectrum is good enough for the purposes of our estimates in the next section. In order to obtain the correct limits we still need to take into account the effect of magnetic focusing. After magnetic focusing and the cuts in Eq. (3.8) the total number of observed events in the spin held sample is 7.4 × 10 4 (obtained by integrating the distribution published in Ref. [9]). Assuming that these events are entirely due to the SM, and neglecting the focusing, would require N µ + 1.4 × 10 9 in the initial spin held sample, while in reality N µ + = 1.8 × 10 7 . The magnetic focusing therefore leads to effectively larger geometric acceptance, 7 F ≡ a geo+focus a geo = 77.8. (3.11) The efficiency for the ALP signal in this experimental setup depends very much on the helicity structure of the ALP couplings to the SM current and on the ALP mass. For a massless isotropic ALP, i.e., m a m e and C V = 0 or C A = 0, we find a ISO geo = 1.25×10 −2 after the angular cut in Eq. (3.8). Assuming that the focusing lens act similarly on background and signal we find a ISO geo+focus = 0.97. The experimental analysis also assigns sys = 0.9 × 10 −6 systematic uncertainty on the branching ratio for an isotropic ALP. Given the numbers above, Eq. (3.5) then gives our recast bound of 2.8 × 10 −6 , which agrees quite well with the result of the fit in Ref. [9]. The agreement makes us confident that we understand the main features of the experimental setup. The above exercise also highlights that the systematic uncertainty in the experiment by Jodidio et al. was smaller than the statistical one, and thus the bound on BR(µ → ea) would have benefitted from a factor of 10 bigger muon luminosity before hitting the bottleneck of systematics.
Next, we recast the result by Jodidio et al. for different chiral structures of ALP couplings and for higher ALP masses. We assume that there are no changes in systematic uncertainties and the solenoid focusing effect. The ratios of signal acceptances are then obtained by simply applying the angular cut in Eq. (3.8) to the angular distribution predicted by the modified ALP coupling and mass, and compare it to the baseline massless isotropic ALP. We obtain a RH geo /a ISO geo 2 and a LH geo /a ISO geo 0.012 for the signal acceptances of righthanded (RH) ALP and left-handed (LH) ALP relative to the isotropic case, respectively. Rescaling the bound in Eq. (3.9) gives for the massless right-handed ALP BR(µ → e a) < 2.5 × 10 −6 (90% CL) . For pure left-handed ALP the systematic uncertainties related to the determination of the background endpoint are expected to grow significantly. The reason is that in this case the signal is suppressed more in the spin held dataset than in the spin precessed one. However, in the analysis [9] the latter was assumed to not be affected by the NP signal, in order to calibrate the instrument. Assuming that this is indeed still the case gives the green dashed line in Fig. 1, which should be viewed as only roughly indicative of what the correct bound for the left-handed ALP is.
The 2015 TWIST experiment [10] collected 5.8×10 8 muons after the selection cuts were applied. The experimental collaboration studied the µ + → e + a decay, varying both the mass of the ALP and the chirality of its couplings to the SM. Their results are summarized with the blue lines in Fig. 1.
The experimental concept of TWIST is fundamentally different from the previously discussed experiment by Jodidio et al. TWIST detected positrons using a spectrometer with an approximate cylindrical symmetry surrounding the muon beam line, and momen-tum resolution at x e 1 given by, which correspond to δp e = 572 keV/ sin θ e at p e = 52 MeV. Since the momentum resolution deteriorates in the forward direction (where sin θ e → 0), the experimental strategy of Jodidio et al. cannot be implemented, even though the TWIST muon beam is highly polarized. The momentum resolution in Eq. (3.10) translates into an upper bound below which the ALP can be considered effectively massless m a 2m µ · 1MeV 14 MeV . (3.14) The 1 MeV was obtained by setting cos θ e = 0.8 in Eq. (3.10), which is the positron angle for which TWIST had the widest momentum acceptance. As shown in Fig. 1, for a massive ALP the TWIST search covers a region that was left unexplored by the 1986 Jodidio et al. result, and extends the coverage up to an ALP mass of 86.6 MeV. For higher masses the positron becomes too soft to be efficiently triggered on at TWIST. In this region older searches by Derenzo et al. [65] and Bilger et al. [66] and the recent PIENU result [67] complement the high mass coverage. We refer to the PIENU paper [67] for a summary plot focused on the high mass region.
TWIST becomes less sensitive for a massless ALP due to the systematic uncertainties related to the calibration of the x e = 1 endpoint, which limited the sensitivity. Unlike Jodidio et al., TWIST collaboration had to rely on Monte Carlo modeling to calibrate the endpoint. The quoted systematic on the momentum edge represented an irreducible bottleneck to improving sensitivity of the massless ALP search. The 90% C.L. upper bounds for massless ALP by TWIST, extracted using the Feldman-Cousins method [68], are Only the last bound, on the left-handed ALP, is stronger than our recast of the previous bound from Jodidio et al. experiment, which, as discussed before, was less sensitive to left-handed ALPs. The gain of the TWIST experiment is due to the much larger muon luminosity compared to the one available to Jodidio et al., which compensates for the worse momentum resolution at TWIST.

New searches at PSI: MEGII-fwd and Mu3e-online
A proposal for MEGII-fwd. The MEG II experiment is expected to start its physics runs soon after completion of the engineering run that started in 2019. The goal of our proposal is to achieve at MEG II a configuration similar to the one used in the experiment by Jodidio et al. [9], so that the µ + → e + a decays of polarized muons are detected in the forward region where the SM background is suppressed, see Fig. 4 and discussion in Sec. 3.1. Such a set-up requires:

cm
< l a t e x i t s h a 1 _ b a s e 6 4 = " s U / p W a 1 p 7 s k 5

l a t e x i t s h a 1 _ b a s e 6 4 = " u I 3 2 V C u H p / I I 4 w H 5 R G l n U a 2 b S O Q = " > A A A C C H i c d V D L S s N A F J 3 U V 6 2 v q k s 3 g 0 V w o S U J s e 1 G K L p x W c E + o A 1 l M p m 0 Q 2 e S M D M R S u g P u H S r H + F O 3 P o X f o M / 4 a S N Y E U P X D i c c y / 3 3 u P F j E p l m h 9 G Y W V 1 b X 2 j u F n a 2 t 7 Z 3 S v v H 3 R k l A h M 2 j h i k e h 5 S B J G Q 9 J W V D H S i w V B 3 G O k 6 0 2 u M 7 9 7 T 4 S k U X i n p j F x O R q F N K A Y K S 3 1 W s M B T y 7 P r W G 5 Y l Z r j n 1 h O 9 C s m n N k x L J r 9 Q a 0 c q U C c r S G 5 c + B H + G E k 1 B h h q T s W 2 a s 3 B Q J R T E j s 9 I g k S R G e I J G p K 9 p i D i R b j q / d w Z P t O L D I B K 6 Q g X n 6 s + J F H E p p 9 z T n R y p s f z t Z e J f X j 9 R Q c N N a R g n i o R 4 s S h I G F Q R z J 6 H P h U E K z b V B G F B 9 a 0 Q j 5 F A W O m I l r Z 4 H j + T + r E x 8 W c l n c 5 3 B P B / 0 r G r l l N 1 b p 1 K 8 y r P q Q i O w D E 4 B R a o g y a 4 A S 3 Q B h g w 8 A i e w L P x Y L w Y r 8 b b o r V g 5 D O H Y A n G + x c s x Z n W < / l a t e x i t >
µ + < l a t e x i t s h a 1 _ b a s e 6 4 = " q 9 l L v L P Figure 5. The proposed MEGII-fwd set-up. A Lyso-ECAL detector of 10 cm in diameter is placed along the muon beam line 1.5 m downstream from the stopping point. The muon polarization P µ is in the opposite direction than the detected positron.

l a t e x i t s h a 1 _ b a s e 6 4 = " d + H l L r s U M / U 3 W + 6 P a + A 3 j 5 S 6 n w s = " > A A A C C H i c b V D L S s N A F J 3 4 r P V V d e k m W A Q X U h I p 6 L L o x m U F + 4 A 2 l M n k p h k 6 m Y S Z G 6 G E / o B L t / o R 7 s S t f + E 3 + B N O 2 y x s 6 4 E L h 3 P u 5 d 5 7 / F R w j Y 7 z b a 2 t b 2 x u b Z d 2 y r t 7 + w e H l a P j t k 4 y x a D F E p G o r k 8 1 C C 6 h h R w F d F M F N P Y F d P z R 3 d T v P I H S P J G P O
• A forward detector to collect energetic forward positrons. The final sensitivity to µ + → e + a decays will depend on the energy resolution of this detector. We assume that a Lyso calorimeter (Lyso-ECAL) with a 10 cm diameter and high enough threshold for positron energies can be installed in the forward direction roughly 1.5 meters downstream from the muon stopping target, see At a similar position the MEG II collaboration planned to put a Radiative Decay Counter (RDC), with the aim to reduce the accidental background for the search of µ + → e + γ. However, RDC is designed to detect low-momenta positrons and would not be useful for the µ + → e + a search.
• A new configuration of the MEG II magnetic field in order to suppress depolarization effects and keep the µ + antiparallel to the outgoing positron. The two main sources of depolarization of the muon beam are the so called "halo muons", emitted from pions decay in flight, and the angular divergence of the beam. How close the polarization can be kept to the maximal one, P µ = −1 is crucial here, as this controls the suppression of the SM background, see (3.4), which directly affects the sensitivity to In what follows, we vary the depolarization in the range, In the predecessor experiment, the MEG experiment, the muon polarization at the stopping target was P µ = −0.86 ± 0.06 [70], which corresponds to the upper limit of the above range. The lower limit of the range assumes that the depolarization strategy similar to the one used in the 1998 experiment by Jodidio et al., Ref. [9], can be put in place. Collecting in addition a less pure sample of polarized muons would help in calibrating the endpoint of the Michel spectrum, x e = 1, exactly as it was done in Ref. [9]. 8 We are aware that the use of such a calorimeter for calibration purposes was discussed inside the MEG II collaboration. A momentum resolution δp e + /p e + = 10 −2 at p e + = mµ/2 is realistic for this ECAL [69]. We thank Angela Papa for private communications regarding this.
• A focusing lens to increase the positron luminosity in the forward direction. In the experiment by Jodidio et al., a solenoid lens was used to maximize the signal acceptance at the price of a higher SM background, cf. Sec. 3.1. Focusing enlarges the effective size of the forward detector. The angular coverage of the Lyso-ECAL is very small so that without focusing the geometric acceptance of the signal is only, for isotropic and V + A ALP, respectively. The reach on the branching ratio scales with the focusing factor F as BR(µ + → e + a) ∼ 1/ √ F as long as F ×a geo < 1 and the systematics uncertainties are subdominant. In the projections we leave F as a free parameter, noting that F ∼ O(10 2 ) was reached in the 1986 experiment by Jodidio et al., cf. Eq. (3.11).
• Sufficient time devoted to the physics run in the µ + → e + a search configuration. As a reference point we will use which corresponds to a 2 week run at an instantaneous luminosity of 10 8 µ + /sec. 9 This is seven orders of magnitude greater than the dataset collected by Jodidio et al. Crucially, the µ → ea run could be performed at the very end of the µ → eγ data taking at MEGII. This would allow to work on the necessary modifications and extend the physics purpose of MEG II before Mu3e starts.
In the left panel of Fig. 6 we show how the reach on the branching ratio BR(µ + → e + a) for a massless ALP depends on the average polarization and the angular resolution. Interestingly, from the shape of the contours one sees that augmenting the polarization purity of the muon beam should go together with an increase of the momentum resolution in order to lead to a better experimental reach. The orange star in the plot is one of our benchmark configurations, where we assume no focusing in the forward direction and set δx e = P µ + 1 = 10 −2 . Already in these suboptimal conditions, MEGII-fwd could improve on the present bound from Jodidio et al.
In the right panel we set δx e = P µ +1 and investigate the importance of the focusing. To be conservative we take as the benchmark F = 100, which corresponds to an effective diameter of the forward calorimeter roughly 10 times bigger than its actual size. This is of the same order of magnitude as the focusing achieved in Ref. [9]. However, a larger focusing would always be beneficial until F × a geo would be of order unity.
In Tab. 1 and Fig. 1 we show the estimated reach of MEGII-fwd for the two benchmark configurations described above. The reach for higher m a , shown in Fig. 1, is computed accounting for the amount of signal which will overlap with the massless hypothesis given the experimental resolution. Due to worse momentum resolution the drop in the f a reach at high masses is slower at MEGII-fwd compared to the Jodidio et al. experiment. 9 MEG II plans to run with the reduced instantaneous luminosity of 3 × 10 7 µ + /sec in order to decrease accidental coincidences, which is the dominant background in the µ + → e + γ measurement. For the µ + → e + a search one should attempt to use the full instantaneous luminosity of 10 8 µ + /sec available at the πE5 beamline at PSI.
at MEGII-fwd for a massless isotropic ALP (either C V µe = 0 or C A µe = 0) as a function of the momentum resolution δx e , and the deviation of average polarization P µ from −1, assuming there is no magnetic focusing. Right: The expected sensitivity at MEGII-fwd (orange contours) as a function of the momentum resolution δx e and focusing F , setting muon polarization to P µ = −1 + δx e . The angular resolution is assumed to be subdominant and systematic uncertainties below the statistical one. The darker green region are excluded by the 1986 Jodidio et al. experiment [9], cf. Table 1 Mu3e-online. The primary goal of the planned Mu3e experiment at PSI is to search for µ + → e + e − e + with the unprecedented sensitivity of 10 −16 . The key feature of Mu3e is that it will operate without a hardware trigger. The full detector read-out will be streamed to the filter farm at 80 GB/sec, where the µ + → e + e − e + events will be identified and eventually stored on disk.
Recently, Ref. [11] performed a preliminary study of the Mu3e sensitivity to BR(µ + → e + a), based on a more detailed simulation of the µ + → e + a channel at the phase I of the Mu3e experiment [42]. The dark thin red line in Fig. 1 shows the 95% C.L. limit on f a for isotropic LFV ALP that Mu3e is projected to achieve with a physics run of 300 days. The ∼ 10 4 improvement in f a reach over the TWIST experiment is mostly driven by the seven orders of magnitude larger dataset, which, however, does not come without challenges.
For µ + → e + a search the Mu3e will be able to reconstruct online all the single positron events corresponding to "short tracks", i.e., events with only four hits on the four detector layers. The online reconstruction of "short tracks" in the filter farm has been shown to reduce by a factor of a 100 the data rate [71], making it possible to process all the short tracks events and store positron three-momenta, p e . A search for BR(µ + → e + a) positron line can be done as a bump hunt on the smooth SM | p e | distribution, assuming that every positron event corresponds to a single track.
The tracks in Mu3e will gyrate around the magnetic field of roughly 1 T. The typical e + gyroradius will be much larger than the radius of the Mu3e instrumented region -a cylinder with radius of around 6 cm. Encountering the detector material the positron will loose momentum and eventually stop. The positron will typically stop after half a turn, i.e., after having encountered at least four detector layers: 2 central layers + 2 external layers. This justifies the assumption of one positron per one track. Positrons emitted perpendicularly to the muon beam will instead perform many turns in the central layers without being stopped in the detector material. Enforcing an angular cut will minimize the impact of these re-curling tracks. As a result, the angular acceptance of the Mu3e analysis is expected to be reduced to the region θ e < π/2 − 0.1.
The momentum resolution for short tracks will be in the δp short = 0.5 − 3 MeV range, roughly comparable to TWIST. In principle, a momentum resolution down to δp long = 0.1 − 0.45 MeV could be achieved, if an experimental upgrade allowed to also process online the "long tracks", characterized by 6 or 8 hits in the detector layers. The Mu3e sensitivity on BR(µ + → e + a) extracted from the long track analysis would then improve by a factor of δp long /δp short 0.4 [42].
Finally, we comment on the challenges of calibrating the instrument. As already mentioned, the current concept foresees to use the endpoint of the positron momentum spectrum to calibrate the online reconstructed tracks. This method will not allow to efficiently search for µ + → e + a decays when m a 10 − 25 MeV, depending on the precise positron momentum resolution. Given the large amount of single positron data collected at Mu3e, Ref. [42] showed that, within a given mass assumption for the signal, one could in principle use the same momentum spectrum to simultaneously calibrate the apparatus and to perform the peak search. In the calibration fit one would remove from the calibration dataset the signal region, which is defined as a 2δp short band below and above the expected momentum of the positron corresponding to the ALP mass. Given that for m a 25 MeV the signal region includes the Michel edge the determination of the scaling parameter x e = 1 deteriorates, resulting in a limited sensitivity at low masses. Including this effect the expected sensitivites for a massless isotropic ALP are Notice that the deteriotation computed in Ref. [42] includes only the broadening of the x e statistical distribution estimated from a toy Monte Carlo sample of 10 9 µ + (the root mean square error grows from 7 × 10 −6 for m a = 60 MeV to 3.8 × 10 −5 for m a = 10 MeV). A similar effect is expected for both the left-handed and the right-handed ALPs so that the reach shown in Fig. 1 is a realistic estimate of the Mu3e reach for all chiral structures. It is possible, however, that alternative calibration strategies such as the one proposed in Ref. [72] could improve the Mu3e reach at low masses.

The potential at µ − → e − conversion experiments
The µ → e conversion in nuclei experiments, COMET at J-PARC [4] and Mu2e at FNAL [5], are designed around very large muon fluxes, with over 10 9 µ − /s and 10 10 µ − /s at the two respective experiments. One may hope that these could also be used for µ → ea searches. However, in order to be able to deal with the vast data-streams the two experiments will not measure the full Michel spectrum but rather focus on the endpoint of the e − energy, which for the , peaks instead at the electron energy of around E e m µ /2, but with the tails of the electron energy distribution that go all the way up to E end e . Nominally, searches for the µ → ea process at COMET and Mu2e would also rely on these tails of the distribution, which poses a challenge for obtaining a competitive reach. For instance, for E e > 100 MeV only a small fraction of µ → ea decays, about 2 × 10 −10 , would be inside the signal region for µ → e conversion on Al [73] (see also [74]). Even with 10 18 muons a competitive search would thus likely require relaxing the lower bound on E e and developing techniques to distinguish between the smooth shapes of the signal, µ → ea, and SM background, µ → eνν, decays in orbit.

ALPs in µ → e + γ + invis. decays
The µ + → e + γa decay offers a complementary probe of the LFV ALP which is less dependent on the chiral structure of the ALP couplings than the experimental searches for µ → ea. In Sec. 4.1 we first discuss the searches performed at the Crystal Box experiment with a total number of 8.15 × 10 11 stopped muons [47,75]. In Sec. 4.2 we discuss possible improvements in the reach were a similar search to be implemented at MEG II. A more detailed analysis of a dedicated trigger at MEG II for this channel is left for future work.

Past Searches at Crystal Box
In order to lower the trigger rate the Crystal Box required at the trigger level a hard positron and a photon of similar energy [47,75]. The search for the three-body µ + → e + γa decay is then a search for a bump in the missing mass distribution in the collected data. The signal would be centered at m miss = m 2 a and spread by the photon and positron energy resolutions and the resolution on the angle between the two. The SM background has two main components: the four-body µ + → e + γνν decays, and the combinatorics background due to coincident µ + → e + νν and µ + → e + γνν events. For the latter, a sufficiently hard positron from the µ + → e + νν decay is detected within a 1.5 ns time window together with a hard photon from the µ + → e + γνν decay, while the soft positron is left undetected. The background and signal shapes at Crystal Box are shown in Fig. 7 (left).
The rate of the three-body decay Γ(µ → e a γ) for an ALP of mass m a is given by (in the limit m e m µ ) with the phase space integral given by These kinematic variables are related to the angle θ eγ between the electron and photon momenta, The branching ratio for the three-body decay, BR(µ → e a γ), is related to the branching ratio for the 2-body decay, BR(µ → e a),  In the special case of vanishing axion mass, η a → 0, the above expressions, Eq. (4.1)-(4.5), reduce to the ones obtained in Ref. [76]. Notice that the expression used by the experimental collaboration in Ref. [75] appears to have a typographical error (the roles of x and y in the phase space integral in Eq. (4.2) were exchanged). However, this does not affect the analysis in Ref. [47] because of symmetric cuts on photon and positron energies (up to the positron energy losses), cf. Eq. (4.6).
For completeness, we discuss a simple procedure that we use to roughly reproduce the Crystal Box result and extend it to m a > 0. The results of this recasting procedure are shown in the right panel of Fig. 7. We first extract the experimental efficiency for the µ + → e + γa signal from the bound on the branching ratio, BR excl. , in Eq. (4.8), which corresponds to N excl. = 165 signal events [47]. Given that the total number of stopped muons is N µ + = 8.15 × 10 11 we get The signal shape can be extracted from the binned MC sample given in Ref. [75], and is well described by a Gaussian centered at m 2 miss = 0 with a variance of σ S = 100 MeV 2 . For a given mass m a one can then estimate the sensitivity of Crystal Box on BR(µ → e a γ) by considering the expected background and signal in a missing mass window, (m 2 a − σ S , m 2 a +σ S ) and using the asymptotic formula at 90% C.L. For m a = 0, this procedure gives BR(µ → e a γ) < 1.6 × 10 −9 which is weaker than the experimental bound in Eq. (4.9) only by about 30%. This is not surprising since the full likelihood analysis has better discriminating power than the simple cut and count analysis we are performing. The bound on the branching ratio gets stronger for m 2 miss > 100 MeV 2 because the background is suppressed. However, this effect is compensated by the phase space suppression of the signal, Eq. (4.1). All in all, the bound on F µe is constant up to m a 10 MeV after which the signal phase space starts to shrink significantly with increasing ALP mass, given the strong energy cuts on the photon and positron energies.

Possibilities for future search at MEG II
We now briefly comment on the possibility for MEG II to improve on the Crystal Box result discussed above. Naively, the MEG II luminosity will exceed the one of Crystal Box by at least 3 orders of magnitude. In optimal conditions, this could lead to an increase of sensitivity on F µe of more than a factor of 5 with respect to the current bound in Table. 1. With such an optimistic gain MEG II could start probing new parameter space beyond the current TWIST bound for the V − A ALPs. An improvement on µ → eaγ would complement the MEGII-fwd proposal, Sec. 3.2, since this is only sensitive to ALPs with some amount of right handed couplings to the SM leptons. The combination of the two searches, µ → eaγ and µ → ea at MEGII-fwd would then fully cover the possible chiral structures of the ALP couplings.
Clearly, the above naive estimate for the improvement on the reach is far from guaranteed. A more realistic estimate of the reach would require a dedicated trigger study. The current energy and angular cuts of the MEG II trigger dedicated to the µ → eγ search are designed to select a very energetic positron and a photon exactly back to back [77]. Keeping these cuts will greatly suppress the signal rate of µ → eγa, and make it impossible to perform the search. An interesting possibility would be to relax the energy and angular cuts of the trigger down to similar values than the ones in Crystal Box. The final expected sensitivity at MEG II will strongly depend on the signal vs. background efficiencies whose detailed determination goes beyond the scope of this paper and is left for future investigations.

ALPs in τ decays
The search strategies for τ → µa and τ → ea decays are qualitatively different from the µ → ea searches. The main differences can be traced to the fact that τ has a much shorter life-time (3 × 10 −13 s vs. 2 × 10 −6 s for the muon), that it has many more decay channels, and that from the τ production it is not possible to unambiguously reconstruct the τ rest frame.

Past Searches at ARGUS
The ARGUS experiment in 1995 derived bounds on tauonic LFV ALP couplings [43]. The tau data sample was produced from e + e − collisions in the DORIS II storage ring at DESY at a center of mass energy varying between 9.4 and 10.6 GeV with a total integrated luminosity of 472 pb −1 .
The challenge in τ → i a search is to disentangle the signal decay from the SM τ → i νν decays. The search would be easier in the tau rest frame, since then the lepton from τ → i a is monochromatic and one can do a line search on top of the smooth τ → i νν background. Unfortunately, the tau is pair-produced in e + e − → τ + τ − collisions. Each of the tau's decays into a final state with at least one invisible particle, making exact reconstructions of the tau rest frames impossible. Instead, the ARGUS analysis used the "pseudo-rest frame" technique. The idea is to require one side of the τ + τ − pair to decay into a three prong hadronic mode. The direction of the τ momentum is then approximated by the direction of the combined momentum of the three prong decay products. In the center of mass of e + e − collision the tau energy equals the beam energy, while the two taus are back to back. This gives enough constraints so that one can boost the leptonically decaying tau to its, approximate "pseudo-rest frame". The crucial property of this frame is that the sensitivity on LFV two body tau decays into ALPs does not depend much on the ALP mass (see [43] for further details). For a massless ALP ARGUS obtained [43] BR(τ → e a) < 2.7 × 10 −3 (95% C.L.) ⇒ F τ e 4.3 × 10 6 GeV , The bound on BR(τ → µ a) is less stringent than BR(τ → e a) at low masses while they become comparable for higher masses. The final bound on the ALP decay constant from BR(τ → ea) is shown in Fig. 1. The mass dependence of the bound is predominantly due to the phase space suppression of the two-body decay for heavier ALPs.

Future Searches at Belle-II
Belle and Belle-II. While Belle and Babar collected ≈ 2000 times larger datasets of τ 's than ARGUS, no experimental searches for τ → i a were performed yet. However, a recent simulation of the expected limit at the Belle experiment with integrated luminosity of 1020 fb −1 was performed in Ref. [48], and obtained for a massless ALP at 90% CL) Notice that this bound is almost exactly a factor of √ 2000 more stringent than the present one from ARGUS, Eq. (5.2). Using the same simple rescaling with the luminosity, we obtain for the expected 95% CL limit for a massless ALP for Belle II with 50 ab −1 , Belle-II (50/ab) prospect: BR(τ → µ a) < 2.0 × 10 −5 ⇒ F τ µ 4.9 × 10 7 GeV. (5.4) The limit as a function of ALP mass is shown with the purple line in Fig. 1. In the absence of MC analysis of Belle or Belle II reach for τ → ea, we estimate the Belle II sensitivity by performing the naive rescaling of the ARGUS result with luminosity, which gives Belle-II (50/ab) prospect: BR(τ → e a) < 8.3 × 10 −6 ⇒ F τ e 7.7 × 10 7 GeV . (5.5) Belle II may improve the ARGUS searches for τ → ea and τ → µa transitions beyond mere increase in statistics. First of all, it could be interesting to explore the reach on BR(τ → µ a γ) and BR(τ → e a γ), especially since for muons µ → eaγ gives constraints that are not that far from the two-body µ → ea decay (see [78] for similar comments in the context of a light Z ). Secondly, further improvements of τ → i a searches may be possible. A possibly interesting direction, while still using the tau "pseudo-rest frame", is to employ in addition variables that tag the tau polarization, such as the directions of pions in two prong tau decays. If successful, this could allow to further suppress the SM τ → i νν background, similarly to what was done for µ → ea decays in the experiment by Jodidio et al. [9].

Bounds from Astrophysics and Cosmology
The bounds on ALP couplings from the astrophysical observations and from cosmology fall into two categories, depending on whether the ALP is assumed to constitute the observed DM relic abundance or not. In Section 6.1 we first discuss the constraints from stellar cooling, which do not depend on whether or not ALP is the DM. In Sec. 6.2 we then explore in which parts of the parameter space the LFV ALP could explain the observed DM abundance.

Bounds from stellar cooling
The emission of light particles inside stars can alter stellar evolution to an extent that is in conflict with observations. This leads to powerful constraints on the interactions of such light particles with matter and radiation [79]. Our primary interest here are the ALP couplings to leptons. In this context, ALP couplings to electrons can lead to efficient energy loss mechanisms in stars. For massless ALP (such as the QCD axion) the studies of red giants (RG) [80,81] and white dwarfs (WD) [44], give roughly comparable bounds, which at 95% C.L. are 10 In both cases the dominant cooling mechanism is ALP bremsstrahlung in electron-nucleus scattering, e − +N → e − +N +a. This dominates over the Compton process, γ+e − → e − +a, and electron-electron bremsstrahlung, e − + e − → e − + e − + a, which are relevant only when electrons are non-degenerate [79]. For non-negligible ALP masses, the cooling rates are expected to be Boltzmannsuppressed. Following Ref. [83], we estimate the resulting constraints on massive ALPs by rescaling the energy loss rates with the ratio R(m a , T ) of ALP energy densities E a for the massive and massless case R(m a , T ) ≡ E a (m a , T )/E a (0, T ) .
(6.2) 10 Interestingly, several stellar systems exhibit hints of non-standard energy losses. The global fit performed in Ref. [82] finds that an axion/ALP solution to these anomalies with a coupling 5.4 × 10 9 GeV F A ee 8.1 × 10 9 GeV (1σ range) is preferred at the 3σ level over the case of only the standard energy loss through neutrinos.
The energy densities are given by Since the cooling rates scale with (F A ee ) −2 , the constraints on massive ALPs in Fig. 1 are obtained from the bounds on F A ee for the massless ALP by rescaling them with the factor R(m a , T ). Because of the Boltzmann suppression the star cooling bounds rapidly shut off for ALP masses above m a ≈ 2T , where T RG ≈ 10 8 K ≈ 8.6 keV and T WD ≈ 10 7 K ≈ 0.8 keV.
For heavier ALPs the relevant astrophysics constraints are due to neutron star cooling in supernova (SN) explosions, since the nascent proto-neutron star (PNS) reaches a temperature of order 30 MeV a few seconds after the start of the supernova explosion [84,85]. In order to estimate the SN bound on F A aa we use the expression for the energy loss rate per unit mass, , through electron-nucleon bremsstrahlung under highly degenerate conditions [79,86] Here, Y p = n p /n B is the number density of protons relative to the baryon number density, while I is the angular integral that includes plasma screening effects, It depends on the electron velocity at the Fermi surface, β F = p F /E F = p F / p 2 F + m 2 e ≈ 1, while c 12 , c 1a , c 2a are the cosines of the angles between the 3-momenta p 1 (p 2 ) of the incoming (outgoing) electron and the ALP 3-momentum p a , respectively. The screening effects enter through the parameter κ 2 DH = k 2 DH /2p 2 F , where k DH is the Debye screening scale. Note that the PNS with temperature T NS ≈ 30 MeV and mass density ρ ≈ ρ nuclear can be treated as composed of weakly coupled degenerate plasmas. The electron screening can then be neglected with respect to the proton screening, giving k 2 DH = 4πα em n p /T NS . Since the main contribution to the angular integral (6.5) comes from the forward direction, c 12 = 1 and c 1a = c 2a , the integral is well approximated by the simplified form obtained by setting c 2a = c 1a in the denominator. In the ultra-relativistic case then [83]  Imposing the crude bound on the energy loss of 10 19 erg g −1 s −1 [83], leads in the case of a massless ALP to the bound  Fig. 1.
A qualitatively different regime is obtained for small values of F A ee . For small enough F A ee the ALP interactions become so strong that the ALP remains trapped within the stellar material, in which case there are no bounds on F A ee from the stellar cooling constraints. Following Ref. [83], we estimate this limiting values of F A ee for RG, WD and NS systems by requiring that the mean free path λ of the ALP is smaller than the corresponding stellar effective radius R 0 , cf. Table 2. The mean free path is calculated from ALP decay and absorption rates, The absorption rate is approximately given by the total energy loss rate per volume, ρ , divided by the ALP energy density, E a (even if ALPs are not in thermal equilibrium) Γ abs = ρ /E a . (6.10) As this estimate just follows from the principle of detailed balance, the absorption rate is independent of the ALP mass. We can therefore use for the energy loss per unit mass, , the result in Eq. (6.4), while E a = E a (0, T ) in Eq. (6.2). The dependence of λ on the ALP mass dominantly enters through the kinematical factors β and γ which are given by the integrals where . . . T denotes a thermal average using the Bose-Einstein thermal distribution of ALPs. In the limit of large masses, m a T , one finds β ≈ γ ≈ 8/π T /m a . For the decay rates, Γ decay , we only consider decays to electrons and photons induced by F A ee . The resulting decay widths are given in Sec. 2 (we set E UV = 1 in these expressions, for definiteness). The contributions from γγ decay channel are always subdominant -for low ALP masses, where the decays to photons dominate Γ decay , the absorption rate is more important in determining λ, while for higher ALP masses the decays to electrons or even to muons dominate over decays into photons.
We have used the inputs in Table 2 (taken from Ref. [79]) to derive the bounds in the trapping regime in Fig. 1. Note that for WDs the plasma is typically strongly coupled, so that the Debye screening given in Eq. (6.6) is not an appropriate description. In these case we simply use I ≈ 1, which provides a good approximation [79,87,88].  Table 2. Numerical inputs used in the evaluation of the stellar cooling bounds.
We have also included the recent bounds on ALP couplings to muons obtained in Ref. [45] from SN1987A (see also Ref. [46]). In contrast to our rough description of the PNS, these authors have used dedicated simulations which lead to the robust (and conservative) bound of F A µµ ≥ 1.3 × 10 8 GeV for a massless axion. As for the case of the bounds on electron couplings from WDs and RGs, we simply rescale this bound by R(m a , T ) defined in Eq. (6.2) (with T ≈ 30 MeV) in order to account for non-zero axion masses. Note that the bound on muons is stronger than the SN bound on electron couplings in Eq. (6.8), because energy loss through the Compton process for non-relativstic and non-degenerate muons is much more efficient than the same process for highly relativstic and degenerate electrons, see Refs. [79,89].
Finally, we comment on the bound on BR(µ → ea) from SN1987A. This decay contributes to the cooling of the PNS, with a cooling rate that is given by (see Ref. [ where p max e (p min e ) is the maximal (minimal) electron momentum in the µ → ea decay, if µ has momentum p µ = |p p p µ |, and E 2 i = p 2 i + m 2 i are the energies in the PNS's rest frame. Moreover, f i (E i ) = 1/ 1 + exp E i −µ i T are the Fermi distribution functions with µ i the chemical potentials, which is the only non-trivial input in Eq. (6.12). Using the numerical values µ µ ≈ 41 MeV and µ e ≈ 190 MeV which we obtained as described below, the cooling rate becomes The resulting bound on BR(µ → ea) is therefore about three orders of magnitude weaker than the constraints from laboratory experiments. It is instructive to compare this result with the case where Pauli blocking in Eq. (6.12) is neglected (i.e. f e (E e ) → 1), in which case the energy loss rate would be approximated by Using Y µ ≈ 2.9 × 10 −3 Y e exp [−(m µ − m e )/T ] (see below), the resulting bound on BR(µ → ea) is about a factor 50 larger than the exact result, indicating that Pauli blocking is an important effect. This can be understood by the large Fermi energy of the electrons, E F ≈ p F ≈ 200 MeV, which implies that muons at rest can hardly decay because almost all the relevant electron levels are filled.
The above values of µ µ , µ e and Y µ were obtained by treating the PNS as a noninteracting Fermi gas, where all particles are in thermal and chemical equilibrium (see also Ref. [23]). At given temperature the number density of a given particle then depends only on its chemical potential, which in turn can be written as a linear combination of the chemical potentials associated with the conserved quantum numbers, i.e., charge Q, baryon number B and the individual lepton numbers L i . Furthermore, we assumed that neutrinos are trapped inside the PNS, which implies that the muon number density vanishes, Y Lµ = Y µ + Y νµ = 0 [90]. For the electron number density (relative to the baryon number density n B ) we used Y Le = 0.3 [79]. With these three input values and the constraint of charge neutrality, n Q = 0, one can numerically solve for the 4 unknown chemical potentials µ Q , µ B , µ Le , µ Lµ and thus obtain number densities n i and chemical potentials µ i for all the involved particles. The tau leptons could be trivially included, taking Y Lτ = 0. However, unlike Y µ , the tau number density is negligible, Y τ ≈ 0. The electron's Fermi energy of about 200 MeV is just large enough to excite a relatively substantial muon population via electron scattering, while taus are simply too heavy. Note that this rather crude treatment of the PNS's thermodynamics appears to be consistent with the results from the dedicated simulations in Ref. [45], since we obtain similar bounds on F A µµ using the expressions for the energy loss rate through Compton (as appropriate for muons) with the above input.
Finally, the derived SN bounds depend crucially on the SN explosion mechanism. If the SN1987A was not triggered by the canonical delayed neutrino mechanism but is rather due to the collapse-induced thermonuclear explosion, there would be no resulting bounds on the ALP couplings [91]. However, this interpretation would be disfavored, if the possible ALMA detection of a compact object in the remnant of SN 1987A [92], consistent with the neutron star [93], is confirmed.

ALP Dark Matter
Next, we explore under what circumstances the LFV ALP is a viable DM candidate, with the correct DM relic abundance. This will lead us to two important conclusions. Firstly, since the expected sensitivity of future LFV experiments is f a 10 10 GeV, well above the present astrophysical bounds, the LFV experiments will explore a part of the parameter space of an ultralight ALP DM. Secondly, the LFV experiments cannot resolve ALP masses below 1 MeV (i.e. the typical experimental mass resolution). Below this mass range pinpointing the ALP mass will require experiments that search for ALP DM using other means, through its couplings to photons and/or electrons.
The minimal requirement for ALP to be the DM is that it is stable on timescales longer than the lifetime of the Universe. Assuming the a → γγ decay dominates, this translates into the following constraint,    [9], the (dark) orange thin line gives our MEGII-fwd projection assuming F = 100 focusing enhancement (no focusing). The dark red line (overlapping with the orange thin line) shows the sensitivity of Mu3e-online analysis [42]. In the blue region enclosed by the solid blue line the ALP decays within the present Hubble time, while the region to the right of the dashed blue line is excluded by the extragalactic diffuse background light measurements for E U V = 0, 1. We also show the X-rays constraints for E UV = 0 [94,95]. The red blob indicates where ALP DM could explain the XENON1T anomaly [96]. The dashed gray lines denote two scenarios where the observed DM relic abundance is due to ALPs produced trough the misalignment mechanism. On the upper line the ALP mass is temperature independent, cf. Eq. (6.17), while on the lower line the temperature dependence is parametrically similar to the one for the QCD axion, cf. Eq. (6.18). The gray shaded regions are excluded by the star cooling bounds, and the ADMX data [97][98][99]. The light green region is excluded by the S2 only analysis of XENON1T [100] and Panda-X [101]. The purple shaded region shows the future reach of axionmagnon conversion experiments such as QUAX [102][103][104]. Regarding the coupling to photons, the cyan band shows the future sensitivity of SPHEREx estimated in Ref. [105], assuming ALP decay exclusively to two photons, while the yellow bands show the future sensitivities of resonant microwave cavities such as ADMX [106], CAPP [107], KLASH [108], and ORGAN [109], dielectric haloscopes such as MADMAX [110] and the reach of the dielectric stack proposal [111] is shown with light blue.
Taking f a = 10 10 GeV as a reference value, this means that the ALP DM probed in LFV experiments must have a mass below 10 keV. If other decay channels, such as a → ν iνj , are appreciable, then the above bound on the ALP mass is correspondingly lowered (this is for instance the case for the majoron, see Sec. 7.4). For the rest of this section we will assume that the ALP decay channels apart from a → γγ can be neglected.
The a → γγ decays contribute to the extragalactic background light (EBL) and may be bounded by EBL measurements. The ALP decay results in a line at frequency ν a = 1.2 × 10 14 m a /eV Hz with intensity A conservative bound on the decay width Γ(a → γγ) is obtained by requiring that the line intensity in Eq. (6.16) is less than what is observed at that frequency [105]. An updated map of the EBL observations at different frequencies can be found in Ref. [112]. For instance, the observed EBL intensity in the optical band is 10 −8 Wm −2 sr −1 , constraining the axion width well below H 0 . Converting the EBL constraints to a bound on f a gives the light blue exclusion region in Fig. 8 for for E UV = 1 and E UV = 0. Stronger bounds can be obtained from the measurements of the X-ray microcalorimeters in the XQC rocket [94,95]. These constraints will be further improved by future X-ray missions such as Athena [113] or by future line-intensity mapping campaigns [105]. Particularly relevant for the ALP parameter space is the SPHEREx project [114], a funded two-year mission by NASA with a planned launch in 2023, that will probe optical and near infrared frequencies corresponding to m a ∼ eV. The SPHEREx reach is denoted as green region in Fig. 8. The ALP that satisfies the stability and decaying DM bounds could be a viable DM candidate. The main production mechanism in the allowed region of parameter space in Fig. 8 is the misalignment mechanism, first discussed in the context of the QCD axion in Refs. [14][15][16] and then generalized to a generic ALP in Ref. [115]. 11 If inflation occurred below the scale of ALP global symmetry breaking, the initial misallignment of the ALP, a 0 , is frozen by the inflationary dynamics and acts as the initial condition. Conventionally, it is parametrized in terms of an angular variable, a 0 = f a θ 0 , where θ 0 ∈ [0, π). As long as the Hubble expansion rate is large, H > m a , the field is frozen at its initial value a 0 . At temperature T osc , when H(T osc ) m a , the field starts to oscillate and produces the ALP number density n a (T osc ) = 1 2 m 2 a a 2 0 , which then expands adiabatically until the present time. For the case when ALP oscillations occur during the radiation dominated epoch the resulting ALP relic abundance is (see also [117]) Since the ALP mass is bounded from above by EBL constraints, the future reach of LFV searches (i.e. f a ≈ 10 10 GeV) will probe a region of parameter space where the production from misalignment does not suffice to obtain the total observed DM abundance (Ω DM h 2 0.12 [118]) with an ALP mass independent on temperature. The relation between m a and f a that leads to the observed DM abundance for θ 0 ∼ 1 for a temperature independent ALP mass is corresponds to the upper gray dashed line in Fig. 8. Below this line the ALP DM produced through the misalignment mechanism is under-abundant, while above this line a smaller value of θ 0 is needed in order to obtain Ω DM h 2 = 0.12. An interesting alternative possibility that leads to enhanced misalignment production is if the ALP mass comes from a dynamical mechanism like the one of the QCD axion. At zero temperature the ALP mass is given by m a = Λ 2 /f a , while at finite temperature the mass is suppressed, and is given by m a (T ) = m a (Λ/T ) b , where b = 4 in QCD (an expression for b in a general gauge theory can be found in Ref. [119]). The relic density from misalignment for this case has been studied in Ref. [115], and more recently in Ref. [117], and is given by . (6.18) The two additional factors on the r.h.s. enhance the relic abundance with respect to the result in Eq. (6.17). In Fig 8 we show that for b = 4 this enhancement is large enough that the correct DM relic abundance is obtained in a large region of parameter space that will be tested by future LFV searches. The ALP abundance can be enhanced even further at small decay constants by tuning the ALP initial condition very close to θ 0 = π, such that non-linearities dominate the production [19], or by mixing of the ALP with a dark photon in a presence of a magnetic field in the early Universe [18]. In summary, these different production mechanisms can make the ALP abundance match the current DM abundance in the whole parameter space shown in Fig. 8 which is not excluded by present constraints. Parts of the LFV ALP parameter space will be probed by other means. In Fig. 8 we show two types of such probes, based either on axion couplings to electrons or to photon. All of these probes require the ALP to be the DM, and assume that the ALP is responsible for the full DM relic abundance. Note that in this case, for the range of masses shown in Fig. 8, the description of ALP in terms of classical background field is justified in the early Universe, since there are many ALPs inside a single de Broglie volume.
The ALP couplings to electrons can then lead to interesting constraints from electron recoil experiments where the ALP energy gets absorbed in the detector. The energy threshold of the DM experiment translate directly to the lower ALP mass that these experiment can probe. We show the current constraints from Panda-X [101] and from the S2-only analysis of XENON1T [? ]. The fit of the XENON1T anomaly derived in [96] is also shown. Further improvements at lower ALP mass are expected from low threshold experiments like SENSEI [120]. The efficient axion-magnon conversion in an experiment such as QUAX [102,103] could probe a portion of the ALP parameter space in the m a ∼ 10 − 50 µeV window. The light purple region in Fig. 8 shows the future reach of QUAX derived in Ref. [104] (the present sensitivity in Ref. [121] is still outside the plotted range in f a ). At higher masses the low threshold DM absorption experiments based on existing technology are generically weaker than the stellar cooling bounds, even if improvements can be foreseen with future technology under optimistic conditions [21,22].
If the ALP coupling to photons is not suppressed, the standard searches for the QCD axion will cover significant parts of the parameter space, as seen in Fig. 8 for the case of E UV = 1. The gray region around 0.2 µeV is excluded by current ADMX data [97][98][99]. The future axion haloscope campaigns will explore the ALP mass region between 0.2 − 20 µeV. In Fig. 8 we show in yellow the estimated sensitivities of CAPP [107], KLASH [108], OR-GAN [109], MADMAX [110] and the ADMX upgrade [106]. We also include the "dieletric stack" proposal which could have sensivity beyond the current stellar cooling bounds between 0.1 and 1 eV, depending on the value of E UV [111]. The sensitivities of large-scale helioscopes such as IAXO [122,123], and light-shining-through-wall experiments such as ALPS-II [124] lie below the current stellar cooling bounds for our choice of parameters and is not shown in Fig. 8.

LFV ALP models
So far we were concerned with the model independent bounds on LFV ALP couplings, Eq. (2.1). In the remainder of this paper we focus instead on several representative models of ALPs with LFV couplings: the LFV QCD axion, the LFV axiflavon, the leptonic familon and the majoron. These examples are representative for broad classes of models, and illustrate how flavor-violating couplings to leptons can naturally arise for PNGBs of global symmetries addressing the strong CP problem, the SM flavor puzzle, or neutrino masses. In the first three models the LFV couplings are generated at tree level via non-universal charges of the global symmetry under which the ALP transform non-linearly while in the last one the LFV comes from loop-induced couplings. For each model above we also present the parameter space where the ALP can be a viable DM candidate as discussed in Sec. 6.2.
The LFV QCD axion (Sec. 7.1) and the LFV axiflavon (Sec. 7.2) are two explicit realizations of the QCD axion, which elegantly solves the strong CP problem in the SM via the spontaneous breaking of a U (1) Peccei-Quinn (PQ) symmetry that is anomalous under QCD. In Sec. 7.1 we first show that in DFSZ-like models [12,13] the QCD axion can naturally have LFV couplings while keeping the couplings to quarks flavor diagonal. In Sec. 7.2 we go one step further and identify the PQ symmetry with a subgroup of the flavor symmetry that gives the hierarchical masses and mixings of the SM fermions. The LFV axiflavon is obtained for the U (2) flavor group, since in this case the flavor violating couplings are parametrically larger in the leptonic than the quark sector.
The Leptonic familon (Sec. 7.3) is the PNGB of a U (1) flavor symmetry in the leptonic sector. The spontaneous breaking of this symmetry could explain the hierarchies among the charged leptons via the Froggatt-Nielsen mechanism [34,125,126] (for recent variations see [26,127]). In the LFV familon setup the strengths of the LFV couplings depend on the texture of the PMNS matrix, as we will see in detail below.

The LFV QCD Axion
The mass of the QCD axion is entirely due to the QCD anomaly, and is given by [139] m a = 5.691(51) µeV 10 12 GeV f a . (7.1) The value of the axion decay constraint f a therefore completely determines the mass of the QCD axion, which for all the processes we consider is effectively massless. Astrophysical constraints require the axion to be very weakly coupled, with a lifetime larger than the age of the universe and a mass below 3 × 10 −2 eV. In this range the QCD axion is a perfectly viable cold DM candidate in large parts of the parameter space. One of the simplest scenarios for axion production is the misalignment mechanism described in Sec. 6.2. In the QCD axion case the observed DM abundance is obtained for misalignment angles of order unity θ 0 ∼ 1 with an axion decay constants f a ∼ 10 (11÷13) GeV. For smaller decay constants, within the reach of LFV experiments, the axion relic from the standard misalignment contribution is under-abundant unless non-trivial dynamics or tuning are invoked (see discussion in Sec. 6.2).
The axion couplings to fermions in Eq. (2.1) arise from rotating the PQ current to the fermion mass basis, with unitary rotations Denoting the PQ charge matrices by X f , one has where 2N is the domain wall number. This implies that off-diagonal couplings arise when the PQ charges are not diagonal in the same basis as the Yukawa couplings, y f . Their sizes depend on the misalignment between the two bases, which is parametrized by the unitary rotations V f R,L . We focus on the situation where the PQ charges in the quark sector are universal, so that the QCD axion only has flavor violating couplings in the lepton sector. (This is of course not the most general case. If PQ charges in the quark sector are not universal, the results from Ref. [23] apply, with the bound from K + → π + a leading to tight constraints on f a .) In the following, we specify a DFSZ-like model of the QCD axion with LFV couplings. The field content of the theory consists of the SM fermions, two Higgs doublets, H 1,2 , and a complex scalar S that is a gauge singlet. The model contains an anomalous global U (1) PQ symmetry under which the scalar fields carry charges X S = 1, X H 2 = 2 + X H 1 . As a consequence, the scalar potential contains the couplings H † 2 H 1 S 2 and (S † S) 2 , but not, for instance, H † 1 H 2 S 2 or S 4 . The fermionic U (1) PQ charges are flavor universal in the quark sector, X u Ri = −X H 1 , X d Ri = X H 2 , X q Li = 0, i = 1, 2, 3, while they are generation dependent in the lepton sector, such that the Yukawa interaction Lagrangian takes the form (hereH i = iσ 2 H * i and a, b = 2, 3) L = y e 1a L1 e RaH1 + y e a1 La e R1H1 + y e ab La e RbH2 − y u q L u R H 1 + y d q L d RH2 + h.c. . (7. 3) The first generation leptons carry charges X e R1 = X H 1 , X L1 = 2 under U (1) PQ , while the 2nd and 3rd generation leptons have X e Ra = X H 2 , X La = 0, where a = 2, 3. In Eq. (7.3) y e 1a and y e a1 are complex 2-vectors, y e ab is a complex 2 × 2 matrix, while for simplicity we do not display the flavor indices on 3 × 3 complex Yukawa matrices in the quark sector, y u , y d . The forms of y e 1a , y e a1 , y e ab and y u,d are not fixed by the U (1) PQ symmetry and are thus external to the discussion (presumably there is additional flavor dynamics that gives their form and thus the required hierarchy of SM fermion masses).
The form of the scalar potential is assumed to be suitable to induce vacuum expectation values for all scalars, with the ratio of Higgs vevs given by GeV the electroweak vev. The axion a is then mainly contained in S, i.e., S = S exp ia/v PQ + · · · , and partially in the two Higgs doublets, H i = H i exp iX H i a/v PQ + · · · (here we only show the leading dependence on a). Requiring that the axion is orthogonal to the Goldstone boson eaten by the Z completely fixes the embedding of U (1) PQ , i.e., it fixes the PQ charge of the two Higgs doublets to be X H 1 = −2s 2 β , X H 2 = 2c 2 β , where we used the shortened notation s β ≡ sin β, c β ≡ cos β.
It is conventional to remove the axion from the Yukawa interactions (7.3) through phase redefinitions of the SM fermion fields. Working in this basis, the axion couples derivatively to the fermions as in Eq. (2.1), and in addition has couplings to gluons and photons induced by the color and EM anomalies The axion decay constant is related to the PQ breaking vev, S = √ 2N f a . The anomaly coefficients are given by 2N = −6 and E/N = 4/3.
The couplings to fermions are given by Eq. (7.2). In the quark sector the PQ charges are flavor universal, and so are the axion couplings, In contrast, in the charged lepton sector the PQ charges are not universal and therefore the axion couplings to charged leptons depend on the unitary rotations that diagonalize the Yukawas as in Eq. (7.2). It is useful to introduce the hermitian matrices which satisfy for P = L, R. In terms of these parameters one has  Table 3. Bounds on the axion decay constant f a (in GeV) for the three bechmarks of the LFV axion model choosing β = 1, cf. Eqs. (7.11)-(7.13).
Note that the flavor diagonal parts of the vectorial couplings, C V f i f j , Eqs. (7.5), (7.8), can be set to zero through fermion field redefinitions (these would introduce couplings to EW boson field strengths as in Eq. (7.4) that are, however, not relevant for our analyses).
To show the impact of the experimental searches for LFV processes with muons, we construct three benchmarks for off-diagonal matrices e L ij and e R ij . To do so we first choose a particular form of the leptonic Yukawa couplings, where we assume that in the basis in which y e ab is diagonal the Yukawa 2-vectors y e 1a , y e 1b have zero couplings between the 1st and 3rd generation. Within these assumptions the charged lepton mass matrix is completely fixed, apart from a single continuous parameter, η, The parameter η controls the size of left-and right-handed rotations. We restrict its values to the range m e /m µ ≤ η ≤ 1 such that there are no unnaturally large cancellations when diagonalizing the mass matrix. Choosing three representative values of η gives which we take as the three representative benchmarks: the "V − A", "V " and "V + A" scenarios, respectively. As per our assumptions, the only flavor violating couplings are between the 1st and the 2nd generation leptons. More explicitly, the axion couplings in the three scenarios are, • Benchmark "V " (η = m e /m µ ) WDs WDs WDs Figure 9. Present and expected future bounds on f a and m a for the LFV QCD axion in the three scenarios described in detail in the text, see also Table 3 and Eqs. (7.11)-(7.13). On the lower axis we indicate the corresponding values for the effective axion mass defined by m i,eff = 4.7 eV × 10 6 GeV/F i .
We can now reinterpret the model independent bounds on LFV ALPs, derived in Sections 3-6, for the three LFV QCD axion benchmarks (choosing β = 1 as a representative value).
The resulting bounds on f a from µ → ea and from WD cooling, obtained by rescaling respectively the bounds on F A,V µe , F A ee in Table 1 by the appropriate values of C V,A µe , C A ee in the three benchmarks, are collected in Table 3 and presented graphically in Fig. 9.
The SN1987 bound is modified with respect to the one discussed in Sec. 6.1 due to the axion couplings to quarks and gluons that then result in the axion couplings to nucleons (due to smaller scattering cross sections the processes involving electrons lead only to subleading corrections). Adopting the treatment of Ref. [140], the relevant bound is on the effective coupling to nucleons, where C p,n are the axion couplings to protons and neutrons, respectively. Using the expressions in Ref. [17] along with the values of couplings to quarks and gluons in Eqs. (7.4), (7.5), we get for the LFV axion model, for all three benchmarks, For given c β the bound on f a follows from the bound on the effective decay constant F N ≡ 2f a /C N ≥ 10 9 GeV [140]. The resulting bound on f a is of the order f a 10 8 GeV, with mild dependce on c β . In Fig. 10 we show the constraints on the axion-photon couplings as a function of the axion mass, m a . The orange (green) solid line in the left (right) panel shows g aγγ as a function of m a in the V + A (V − A) benchmark up to the exclusion by the WD cooling constraints (the dotted continuation of the line is excluded by WD cooling but is not excluded by the direct µ → ea searches). Note that for the LFV QCD axion, E/N = 4/3, leading to smaller coupling to photons, |g aγγ | 0.6 × α em /(2πf a ), than for the flavor universal original KSVZ model, for which E/N = 0 and thus |g aγγ | 1.9 × α em /(2πf a ) [141,142], and DFSZ-II model [12,13], for which E/N = 2/3 and |g aγγ | 1.3×α em /(2πf a ) (though large variations in this coupling are possible depending on the precise choices of the heavy fields and their charges [143][144][145]). The dotted orange (green) vertical lines show projected and present bounds from µ → ea searches, as denoted.
The gray regions are excluded by other axion experiments: CAST [146], cooling of horizontal branch stars (HB) and ADMX [97][98][99]. The gray dashed lines denote future projections from different axion searches already discussed in Sec. 6.2. We show the reach of the future ADMX upgrade [106], of CAPP [107], KLASH [108], and ORGAN [109], MADMAX [110] and the "dieletric stack" proposal [111]. We also include the reach of large-scale helioscopes such as IAXO [122,123], and light-shining-through-wall experiments such as ALPS-II [124]. Fig. 10 demonstrates the complementarity between this diverse experimental program based on axion couplings to photons and electrons, and the reach of the LFV experiments MEG-fwd and Mu3e, in order to search for LFV axions. In particular, a signal in a future LFV search would be incompatible with an axion lighter than a few meV. Such an LFV axion line will be challenging to test in axion haloscopes because of the infamous DFSZ accidental suppression of the photon coupling, see Eq. (7.16). In contrast, the future ADMX experimental campaign and CAPP will probe the LFV axion for masses between few µeV and few tens of µeV, which are inaccessible through LFV experiments.

The LFV Axiflavon
We discuss next the possibility that the PQ symmetry that solves the strong CP problem is also responsible for explaining the smallness of the SM Yukawas, i.e., that the QCD axion is the axiflavon. This framework naturally results in a QCD axion with flavor-violating couplings.
The simplest scenarios of this kind arise when the U (1) PQ is responsible for explaining all the SM fermion mass hierarchies and mixings, along the lines of the Froggatt-Nielsen models [34,125,126], as in Refs. [24,33]. In these constructions the strongest bound on the axion decay constant always arises from K + → π + a constraints [24], since the axion coupling to sd is suppressed only by roughly the size of the Cabibbo angle, V us = λ 0.2. Indeed, for the U (1) axiflavon the LH quark charges, X q L i , are non-universal so that from Eq. (7.2), The sd axiflavon couplings, on the other hand, can be strongly suppressed in U (2) F flavor models [35,147,148]. In these classes of models the light generations form doublets of U (2) F . The U (1) F factor acts as the PQ symmetry that gives rise to the QCD axion after spontaneous symmetry breaking. This scenario successfully explains the fermion mass hierarchies and mixings in terms of just two small parameters. In the model of Ref. [35] both quark and lepton flavor violating couplings between the first two generations are equally suppressed because of the assumed structure that is compatible with SU (5) unification. Here we present a variant of this model that leads instead to parametrically large µe couplings (their enhancement can be traced to large PMNS mixing angles). This model is a successful model of flavor and at the same time an example of a large class of flavored axion models where the LFV couplings are sizable while the FV couplings to quarks are suppressed.
In the LFV axiflavon model the U (2) F = SU (2) F × U (1) F quantum numbers of almost all the SM fermion are the same as in Ref. [35]. In particular, all the fermions transform as 2 + 1 under SU (2) F . The only difference with respect to Ref. [35] is that the U (1) F charge of the SU (2) F singlet left-handed lepton, L 3 , is −1 instead of 1. Table 4 summarizes the  Table 4, a = 1, 2), the model contains the SM Higgs doublet as well as two scalar spurions, φ a and χ.
As in Ref. [35], the breaking of the flavor symmetry is parametrized by two scalar spurions φ and χ, which transform under U (2) F as φ = 2 −1 and χ= 1 −1 . These fields acquire the following flavor symmetry breaking vevs where Λ is, up to O(1) factors, the typical mass of the heavy states present in the full UV model (their exact structure is not important for our effective low energy discussion as they are integrated out). The values of the two small parameters in the two spurions are fixed by the fit to quark masses and mixings to be about φ ∼ λ 2 and χ ∼ λ 3 .
Since the SM fermions are charged under U (2) F , the Yukawa interactions between the SM fermions and the Higgs require insertions of the spurion fields in order to form invariants under U (2) F . This leads to non-renormalizable interactions suppressed by appropriate powers of Λ. After replacing spurions with their vevs, Eq. (7.18), the dependence on the heavy scale Λ drops out. The hierarchies in Yukawa matrices then arise from powers of the small parameters φ,χ , giving for the mass matrices rotations that diagonalize the quark masses. Their parametric structure is given by, where λ = V us ∼ 0.2. The charged lepton mass matrix is parametrically the same as in Ref. [35], and so are the parametric sizes of the unitary rotations that diagonalize it, It is easy to check that the neutrino sector can reproduce the PMNS matrix and neutrino mass differences for λ ν ij that are O(1) for normal neutrino mass ordering. The PNGB corresponding to the U (1) F factor is the LFV axiflavon, i.e., it acts as the QCD axion that solves the strong CP problem (see Ref. [35] for details). The anomaly coefficients controlling the couplings of the LFV axiflavon to gluons and to photons in Eq. The couplings of the LFV axiflavon to the SM fermions depend on the unitary rotations in Eqs. (7.21), (7.22), as in Eq. (7.2). Because of the SU (2) F structure of the LFV axiflavon model it is useful to introduce the matrices (f = u, d, e; P = L, R) 12 which satisfy They have the parametric structures The axiflavon couplings to fermions are given in terms of charges and these parameters as where f = u, d, e denotes the fermion sector and X f c a , X fa , X f c 3 , X f 3 are the U (1) F charges in Table 4.
The sd couplings are strongly suppressed, C V,A sd ∼ λ 5 /(2N ), as a result of small rotations in the LH sector, d L,12 ∼ λ 5 , and the fact that RH rotation do not lead to off-diagonal terms because of universal charges in the RH sector, X D c a = X D c 3 . The RH contributions to the µe couplings are CKM-like suppressed, while the contribution from the LH rotations are large, since the corresponding charges are non-universal (in contrast to Ref. [35]), X La = X L 3 , giving C V µe = −C A µe ∼ m e /m µ /N . The axiflavon couplings to nucleons and electrons are identical to Ref. [35] and are to good approximation given by The O(1) coefficients in the rotation matrices can be fixed by performing an explicit fit to all the observables -the masses and mixings, including the neutrino sector. Using the same procedure and the SM inputs as in Ref. [35], we find that a good fit is ob- where We use the above benchmark values for axiflavon coupling to derive the sensitivities of different observables to axiflavon in Fig. 11. Other phenomenologically viable choices of parameters that differ by "O(1)" factors can also give a good fit to the SM masses and mixings, so the constraints obtained in our benchmark should be viewed only as indicative, with O(1) variations, when this larger class of axiflavon parameters is considered.
The red line in Fig. 11 shows the predicted coupling to photons, g aγγ , which for the LFV axiflavon is given by Eq. (7.16) with E/N = 20/9, as a function of axiflavon mass, Eq. (7.1). The bound on f a from WD cooling is denoted with a vertical solid red line. The next less stringent bound comes from the µ → ea search by TWIST (dotted vertical red line). Fig. 11 shows that the Mu3e future reach (dotted vertical red line) will exceed the WD cooling constraints. In order not to clutter Fig. 11, we do not show the less constraining present bound (future senstivity) on LFV axiflavon from K + → π + a which is f 5.4×10 6 GeV (1.6×10 7 GeV) [23]. The expected reach from Belle II is f a 3(1)×10 6 GeV from τ → µa(τ → ea) searches, which is outside the range plotted in Fig. 11. The other constraints, shown in grey with future sensitivities denoted with grey dashed lines, are as in Fig. 10. Clearly, there is significant parameter space where the LFV axiflavon can be discovered in LFV experiments, especially considering the potential astrophysical uncertainties in the WD cooling bounds.

The Leptonic Familon
Our next example of an LFV ALP is the familon, i.e., the PNGB arising from the spontaneous breaking of a global horizontal symmetry, which we take to be the Froggatt-Nielsen (FN) flavor symmetry, U (1) FN [34]. We consider the case where the U (1) FN only acts on the leptonic sector so that the LFV ALP is the leptonic familon. Unlike the previous two examples, the leptonic familon does not solve the strong CP problem and, as a consequence, its mass is not determined by the QCD anomaly. The mass of the leptonic familon is therefore taken to be a free parameter, yet still small enough that it can be produced in tau or muon decays. The predictive power of the model is limited to the parametric prediction of the LFV coupling, which are related to the neutrinos mass texture. The where a is the familon, while φ is the radial mode with the mass O(f a ). Integrating out all the heavy fields the charged lepton Yukawa couplings and the Majorana neutrino mass matrix are given by (as determined through spurion analysis) where a e and κ ν are assumed to be flavour-anarchical matrices of O(1) coefficients, M is a cut-off scale with controlling the hierarchies among charged lepton masses, while Λ N is the lepton-number breaking scale suppressing the dimension 5 Weinberg operator. 13 An anarchical PMNS matrix featuring O(1) mixing angles [149] can be achieved by taking equal charges for the lepton doublets, The couplings of the familon to the leptons are given by, where cf. Eq. (7.33), so that the off-diagonal couplings in Eq. (7.40) are entirely due to the RH charges. The RH rotations are small, of the order of the ratios of lepton masses, (V e R ) ij ≈ (V e R ) ji ≈ m i /m j (i < j), and thus the LFV couplings are suppressed, The purely anarchical leptonic familon couples to the V + A current. It is thus subject to constraints from the old Jodidio et al. experiment and can be an important target for the proposed MEGII-fwd setup, despite the severe suppression of the LFV couplings, see Fig. 12 (left). For the hierarchical model the [L] i are non-universal, Eq. (7.35), and the LH rotations give the dominant contributions to the off-diagonal familon couplings in Eq. (7.40). Even so, the V +A couplings induced via RH rotations remain non-negligible, and are even enhanced compared to the anarchical case, . The LFV couplings of the hierarchical leptonic familon are thus, from Eq. (7.40), Because the hierarchical familon mostly couples to the V − A current the dominant constraint comes from the bound on µ → e a due to TWIST, Eq. (3.17), even though this is otherwise the weakest bound on BR(µ → e a). In the future, MEGII-fwd can improve the reach on the hierarchical familon beyond the present bounds, because of the non-vanishing V + A contributions, cf. Fig. 12 (right).
The µτ Anarchy scenario is in between the above two cases. There is no LH mixing in the 2 − 3 sector, while there is one in the case of 1 − 2 and 1 − 3 transitions. The RH mixings are suppressed, but with 1 − 2 and 1 − 3 mixing enhanced by 1/ over the pure anarchical case.
The familon coupling to electrons, relevant for assessing the star cooling bounds, is given by Finally, the coupling to photons is controlled by the anomaly contribution E UV , Eq. (2.9), which can be calculated in terms of FN charges in Eqs. (7.33)-(7.38) as   Fig. 12 indicates where the familon proper decay length is shorter than 1 m (due to a sizeable width to leptons, a → i j ), so that searches for i → j + invisible lose sensitivity. Fig. 12 shows that in both representative models all future searches for LFV processes performed by MEG II, Mu3e and Belle II can probe well into the yet unexplored parameter space.
As far as DM is concerned, the leptonic familon resembles very much the generic ALP DM discussion in Sec. 8. We refer to that section for an extensive discussion of the implications of LFV searches on the ALP DM parameter space.

The Majoron
The majoron [128,129] is the PNGB due to spontaneous breaking of the lepton number. A natural context where this kind of ALP arises are the seesaw models, where the breaking scale of lepton number is associated with the mass scale of heavy new fields. In type-I seesaw models at least two singlet fermions, the right-handed neutrinos N i , are added to the SM (for concreteness we will assume below two RH neutrinos). The RH neutrinos couple to LH leptons via the Yukawa coupling matrix, y N , while their masses are described by the Majorana mass matrix M N , In the seesaw limit, M N m D ≡ y N v, the RH neutrinos are heavy and can be integrated out, while the light neutrinos are predominantly part of the SM doublets, L i , with the Majorana mass matrix given by The majoron J arises when the mass matrix M N , which breaks the lepton number by two units, is generated dynamically by the vev of a new SM singlet scalar field σ. In this case the RH neutrino mass matrix in Eq. (7.45) is replaced by the Yukawa couplings of RH neutrinos to the scalar σ, i.e., M N → λ N σ in Eq. (7.45), where so that the RH neutrino mass matrix is given in terms of the matrix λ N as The radial modeσ is heavy and can be integrated out, while the majoron J is a PNGB and is light (we take its mass to be a free parameter). The majoron couples at tree level to neutrinos through theN c λ N σN Yukawa terms. These Yukawa interactions then induce couplings of J to charged leptons and quarks at loop-level [128], see Ref. [152] for complete expressions. Here we are interested only in the seesaw limit of these general expressions, which we match onto the effective Lagrangian (2.1) upon identifying a → J, f a → f N . Using the results of Ref. [153,154], we find for the majoron couplings to quarks and leptons, respectively, where T u,d 3 = ±1/2. Note that F V µe = −F A µe , i.e., the LFV couplings of type-I seesaw majoron have the V − A form (the couplings to the V + A leptonic current are flavor conserving). The TWIST experiment is sensitive to such a majoron, while the more stringent bounds from the 1986 experiment by Jodidio et al. do not apply, see Section 3.1 for details.
In contrast to the other ALP scenarios discussed above, in the case of a majoron, couplings to neutrinos are particularly relevant to assess the stability of the particle, hence whether it is a viable DM candidate. This is a consequence of the suppressed coupling to photons of the majoron, which in fact decays preferably to neutrinos when its mass is below the 2m e threshold. In the seesaw limit, the coupling to light neutrinos is diagonal and the decay width reads [153,154] where m i are the light neutrino mass eigenvalues. Another crucial issue is whether the experiments are able to probe scales that are interesting for the neutrino mass generation. We can distinguish two limits: • In the standard seesaw setup, sizeable entries of the Yukawa matrix y N are only compatible with the observed neutrino masses for an ultra-high seesaw scale. For instance, let us consider the case where elements of the Yukawa matrix are all of similar size, without any special structure, |(y N ) ij | ∼ y (a hierarchical structure would not qualitatively change the argument). The light neutrino masses are thus of the size m ν ∼ y 2 v 2 /M N , where v = 246 GeV is the electroweak symmetry breaking scale. The effective scale suppressing the LFV i → j J decays is then given by GeV. The standard set-up therefore cannot be probed by the present nor any of the planned LFV experiments.
• In the low-scale seesaw setup, the neutrino masses are additionally suppressed, such that large couplings in y N and a lower seesaw scale are compatible with the observed light neutrino masses. Indeed, since the neutrino masses m ν ∝ y N M −1 N y T N transform non-trivially under the lepton number (in contrast to y N y † N ), it is possible that light neutrino masses are parametrically suppressed in the presence of an approximate generalized lepton number. Such scenarios, usually referred to as the "TeV scale seesaw mechanism", have been extensively studied in the literature [41,[130][131][132][133][134][135][136][137][138]. In the following we use the results of Ref. [41] to construct a concrete example of a majoron model with parametrically suppressed neutrino masses (and thus with enhanced majoron couplings to the SM leptons).
In the simplest low-energy seesaw model one considers two right-handed neutrinos N 1,2 with a 2 × 2 Dirac mass matrix, M N , and 3 × 2 Dirac Yukawa couplings, y N , where λ ≡ √ 2M/f N is a real free parameter. In the y 1 → 0 limit the model has a global U (1) symmetry, M N → P M N P , y N → e iα y N P with P = diag{e iα , e −iα }. The majoron couplings, which are proportional to y N y † N , are invariant under this symmetry, while the neutrino masses are not, m ν → e 2iα m ν . This means that the neutrino masses are proportional to symmetry breaking parameters, y 1 , which, if small, additionally suppress the neutrino masses compared to the majoron couplings.
Working in a basis where the charged lepton matrix is diagonal, we can adjust the input parameters, M and y i , such that all neutrino observables (2 mass differences + 3 mixing angles) are at the central experimental values. This leaves two free parameters, which we choose to be M , the mass scale of RH neutrinos, and the largest eigenvalue of the Dirac Yukawa matrix, y = max eig(y N y † N ) . Using the results of Ref. [41], we obtain for the Normal Ordering (NO) in the seesaw limit, For the numerical analysis we use the latest global neutrino oscillation fit results [155,156], and set in Eq. (7.53) the mass differences, the mixing angles and the Dirac CP phase in PMNS matrix to their central experimental values. This gives the 3×3 Hermitian matrix y N y † N that still depends on y and one Majorana phases, α m . For simplicity, we set the latter to zero, α m = 0. The effective suppression scales for the majoron couplings are in the NO case given by (similar results are obtained for IO) The coupling to muons is comparatively suppressed due to an accidental cancellation between the two contributions to C A µµ in Eq. (7.50). The majoron also couples to nucleons via its couplings to quarks (7.49). These couplings do not depend on PMNS elements and are given by In Fig. 13, we summarize the current status and future prospect of the constraints on the parameter space of the low-energy seesaw majoron model described above. Besides the present and future bounds from LFV experiments and the astrophysical limits discussed in Sec. 6.1 involving the coupling of the majoron to electrons, we display as a yellow-shaded area the region excluded by SN1987A, according to the study of Ref. [157], due to the nucleon-majoron coupling of Eq. (7.57).
We also show as a dashed blue line where the majoron lifetime equals the lifetime of the universe. Anywhere to the left of this line, the majoron is a viable DM candidate. In fact, there is no strong constraint from observations of the extragalactic background light, since coupling of the majoron to photons are suppressed. The two representative lines explained in Sec. 6.2 show where the misalignment mechanism accounts for the whole DM abundance today with θ 0 ∼ 1. Below these lines the majoron is a sub-component of the total DM abundance unless new dynamical mechanism or a tuning of the initial condition is put into place. Notice that here we ignore further mechanisms of production that could arise from Higgs portal-type of couplings (see Ref. [158] for a discussion).
Finally, the light purple line denotes the expected reach of the QUAX experiment [102,103,121], which is directly sensitive to the coupling of the majoron to electrons. The shown sensitivity is as assessed in Ref. [104], under the assumption that the majoron is responsible for 100% of the observed DM abundance. Fig. 13 shows the remarkable reach in terms of the effective scale F A ee and the RH mass neutrino mass scale M of the Mu3e experiment, which, by searching for the µ → eJ decay, will be able to probe for the first time the uncharted territory beyond the star cooling and SN1987A bounds. In particular, Mu3e will be able to probe that part of the parameter space where the majoron can be a substantial component of DM.
Stringent constraints on low-energy seesaw models arise also from LFV decays, µ → eγ, eee, and µ → e conversion, mediated by the heavy neutrinos at the scale M , if this is not much above the TeV scale. Following again Ref. [41], one for instance finds for the branching ratio of µ → eγ This should be compared to the current 90% CL limit set by the MEG experiment, BR(µ → eγ) < 4.2 × 10 −13 [77]. The LFV decays to majorons therefore typically provide stronger bounds on the seesaw scale M than the LFV processes, µ → eγ, µ → 3e, and µN → eN , unless either M ≈ f N ≈ 1 TeV, or λ 1, such that the mass scale of the RH neutrinos is much lower than the L-breaking scale M f N . Apart from these limits, µ → eJ tends to provide the dominant constraint, as a consequence of different scalings with the heavy scale (the rate of µ → eJ is ∝ M −2 , while the rates of µ → eγ, µ → 3e, and µN → eN , are ∝ M −4 ). See also the detailed discussion in Ref. [154].
Finally, we comment on the chiral structure of the LFV majoron couplings which in our minimal implementation are entirely of V − A, given that LFV is mediated by the W -loops. This will not be the case in a two Higgs doublet model, where the majoron LFV couplings can be induced by loops of the charged Higgs. The V +A LFV couplings are then going to be suppressed by ∼ y 2 H v 2 /m 2 H + , where m H + is the mass of the charged Higgs and y H its coupling to leptons. Given the strong bounds on H + coming from indirect searches in b → sγ [159] and direct collider searches [160] we expect the right-handed LFV couplings of the majoron to be generically suppressed with respect to the left-handed ones. It would be interesting to explore further the robustness of this statement in models where a light charged Higgs is still unconstrained [161].

Conclusions
Generically, axion like particles (ALPs) can have flavor violating couplings to the SM fermions. In this manuscript we explored the phenomenological consequences of such couplings; in the first part we took a model-independent perspective and summarized present constraints and future sensitivities on generic lepton flavor violating couplings of light ALPs, see Fig. 1. Here an important feature is our proposal for a new experimental set-up at MEG II, MEGII-fwd, which consists of a forward calorimeter to be installed in front of the MEG II beamline, see Sec. 3.2 for details. The major benefit of such an experimental set-up is that the irreducible SM background from µ + → e + νν is reduced at the highest positron momentum in the forward region. Since the SM decay amplitude is controlled by left-handed couplings, it vanishes for an exactly forward positron of p e + = m µ /2 if produced from a muon that is completely negatively polarized. MEGII-fwd can be used to search for an effectively massles ALP produced in µ + → e + a with some amount of righthanded LFV coupling to the SM leptons. The signal will appear as a sharp line in positron energy distribution, in contrast to the smoothly falling SM background. The final reach of MEGII-fwd depends on how well depolarization effects can be controlled, on the positron momentum resolution, and on whether or not magnetic focusing is applied in order to increase the positron luminosity in the forward direction. Assuming realistic estimates for these parameters, we expect that a two week run at MEG II in MEGII-fwd configuration will allow to explore new ALP parameter space well before Mu3e and beyond the current astrophysical limits from star cooling.
Exploring this new region of parameter space will provide new insights on the couplings of an ALP DM produced non-thermally in the early Universe as discussed in Sec. 6.2. A possible signal in an LFV experiment could be cross-correlated with the future experimental campaigns of axion haloscopes and future intensity mapping searches. This would give us more information about the light ALP mass beyond the expected resolution of MEGII-fwd.
In the second part of this paper we discussed several UV models where ALPs with flavor-violating couplings to leptons arise when addressing various shortcomings of the SM, such of the Strong CP Problem or the Flavor Puzzle, see Sec. 7. These models explicitly demonstrate that the LFV couplings, which will be tested by future laboratory experiments, are correlated to the flavor-diagonal lepton couplings, which are strongly constrained by star cooling. Despite these constraints we find encouraging prospects for the expected sensitivity of MEGII-fwd to test these theoretically well-motivated models. We believe that this warrants a more systematic exploration of the feasibility of MEGII-fwd, including proper detector simulations. In particular, it would be interesting to investigate in more detail the required rearrangement of the MEGII magnetic field and its interplay with the achievable momentum resolution. We hope that this proposal will be explored further by the experimental collaboration. The MEGII-fwd proposal should also be carefully compared with other possibilities of improving the MEGII reach on LFV decays, for example the possibility of a dedicated trigger for µ + → e + aγ decays as discussed in Sec. 4.2. We hope to return to these issues in the future.
Beyond the new proposed experimental ideas, the paper includes several novel theoretical results. First of all, in Sec. 6.1 we derived the astrophysical bounds from stellar cooling on leptonic couplings of an ALP with arbitrary mass. While the bounds on couplings of massless ALPs to electrons were readily available in the literature, this was not the case, to the best of our knowledge, for massive ALPs coupling to ee, as well as for the couplings to µe and µµ (recently, Ref. [45] treated the case of massless ALP coupling to muons). Second, we showed how the chiral structure of the LFV couplings emerges in various explicit (and mostly novel) models, in which the presence of a light ALP is motivated by addressing the strong CP Problem (the LFV QCD Axion and the LFV Axiflavon), the hierarchical structure of SM fermion masses (the LFV Axiflavon and the Leptonic Familon) or the origin of neutrino masses (the Majoron). For the LFV Axiflavon and the majoron the ALP couples mainly to left-handed leptons, and thus improvements in the reach on µ + → e + γa at MEGII (or ultimately the inclusive on-line bump hunt proposed at Mu3e) will be needed in order to test these scenarios. On the other hand, for the case of LFV QCD axion and the leptonic familon the right-handed couplings are sufficiently large to allow MEGII-fwd to be the first experiment to discover the flavor violating ALP.
To conclude, we hope that this work may boost a renovated interest in the experimental possibilities at near future LFV experiments. The unprecedented luminosity of these facilities has been conceived to probe extremely rare lepton decays mediated by heavy new physics. The same data could also be used to probe decays with an (invisible) light new particle in the final state. We believe that we merely started to scratch the surface of the possible experimental improvements and theoretical motivations to search for light new physics in rare lepton flavor violating decays.