Lepton-flavor violating axions at MEG II

We study the sensitivity of the existing MEG data to lepton flavor violating axion-like particles produced through $\mu^+ \to e^+ a \gamma$ and estimate the discovery potential for the upcoming MEG II experiment in this channel. The MEG II signal efficiency can be improved significantly if a new trigger can be implemented in a dedicated run with a reduced beam intensity. This search would establish the world leading measurement in this channel with only 1 month of data taking.


I. INTRODUCTION
Despite the advances in precision flavor measurements, the Standard Model (SM) flavor puzzle remains one of its greatest mysteries.The SM is equipped with three generations of fermions, which come with an elaborate set of flavor symmetries.These flavor symmetries are (weakly) broken by the SM yukawa couplings and the mechanism generating the neutrino masses.If this breaking occurs spontaneously, one expects a set of pseudogoldstone bosons with flavor violating couplings [1][2][3][4].Focusing on the leptonic sector, lepton flavor violating (LFV) axion-like particles (ALPs) can also arise in QCD axion models where the Peccei-Quinn symmetry is embedded non-trivially in the SM flavor group [5][6][7][8], in familon models explaining the leptonic mass hierarchies à la Froggatt-Nielsen [8,9] as well as in majoron models generating neutrino masses [8,[10][11][12].In these constructions the ALP mass can be very light and its decay constant is typically very large, resulting in ALP lifetimes longer than the age of the Universe.This allows for the intriguing possibility that a LFV ALP can be the Dark Matter (DM).
The large decay constant of the ALP suppresses its interactions with the SM, which makes it challenging for any laboratory experiment to test it.However, the presence of LFV couplings provides a unique opportunity to probe new physics at high scales through the exotic decays of SM particles to the light ALP.Here we study LFV ALPs in rare muon decays such as µ + → e + a and µ + → e + aγ, where the stable or long-lived ALP a remains invisible to the detectors.Such rare muon decays can be tested at exquisite precision by the next generation muon experiments at the Paul Scherrer Institute (PSI) if dedicated data taking strategies are implemented.
The main objective of our study is to identify a new data-taking strategy for MEG II [13] that maximizes its sensitivity to µ + → e + aγ.Along the way, we show that the existing MEG data [14] should already yield a competitive limit, though we lack some information to per-form a faithful recast of the data.As shown in Fig. 1, this expected limit competes with the current best bound set by the Crystal Box experiment [15] for ALP masses larger than 8 MeV.
Experimentally, the missing mass variable in the µ + → e + aγ channel allows for a more robust background discrimination as compared to the µ + → e + a channel.This is especially true for left-handed ALPs, for which the µ + → e + a channel gives a monocromatic line at the kinematic endpoint of the µ + → e + ν e νµ background.This region is however typically assumed to be signal free and used for calibration purposes [8].Accounting for the corresponding systematic uncertainties, the TWIST collaboration is setting the current best bound on left-handed LFV ALP couplings from µ + → e + a [16].
As shown in Fig. 1, a search for µ + → e + aγ at MEG II (in blue) can approach the current TWIST limit (dark red), but an experimental challenge remains: the existing triggers are optimized for MEG II's flagship analysis in the µ + → e + γ channel but have a suboptimal acceptance for µ + → e + aγ.We explore an alternative data-taking strategy which greatly increases the signal acceptance by adjusting the trigger selection while reducing the beam intensity.This approach can improve on the TWIST limit with only one month of data taking as shown by the purple solid line in Fig. 1.
The LFV ALP is defined by the low energy effective action where C V µe (C A µe ) controls the vector (axial) LFV coupling.For concreteness we focus in the main text on lefthanded ALP couplings, setting C A µe = −C V µe = C V −A µe and define the shorthand notation The cases of right-handed ALP couplings, with C A µe = C V µe , or purely axial (vectorial) with C V µe = 0 (C A µe = 0) will be discussed in Appendix C for completeness.The kinematical distributions and branching ratio for µ + → e + aγ were computed for a massless and a massive ALP [8,17], assuming an unpolarized muon.Here we TWIST (µ !ea) < l a t e x i t s h a 1 _ b a s e 6  F T J G s 1 O + 8 r u J y C K I U S h u T N t j K Q Y 5 1 y i F g m H J z w y k X J z z P r Q t j X k E J s h H B w / p N 6 t 0 a S / R 9 s V I R + r f E z m P j B l E o e 2 M O J 6 Z f 2 u F + F a t n W F v L c h l n G Y I s X h Z 1 M s U x Y Q W E d K u 1 C B Q D S z h Q k t 7 K x V n X H O B N u i S D W H y U / p / c l x z v Z 9 u / V e 9 s r E 1 j m O W L J F l U i U e a Z A N s k e a p E U E u S D X 5 I 7 c O 5 f O r f P g P L 6 0 T j n j m U X y C s 7 T M 1 1 D p m 4 = < / l a t e x i t > MEG II-ALP (1 year) (?) FIG. 1. 95% C.L. limits on F V −A µe .The green line is the expected bound from the parasitic analysis of MEG RMD data [14] (Sec.II B).The blue band is the MEG II projection of the same parasitic analysis (Sec.II C).The upper boundary of the band correspond to a 50% reduction of the RC background with respect to the MEG search [14].The purple bands show the reach of the new hypothetical MEG II run with lower beam intensity and a dedicated trigger stream, with 1 month and 1 year of data taking (Sec.III).The upper (lower) limit of the reach corresponds to the lower (upper) limit in the determination of the trigger rate as detailed in Fig. 4. The orange shaded region is the most conservative Crystal Box bound derived in [8].The dark red shaded region is the bound from the TWIST experiment on µ + → e + a [16].The magenta shaded region is the supernova bound on the LFV coupling derived in [8].
further extend these results by accounting for the muon polarization, which is relevant for MEG [18].The fully differential decay width for µ + → e + aγ is given in Appendix A.
Our paper is organized as follows: in Sec.II A we review the standard MEG trigger selection, derive the expected MEG limit on µ + → e + aγ in Sec.II B and a projection for MEG II in Sec.II C. In Sec.III we explore an alternative data-taking strategy optimized for the ALP signal.We conclude in Sec.IV with a discussion of the physics potential for light new physics at muon facilities, as well as the theory motivation for searches of this class.In Appendix A we detail our new signal computation.Appendix B contains a validation of our simulation framework and more details on our analysis.In Appendix C we present the expected reach for different chiral structures of the ALP couplings.

II. EXISTING AND PLANNED DATASETS
In the MEG and MEG II experimental setup a high intensity µ + beam is stopped in a thin target located at the center of a magnetic spectrometer.The main detectors making up the experiment are a high resolution liquid xenon scintillation detector and a drift chamber, optimized to measure the outgoing photon and positron respectively.The experiment is further equipped with a timing counter of scintillator bars at MEG and scintillator tiles at MEG II, to provide a good timing measurement for the e + and to aid with the trigger selection [19].

A. The MEG trigger
We first describe the standard MEG trigger [20], which is now being upgraded with increased bandwidth but similar logic for MEG II [21,22].The trigger is optimized to look for the µ + → e + γ decay, which amounts to requiring the positron and photon to be back-to-back with energies E e,γ m µ /2 [23].As a consequence, the trigger is suboptimal to probe µ + → e + aγ, where the signal rate is maximized for a soft photon, collinear with the positron.
At trigger level, the only available information is the photon energy, the time and the conversion point measured by liquid xenon scintillation detector and the hit and time measured by the timing counter [19].Because of the positron spectrometer design, requiring a hit in the timing counter corresponds to selecting positrons with energies higher than roughly 45 MeV.In addition, an extra trigger selection on the photon energy of E γ 40 MeV is imposed to keep the trigger rate below 10 Hz, as required by the experimental design.The positron (photon) energy trigger efficiency Ee ( Eγ ) is a function of the positron (photon) energy only E e (E γ ) as long as the they are within the detector acceptance.Ee ( Eγ ) is plotted in the left (central) panel of Fig. 2, as taken from Ref. [14].
The information of the full positron momentum as measured by the drift chamber cannot be accessed at trigger level [20].In the standard MEG trigger algorithm, the coordinates of the positron hit in the timing counter are matched to the muon stopping point by assuming that the positron momentum and direction are consistent with those of a µ + → e + γ decay.The trigger therefore selects predominantly back-to-back positron-photon pairs.The dependence of the trigger efficiency on the polar angle between the positron and the photon (θ eγ ) depends on the energy of the positron, while the dependence on the azimuthal angle (φ eγ ) is a subdominant effect after the trigger energy cuts on positron and photons are imposed. 1In the right panel of Fig. 2 we show the trigger efficiency θeγ as a function of θ eγ for different values of the photon energy E γ .As expected, the closer the photon energy is to m µ /2, the more efficient the trigger is in the region of θ eγ 0.

E e +[MeV]
Ee+ (E n l 5 l e N S z 5 D y 8 e F P H q v 3 j z b 5 w k e 9 D E g o a i q p v u L i 8 S X I N t f 1 t z 8 w u L S 8 u Z l e z q 2 v r G Z m 5 r u n l 5 l e N S z 5 D y 8 e F P H q v 3 j z b 5 w k e 9 D E g o a i q p v u L i 8 S X I N t f 1 t z 8 w u L S 8 u Z l e z q 2 v r G Z m 5 r u Q e e K 7 p 9 B j 2 9 L Q 3 F P / z 6 h F 2 z p q x 8 M M I w e f j R Z 1 I U g z o M A 3 a F g o 4 y o E h j C t h b q W 8 x x T j a D L L m B C c 6 Z d n S e U o 7 5 z k C 7 e F X P F i E k e a 7 J I 9 c k A c c k q K 5 J q U S J l w k p B n 8 k r e r C f r x X q 3 P s a t K W s y s 0 3 + w P r 8 A d P p l U s = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " n l 5 l e N S z 5 D y 8 e F P H q v 3 j z b 5 w k e 9 D E g o a i q p v u L i 8 S X I N t f 1 t z 8 w u L S 8 u Z l e z q 2 v r G Z m 5 r u Eγ of the MEG trigger efficiency.Right: polar angle-dependent part θeγ of the MEG trigger efficiency.The total trigger efficiency is given by Eq. 2 up to a normalization factor cRMD = 0.35, which is defined in Eq. 5 to reproduce the number of observed RMD events NRMD| obs.= 12900 in Ref. [14]. tive information about the MEG trigger.The search is based on N MEG µ + ,tot = 1.8 × 10 14 muons collected in the years 2009 -2010 with a beam intensity of R MEG µ + = 3 × 10 7 µ + /sec.The MEG collaboration measures the turn-on of the trigger efficiency relative to a prescaled trigger with a lower threshold, and obtains the overall normalization from their (internal) Monte Carlo simulation.The full, differential trigger efficiency as a function of E e , E γ , φ eγ and θ eγ was not made public and we must therefore construct an approximate model from the published turn-on curves in Fig. 2. We do so by assuming that the full efficiency function factorizes as and by extrapolating the functional dependence of θeγ as θeγ (E e , θ eγ ) = −2.6 + 0.07E e MeV θeγ (49 MeV, θ eγ ) .
For the geometric acceptance of the photon detector we take . The positron timing is detector not hermetic but was designed to detect E e = m µ /2 positrons that are back-to-back to the photons that are within the acceptance of the calorimeter.We therefore estimate its acceptance to be φ e ∈ [120 With this procedure, we reproduce all kinematical distributions in Ref. [14] up to an overall normalization factor, as we show in Appendix B. This offset of the overall rate between the data and our simulations could be due to the simplifying assumptions above or other more subtle experimental effects, either in the trigger or in the offline selection.In addition to the acceptance cuts described above, we further assume that the offline positron acceptance in θ eγ is the same as the trigger acceptance, shown in the right-hand panel of Fig. 2, which is likely an overestimate.We therefore introduce an overall normalization factor, c RMD , to rescale our simulations such that they match the number of observed RMD events after the offline kinematic selection: With the available information we cannot unambiguously attribute c RMD to our modeling of either the trigger or the off-line selection, which will be a source of uncertainty when we estimate the trigger rate later in this section.Concretely, c RMD is defined as and we find it to be c RMD 0.35.The inputs to Eq. 5 were found as follows: N RMD | obs.= 12900 is the observed number of RMD events in Ref. [14].To ensure it is finite, the RMD branching ratio was defined subject to an arbitrary, minimal set of baseline cuts. 2 The offline angular acceptance of the positron was taken to be the same as the trigger acceptance, in the right-hand panel of Fig. 2. The muon polarization was taken to be P µ = −0.85,as measured in MEG [18].which give BR base RMD = 1.44 × 10 −5 with the formula in Refs.[24,25].The analysis is not sensitive to these baseline cuts, as long as they are looser than the trigger cuts.Starting from this baseline branching ratio, we can use our Monte Carlo to compute the online efficiency trig.
RMD by applying Eq. 2, and the offline efficiency off.
RMD / trig.RMD = 0.36, which serve as inputs for Eq. 5.The MEG trigger selects RMD events together with random coincidences (RC), which are generated when a photon from an RMD µ + → (e + )ν νγ (with a missing soft positron) and an positron from an unrelated Michel decay µ + → e + ν ν are detected as coming from the same event.These pileup events are due to the enormous intensity of the muon beam, which is only partially offset by the strict cuts on the time separation between the positron and the photon.The RC background also receives a contribution from positrons annihilating in flight into a pair of photons, when one of the two photons is lost and the other is paired up with a hard positron from the Michel decay.This positron annihilation contribution is not explicitly included in our simulation but we can roughly account for it by normalizing the total RC measured offline to N RC | obs.= 83850, which is the number of RC MEG observed after their offline selection cuts [14].Analogously to the RMD discussion, we can write where BR base RC is the probability of a muon to be involved in an RC event.In this sense it can be thought of as the baseline "branching ratio" of the random coincidences and it is defined as eγ , (7) where ∆t trig.eγ 24 ns is the trigger resolution on the arrival time between the measured photon and positron [20].The c RC parameter is the overall normalization constant we use to normalize our Monte Carlo to the MEG data and is the RC analogue of the c RMD parameter in Eq. 5.It is fixed from Eq. 6 and Eq. 7. BR base RMD is obtained with our Monte Carlo and is defined by requiring the positron to be outside the detector acceptance or softer than 40 MeV, and the photon to have E γ > 5 MeV and be within the geometrical acceptance of the detector.The resulting value is BR base RMD = 2.50 × 10 −3 , while BR base Mich.= 0.28 is the branching ratio of the Michel decay µ + → e + ν ν after the minimal energy cut E e > 40 MeV and the geometrical acceptance are applied.The baseline RC differential distributions are then obtained by assuming RMD photons and Michel positron to be time coincident.This simplification should capture the kinematic properties of the main component of the RC background.Analogous to the RMD background, trig.
RC is found with our Monte Carlo by applying Eq. 2 and the offline efficiency off.
For purposes that will be clear in Sec.III, we here estimate the trigger rate of both the RMD and the RC events at MEG by computing the total number of simulated events passing the trigger selection and dividing the effective run time, which we take to be t MEG run = N MEG µ + ,tot /R MEG µ + = 6 × 10 6 sec.When doing so, we must account for the fact that the online timing window is ∆t trig.γe 24 ns [20], roughly 6 times larger than the offline window ∆t off.γe = 4 ns.This increases the RC trigger rate with a factor of ∆t trig.
γe /∆t off.γe 6.A large uncertainty on our estimate comes from the overall normalization of our efficiencies c RMD and c RC (see Eq. 5 and Eq.7) as we cannot unambiguously determine whether our modeling of the online or offline selection is responsible for these correction factors.In practice, our estimate of the RMD trigger rate can therefore vary within a factor of 1/c RMD and the RC trigger rate within a factor of 1/c RC : Our estimated trigger rate is thus in the 1-10 Hz range and completely dominated by the RC, for which the rate at trigger level is roughly a factor of 200 larger than the RMD rate.In Tab.I and Fig. 4 we will account for this uncertainty when optimizing the selection for the dedicated µ + → e + γa analysis.The corresponding uncertainty on the reach is indicated by the purple bands in Fig. 1.We emphasize that this uncertainty in our projection is due to the uncertainty in our modeling of the MEG experimental setup; a full analysis by the MEG collaboration would not be subject to it.

B. Parasitic analysis: expected MEG bound
We now show how the RMD measurement [14] can be repurposed as a search for µ + → e + aγ.Concretely, we take the offline kinematic selection to be that in Eq. 4, which should be applied together with the trigger efficiency and the angular acceptances of the MEG detector: In the previous section, we explained how the factor c RMD is used to correct for our imperfect modeling of the detector efficiency for the RMD process.We assume that the same correction factor holds for the ALP signal.The missing mass (m / E ) is defined as The signal is a peak in the m / E distribution, located at the ALP mass.The differential distributions of the signal, the RMD and the RC backgrounds are shown in the lefthand panel of Fig. 3.
The final sensitivity depends on the energy and angular resolutions.For electron and photon energies between 40 and 53 MeV, the MEG detector resolutions are extracted from Ref. [26], fitted and extrapolated to the energy range of interest.(See Appendix B).From this procedure we derive the minimal resolution on the missing mass to be 4.5 MeV.Any ALP with mass below this resolution will be seen as effectively massless by MEG.
Assuming no bump in missing mass spectrum has been detected in the existing MEG data, we can estimate the expected limit with the following scheme: We take the signal (S) and the background (B) in a narrow m 2 / E window, where the window size, ∆ m 2 / E = 27 MeV 2 , is chosen to optimize the sensitivity under the assumption of negligible systematics.To further improve the sensitivity,

RC bkd
< l a t e x i t s h a 1 _ b a s e 6 4 = " P 7 i 7

Total bkd
< l a t e x i t s h a 1 _ b a s e 6 4 = " y U 9 e 0 0 o s e   The expected limits are given for an effective massless axion (i.e. with mass below the experimental resolution).MEG II-RMD limits vary depending on the normalization of the RC background, which can be reduced by 50% w.r.t. the measured value at MEG [14].The trigger rate is estimated in more detail at the end of Sec.II A, with a factor of 1/cRC ≈ 14 uncertainty.The latter affects the reach of the dedicated MEG II-ALP data taking run, where we fix the trigger rate to be 10 Hz and derive two optimal benchmark choices for the beam intensity.These different beam intensities each result in a slightly different projected bound, shown by the width of the purple bands in Fig. 1.
we use a double-sided binned log-likelihood ratio on the (θ eγ , φ eγ ) distribution of the events passing the m 2 where S i (B i ) is the number of signal (background) events in ith bin of a grid with binsize 20 mrad × 20 mrad.The likelihood is defined as the poisson distribution where we estimated the number of observed events in each bin with the expectation value of the background, B i .Demanding Λ(S) < 4, we obtain the 95% confidence level projected limit on F V −A µe , as shown in Fig. 1.The projected bound from MEG data is slightly weaker than the most conservative bound from Crystal Box derived in Ref. [8] for an effectively massless ALP. 3his is due to the larger angular acceptance of Crystal Box which compensates for its smaller luminosity (N Crystal Box µ + = 8 × 10 11 ) and its worse detector resolution.

C. Parasitic analysis: MEG II projection
We now look into the future, assessing the MEG II projected sensitivity on µ + → e + aγ.We consider the MEG kinematical selection in Eq. 4 and derive the expected reach at MEG II accounting for i) the larger luminosity, which we take to be N MEGII µ + = 1.8 × 10 15 , ii) the improved offline energy and angular resolution.As detailed in Appendix B, we rescale the MEG resolutions using the resolution information at E e,γ m µ /2 [13] with measured energies E γ,e + > 38 MeV and θeγ < 0.7.Translating this to the theory prediction is subject to a large uncertainty from the energy loss of the positron before reaching the detector.This was estimated to be at most 5 MeV by the collaboration.The most conservative theory bound is then obtained assuming a truth-level positron energy cut of 43 MeV.
by assuming that the energy dependence is the same as at MEG.We also account for the expected suppression of the RC background due to the installation of the radiative decay counter to reject the soft positron in the forward direction at MEG II [13].The projected limit is shown by the blue band in Fig. 1, where the upper edge corresponds to a 50% suppression of the RC.Despite the expected MEG II improvements, the kinematical selection of Eq. 4 can likely not push the reach beyond the present TWIST bound, motivating the exploration of a new, optimized data taking strategy.
III.A DEDICATED RUN < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 GeV < l a t e x i t s h a 1 _ b a s e 6 4 = " O w 6 2 y W x s 8 o 6 J l 5 R J v j Y a s Q T P y q 4 = " > A  < l a t e x i t s h a 1 _ b a s e 6 4 = " R + F a 0 V q 5 x 5 S / 6 q S v U 3 7 3 y w Q A = = < / l a t e x i t > t r ig g e r r a t e > 1 0 H z ( lo w e r ) < l a t e x i t s h a 1 _ b a s e 6 4 = "

MEGII-ALP
< l a t e x i t s h a 1 _ b a s e 6 4 = " p j H Z v w 9 6 w 8 k 1 To enhance the reach for µ + → e + aγ, one would ideally want to relax the energy and angular cuts on the photons while keeping the trigger rate below 10 Hz.This can be achieved by reducing the muon beam intensity R µ + , which has the double benefit of i) allowing the photon trigger cut to be looser, enhancing the signal acceptance and ii) suppressing the RC background (which scales with ∼ R 2 µ + ) compared to the RMD background (which scales with ∼ R µ + ). 4 In the remainder of this section we will estimate the sensitivity of such a hypothetical "MEG II-ALP" dedicated run.
We define the experimental efficiency and acceptance by taking into account the turn-on of the positron trigger and the detector geometry only, which are defined as before.The detection efficiency as a function of E γ , θ eγ and φ eγ are otherwise assumed to be one.This might be an optimistic assumption, which can only be assessed by the MEG II collaboration.
In Fig. 4 we study the signal and background acceptance as a function of the beam intensity R µ + and the lower bound on the photon energy E cut γ .For concreteness, we benchmark a trigger selection with (11) where the uncertainty on the optimal R + µ stems from our approximate estimate of the trigger rate in Sec.II A ( symbols in Fig. 4).The proposed data taking strategy requires the beam intensity to be reduced by roughly an order of magnitude compared to the MEG run, in order to keep the trigger rate below 10 Hz (see Table I).As can be seen from the purple line Fig. 4, lowering the photon energy cut together with the beam intensity makes the RMD background almost of the same order as the RC background, at trigger level.Loosening the photon energy cut as much as possible moreover maximizes the reach for the ALP signal.We expect the bottleneck of this strategy to be the energy threshold of the liquid scintillator, but at this time there is no public information about its response to low energy photons.For the purpose of our study we therefore select photon energies larger than 10 MeV, where the detector efficiency should be excellent.The possibility of including softer photons can be considered by the MEG II collaboration.
Offline, analogously to the previous section, we optimize the missing mass window to separate the signal from the background.The differential distributions are shown in Fig 3 right.We also perform the log-likelihood ratio test for the (θ eγ , φ eγ ) distribution to maximize the sensitivity.The detailed distribution of signal and background in the angular variables are given in Appendix B. The optimal value for the width of the missing mass window is ∆ m 2 / E = 35 MeV 2 , in the limit of negligible systematic uncertainty on the background.The broadening of the signal distribution can be traced back to the expected deterioration of the energy resolution on the photons at low energies, which is accounted by our fitting function of the resolution in Appendix B.
The expected reach of this dedicated run is shown in Fig. 1 for the same total luminosity as the MEG run N µ + = 1.8 × 10 14 , which can be collected in a dedicated 1 year run time (∼ 50 weeks data taking) at the end of the commissioned run of MEG II.Interestingly, we show in Fig. 1 that with only 1 month of data taking our proposal can already get the best sensitivity on left-handed LFV axions.Our projections neglect systematic uncertainties which can be parametrized in the cut and count scheme as S/ B + η 2 sys (B + S) 2 .The contours of S/B in Fig. 4 indicate that the parameter η sys should be kept below 0.1% in order for systematics uncertainties to be negligible.This assumption can again only be validated by the MEG collaboration.
FIG. 5. ALP parameter space as a function of the decay constant fa and the mass ma, assuming Ce = C V −A eµ = 1 and EUV = 0.The dark red line is the present TWIST bound [16], while the purple bands correspond the projections for the MEG II-ALP dedicated run shown in Fig. 1.The solid/dashed line corresponds to 1 month/1 year of data taking.The dashed orange line shows the (speculative) projection for a Mu3e online analysis of µ + → e + a data [27].The shaded grey regions show existing bounds from white dwarf (WD) and red giants (RG) cooling [28][29][30], X rays searches of γ lines from decaying DM [31,32], absorption in direct detection experiments [33,34], and existing resonant cavities [35][36][37] for EUV = 1.The dashed grey line show the bound on decaying DM from diffuse extra-galactic light observations [38] if EUV = 1 (the arrow points towards the excluded region).In the dark orange blob ALP DM can explain the Xenon1T excess in electron recoils [39][40][41], while in the dark green region the solar basin can fit the same excess [42].
The experimental program for rare muon decays has primarily focused on well motivated but very specific LFV final states such as µ + → e + γ and µ + → e + e − e − , with no (or very little) missing energy.These final states are very interesting tests of heavy new physics generating LFV operators of dimension six in the SM and can explore the flavor structure at the multi-TeV scale, for instance in supersymmetric or composite Higgs models (see for example Ref. [43]).They are however by design insensitive to signatures of low energy remnants of high scale LFV, such as light LFV axions.
The implementation of new trigger strategies can address this blind spot, by directly targeting events containing missing energy.These searches would enlarge the physics case of the muon experimental program in a com-pletely orthogonal direction by testing dimension five operators with new, light long-lived particles that are very weakly coupled to the SM.In this context, rare muon decays can test scales as high as 10 10 GeV and probe nontrivial embeddings of the Peccei-Quinn symmetry inside the SM flavor group, as well as spontaneously broken lepton flavor symmetries more generally.
An example in this direction is the online trigger strategy for µ + → e + a at the Mu3e experiment proposed in Ref. [27], or the MEG II-fwd proposal put forward in Ref. [8].Both these proposals are complementary to the one explored here, because they are expected to have limited sensitivity for a left-handed massless ALP: In particular, the whole MEG II-fwd proposal ceases to be advantageous because the signal acceptance of left-handed ALPs is tiny in the forward region.The proposed search for Mu3e (orange dashed line in Fig. 5) on the other hand faces severe challenges related to systematics uncertainties in hunting for a bump on top of the Michel end point.(This region is typically assumed to be signal-free and used for experimental calibration.)In addition, the MEG II experiment is already commissioned and should be able to perform the measurement on a shorter time scale than Mu3e.In the same spirit, we show in Appendix C the reach of our proposal on right-handed and vectorial/axial ALP couplings.With 1 year of data taking MEG II can sensibly do better than the current best bound from the experiment performed by Jodidio et al. in 1986 [44] and set a bound which is only slightly weaker than the projections of Mu3e and MEG II-fwd.
In Fig. 5 we show the impact of our projections in the ALP parameter space, assuming the flavor diagonal (FD) couplings to electrons are of the same order of the LFV coupling. 5The coupling to photons is controlled by , where E U V is the electromagnetic anomaly coefficient in the ultraviolet theory and B(τ e /m 2 a − i is the IR contribution from the electron threshold.We see that a MEG II-ALP dedicated run can probe new parameter space beyond the stellar cooling constraints already with 1 month of running.
A particularly interesting model is the photophobic ALP with E UV = 0, which can be the DM with a mass m a 2 − 3 keV and explain the recent XENON1T excess in electron recoils [39][40][41], without being in tension with astrophysical bounds on decaying DM [31,32].The same model could explain the Xenon excess if an ALP solar basin is formed around the Sun [42] in a region of parameter space that is compatible with stellar energy losses [45].Intriguingly, E UV = 0 is naturally realized in Majoron models where C e ∼ C V −A µe ∼ 1/16π 2 are also generated after the right handed neutrinos are integrated out [8,[10][11][12].From Fig. 5 we see that 1 year of running of MEG II-ALP will be sufficient to probe the stellar basin explanation if C e ∼ C V −A µe .
In conclusion, we hope that this study can pave the way for a more systematic assessment of the capabilities of MEG II in exploring light new physics with flavor violating couplings to the SM.In a first step, the existing and future data sets used for the RMD analysis can be (re)analyzed to obtain competitive limits on the µ + → e + aγ process.Second, a dedicated run of the MEG II experiment at lower beam intensity should yield a sensitivity surpassing the existing bounds by one order of magnitude.This program has the potential to shed light on open questions in axion phenomenology and even establish a new connection between precision measurements of muon branching ratios and ultralight DM candidates.
Using the approximate trigger efficiency discussed in Sec.II and the full differential decay width of RMD process [24,25], we validate our simulation by reproducing the distribution of the RMD events as a function of E e , E γ and θ eγ separately.This is shown in Fig. 6.
Except for two lowest photon energy bins, the distributions generated with our Monte Carlo reproduce the MEG distributions Ref. [14] quite well, within their systematic uncertainties.The biggest deviations are at low E γ , where our extrapolation of the trigger efficiency is expected to fail.
For the MEGII-RMD analysis we use the same set of events as for the MEG-RMD case.For the MEGII-ALP analysis, signal/background event sets are obtained using the same procedure, but with the different kinematic selection explained in Sec.III.As this is a projection for a future search, we do not have a way of validating it with existing data.

Detector resolution
The MEG detector resolutions for positron and photon energies between 40 and 53 MeV is extracted from Ref. [26] fitted and extrapolated to a wider energy range of energies with the following functional dependencies: The functional form of the photon energy resolution is the typical form for any calorimeter [47], where the stochastic term drops as 1/ √ E and the constant term accounts for effects that are independent on the particle energy.We use the fits above in our Monte Carlo to compute the smearing of the energy, angle and missing invariant mass distributions at MEG.
For MEG II we take into account the improved resolutions of the detector with respect to MEG.In practice, we replace the MEG resolutions at E e + /γ m µ /2 with the ones provided in Table 8 of Ref. [13] δE δθ MEGII e + /γ = 0.56δθ MEG e + /γ , (B7) δφ MEGII e + /γ = 0.43δφ MEG e + /γ .(B8) We then extrapolate the MEG II resolutions to lower energies by using the same functional dependence as the one derived for MEG.In this way the resolution improvement of MEG II with respect to MEG is essentially an overall rescaling of the resolution, independent on energy.This assumption should be revisited once the performance information of the MEG II detector at lower energies is available.is fixed to 10 9 GeV.The "timing" selection refers to the tighter coincidence requirement in the offline cuts, as compared to the trigger selection.See Sec.II A for details.

Online and offline efficiencies
In this section we summarize the cut flow for the data taking strategies discussed in this paper: i) the parasitic analysis of the MEG RMD data presented in Sec.II B and its projection at MEG II showed in Sec.II C, and ii) the MEG II-ALP dedicated run discussed in Sec.III.In Table II we give the integrated efficiencies of both the trigger and the offline selection for the MEG RMD data taking for the ALP signal and the RMD and the RC backgrounds.The efficiencies are normalized with respect to baseline branch ratios defined in Sec.II A. The baseline cuts are E e > 40 MeV, E γ > 5 MeV, θ γ ∈ [70 • , 110 • ] and φ γ ∈ [−60 • , +60 • ], which account for the geometric acceptance of the MEG photon detector.We moreover impose φ e ∈ [120 • , 240 • ], to ensure that the positron is approximately within the acceptance of the timing detector.Every efficiency is normalized with respect to the number of events passing the previous cut, from left to right, such that the product of all the trigger requirements (columns 3, 4 and 5) is reproducing the total trigger efficiency discussed in Sec.II.The numbers in the off.i / trig.i column indicate the sequential loss in efficiency once the offline selections are imposed, relative to the trigger selection.In other words, the total offline efficiency can be obtained by multiplying the numbers in columns 3 to 8.
The table shows that the trigger requirement on the photon energy in Fig. 2, together with the cut on the positron energy, is the main limitation on MEG sensitivity for ALPs, while the angular cut is an O(1) effect in this ordering. 6This trigger selection has essentialy two main drawbacks: i) the small signal efficiency ii) the background shape of the RC background, which becomes very similar to the signal after the trigger requirements are imposed.This second issue makes the offline variables quite inefficient in separating the signal from the background, as can be seen directly from the signal and background distributions in the left-hand panel of Fig. 3 as well as Fig. 7.
In Table III we show the integrated efficiencies of the MEG II-ALP data taking strategy.Reducing the beam intensity allows on the one hand to increase the signal efficiency at trigger level and on the other hand to keep the shape of the RC flat enough to be more easily distinguishable from the signal shape in the offline analysis.This can be seen in the missing mass distribution in the right-hand panel of Fig. 3, where the RC background appears as a featureless flat distribution, and from the angular distributions of Fig. 8.

Angular differential distributions
For completeness we show the angular distributions of the ALP signal and the RMD and RC backgrounds for the parasitic analysis of the MEG RMD data in Fig. 7 and for the MEG II-ALP dedicated run in Fig. 8.These are events passing both the trigger and the offline selection, where we applied the cut on the missing mass window.By comparing the two figures it is clear that the standard MEG RMD trigger selection produces a very different shape for the RC background than with the MEG II-ALP selection.While difficult to see by eye, a likelihood the final states leading to a similar suppression of the signal compared to the background.We show here the reach of our dedicated data taking proposal for different chiral structures of the axion couplings to leptons.These are shown in Fig 9 .Interestingly, even for the most conservative estimate of our expect trigger rate, the expected sensitivity of MEG II with our data taking proposal and 1 year of data taking can surpass the current best limit coming from the experiment of Jodidio et al. [44] for right-handed ALP couplings (V+A) or purely axial (purely vectorial) couplings.The reach in these scenarios is sensibly improved compared to the V-A case discussed in the main text due to the more distinctive angular distribution of the signal events with respect to the background events.
4 = " W 7 d O Y I 9 j R S W V o y A g G b 3 m w U D T 2 v I = " > A A A C N X i c d V B N S y N B E O 1 R d 1 f j f m T 1 6 K U x L G T B H X p i 2 O j N D 0 Q F h S w Y F T J D 6 O l U Y m P P B 9 0 1 y 4 Y h f 8 q L / 8 O T H j w o 4 t W / Y E 9 M Y F 3 c B w 2 P V 1 W v q l + Y K m m Q s R t n a n r m 3 f s P s 3 O l + Y + f P n 8 p f 1 0 4 N k m m B b R E o h J 9 G n I D S s b Q Q o k K T l M N P A o V n I T n2 0 X 9 5 D d o I 5 P 4 C A c p B B H v x 7 I n B U c r d c o H u T 8 y a e t + G O T M r a 3 X G 5 6 3 w l x W Z z W 2 v l I o a 4 3 V 1 a G P 8 A f z w 5 1 d u r / / Y / O g S a s e H Q D X 3 4 c + r f o G C 9 Y p V y Y O d O J A J w 7 U c 9 k I

FIG. 2 .
FIG. 2. Left: Positron energy-dependent part Ee of the MEG trigger efficiency Middle: Photon energy-dependent part

2 )
x b C 0 4 + c 4 r + w P n 8 A c X W k 8 I = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 9 E L f U P l g b 4 9 O J o Y y y a O + a X e 4 4 T 8 8 a 7 8 T F p n T P S m W 3 0 B 8 b n D 8 e z k w w = < / l a t e x i t > MEG II-ALP < l a t e x i t s h a 1 _ b a s e 6 4 = " r k a M F M E 8 s s L e M L D y 4 r S b p Y Y A K 6 5 Y w u w Z 7 8 8 j S p H O T t w 3 z h u p A r n o z r S K M t t I 1 2 k Y 2 O U B F d o B I q I 4 I e 0 D N 6 Q + / G o / F q f B i f o 2 j K G M 9 s o j 8 w v r 4 B 0 7 G n p A = = < / l a t e x i t > R MEG µ + < l a t e x i t s h a 1 _ b a s e 6 4 = " b 5 r 5 Q k i 8 + e b r j T D / H b H 4 1 n d u 7 C I 7 4 P b P e p 2 B p / X c T T J G 7 J D 9 o h H P p I B G Z J D M i K C / C A / y S / y 2 7 l 0 b p x b 5 6 5 q b T j r m W 3 y o B o b 9 5 A v s K M = < / l a t e x i t > t r ig g e r r a t e > 1 0 H z ( u p p e r ) r 8 a j M T b e j P e p d c 6 Y Z X b A j z E + P g H S Q a X K < / l a t e x i t > S / B = 0 .00 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " L G b d x m x F B 7 j o 8 d I G o j I g x 6 z J r

F b 8 p 3 8 cFIG. 4 . 2 /E − m 2 a | < ∆ m 2 / E and assuming 1
FIG. 4. MEG II-ALP performances as a function of the photon cut E cut γ and of the beam intensity R µ + .In red contours of the 95% C.L. reach after enforcing the kinematic selection |m 2 / E − m 2 a | < ∆ m 2 / E and assuming 1 year of running time.The ALP is assumed to be massless within the experimental resolution.The blue lines indicate 0.1 Hz and 1 Hz of the trigger rate, which gets larger than 10 Hz in the shaded regions.The dashed green show contours of S/B.On the right of the solid purple line the RMD background dominates over RC at trigger level, the dashed purple line shows where the RMD dominates with the offline selection.The stars indicates the benchmarks chosen for the MEGII-ALP dedicated run, see Eq. 11.
10 -5 10 -4 10 -3 10 -2 10 - < l a t e x i t s h a 1 _ b a s e 6 4 = " w E p E j Y O P J a s e U X 3 s U v y a u y + v Z D U = " > A A A B 9 X i c b V D L S g N B E J y N r x h f U Y 9 e B o P g K e y K q M e g H r w I U c w D k j X M z n a S I b 4 g P W g Z W j I A t B u O r l 6 R A + M 4 t N u p E y F S C f q 7 4 m U B V o P A 8 9 0 B g z 7 e t Y b i / 9 5 r Q S 7 Z 2 4 q w j hB C P l 0 U T e R F C M 6 j o D 6 Q g F H O T S E c S X M r Z T 3 m W I c T V A F E 4 I z + / I 8 q R + V n Z P y 8 c 1 x q X K e x Z E n e 2 S f H B K H n J I K u S J V U i O c K P J M X s m b 9 W i 9 W O / W x 7 Q 1 Z 2 U z u + Q P r M 8 f Q B C S X A = = < / l a t e x i t > RMD bkd < l a t e x i t s h a 1 _ b a s e 6 4 = " E f G V g x z J n B F t N g c G M 6 t T c B / + S C Y = " > A A A B 9 H i c b V D L T g J B E J z F F + I L 9 e h l I j H x R H Y N U Y 9 E L h 7 R y C O B D Z m d b W D C 7 M O Z X i L Z 8 B 1 e P G i M V z / G m 3 / j A H t Q s J J O K l X d6 e 7 y Y i k 0 2 v a 3 l V t b 3 9 j c y m 8 X d n b 3 9 g + K h 0 d N H S W K Q 4 N H M l J t j 2 m Q I o Q G C p T Q j h W w w J P Q 8 k a 1 m d 8 a g 9 I i C h 9 w E o M b s E E o + o I z N J L b R X j C 9 L 5 G v Z E / 7 R V L d t m e g 6 4 S J y M l k q H e K 3 5 1 / Y g n A Y T I J d O 6 4 9 g x u i l T K L i E a a G b a I g Z H 7 E B d A w N W Q D a T e d H T + m Z U X z a j 5 S p E O l c / T 2 R s k D r S e C Z z o D h U C 9 7 M / E / r 5 N g / 9 p N R R g n C C F f L O o n k m J E Z w l Q X y j g K C e G M K 6 E u Z X y I V O M o 8 m p Y E J w l l 9 e J c 2 L s n N Z r t x V S t W b L I 4 8 O S G n 5 J w 4 5 I p U y S 2 p k w b h 5 J E 8 k 1 f y Z o 2 t F + v d + l i 0 5 q x s 5 p j 8 g f X 5 A 6 A M k g Q = < / l a t e x i t > RC bkd < l a t e x i t s h a 1 _ b a s e 6 4 = " c x FIG.7.The unit-normalized distributions of RMD (Left), RC (Middle) background and signal (Right) events on the (θeγ, φeγ)plane for the parasitic analysis (• in Fig.1) with the bin size of 20mrad (θeγ) × 20mrad (φeγ).For the signal we took ma = 10 −4 MeV.
FIG.8.The unit-normalized distributions of RMD (Left), RC (Middle) background and signal (Right) events on the (θeγ, φeγ)plane for the benchmark point of the dedicated run ( in Fig.1) with the bin size of 20mrad (θeγ) × 20mrad (φeγ) up to modulo of 2π for φeγ.For the signal we took ma = 10 −4 MeV.
e x i t > e[rad]e ( e ) • , 240 • ].Due the non-homogeneous magnetic field, φ e acceptance interval should shift for lower values of E e , but we cannot reliably model this effect without the full MEG simulation framework.
Missing invariant mass distribution of signal and backgrounds after kinematic selection, detector efficiency and acceptance and resolution effects are taken into account.The ALP is fixed to ma = 10 −4 MeV, which is effectively massless within the m / E resolution.Left: missing mass distribution at MEG. Right: Missing invariant mass distribution with the MEG II-ALP data taking strategy proposed in Sec.III.

TABLE II .
Efficiencies in each kinematic selection for the parasitic analysis (Sec.II B).For signal, F V −A µe is fixed to 10 9 GeV.The "timing" selection refers to the tighter coincidence requirement in the offline cuts, as compared to the trigger selection.See Sec.II A for details.

TABLE III .
Efficiencies in each kinematic selection for the dedicated run (Sec.III).For signal, F V −A