Search for lepton flavour violating muon decay mediated by a new light particle in the MEG experiment

We present the first direct search for lepton flavour violating muon decay mediated by a new light particle X, $\mu^+ \to \mathrm{e}^+\mathrm{X}, \mathrm{X} \to \gamma\gamma$. This search uses a dataset resulting from $7.5\times 10^{14}$ stopped muons collected by the MEG experiment at the Paul Scherrer Institut in the period 2009--2013. No significant excess is found in the mass region 20--45 MeV/c$^2$ for lifetimes below 40 ps, and we set the most stringent branching ratio upper limits in the mass region of 20--40 MeV/c$^2$, down to $\mathcal{O}(10^{-11})$ at 90\% confidence level.

Standard Model (SM) of elementary particles and interactions. The observation of neutrino oscillations [1][2][3] showed that lepton flavour is not conserved in nature. As a consequence, charged lepton flavour isA violated, even though the rate is unobservably small (< 10 −50 ) in an extension of the SM accounting for measured neutrino mass differences and mixing angles [4,5]. In the context of new physics, in the framework of grand unified theories for example, CLFV processes can occur at an experimentally observable rate [6]. Therefore, such processes are free from SM physics backgrounds and a positive signal would constitute unambiguous evidence for physics beyond the SM. This motivates the effort to search for evidence of new physics through CLFV processes [7,8].
The MEG experiment at the Paul Scherrer Institut (PSI) in Switzerland searched for one such CLFV process, µ + → e + γ decay, with the highest sensitivity in the world. No evidence of the decay was found yet, leading to an upper limit on the branching ratio B(µ + → e + γ) < 4.2 × 10 −13 at 90% confidence level (C.L.) [9]. Models that allow µ + → e + γ decay at an observable rate usually assume that CLFV couplings are introduced through an exchange of new particles much heavier than the muon. Negative results by CLFV searches suggest another possibility: new physics exists at a lighter scale but with very weak coupling to SM particles.
A dedicated search for the MEx2G decay has never been done, although some constraints on the X particle parameter space can be deduced by experimental results from both related muon decay modes and non-muon experiments; these are discussed below.
Current upper limits on the inclusive decay µ + → e + X are given at O(10 −5 ) for m X in the range 13-80 MeV/c 2 [26]. 1 However, the current limits do not impose any constraints on the MEx2G decay in the target region of this search. They are complementary, relevant for cases where X is either stable or decays invisibly. For X resulting from muon decays, the only kinematically allowed visible decay channels are X → e + e − and X → γγ. The former can occur at tree level while the latter can occur via a fermion loop. The current upper limit on µ + → e + X, X → e + e − at a level of O(10 −12 ) [27] give stringent constraints on the 1 In these searches, only e + is looked at. MEx2G decay if we assume that X is more likely to decay into an e + e − pair. However, there is a possibility for X to be electrophobic, as pointed out in [28,29], and searches for both decay modes can hint at the model behind these decay modes.
The current upper limit on the decay µ + → e + γγ, B(µ + → e + γγ) < 7.2 × 10 −11 (90% C.L.) from the Crystal Box experiment [30] can be converted into an equivalent MEx2G upper limit by taking into account the difference in detector efficiencies [31]; the converted limits are shown in Fig. 1.
Axion-like particle searches from collider and beam dump experiments and from supernova observations also constrain the branching ratio X → γγ if the axion-like particles are generated from coupling to photons [32]. Figure 2 summarises the parameter regions excluded by these experiments. A region with decay length cτ X γ < 1 cm and m X > 20 MeV/c 2 still has room for the MEx2G decay.
Based on limits discussed above, we define the target parameter space of this search in the τ X -m X plane as shown in Fig. 3.

Detector
The MEG detector is briefly presented in the following, emphasising aspects relevant to this search; a detailed description is available in [36].
In this paper we adopt a Cartesian coordinate system (x, y, z) shown in Fig. 4 with the origin at the centre of the magnet. When necessary, we also refer to the cylindrical coordinate system (r, φ, z) as well as the polar angle θ. Figure 2 Excluded parameter regions for a scalar X with mass m X and coupling g γγ to 2γs from collider, beam dumps, and supernova [33][34][35] (from [32]). In black we show contours of the boosted decay length γcτ X of X → γγ, assuming X to be produced from an at-rest muon decay µ + → e + X. The solid black line corresponds to γcτ X = 0.01 cm, the dotted one to 0.1 cm, the dashed one to 1 cm and the dot-dashed line to 10 cm. Allowed X particle parameter space (white). The blue region has already been excluded [34] and the red shaded region on the right (m X 45 MeV/c 2 ) is inaccessible to MEG.
Multiple intense µ + beams are available at the πE5 channel in the 2.2-mA PSI proton accelerator complex. We use a beam of surface muons, produced by π + decaying near the surface of a production target. The beam intensity is tuned to a µ + stopping rate of 3 × 10 7 , limited by the rate capabilities of the tracking system and the rate of accidental backgrounds in the µ + → e + γ search. The muons at the production target are fully polarised (P µ + = −1), and they reach a stopping target with a residual polarisation P µ + = −0.86 ± 0.02 (stat) +0.05 −0.06 (syst) [37]. The positive muons are stopped and decay in a thin target placed at the centre of the spectrometer at a slant angle of ≈ 20 • from the µ + beam direction. The target is composed of a 205 µm thick layer of polyethylene and polyester (density 0.895 g/cm 3 ).
Positrons from the muon decays are detected with a magnetic spectrometer, called the COBRA (standing for COnstant Bending RAdius) spectrometer, consisting of a thin-walled superconducting magnet, a drift chamber array (DCH), and two scintillating timing counter (TC) arrays.
The magnet [38] is made of a superconducting coil with three different radii. It generates a gradient magnetic field of 1.27 T at the centre and 0.49 T at each end. The diameter of an emitted e + trajectory depends on the absolute momentum, independent of the polar angle due to the gradient field. This allows us to select e + s within a specific momentum range by placing the TC detectors in a specific radial range; e + s whose momenta are larger than ∼45 MeV/c fall into the acceptance of the TC. Furthermore, the gradient field prevents e + s emitted nearly perpendicular to the µ + beam direction from looping many times in the spectrometer. This results in a suppression of hit rates in the DCH. The thickness of the central part of the magnet is 0.2 radiation length to maximise transparency to γ; 85% of the signal γs penetrate the magnet without interactin and reach the photon detector.
Positrons are tracked in the DCH [39]. It is composed of 16 independent modules. Each module has a trapezoidal shape with base lengths of 104 cm (at smaller radius, close to the stopping target) and 40 cm (at larger radius). These modules are installed in the bottom hemisphere in the magnet at 10.5 • intervals. The DCH covers the azimuthal region between 191.25 • and 348.75 • and the radial region between 19.3 cm and 27.9 cm. It is composed of low mass materials and helium-based gas (He : C 2 H 6 = 1 : 1) to suppress Coulomb multiple scattering; 2.0 × 10 −3 radiation length path is achieved for the e + from µ + → e + γ decay at energy of E e + = 52.83 MeV (= m µ c 2 /2, where m µ is the mass of muon).
The TC [40,41] is designed to measure precisely the e + hit time. Fifteen scintillator bars are placed at each end of the COBRA. They are made of 4 × 4 × 80 cm 3 plastic scintillators with fine-mesh PMTs attached to both ends of the bars.
The efficiency of the spectrometer significantly depends on E e + as shown in Fig. 5. The e + energy from the MEx2G decay is lower than that from µ + → e + γ depending on m X , and the efficiency is correspondingly lower. The large m X search range is limited by this effect as shown in Fig. 3.
The photon detector is a homogeneous liquid-xenon (LXe) detector relying on scintillation light 2 for energy, position, and timing measurement [42,43]. As shown in Fig. 4, it has a C-shaped structure fitting the outer radius of the magnet. The fiducial volume is ≈ 800 , covering 11% of the solid angle viewed from the centre of the stopping target in the radial range of 67.85 < r < 105.9 cm, corresponding to ≈ 14 radiation length. It is able to detect a 52.83-MeV γ with  high efficiency and to contain the electromagnetic shower induced by it. The scintillation light is detected by 846 2inch PMTs submerged directly in the liquid xenon. They are placed on all six faces of the detector, with different PMT coverage on different faces. On the inner face, which is the densest part, the PMTs align at intervals of 6.2 cm. One of the distinctive features of the MEG experiment is that it digitises and records all waveforms from the detectors using the Domino Ring Sampler v4 (DRS4) chip [44]. The sampling speeds are set to 1.6 GSPS for TC and LXe photon detector and 0.8 GSPS for DCH. This lower value for DCH is selected to match the drift velocity and the required precision.
The DAQ event rate was kept below 10 Hz in order to acquire the full waveform data (≈ 1 MB/event). It was accomplished using a highly efficient online trigger system [45,46].
Several types of trigger logic were implemented and activated during the physics data-taking each with its own prescaling factor. However, a dedicated trigger for the Conditional efficiency (%) Figure 6 Trigger direction match efficiency for the MEx2G decay conditional to e + and 2γ detection as a function of m X evaluated with a Monte Carlo simulation (Sect. 4).
MEx2G events was neither foreseen nor implemented. Thus, we rely on the µ + → e + γ triggered data in this search.
The main µ + → e + γ trigger, with a prescaling of 1, used the following observables: γ energy, time difference between e + and γ, and relative direction of e + and γ. The DC was not used in the trigger due to the slow drift velocity. The condition on the relative direction is designed to select backto-back events. To calculate the relative direction, the PMT that detects the largest amount of scintillation light is used for the γ, while the hit position at the TC is used for the e + . This direction match requirement results in inefficient selection of the MEx2G signal because, unlike the µ + → e + γ decay, the MEx2G decay has 2γs with a finite opening angle, resulting in events often failing to satisfy the direction trigger. The selection inefficiency for MEx2G events is 10-50% depending on m X as shown in Fig. 6.
Finally, the detector has been calibrated and monitored over all data-taking period with various methods [47,48], ensuring that the detector performances have been under control over the duration of the experiment.

Search strategy
The MEx2G signal results from the sequential decays of µ + → e + X followed by X → γγ. The first part is a twobody decay of a muon at rest, signalled by a mono-energetic e + . The energy E e + is determined by m X : E e + (m X = 0) = 52.83 MeV and is a decreasing function of m X . The sum of energies of the two γs is also mono-energetic and an increasing function of m X . The momenta of the two γs are Lorentzboosted along the direction of X, which increases the acceptance in the LXe photon detector compared to the three-body decay µ + → e + γγ. The final-state three particles is expected to have an invariant mass of 105.7 MeV/c 2 (= m µ ) and the total momentum vector equal to 0.
A physics background that generates time-coincident e + γγ in the final state is µ + → e + ννγγ. This mode has not yet been measured but exists in the SM. The branching ratio is calculated to be O(10 −15 ) for the MEG detector configuration [49][50][51]. Therefore, it is negligible in this search.
The dominant background is the accidental pileup of multiple µ + s decays. There are three types of accidental background events: Type 1: The e + and one of the γs originate from one µ + , and the other γ from a different one. Type 2: The two γs share the same origin, and the e + is accidental. Type 3: All the particles are accidental.
The main source of a time-coincident e + γ pair in type 1 is the radiative muon decay µ + → e + ννγ [52]. The sources of time-coincident γγ pairs in type 2 are e + e − → γγ (e + from µ + decay and e − from material along the e + trajectory), µ + → e + ννγ with an additional γ, e.g. by a bremsstrahlung from the e +3 , or a cosmic-ray induced shower. Figure 7 shows the decay kinematics and the kinematic variables. The muon decay vertex and the momentum of the e + are obtained by reconstructing the e + trajectory using the hits in DCH and TC and the intersection of the trajectory with the plane of the muon beam stopping target (Sect. 5.1). The interaction positions and times of the two γs within the LXe photon detector and their energies are individually reconstructed using the PMT charge and time information of the LXe photon detector (Sect. 5.2).
Given the muon decay vertex, the two γs' energies and positions, and m X , the X decay vertex x vtx can be computed. Therefore, we reconstruct x vtx by scanning the assumed 3 In the case of type 2, the e + can have low energy and be undetected. value of m X (Sect. 5.3.1). If the final-state three particles do not originate at a single muon decay vertex, these variables will be inconsistent with originating from a single point.
After reconstructing x vtx , the relative time and angles (momenta) between X and e + are tested for consistency with a muon decay (Sect. 5.3.2 and 5.3.3).
The MEx2G decay search analysis is performed within the mass range 20 MeV/c 2 < m X < 45 MeV/c 2 at 1 MeV/c 2 step. This step is chosen small enough not to miss signals in the gaps. Therefore, adjacent mass bins are not statistically independent. The analysis was performed assuming lifetimes τ X = 5, 20, and 40 ps; the value affects only the signal efficiency.
We estimate the accidental background by using the data in which the particles are not time coincident. To reduce the possibility of experimental bias, a blind analysis is adopted; the blind region is defined in the plane of the relative times of the three particles (Sect. 6).
The signal efficiency is evaluated on the basis of a Monte Carlo simulation (Sect. 4). Its tuning and validation are performed using pseudo-2γ data as described in Sect. 4.1.

Simulation
The technical details of the program of Monte Carlo (MC) simulation are presented in [53] and an overview of the physics and detector simulation is available in [36]. In the following we report a brief summary.
The first step of the simulation is the generation of the physics events. That is realised with custom written code for a large number of relevant physics channels. The MEx2G decay is simulated starting from a muon at rest in the target; the decay products are generated in accordance with the decay kinematics for the given m X and τ X .
The muon beam transport, interaction in the target, and propagation of the decay products in the detector are simulated with a MC program based on GEANT3.21 [54] that describes the detector response. Between the detector simulation and the reconstruction program, an intermediate pro-gram processes the MC information, adding readout simulation and allowing event mixing to study the detector performance under combinatorial background events. Particularly, the µ + beam, randomly distributed in time at a decay rate of 3 × 10 7 µ + s −1 , is mixed with the MEx2G decay to study the e + spectrometer performance. The detectors' operating condition, such as the active layers of DCH and the applied high-voltages, are implemented with the known time dependence.
In order to simulate the accidental activity in the LXe photon detector, data collected with a random-time trigger are used. A MC event and a random-trigger event are overlaid by summing the numbers of photo-electrons detected by each PMT.

Pseudo two γ data
To study the performance of the 2γ reconstruction, we built pseudo 2γ events using calibration data. The following γ-ray lines are obtained in calibration runs: -54.9 MeV and 82.9 MeV from π − p → π 0 n → γγn reaction, -17.6 MeV and 14.6 MeV from 7 Li(p, γ) 8 Be reaction, -11.7 MeV from 11 B(p, 2γ) 12 C reaction.
We take two events from the above calibration data and overlay them, summing the number of photo-electrons PMT by PMT. These pseudo 2γ events are generated using both data and MC events.

Event reconstruction
We describe here the reconstruction methods and their performance, focusing on high-level objects; descriptions of the manipulation of low-level objects, including waveform analysis and calibration procedures, are available in [9,36]. The e + reconstruction (Sect. 5.1) is identical to that used in the µ + → e + γ decay analysis in [9]. The 2γ reconstruction was developed originally for this analysis (Sect. 5.2). After reconstructing the e + and two γs, the reconstructed variables are combined to reconstruct the X decay vertex (Sect. 5.3).

Positron reconstruction
Positron trajectories in the DCH are reconstructed using the Kalman filter technique [55,56] based on the GEANE software [57]. This technique takes the effect of materials into account. After the first track fitting in DCH, the track is propagated to the TC region to test matching with TC hits. The matched TC hits are connected to the track and then the track is refined using the TC hit time. Finally, the fitted track is propagated back to the stopping target, and the point of intersection with the target defines the muon decay vertex position (x e + ) and momentum vector that defines the e + emission angles (θ e + , φ e + ). The e + emission time (t e + ) is reconstructed from the TC hit time minus the e + flight time.
Positron tracks satisfying the following criteria are selected: the number of hits in DCH is more than six, the reduced chi-square of the track fitting is less than 12, the track is matched with a TC hit, and the track is successfully propagated back to the fiducial volume of the target. If multiple tracks in an event pass the criteria, only one track is selected and passed to the following analysis, based on the covariance matrix of the track fitting as well as the number of hits and the reduced chi-square.
The resolutions are evaluated based on the MC, tuned to data using double-turn events; tracks traversing DCH twice (two turns) are selected and reconstructed independently by using hits belonging to each turn. The difference in the reconstruction results by the two turns indicates the resolution. The MC results are smeared so that the doubleturn results become the same as those with the data. Figure 8 shows the E e + resolution as a function of E e + . The angular resolutions also show a similar E e + dependence. The φ e + -and θ e + -resolutions for m X = 20 (45) MeV/c 2 are σ φ e + ∼ 12 (15) mrad and σ θ e + ∼ 10 (11) mrad, respectively. The time resolution is σ t e + ∼ 100 (130) ps.

Photon reconstruction
Coordinates (u, v, w) are used in the LXe photon detector local coordinate system rather than the global coordinates (x, y, z): u coincides with z, v = r in (π − φ) where r in = 67.85 cm is the radius of the inner face, and w = r − r in is the depth measured from the inner face.

Multiple photon search
A peak search is performed based on the light distributions on the LXe photon detector inner and outer faces by using TSpectrum2 [58,59]. The threshold of the peak light yield is set to 200 photons. Events that have more than one peak are identified as multiple-γ events.

Position and energy
Hereafter, only the multiple-γ events are analysed. When more than two γs are found, we select the two with the largest energy by performing the position-energy fitting described in this subsection on different combinations of two γs. Figure 9 shows a typical event display of a 2γ event. Each PMT detects photons from the two γs. The key point of the 2γ reconstruction is how to divide the number of photons detected in each PMT into a contribution from each γ.
Calculation of initial values. First, the positions of the detected peaks in (u, v) are used as the initial estimate with w = 1.5 cm. Given the interaction point of each γ within the LXe photon detector, the contribution from each γ to each PMT can be calculated as follows. Assuming the ratio of the energy of γ 1 to that of γ 2 to be E γ 1 : E γ 2 = R 1 : (1 − R 1 ) (0 < R 1 < 1, at first R 1 is set to 0.5), the fractions of the number of photons from γ 1 is calculated as where Ω 1,i is the solid angle subtended by the i-th PMT from the γ 1 interaction point. The total number of photons generated by γ 1 (2) , M pho,1 (2) , is calculated from the ratio R 1,i and the number of photons at each PMT N pho,i as Then, R 1 is updated to R 1 = M pho,1 /(M pho,1 + M pho,2 ) and calculations (1) and (2) are repeated with the updated R 1 . This procedure is iterated four times.
Position pre-fitting. Inner PMTs that detect more than 10 photons are selected to perform a position pre-fitting. The following quantity is minimised during the fitting: where σ 2 pho,i (N pho,i ) = N pho,i / PMT,i with PMT,i being the product of quantum and collection efficiencies of the PMT. This fitting is performed 4 separately for each γ: first, the light distribution is fitted with {x γ 1 , M pho,1 } as free parameters, while the other parameters are fixed; next, the light distribution is fitted with {x γ 2 , M pho,2 } as free parameters, while the other parameters are fixed.
Energy pre-fitting. To improve the energy estimation, M pho,1 (2) are fitted while the other parameters are fixed. The same χ 2 2γ (Eq. (3)) is used but only with PMTs that detect more than 200 photo-electrons.
The γ with the larger M pho is defined as γ 1 and the second largest one is defined as γ 2 in the later analysis.
Position and energy fitting. At the final step, all the parameters are fitted simultaneously to eliminate the dependence of the fitted positions on the value of M pho,1 (2) initially assumed. The best-fit value of M pho,1(2) is used to update R 1 and calculations (1) and (2) are repeated again to obtain the final value of M pho,1 (2) . Finally, it is converted into E γ 1(2) : where U(x γ 1 (2) ) is a uniformity correction factor, H(T ) is a time variation correction factor with T being the calendar time when the event was collected, and S is a factor to convert the number of photons to energy. The functions U(x γ 1 (2) ) and H(T ) are mainly derived from the 17.6-MeV line from 7 Li(p, γ) 8 Be reaction, which was measured twice per week. The factor S is calibrated using the 54.9-MeV line from π 0 decay, taken once per year.
Energy-ratio correction. Both the MC data and the pseudo-2γ data show an anti-correlation between the errors in E γ 1 and E γ 2 , while their sum is not biased. The reconstruction error in the pseudo-2γ data is quantified by the ratio of the reconstructed energy for gamma in the pseudo-2γ data to that in the original data without the overlay; it was shown to have a linear dependence on the energy ratio R 1 . Therefore, we apply an energy-ratio correction to the fitted energies; the correction coefficients are evaluated from the pseudo-2γ data with different combinations of calibration data.

Selection criteria.
To guarantee the quality of the reconstruction, the following criteria are imposed on the reconstruction results: the fits for both γs converge; the two γ positions are both within the detector fiducial volume defined as |u| < 25 cm ∧ |v| < 71 cm; the distance between the two γs on the inner face is d uv > 20 cm; E γ 1(2) > 10 MeV; and Probability density function for E γ . The probability density function (PDF) for E γ 1(2) is evaluated by means of the MC simulation. To tune the MC, the pseudo-2γ data of MC and data are used. It is asymmetric with a lower tail and modelled as follows: where E γ is a reconstructed γ energy, E true γ is the true value, f is the fraction of the narrow component, A is a normalisation parameter, E t is the transition parameter between the Gaussian and exponential components, and σ E γ is the standard deviation of the Gaussian component describing the width on the high-energy side. The parameters E t and σ E γ are correlated with each other, different for the narrow and wide components, and are dependent on E true γ . Figure 10 shows an example of the PDFs for 2γ events with E true γ 1 = 55 MeV and E true γ 2 = 12 MeV.

Time
The interaction time of γ 1 (γ 2 ) can be reconstructed using the pulse time measured by each PMT (t PMT,i ) by correcting for a delay time (t delay,γ 1(2) ,i ) including the propagation time of the light between the interaction point and the PMT and the time-walk effect, and a time offset due to the readout electronics (t offset,i ): The single PMT time resolution σ t,i is approximately proportional to 1/ N pe,γ 1(2) ,i with σ t,i (N pe,γ 1(2) ,i = 500) ≈ 500 ps, where N pe,γ 1(2) ,i is the number of photo-electrons from γ 1 (γ 2 ). These individual PMT measurements are combined to obtain the best estimate of the interaction time of γ 1 (γ 2 ) (t γ 1 (2) ). The following χ 2 is minimised: We use PMTs whose light yield from γ 1 (γ 2 ) is 5 times higher than that from γ 2 (γ 1 ) excluding PMTs whose light yield is less than 100 photons or which give large χ 2 contribution in the fitting. The E γ -dependent time resolution for single γ event is evaluated with the calibration runs and corrected for 2γ events using the MC:

Combined reconstruction
In this section, we present the reconstruction method for the X → γγ vertex assuming a value for m X in the reconstruction. We scan m X in 20-45 MeV/c 2 at 1 MeV/c 2 intervals; each assumed mass results in a different reconstructed X → γγ vertex position.

X decay vertex
A maximum likelihood fit is used in the reconstruction, with the following observables: The fit parameters are the following: where θ rest is the γ emission angle in the X rest frame, φ rest is the angle of the photons in the X rest frame with respect to the X momentum direction in the MEG coordinate system, and x vtx is the X decay vertex position. The likelihood function L(Θ) is defined as follows: where l X = |x vtx −x e + | is the X decay length. The term P(x e + | x true e + ) is omitted by approximating x true e + by x e + to reduce the fitting parameters. The PDFs of the positron angles are approximated as single Gaussians, and those of the γ position are approximated as double Gaussians to fit better tails in the PDF.
The energy dependence of the E γ 1(2) PDF Eq. (5) is modelled with a morphing technique [61] using two quasimonoenergetic calibration lines: the 11.7 MeV line from the nuclear reaction of 11 B(p, 2γ) 12 C and the 54.9 MeV line from π 0 decay. The PDF of l X is added to the likelihood function to constrain the radial position of the X decay vertex, which is defined and normalised for l X ≥ 0. We fix τ X = 20 ps since the vertex reconstruction performance is almost independent of τ X in the assumed range. The x vtx resolution of the maximum-likelihood fit is evaluated via the MC to be σ x vtx = (8, 12) mm in the transverse and longitudinal directions.
We define a χ 2 to quantify the goodness of the vertex fit as The variables with the superscript "best" indicate the bestfitted parameters in the maximum likelihood fit and the variables with no superscript indicate the measured ones. Here (θ X , φ X ) = (π − θ e + , π + φ e + ) is the direction opposite to (θ e + , φ e + ). The σ of each variable is the corresponding resolution when the distribution is approximated as a single Gaussian. Figure 11 shows the dependence of χ 2 vtx 5 on the assumed value of m X for the MC signal events. When the assumed value is the same as the true value (m X = 30 MeV/c 2 in this case), the resultant χ 2 vtx becomes minimum on average. The effective m X resolution is ∼ 2.5 MeV/c 2 .

Momentum
Given the vertex position, the momentum of each γ can be calculated. The sum of the final-state three particles momenta, should be 0 for the MEx2G events.

Relative time
The time difference between the 2γs at the X vertex is calculated as where l γ 1(2) is the distance between the γ 1(2) interaction point in the LXe photon detector and the X vertex position, l γ 1(2) = |x γ 1(2) − x vtx |. The time difference between γ 1 and e + at the muon vertex is calculated as These two time differences should be 0 for the MEx2G events.

Dataset and event selection
We use the full MEG dataset, collected in 2009-2013, as was used in the µ + → e + γ search reported in [9]. As described in Sect. 2, the µ + → e + γ trigger data are used in this analysis. In total, 7.5 × 10 14 µ + s were stopped on the target. A pre-selection was applied at the first stage of the µ + → e + γ decay analysis, requiring that at least one positron track is reconstructed and the time difference between signals in the LXe photon detector and TC is in the range −6.9 < t LXe−TC < 4.4 ns. At this stage, aiming to select the µ + → e + γ decays, the time of the LXe photon detector is reconstructed with PMTs around the largest peak found in the peak search (Sect. 5.2.1). This retained ∼16% of the dataset, on which the full event reconstruction for the µ + → e + γ decay analysis was performed. Before processing the MEx2G dedicated reconstruction, we applied an additional event selection using the µ + → e + γ reconstruction results. It was based on the existence of multiple (≥ 2) γs and the total energy of the γs 6 and e + (E total ) being |E total − m µ | < 0.2m µ . This selection reduces the dataset by an additional factor 6 At this stage, the sum of the multiple γs' energy is reconstructed without being separated into each γ. of ∼ 300. We applied the MEx2G dedicated reconstruction (Sect. 5.2.2, 5.2.3, and 5.3) to this selected dataset. A blind region was defined containing the events satisfying the cuts |t γ 1 e + | < 1 ns ∧ |t γγ | < 1 ns. This blind region is large enough to hide the signal. Those events were sent in a separated data-stream and were not used in the definition of the analysis strategy including cuts; background events in the signal region were estimated without using events in the blind region. After the analysis strategy was defined, the blind region was opened and events in this region were added to perform the last step of analysis.
The accidental background can be estimated from the off-time sideband regions defined in Fig. 12. There are three such regions: A, B, and C; each containing a different combination of the types of background as defined in Sect. 3. The outer boundary of the time sidebands, |t γ 1 e + | < 3.5 ns ∧ |t γγ | < 3.5 ns, are determined so that the background distribution is not deformed by the time-coincidence trigger condition. The widths (x A,C and y B,C in Fig. 12) are the same as the outer boundary of the signal region defined depending on m X by the signal selection criteria described below.
The following seven variables are used for the signal selection: 1. E e + : the e + energy. 2. E sum : the total energy of the three particles. 3. |P sum |: the magnitude of the sum of the three particles' momenta. 4. d uv : the distance between the 2γ positions on the LXe photon detector inner face. 5. t γ 1 e + : the time difference between γ 1 and e + calculated in Eq. (17). 6. t γγ : the time difference between 2γs calculated in Eq. (16). 7. χ 2 vtx : the goodness of vertex fitting calculated in Eq. (14). First, we fix the E e + selection to require |E e + − E m X e + | < 1 MeV, where E m X e + is the e + energy for the MEx2G decay with m X . This selection is also used in the Michel normalisation described in Sect. 8.
Next, we optimise the cut thresholds for the other variables to maximise the experimental outcomes. Distributions of these variables for the signal and background at a parameter set (20 MeV/c 2 , 20 ps) are shown in Fig. 13. The time sideband events are used for the background distribution, while MC samples are used for the signal distribution. All other selection criteria, such as trigger and reconstruction conditions, are applied in advance.
Punzi's sensitivity [62] is used as a figure of merit. The sensitivity σ Punzi is defined as where a and b are the significance and the power of a test, respectively, selection is the selection efficiency for the signal, and N BG is the expected number of background events. The values of a and b should be defined before the analysis, and we set a = 3, b = 1.28 (= 90%), where b is set to the value appropriate to the confidence level being used to set the upper limit when a non-significant result is obtained. The optimisation process is divided into two steps. In the first step, we optimise the cut thresholds of variables 2-6, independently for each variable in order to maintain high statistics in the sidebands. As an approximation, before applying each cut we normalise N BG = 1 and then maximise σ −1 Punzi ; this leads to suboptimal criteria. In the second step, after all other selection criteria are applied, the threshold for χ 2 vtx is optimised to give the highest σ −1 Punzi . In this step, to estimate N BG from the low statistics in the sideband regions, we use a kernel-density-estimation method [63] to model the continuous event distribution.
The cut thresholds are optimised at 5-MeV intervals in m X , while the same thresholds are used for different τ X for each m X . The optimised thresholds for m X = 20 MeV/c 2 are shown as black lines in Fig. 13. These cuts result in selection = 67% (m X = 45 MeV/c 2 ) -51% (m X = 20 MeV/c 2 ).

Statistical treatment of background and signal
In the following, we describe how we estimate the expected number of background events in the signal region (N BG ) from the numbers of events observed in sidebands A, B, and C (N obs A , N obs B , and N obs C ). There are three types of accidental background events defined in Sect. 3. The expected number of background events in the signal region is given by where N 1 , N 2 , N 3 are the expected numbers of background events in the signal region from the types 1, 2, and 3, respectively. Sideband A has the contributions from types 2 and 3, B has the contributions from types 1 and 3, and C has the contribution from type 3. Figure 14 shows the time distributions in the sideband regions. A peak of type 2 on a flat component of type 3 is observed in the t γγ distribution, while a peak of type 1 is not clearly visible in the t γ 1 e + distribution. The uniformity of the accidental backgrounds is examined using these distributions; the number of events in (|t γ 1 e + | < 1 ns ∧ 1 < |t γγ | < 3.5 ns) is compared to the the number of events interpolated from the region (1 < |t γ 1 e + | < 3.5 ns ∧ 1 < |t γγ | < 3.5 ns) scaled by the ratio of the widths of the time ranges (2 ns/5 ns). They agree within 1.7% (the central part, including type 1, is 1.7% larger than the interpolation). In Fig. 14(b), t γγ the distribution for 1 < |t γ 1 e + | < 3.5 ns is superimposed on that for |t γ 1 e + | < 1 ns after scaling by the time range ratio. The tail component of type 2 is consistent in these regions. The errors on the background estimations by the interpolation are thus negligibly small compared with the statistical uncertainties in N obs A , N obs B , N obs C .  Using N 1 , N 2 , N 3 , the expected numbers of events in sidebands A, B, C can be calculated as follows: where x A(C) and y B(C) are the sizes of the signal regions (sideband regions) in t γγ and t γ 1 e + , respectively, as defined in Fig. 12, and f escape = 0.171 ± 0.003 is the fraction of type 2 events in |t γγ | > 1 ns. The likelihood function for N BG is given from the Poisson statistics as,  Table 2). However, we do not use this estimated N BG in the inference of the signal but use (N obs A , N obs B , N obs C ) as discussed in the following.
Our goal is to estimate the branching ratio of the MEx2G decay (B MEx2G ). The likelihood function Eq. (23) is extended to include B MEx2G as a parameter and the number of events in the signal region (N obs S ) as an observable. In addition, to incorporate the uncertainty in the single event sensitivity (SES) into the B MEx2G estimation, an estimated SES (s 0 ) and the true value (s) are included into the likelihood function: Using N 1 , N 2 , N 3 and a Gaussian PDF for the inverse of SES, it can be written as, The best estimated values of the parameter set {B MEx2G , N 1 , N 2 , N 3 , s} are obtained by maximising Eq. (25). Among them, only B MEx2G is the interesting parameter, while the others are regarded as nuisance parameters ν = (N 1 , N 2 , N 3 , s). Figure 14 Time distributions in the sideband regions for (20 MeV/c 2 , 20 ps). (a) t γ1e + distributions for |t γγ | < 1 ns (red open circles) and for 1 < |t γγ | < 3.5 ns (black closed circles). (b) t γγ distributions for |t γ1e + | < 1 ns (red open circles) and for 1 < |t γ1e + | < 3.5 ns scaled by the ratio of the time ranges (black closed circles). A loose cut is applied:

(b)
A frequentist test of the null (background-only) hypothesis is performed with the following profile likelihood ratio λ p as the test statistic [3]: whereB MEx2G andν are the best-estimated values, andν is the value of ν that maximises the likelihood at the fixed B MEx2G . The systematic uncertainties of the background estimation and the SES are incorporated into the test by profiling the likelihood about ν. The local 7 significance is quantified by the p-value p local , defined as the probability to find λ p that is equally or less compatible with the null hypothesis than that observed with the data when the signal does not exist.
Since m X is unknown, we need to take the lookelsewhere effect [3] into account to calculate the global significance. We estimate this effect following the approaches in [64,65], in which the trial factor of the search is estimated using an asymptotic property of λ p , obeying the chi-square distribution. The smallest p local in the m X scan is converted 7 Assuming that the signal is at the assumed m X .
into the global p-value p global assuming that the signal can appear only at one m X .
The range of B MEx2G at 90% C.L. is constructed based on the Feldman-Cousins unified approach [66] with the profile-likelihood ratio ordering.

Single event sensitivity
The SES of the MEx2G decay s is defined as follows: where N MEx2G is the expected number of signal events in the signal region. We calculate it using Michel decay (µ + → e + νν) events taken at the same time with the µ + → e + γ trigger. This Michel normalisation is beneficial for the following reasons. First, systematic uncertainties coming from the muon beam are cancelled because beam instability is included in both Michel triggered and the µ + → e + γ triggered events. Moreover, we do not need to know the µ + stopping rate nor the live DAQ time. Second, most of the systematic uncertainties coming from e + detection are also cancelled. The absolute value of e + efficiency is not needed. The number of Michel events is given by where N µ + : the number of stopped µ + s; B Michel : branching ratio of the Michel decay (≈ 1); f Michel : branching fraction of the selected energy region (7%-10% depending on m X ); p Michel : prescaling factor of the Michel trigger (= 10 7 ); p correction : correction factor of p Michel depending on the muon beam intensity; A Michel : geometrical acceptance of the spectrometer for Michel e + s; Michel : e + efficiency for Michel events within the geometrical acceptance of the spectrometer.
The number of MEx2G events is given by where p MEG : prescaling factor of the µ + → e + γ trigger (=1); A e + : geometrical acceptance of the spectrometer for MEx2G e + s; e + : e + efficiency for MEx2G events conditional to e + s in the geometrical acceptance of the spectrometer; 2γ : the product of 2γ geometrical acceptance and 2γ trigger, detection, and reconstruction efficiency, conditional to the e + detection; DM : the trigger direction match efficiency conditional to the e + and 2γ detection (Fig. 6); Using Eqs. (27)- (29), an estimate of the SES is given by The geometrical acceptance of the spectrometer is common, hence A e + /A Michel = 1; the estimate of the relative e + efficiency is e + / Michel = 89% (m X = 45 MeV/c 2 ) − 97% (m X = 20 MeV/c 2 ) increasing monotonically with m X . The estimate of 2γ is shown in Fig. 15; 2γ = 0.6% (m X = 45 MeV/c 2 ) -2.9% (m X = 20 MeV/c 2 ), decreasing monotonically with m X . This dependence comes mainly from the 2γ acceptance: for increasing m X , the opening angle between the 2γs becomes larger, resulting in a decreasing efficiency. The systematic uncertainties are summarised in Table 1. The uncertainty in the 2γ detection efficiency and that in the MC smearing parameters are the dominant components.
The estimated value of SES is s 0 = (2.9 ± 0.3) × 10 −12 (20 MeV/c 2 ) -(6.3±1.1)×10 −10 (45 MeV/c 2 ) for τ X = 20 ps increasing monotonically with m X . The e + efficiency is e + = 1% (45 MeV/c 2 ) -36% (20 MeV/c 2 ) decreasing monotonically with m X , estimated with the MC, although this quantity is not necessary for the normalisation. The overall efficiency for the MEx2G events conditional to the e + in the geometrical acceptance of the spectrometer is therefore 9 Results and discussion Table 2 summarises the numbers of events in the signal region and the sidebands as well as the expected number of background events in the signal region. We observe non-zero events in the signal region for some masses. Note that the adjacent m X bins are not statistically independent. Summing up the observed events gives nine events but five of them are unique events. One event appears in four bins (m X = 34, 35, 36, 37 MeV/c 2 ) and another event appears in two bins (m X = 35, 36 MeV/c 2 ).  We discuss the results for τ X = 20 ps below. The results for other τ X are similar, with small changes in the efficiency. The results are presented in detail in Appendix A. Figure 16 shows 90% confidence intervals on B MEx2G obtained from this analysis together with the sensitivities and the previous upper limits due to Crystal Box. The sensitivities are evaluated by the mean of the branching ratio limits at 90% C.L. under the null hypothesis. Note that since we adopt the Feldman-Cousins unified approach, a one-sided or two-sided interval is automatically determined according to the data. Therefore, lower limits can be set in m X regions where non-zero events are observed with small N BG .
The statistical significance of the excesses is tested against the null hypothesis. Figure 17 shows p local versus m X . We observe the lowest p local = 0.012 at m X = 35 MeV/c 2 , which corresponds to 2.2σ significance. The global p-value is calculated to be p global ≈ 0.10 by taking the look-elsewhere effect into account. This corresponds to 1.3σ, that is not statistically significant.  Table 2 The number of observed events in the sideband regions and the signal region and the expected number of background events in the signal region. This publication reports results from the full MEG dataset. Hence, new experiments will be needed for further ex-ploration of this decay, e.g. to test whether the small excess observed in this search grows. An upgraded experiment, MEG II, is currently being prepared [67]. A brief prospect for improved sensitivity to MEx2G in MEG II is discussed below. In this analysis the sensitivity worsens with increasing m X , mainly due to the 2γ acceptance and direction match efficiencies. The acceptance is determined by the geometry of the LXe photon detector and is not changed by the upgrade. The direction match efficiency can even worsen if we only consider the µ + → e + γ search; the γ position resolution is expected to improve by a factor two, which enables tightening the direction match trigger condition. However, the MEG II trigger development is underway and the trigger efficiency for high mass can be improved up to a factor ∼ 2 if a dedicated trigger is prepared. Basically, MEG II will collect ten times more µ + decays and the resolutions of each kinematic variable will improve by roughly a factor two, leading to higher efficiency while maintaining low background. It is therefore possible to improve the sensitivity by one order of magnitude.

Conclusions
We have searched for a lepton-flavour-violating muon decay mediated by a new light particle, µ + → e + X, X → γγ decay, for the first time using the full dataset (2009-2013) of the MEG experiment. No significant excess was found in the mass range m X = 20-45 MeV/c 2 and τ X < 40 ps, and we set new branching ratio upper limits in the mass range m X = 20-40 MeV/c 2 . In particular, the upper limits are lowered to the level of O(10 −11 ) for m X = 20-30 MeV/c 2 . The result is up to 60 times more stringent than the bound converted from the previous experiment, Crystal Box.

Appendix A Detailed results for different lifetimes
The detailed results for different lifetimes τ X are summarised in Tables 3-5.