1 Introduction

The search for charged lepton flavour violating (CLFV) processes is one of the key tools to probe for physics beyond the Standard Model (SM) of elementary particles and interactions. The observation of neutrino oscillations [1,2,3] showed that lepton flavour is not conserved in nature. As a consequence, charged lepton flavour is violated, even though the rate is unobservably small \(\left( <\!10^{-50}\right) \) in an extension of the SM accounting for measured neutrino mass differences and mixing angles [4, 5]. In the context of new physics, in the framework of grand unified theories for example, CLFV processes can occur at an experimentally observable rate [6]. Therefore, such processes are free from SM physics backgrounds and a positive signal would constitute unambiguous evidence for physics beyond the SM. This motivates the effort to search for evidence of new physics through CLFV processes [7, 8].

The MEG experiment at the Paul Scherrer Institut (PSI) in Switzerland searched for one such CLFV process, \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)decay, with the highest sensitivity in the world. No evidence of the decay was found yet, leading to an upper limit on the branching ratio \(\mathcal {B}(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ ) < 4.2\times 10^{-13}\) at 90% confidence level (C.L.) [9]. Models that allow \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)decay at an observable rate usually assume that CLFV couplings are introduced through an exchange of new particles much heavier than the muon. Negative results by CLFV searches leave open another possibility: new physics exists at a lighter scale but with very weak coupling to SM particles.

If a new particle X (with mass \(m_\mathrm {X}\) and lifetime \(\tau _\mathrm {X}\)) lighter than the muon exists, the CLFV two-body decay \(\upmu \rightarrow \mathrm {eX}\) may be a good probe for such new physics. The experimental signature depends on how the new particle X decays. In this paper, we report a search for \(\upmu ^+ \rightarrow \mathrm {e}^+\mathrm {X}, \mathrm {X} \rightarrow \upgamma \upgamma \ \)(MEx2G) decay using the full dataset collected in the MEG experiment. Here, we assume that X is an on-shell scalar or pseudo-scalar particle. Axion-like particles [10,11,12,13], Majoron [14, 15], familon [16,17,18,19], flavon [20, 21], flaxion [22, 23], hierarchion [24], and strongly interacting massive particles [25, 26] are candidates for X.

A dedicated search for the MEx2G decay has never been done, although some constraints on the X particle parameter space can be deduced by experimental results from both related muon decay modes and non-muon experiments; these are discussed below.

Current upper limits on the inclusive decay \(\upmu ^+\rightarrow \mathrm {e^+ X}\) are given at \(\mathcal {O}(10^{-5})\) for \(m_\mathrm {X}\) in the range 13–80 MeV/c\(^2\) [27].Footnote 1 However, the current limits do not impose any constraints on the MEx2G decay in the target region of this search. They are complementary, relevant for cases where X is either stable or decays invisibly. For X resulting from muon decays, the only kinematically allowed visible decay channels are \(\mathrm {X \rightarrow e^+e^-}\) and \(\mathrm {X} \rightarrow \upgamma \upgamma \). The former can occur at tree level while the latter can occur via a fermion loop. The current upper limit on \(\upmu ^+\rightarrow \mathrm {e^+ X}, \mathrm {X}\rightarrow \mathrm {e^+e^-}\) at a level of \(\mathcal {O}\left( 10^{-12}\right) \) [28] give stringent constraints on the MEx2G decay if we assume that X is more likely to decay into an e\(^+\)e\(^-\) pair. However, there is a possibility for X to be electrophobic, as pointed out in [29, 30], and searches for both decay modes can hint at the model behind these decay modes.

The current upper limit on the decay \(\upmu ^+\rightarrow \mathrm {e^+}\upgamma \upgamma \), \(\mathcal {B}(\upmu ^+\rightarrow \mathrm {e^+}\upgamma \upgamma )< 7.2\times 10^{-11}\) (90% C.L.) from the Crystal Box experiment [31] can be converted into an equivalent MEx2G upper limit by taking into account the difference in detector efficiencies [32]; the converted limits are shown in Fig. 1.

Fig. 1
figure 1

Upper limits on MEx2G decay estimated by converting the upper limits on \(\upmu ^+\rightarrow \mathrm {e^+}\upgamma \upgamma \) from the Crystal Box experiment as a function of \(m_\mathrm {X}\). Lines with different markers and colours correspond to different \(\tau _\mathrm {X}\)

Axion-like particle searches from collider and beam dump experiments and from supernova observations also constrain the branching ratio \(\mathrm {X}\rightarrow \upgamma \upgamma \) if the axion-like particles are generated from coupling to photons [33]. Figure 2 summarises the parameter regions excluded by these experiments. A region with decay length c\(\tau _\mathrm{{X}}\gamma<\) 1 cm and \(m_\mathrm{{X}}>20\) MeV/c\(^2\) still has room for the MEx2G decay.

Fig. 2
figure 2

Excluded parameter regions for a scalar X with mass \(m_\mathrm{{X}}\) and coupling \(g_{\gamma \gamma }\) to 2\(\upgamma \)s from collider, beam dumps, and supernova [34,35,36] (from [33]). In black we show contours of the boosted decay length \(\gamma \mathrm {c}\tau _\mathrm {X}\) of \(\mathrm {X}\rightarrow \upgamma \upgamma \), assuming X to be produced from an at-rest muon decay \(\upmu ^+\rightarrow \mathrm {e^+ X}\). The solid black line corresponds to \(\gamma \mathrm {c}\tau _\mathrm {X}=0.01\) cm, the dotted one to 0.1 cm, the dashed one to 1 cm and the dot-dashed line to 10 cm

Based on limits discussed above, we define the target parameter space of this search in the \(\tau _\mathrm {X}\)\(m_{\mathrm {X}}\) plane as shown in Fig. 3.

Fig. 3
figure 3

Allowed X particle parameter space (white). The blue region has already been excluded [35] and the red shaded region on the right (\(m_\mathrm {X}\gtrsim 45\) MeV/c\(^2\)) is inaccessible to MEG

2 Detector

The MEG detector is briefly presented in the following, emphasising aspects relevant to this search; a detailed description is available in [37].

In this paper we adopt a Cartesian coordinate system (xyz) shown in Fig. 4 with the origin at the centre of the magnet. When necessary, we also refer to the cylindrical coordinate system \({ (r,\phi ,z)}\) as well as the polar angle \(\theta \).

Multiple intense \(\upmu ^+\) beams are available at the \(\uppi \)E5 channel in the 2.2-mA PSI proton accelerator complex. We use a beam of surface muons, produced by \(\uppi ^+\) decaying near the surface of a production target. The beam intensity is tuned to a \(\upmu ^+\) stopping rate of \(3\times 10^7\), limited by the rate capabilities of the tracking system and the rate of accidental backgrounds in the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)search. The muons at the production target are fully polarised (\(P_{\upmu ^+}=-1\)), and they reach a stopping target with a residual polarisation \(P_{\upmu ^+} = -\,\,0.86 \pm 0.02 ~ \mathrm{(stat)} ~ { }^{+ 0.05}_{-0.06} ~ \mathrm{(syst)}\) [38].

The positive muons are stopped and decay in a thin target placed at the centre of the spectrometer at a slant angle of \(\approx \) \(20^\circ \) from the \(\upmu ^+\) beam direction. The target is composed of a 205 \(\upmu \)m thick layer of polyethylene and polyester (density 0.895 g/cm\(^3\)).

Fig. 4
figure 4

The figure shows a schematic view of the MEG detector with a simulated MEx2G event emitted from the target. The top view is shown on the left, the view from downstream on the right

Positrons from the muon decays are detected with a magnetic spectrometer, called the COBRA (standing for COnstant Bending RAdius) spectrometer, consisting of a thin-walled superconducting magnet, a drift chamber array (DCH), and two scintillating timing counter (TC) arrays.

The magnet [39] is made of a superconducting coil with three different radii. It generates a gradient magnetic field of 1.27 T at the centre and 0.49 T at each end. The diameter of an emitted e\(^+\) trajectory depends on the absolute momentum, independent of the polar angle due to the gradient field. This allows us to select e\(^+\)s within a specific momentum range by placing the TC detectors in a specific radial range; e\(^+\)s whose momenta are larger than \(\sim \) 45 MeV/c fall into the acceptance of the TC. Furthermore, the gradient field prevents e\(^+\)s emitted nearly perpendicular to the \(\upmu ^+\) beam direction from looping many times in the spectrometer. This results in a suppression of hit rates in the DCH. The thickness of the central part of the magnet is 0.2 radiation length to maximise transparency to \(\upgamma \); 85% of the signal \(\upgamma \)s penetrate the magnet without interaction and reach the photon detector.

Positrons are tracked in the DCH [40]. It is composed of 16 independent modules. Each module has a trapezoidal shape with base lengths of 104 cm (at smaller radius, close to the stopping target) and 40 cm (at larger radius). These modules are installed in the bottom hemisphere in the magnet at 10.5\(^{\circ }\) intervals. The DCH covers the azimuthal region between 191.25\(^{\circ }\) and 348.75\(^{\circ }\) and the radial region between 19.3 cm and 27.9 cm. It is composed of low mass materials and helium-based gas (\(\mathrm {He}:\mathrm {C_2H_6} = 1:1\)) to suppress Coulomb multiple scattering; \(2.0\times 10^{-3}\) radiation length path is achieved for the e\(^+\) from \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)decay at energy of \(E_\mathrm {e^+}=52.83\) MeV (\(= m_\upmu \mathrm {c}^2/2\), where \(m_\upmu \) is the mass of muon).

The TC [41, 42] is designed to measure precisely the e\(^+\) hit time. Fifteen scintillator bars are placed at each end of the COBRA. They are made of \(4\times 4\times 80\) cm\(^3\) plastic scintillators with fine-mesh PMTs attached to both ends of the bars.

The efficiency of the spectrometer significantly depends on \(E_\mathrm {e^+}\) as shown in Fig. 5. The e\(^+\) energy from the MEx2G decay is lower than that from \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)depending on \(m_{\mathrm {X}}\), and the efficiency is correspondingly lower. The large \(m_{\mathrm {X}}\) search range is limited by this effect as shown in Fig. 3.

Fig. 5
figure 5

COBRA spectrometer relative efficiency as a function of \(E_\mathrm {e^+}\) normalised to \(\epsilon _\mathrm {e^+}(52.83\) MeV) = 1

The photon detector is a homogeneous liquid-xenon (LXe) detector relying on scintillation lightFootnote 2 for energy, position, and timing measurement [43, 44]. As shown in Fig. 4, it has a C-shaped structure fitting the outer radius of the magnet. The fiducial volume is \(\approx \) 800 \(\ell \), covering 11% of the solid angle viewed from the centre of the stopping target in the radial range of \(67.85<r<105.9\) cm, corresponding to \(\approx 14\) radiation length. It is able to detect a 52.83-MeV \(\upgamma \) with high efficiency and to contain the electromagnetic shower induced by it. The scintillation light is detected by 846 2-inch PMTs submerged directly in the liquid xenon. They are placed on all six faces of the detector, with different PMT coverage on different faces. On the inner face, which is the densest part, the PMTs align at intervals of 6.2 cm.

One of the distinctive features of the MEG experiment is that it digitises and records all waveforms from the detectors using the Domino Ring Sampler v4 (DRS4) chip [45]. The sampling speeds are set to 1.6 GSPS for TC and LXe photon detector and 0.8 GSPS for DCH. This lower value for DCH is selected to match the drift velocity and the required precision.

The DAQ event rate was kept below 10 Hz in order to acquire the full waveform data (\(\approx \) 1 MB/event). It was accomplished using a highly efficient online trigger system [46, 47].

Several types of trigger logic were implemented and activated during the physics data-taking each with its own prescaling factor. However, a dedicated trigger for the MEx2G events was neither foreseen nor implemented. Thus, we rely on the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)triggered data in this search.

The main \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)trigger, with a prescaling of 1, used the following observables: \(\upgamma \) energy, time difference between e\(^+\) and \(\upgamma \), and relative direction of e\(^+\) and \(\upgamma \). The DC was not used in the trigger due to the slow drift velocity. The condition on the relative direction is designed to select back-to-back events. To calculate the relative direction, the PMT that detects the largest amount of scintillation light is used for the \(\upgamma \), while the hit position at the TC is used for the e\(^+\). This direction match requirement results in inefficient selection of the MEx2G signal because, unlike the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)decay, the MEx2G decay has 2\(\upgamma \)s with a finite opening angle, resulting in events often failing to satisfy the direction trigger. The selection inefficiency for MEx2G events is 10–50% depending on \(m_{\mathrm {X}}\) as shown in Fig. 6.

Finally, the detector has been calibrated and monitored over all data-taking period with various methods [48, 49], ensuring that the detector performances have been under control over the duration of the experiment.

Fig. 6
figure 6

Trigger direction match efficiency for the MEx2G decay conditional to \(\mathrm {e^+}\) and \(2 \upgamma \) detection as a function of \(m_\mathrm {X}\) evaluated with a Monte Carlo simulation (Sect. 4)

3 Search strategy

The MEx2G signal results from the sequential decays of \(\upmu ^+\rightarrow \mathrm {e^+}\mathrm {X}\) followed by \(\mathrm {X}\rightarrow \upgamma \upgamma \). The first part is a two-body decay of a muon at rest, signalled by a mono-energetic e\(^+\). The energy \(E_\mathrm {e^+}\) is determined by \(m_\mathrm {X}\): \(E_\mathrm {e^+}(m_\mathrm {X}=0) = 52.83\) MeV and is a decreasing function of \(m_\mathrm {X}\). The sum of energies of the two \(\upgamma \)s is also mono-energetic and an increasing function of \(m_\mathrm {X}\). The momenta of the two \(\upgamma \)s are Lorentz-boosted along the direction of X, which increases the acceptance in the LXe photon detector compared to the three-body decay \(\upmu ^+\rightarrow \mathrm {e^+}\upgamma \upgamma \). The final-state three particles is expected to have an invariant mass of 105.7 MeV\(/\mathrm {c}^2 (=m_\upmu )\) and the total momentum vector equal to 0.

A physics background that generates time-coincident \(\mathrm {e^+}\upgamma \upgamma \) in the final state is \(\upmu ^+ \rightarrow \mathrm {e}^+ \upnu \bar{\upnu }\upgamma \upgamma \). This mode has not yet been measured but exists in the SM. The branching ratio is calculated to be \(\sim \mathcal {O}(10^{-14})\) for the MEG detector configuration without any cut on \(E_\mathrm {e^+}\)  [50, 51]. Therefore, its contribution is certainly negligible in this search where we apply cuts on \(E_\mathrm {e^+}\).

The dominant background is the accidental pileup of multiple \(\upmu ^+\)s decays. There are three types of accidental background events:

Type 1:

The e\(^+\) and one of the \(\upgamma \)s originate from one \(\upmu ^+\), and the other \(\upgamma \) from a different one.

Type 2:

The two \(\upgamma \)s share the same origin, and the e\(^+\) is accidental.

Type 3:

All the particles are accidental.

The main source of a time-coincident \(\mathrm {e^+}\upgamma \) pair in type 1 is the radiative muon decay \(\upmu ^+ \rightarrow \mathrm {e}^+ \upnu \bar{\upnu }\upgamma \) [52]. The sources of time-coincident \(\upgamma \upgamma \) pairs in type 2 are \(\mathrm {e}^+ \mathrm {e}^- \rightarrow \upgamma \upgamma \) (e\(^+\) from \(\upmu ^+\) decay and e\(^-\) from material along the e\(^+\) trajectory), \(\upmu ^+ \rightarrow \mathrm {e}^+ \upnu \bar{\upnu }\upgamma \) with an additional \(\upgamma \), e.g. by a bremsstrahlung from the e\(^+\)Footnote 3, or a cosmic-ray induced shower.

Figure 7 shows the decay kinematics and the kinematic variables. The muon decay vertex and the momentum of the e\(^+\) are obtained by reconstructing the e\(^+\) trajectory using the hits in DCH and TC and the intersection of the trajectory with the plane of the muon beam stopping target (Sect. 5.1). The interaction positions and times of the two \(\upgamma \)s within the LXe photon detector and their energies are individually reconstructed using the PMT charge and time information of the LXe photon detector (Sect. 5.2).

Given the muon decay vertex, the two \(\upgamma \)s’ energies and positions, and \(m_\mathrm {X}\), the X decay vertex \({\varvec{x}}_\mathrm {vtx}\) can be computed. Therefore, we reconstruct \({\varvec{x}}_\mathrm {vtx}\) by scanning the assumed value of \(m_\mathrm {X}\) (Sect. 5.3.1). If the final-state three particles do not originate at a single muon decay vertex, these variables will be inconsistent with originating from a single point. After reconstructing \({\varvec{x}}_\mathrm {vtx}\), the relative time and angles (momenta) between X and e\(^+\) are tested for consistency with a muon decay (Sects. 5.3.2 and 5.3.3).

The MEx2G decay search analysis is performed within the mass range 20 MeV/c\(^2<m_\mathrm {X}<45~\)MeV/c\(^2\) at 1 MeV/c\(^2\) step. This step is chosen small enough not to miss signals in the gaps. Therefore, adjacent mass bins are not statistically independent. The analysis was performed assuming lifetimes \(\tau _\mathrm {X}= 5, 20\), and 40 ps; the value affects only the signal efficiency.

We estimate the accidental background by using the data in which the particles are not time coincident. To reduce the possibility of experimental bias, a blind analysis is adopted; the blind region is defined in the plane of the relative times of the three particles (Sect. 6).

The signal efficiency is evaluated on the basis of a Monte Carlo simulation (Sect. 4). Its tuning and validation are performed using pseudo-2\(\upgamma \) data as described in Sect. 4.1.

Fig. 7
figure 7

Decay kinematics and kinematic variables

4 Simulation

The technical details of the program of Monte Carlo (MC) simulation are presented in [53] and an overview of the physics and detector simulation is available in [37]. In the following we report a brief summary.

The first step of the simulation is the generation of the physics events. That is realised with custom written code for a large number of relevant physics channels. The MEx2G decay is simulated starting from a muon at rest in the target; the decay products are generated in accordance with the decay kinematics for the given \(m_\mathrm {X}\) and \(\tau _\mathrm {X}\).

The muon beam transport, interaction in the target, and propagation of the decay products in the detector are simulated with a MC program based on GEANT3.21 [54] that describes the detector response. Between the detector simulation and the reconstruction program, an intermediate program processes the MC information, adding readout simulation and allowing event mixing to study the detector performance under combinatorial background events. Particularly, the \(\upmu ^+\) beam, randomly distributed in time at a decay rate of \(3\times 10^{7}~\upmu ^+\mathrm {s^{-1}}\), is mixed with the MEx2G decay to study the e\(^+\) spectrometer performance. The detectors’ operating condition, such as the active layers of DCH and the applied high-voltages, are implemented with the known time dependence.

In order to simulate the accidental activity in the LXe photon detector, data collected with a random-time trigger are used. A MC event and a random-trigger event are overlaid by summing the numbers of photo-electrons detected by each PMT.

4.1 Pseudo two \(\upgamma \) data

To study the performance of the 2\(\upgamma \) reconstruction, we built pseudo 2\(\upgamma \) events using calibration data. The following \(\upgamma \)-ray lines are obtained in calibration runs:

  • 54.9 MeV and 82.9 MeV from \(\uppi ^-\mathrm {p}\rightarrow \uppi ^0\mathrm {n}\rightarrow \upgamma \upgamma \mathrm {n}\) reaction,

  • 17.6 MeV and 14.6 MeV from \(^7\mathrm {Li}(\mathrm {p},\upgamma )^8\mathrm {Be}\) reaction,

  • 11.7 MeV from \(^{11}\mathrm {B}(\mathrm {p},2\upgamma )^{12}\mathrm {C}\) reaction.

The selection criteria for those calibration events are detailed in [48] and [55]. We take two events from the above calibration data and overlay them, summing the number of photo-electrons PMT by PMT. These pseudo 2\(\upgamma \) events are generated using both data and MC events.

5 Event reconstruction

We describe here the reconstruction methods and their performance, focusing on high-level objects; descriptions of the manipulation of low-level objects, including waveform analysis and calibration procedures, are available in [9, 37]. The e\(^+\) reconstruction (Sect. 5.1) is identical to that used in the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)decay analysis in [9]. The 2\(\upgamma \) reconstruction was developed originally for this analysis (Sect. 5.2). After reconstructing the e\(^+\) and two \(\upgamma \)s, the reconstructed variables are combined to reconstruct the X decay vertex (Sect. 5.3).

5.1 Positron reconstruction

Positron trajectories in the DCH are reconstructed using the Kalman filter technique [56, 57] based on the GEANE software [58]. This technique takes the effect of materials into account. After the first track fitting in DCH, the track is propagated to the TC region to test matching with TC hits. The matched TC hits are connected to the track and then the track is refined using the TC hit time. Finally, the fitted track is propagated back to the stopping target, and the point of intersection with the target defines the muon decay vertex position (\({\varvec{x}}_\mathrm {e^+}\)) and momentum vector that defines the e\(^+\) emission angles (\(\theta _\mathrm {e^+}, \phi _\mathrm {e^+}\)). The e\(^+\) emission time (\(t_\mathrm {e^+}\)) is reconstructed from the TC hit time minus the e\(^+\) flight time.

Positron tracks satisfying the following criteria are selected: the number of hits in DCH is more than six, the reduced chi-square of the track fitting is less than 12, the track is matched with a TC hit, and the track is successfully propagated back to the fiducial volume of the target. If multiple tracks in an event pass the criteria, only one track is selected and passed to the following analysis, based on the covariance matrix of the track fitting as well as the number of hits and the reduced chi-square.

Fig. 8
figure 8

\(E_{\mathrm {e^+}}\) resolution as a function of \(E_{\mathrm {e^+}}\)

Fig. 9
figure 9

Event display of the LXe photon detector for a 2\(\upgamma \) event (in a development view). The red points show the interaction positions of the two \(\upgamma \)s projected to each face. Each circular marker denotes a PMT. The colour indicates the measured light yield, which is the sum of photons from the two showers induced by the two \(\upgamma \)s as depicted in the right figure

The resolutions are evaluated based on the MC, tuned to data using double-turn events; tracks traversing DCH twice (two turns) are selected and reconstructed independently by using hits belonging to each turn. The difference in the reconstruction results by the two turns indicates the resolution. The MC results are smeared so that the double-turn results become the same as those with the data. Figure 8 shows the \(E_\mathrm {e^+}\) resolution as a function of \(E_\mathrm {e^+}\). The angular resolutions also show a similar \(E_\mathrm {e^+}\) dependence. The \(\phi _\mathrm {e^+}\)- and \(\theta _\mathrm {e^+}\)-resolutions for \(m_\mathrm {X}=20\, (45)\) MeV/c\(^2\) are \(\sigma _{\phi _\mathrm {e^+}}\sim 12\,(15)\) mrad and \(\sigma _{\theta _{\mathrm {e^+}}}\sim 10\,(11)\) mrad, respectively. The time resolution is \(\sigma _{t_\mathrm {e^+}}\sim 100\,(130)\) ps.

5.2 Photon reconstruction

Coordinates (uvw) are used in the LXe photon detector local coordinate system rather than the global coordinates (xyz): u coincides with z, \(v = r_\mathrm {in}(\pi - \phi )\) where \(r_{\mathrm {in}}=67.85\) cm is the radius of the inner face, and \(w=r-r_{\mathrm {in}}\) is the depth measured from the inner face.

5.2.1 Multiple photon search

A peak search is performed based on the light distributions on the LXe photon detector inner and outer faces by using TSpectrum2 [59, 60]. The threshold of the peak light yield is set to 200 photons. Events that have more than one peak are identified as multiple-\(\upgamma \) events.

5.2.2 Position and energy

Hereafter, only the multiple-\(\upgamma \) events are analysed. When more than two \(\upgamma \)s are found, we select the two with the largest energy by performing the position-energy fitting described in this subsection on different combinations of two \(\upgamma \)s.

Figure 9 shows a typical event display of a 2\(\upgamma \) event. Each PMT detects photons from the two \(\upgamma \)s. The key point of the 2\(\upgamma \) reconstruction is how to divide the number of photons detected in each PMT into a contribution from each \(\upgamma \).

Calculation of initial values   First, the positions of the detected peaks in (uv) are used as the initial estimate with \(w=1.5\) cm. Given the interaction point of each \(\upgamma \) within the LXe photon detector, the contribution from each \(\upgamma \) to each PMT can be calculated as follows. Assuming the ratio of the energy of \(\upgamma _1\) to that of \(\upgamma _2\) to be \(E_{\upgamma _1}:E_{\upgamma _2}=R_1:(1-R_1)\) (\(0<R_1<1\), at first \(R_1\) is set to 0.5), the fractions of the number of photons from \(\upgamma _1\) is calculated as

$$\begin{aligned} R_{1,i}= \frac{R_1\varOmega _{1,i}}{R_1\varOmega _{1,i} + (1 - R_1)\varOmega _{2,i}}, \end{aligned}$$
(1)

where \(\varOmega _{1,i}\) is the solid angle subtended by the i-th PMT from the \(\upgamma _1\) interaction point . The total number of photons generated by \(\upgamma _{1(2)}\), \(M_{\mathrm {pho}, 1(2)}\), is calculated from the ratio \(R_{1,i}\) and the number of photons at each PMT \(N_{\mathrm {pho},i}\) as

$$\begin{aligned} M_{\mathrm {pho}, 1(2)} = \sum _i^{n_\mathrm {{PMT}}^\mathrm {all}}\left( R_{1,i} \times N_{\mathrm {pho},i} \right) . \end{aligned}$$
(2)

Then, \(R_1\) is updated to \(R_1 = M_{\mathrm {pho}, 1} / (M_{\mathrm {pho}, 1} + M_{\mathrm {pho}, 2})\) and calculations (1) and (2) are repeated with the updated \(R_1\). This procedure is iterated four times.

Position pre-fitting   Inner PMTs that detect more than 10 photons are selected to perform a position pre-fitting. The following quantity is minimised during the fitting:

$$\begin{aligned} \chi ^2_{2\upgamma }=\sum _i^{n_\mathrm {{PMT}}^\mathrm {selected}}\frac{\left( N_{\mathrm {pho}, i}-M_{\mathrm {pho},1}\varOmega _i({\varvec{x}}_{\upgamma _1})-M_{\mathrm {pho}, 2}\varOmega _i({\varvec{x}}_{\upgamma _2})\right) ^2}{\sigma ^2_{\mathrm {pho}, i}(N_{\mathrm {pho}, i})}, \nonumber \\ \end{aligned}$$
(3)

where \(\sigma ^2_{\mathrm {pho}, i}(N_{\mathrm {pho}, i}) = N_{\mathrm {pho}, i}/\epsilon _{\mathrm {PMT},i}\) with \(\epsilon _{\mathrm {PMT},i}\) being the product of quantum and collection efficiencies of the PMT. This fitting is performedFootnote 4 separately for each \(\upgamma \): first, the light distribution is fitted with \(\{{\varvec{x}}_{\upgamma _1}, M_{\mathrm {pho},1}\}\) as free parameters, while the other parameters are fixed; next, the light distribution is fitted with \(\{{\varvec{x}}_{\upgamma _2}, M_{\mathrm {pho},2}\}\) as free parameters, while the other parameters are fixed.

Energy pre-fitting  To improve the energy estimation, \(M_{\mathrm {pho},1(2)}\) are fitted while the other parameters are fixed. The same \(\chi ^2_{2\upgamma }\) (Eq. (3)) is used but only with PMTs that detect more than 200 photo-electrons.

The \(\upgamma \) with the larger \(M_\mathrm {pho}\) is defined as \(\upgamma _1\) and the second largest one is defined as \(\upgamma _2\) in the later analysis.

Position and energy fitting  At the final step, all the parameters are fitted simultaneously to eliminate the dependence of the fitted positions on the value of \(M_{\mathrm {pho},1(2)}\) initially assumed. The best-fit value of \(M_{\mathrm {pho},1(2)}\) is used to update \(R_1\) and calculations (1) and (2) are repeated again to obtain the final value of \(M_{\mathrm {pho},1(2)}\). Finally, it is converted into \(E_{\upgamma _{1(2)}}\):

$$\begin{aligned} E_{\upgamma _{1(2)}}= & {} U({\varvec{x}}_{\upgamma _{1(2)}})\times H(T) \times S \times M_\mathrm {pho, 1(2)}, \end{aligned}$$
(4)

where \(U({\varvec{x}}_{\upgamma _{1(2)}})\) is a uniformity correction factor, H(T) is a time variation correction factor with T being the calendar time when the event was collected, and S is a factor to convert the number of photons to energy. The functions \(U({\varvec{x}}_{\upgamma _{1(2)}})\) and H(T) are mainly derived from the 17.6-MeV line from \(^7\mathrm {Li}(\mathrm {p},\upgamma )^8\mathrm {Be}\) reaction, which was measured twice per week. The factor S is calibrated using the 54.9-MeV line from \(\uppi ^0\) decay, taken once per year.

Energy-ratio correction  Both the MC data and the pseudo-2\(\upgamma \) data show an anti-correlation between the errorsFootnote 5 in \(E_{\upgamma _{1}}\) and \(E_{\upgamma _{2}}\) as shown in Fig. 10a, while their sum is not biased. Defining \(R_{1}^{\mathrm {true}}\) as the \(R_1\) for true energies for MC data and that for energies reconstructed without the overlay for real data, the reconstruction bias in both the MC data and the pseudo-2\(\upgamma \) data is apparent by the linear dependence of \(R_{1}/R_{1}^{\mathrm {true}}\) on \(R_{1}\) as shown in Fig. 10b. This bias is removed by applying a correction to the reconstructed energies; the correction coefficients are evaluated from the pseudo-2\(\upgamma \) data with different combinations of calibration data.

Fig. 10
figure 10

a Scatter plot of the energy reconstruction errors (MC). \(E^\mathrm {true-deposit}_{\upgamma _{1(2)}}\) is the MC true value of the energy deposited in the LXe. b Dependence of the reconstructed energy ratio bias as a function of the reconstructed energy ratio (MC)

Position correction  Oblique incidence of \(\upgamma \)s to the inner face results in a bias of the fitted positions. This bias was checked and corrected for using the MC simulation. No bias is observed in the v direction while a significant bias is observed in the u direction. This is because the \(\upgamma \)s from the MEx2G decay enter the LXe photon detector almost perpendicularly in the x-y view but enter with angles in the z-r view. Since the u bias arises from the direction and the size of the shower, it depends on the u coordinate and the energy. Therefore, the correction function is prepared as a function of \(u_{\upgamma _{1(2)}}\) and \(E_{\upgamma _{1(2)}}\).

Selection criteria  To guarantee the quality of the reconstruction, the following criteria are imposed on the reconstruction results: the fits for both \(\upgamma \)s converge; the two \(\upgamma \) positions are both within the detector fiducial volume defined as \(|u|<25\) cm \(\wedge \) \(|v|<71\) cm; the distance between the two \(\upgamma \)s on the inner face is \(d_{uv}> 20\) cm; \(E_{\upgamma _{1(2)}} > 10\) MeV; and \(E_{\upgamma _1} + E_{\upgamma _2} > 40\) MeV.

Fig. 11
figure 11

Energy response to MC \(2\upgamma \) events with \(E^\mathrm {true}_{\upgamma _{1}}=55\) MeV and \(E^\mathrm {true}_{\upgamma _{2}}=12\) MeV. The blue curves are the PDFs fit to the distributions. See text for the formula of the PDFs

Probability density function for \(E_{\upgamma }\)  The probability density function (PDF) for \(E_{\upgamma _{1(2)}}\) is evaluated by means of the MC simulation. To tune the MC, the pseudo-2\(\upgamma \) data of MC and data are used. It is asymmetric with a lower tail and modelled as follows:

$$\begin{aligned} P(E_\upgamma \mid E_\upgamma ^\mathrm {true})&= f\cdot F(E_\upgamma ; E_\upgamma ^\mathrm {true}, E_{t}^\mathrm {narrow}, \sigma _{E_{\upgamma }}^\mathrm {narrow}) \nonumber \\&\qquad + (1-f)\cdot F(E_\upgamma ; E_\upgamma ^\mathrm {true}, E_{t}^\mathrm {wide}, \sigma _{E_{\upgamma }}^\mathrm {wide}), \end{aligned}$$
(5)

where

$$\begin{aligned}&F(E_\upgamma ; E_\upgamma ^\mathrm {true}, E_{t}, \sigma _{E_{\upgamma }}) \nonumber \\&= {\left\{ \begin{array}{ll} A \exp \left( -\frac{\left( E_\upgamma - E_\upgamma ^\mathrm {true}\right) ^2}{2\sigma ^2_{E_\upgamma }} \right) &{} E_\upgamma > E_\upgamma ^\mathrm {true} - E_{t} \\ A \exp \left( \frac{E_{t}}{\sigma ^2_{E_{\upgamma }}} \left( \frac{E_{t}}{2} + (E_\upgamma - E_\upgamma ^\mathrm {true})\right) \right) &{} E_\upgamma \le E_\upgamma ^\mathrm {true} - E_{t} \end{array}\right. } , \end{aligned}$$
(6)

\(E_\upgamma \) is a reconstructed \(\upgamma \) energy, \(E_\upgamma ^\mathrm {true}\) is the true value, f is the fraction of the narrow component, A is a normalisation parameter, \(E_{t}\) is the transition parameter between the Gaussian and exponential components, and \(\sigma _{E_{\upgamma }}\) is the standard deviation of the Gaussian component describing the width on the high-energy side. The parameters \(E_{t}\) and \(\sigma _{E_{\upgamma }}\) are correlated with each other, different for the narrow and wide components, and are dependent on \(E_\upgamma ^\mathrm {true}\). Figure 11 shows an example of the PDFs for \(2\upgamma \) events with \(E^\mathrm {true}_{\upgamma _{1}}=55\) MeV and \(E^\mathrm {true}_{\upgamma _{2}}=12\) MeV.

Probability density functions for \(\upgamma \) position  The PDFs of \(\upgamma \) position are almost independent of \(E_{\upgamma _{1(2)}}\) and hence \((m_\mathrm {X}, \tau _\mathrm {X})\). They are represented by double Gaussians with fractions of tail components of \(\sim \!20\)%. The standard deviations of the core components are \({\varvec{\sigma }}^\mathrm {core}_{{\varvec{x}}_{\upgamma _{1(2)}}}=(5.4, 4.7, 6.5)\) mm in (uvw) coordinates, those of the tail components are \({\varvec{\sigma }}^\mathrm {tail}_{{\varvec{x}}_{\upgamma _{1(2)}}}=(29, 19, 45)\) mm.

5.2.3 Time

The interaction time of \(\upgamma _1(\upgamma _2)\) can be reconstructed using the pulse time measured by each PMT (\(t_{\mathrm {PMT}, i}\)) by correcting for a delay time (\(t_{\mathrm {delay}, \upgamma _{1(2)},i}\)) including the propagation time of the light between the interaction point and the PMT and the time-walk effect, and a time offset due to the readout electronics (\(t_{\mathrm {offset}, i}\)):

$$\begin{aligned} t_{\upgamma _{1(2)},i}=t_{\mathrm {PMT}, i}-t_{\mathrm {delay}, \upgamma _{1(2)},i}-t_{\mathrm {offset}, i}. \end{aligned}$$
(7)

The single PMT time resolution \(\sigma _{t, i}\) is approximately proportional to \(1/\sqrt{N_{\mathrm {pe}, \upgamma _{1(2)},i}}\) with \(\sigma _{t, i}(N_{\mathrm {pe}, \upgamma _{1(2)},i}=500)\approx 500~\mathrm {ps}\), where \(N_{\mathrm {pe}, \upgamma _{1(2)},i}\) is the number of photo-electrons from \(\upgamma _1(\upgamma _2)\).

These individual PMT measurements are combined to obtain the best estimate of the interaction time of \(\upgamma _1(\upgamma _2)\) (\(t_{\upgamma _{1(2)}}\)). The following \(\chi ^2\) is minimised:

$$\begin{aligned} \chi ^2_{\mathrm {time}}=\sum _i^{n_\mathrm {PMT}^\mathrm {selected}}\frac{\left( t_{\upgamma _{1(2)},i}-t_{\upgamma _{1(2)}}\right) ^2}{\sigma ^2_{t, i}(N_{\mathrm {pe}, \upgamma _{1(2)},i})}. \end{aligned}$$
(8)

We use PMTs whose light yield from \(\upgamma _1(\upgamma _2)\) is 5 times higher than that from \(\upgamma _2(\upgamma _1)\) excluding PMTs whose light yield is less than 100 photons or which give large \(\chi ^2\) contribution in the fitting.

The \(E_\upgamma \)-dependent time resolution for single \(\upgamma \) event is evaluated with the calibration runs and corrected for 2\(\upgamma \) events using the MC:

$$\begin{aligned} \sigma _{t_{\upgamma _{1(2)}}} = \sqrt{338^2/E_{\upgamma _{1(2)}}\mathrm {(MeV)} + 45^2} \ ~\mathrm {(ps)}. \end{aligned}$$
(9)

5.3 Combined reconstruction

In this section, we present the reconstruction method for the \(\mathrm {X}\rightarrow \upgamma \upgamma \) vertex assuming a value for \(m_{\mathrm {X}}\) in the reconstruction. We scan \(m_{\mathrm {X}}\) in 20–45 MeV/c\(^2\) at 1 MeV/c\(^2\) intervals; each assumed mass results in a different reconstructed \(\mathrm {X}\rightarrow \upgamma \upgamma \) vertex position.

5.3.1 X decay vertex

A maximum likelihood fit is used in the reconstruction, with the following observables:

$$\begin{aligned} X = (E_{\upgamma _1}, E_{\upgamma _2}, {\varvec{x}}_{\upgamma _1}, {\varvec{x}}_{\upgamma _2}, {\varvec{x}}_\mathrm {e^+}, \theta _\mathrm {e^+}, \phi _\mathrm {e^+}). \end{aligned}$$
(10)

The fit parameters are the following:

$$\begin{aligned} \varTheta = (\cos \theta _{\mathrm {rest}}, \phi _{\mathrm {rest}}, {\varvec{x}}_{\mathrm {vtx}}), \end{aligned}$$
(11)

where \(\theta _\mathrm {rest}\) is the \(\upgamma \) emission angle in the X rest frame, \(\phi _\mathrm {rest}\) is the angle of the photons in the X rest frame with respect to the X momentum direction in the MEG coordinate system, and \({\varvec{x}}_{\mathrm {vtx}}\) is the X decay vertex position. The function \(L(\varTheta )\) is defined as follows:

$$\begin{aligned} L(\varTheta )= & {} P(E_{\upgamma _1}\mid \cos \theta _{\mathrm {rest}}, m_\mathrm {X}) \nonumber \\&\qquad \times&P(E_{\upgamma _2}\mid \cos \theta _{\mathrm {rest}}, m_\mathrm {X}) \nonumber \\&\qquad \times&P({\varvec{x}}_{\upgamma _1}\mid \cos \theta _{\mathrm {rest}}, \phi _{\mathrm {rest}}, {\varvec{x}}_{\mathrm {vtx}},{\varvec{x}}_\mathrm {e^+}, m_\mathrm {X})\nonumber \\&\qquad \times&P({\varvec{x}}_{\upgamma _2}\mid \cos \theta _{\mathrm {rest}}, \phi _{\mathrm {rest}}, {\varvec{x}}_{\mathrm {vtx}},{\varvec{x}}_\mathrm {e^+}, m_\mathrm {X})\nonumber \\&\qquad \times&P(\theta _\mathrm {e^+} \mid {\varvec{x}}_\mathrm {vtx}, {\varvec{x}}_\mathrm {e^+})\nonumber \\&\qquad \times&P(\phi _\mathrm {e^+} \mid {\varvec{x}}_\mathrm {vtx}, {\varvec{x}}_\mathrm {e^+})\nonumber \\&\qquad \times&P(l_\mathrm {X} \mid {\varvec{x}}_\mathrm {vtx}, {\varvec{x}}_\mathrm {e^+}, \tau _\mathrm {X}, m_\mathrm {X}), \end{aligned}$$
(12)

where \(l_\mathrm {X}\) is the X decay length. The term \(P({\varvec{x}}_\mathrm {e^+}\mid {\varvec{x}}_\mathrm {e^+}^{\mathrm {true}})\) is omitted by approximating \({\varvec{x}}_\mathrm {e^+}^{\mathrm {true}}\) by \({\varvec{x}}_\mathrm {e^+}\) to reduce the fitting parameters.

The energy dependence of the \(E_{\upgamma _{1(2)}}\) PDF Eq. (5) is modelled with a morphing technique [62] using two quasi-monoenergetic calibration lines: the 11.7 MeV line from the nuclear reaction of \(^{11}\mathrm {B}(\mathrm {p}, 2\upgamma )^{12}\mathrm {C}\) and the 54.9 MeV line from \(\uppi ^0\) decay.

The PDFs of the \(\upgamma \) position are approximated as double Gaussians to fit better tails in the PDF.

The positron angles are compared with those of the flipped direction of the X momentum (\(-({\varvec{x}}_{\mathrm {vtx}} - {\varvec{x}}_\mathrm {e^+})\)) with PDFs approximated as single Gaussians.

The decay length is defined as \(l_\mathrm {X} = |{\varvec{x}}_{\mathrm {vtx}} - {\varvec{x}}_\mathrm {e^+}|\).Footnote 6 Under the approximation \({\varvec{\sigma }}_{{\varvec{x}}_{\mathrm {e^+}}}\rightarrow 0\), the PDF is

$$\begin{aligned} P(l_\mathrm {X} \mid {\varvec{x}}_{\mathrm {e^+}}, {\varvec{x}}_{\mathrm {vtx}}, \tau _\mathrm {X}, m_\mathrm {X}) = \frac{1}{\gamma \beta c \tau _\mathrm {X}} \cdot \exp {\left( -\frac{l_\mathrm {X}}{\gamma \beta c \tau _\mathrm {X}}\right) },\nonumber \\ \end{aligned}$$
(13)

which is defined and normalised for \(l_\mathrm {X}\ge 0\). The approximation is justified because the transverse component of \({\varvec{\sigma }}_{{\varvec{x}}_\mathrm {e^+}}\) is \(\sim \,\)1–2 mm [55] while the longitudinal component is largely driven by the target thickness (\(\sim \,\)0.2 mm), which is to be compared with \(\gamma \beta c \tau _\mathrm {X}\) ranging between \(\sim \,\)6–30 mm.

We fix \(\tau _\mathrm {X}=20\) ps since the vertex reconstruction performance is almost independent of \(\tau _\mathrm {X}\) in the assumed range. This likelihood term effectively penalizes non-zero decay lengths using a scale that is fixed to the average expected decay length of 20 ps.

The \({\varvec{x}}_{\mathrm {vtx}}\) resolution of the maximum-likelihood fit is evaluated via the MC to be \({\varvec{\sigma }}_{{\varvec{x}}_{\mathrm {vtx}}} = (8, 12)\) mm in the transverse and longitudinal directions.

We define an expression to quantify the goodness of the vertex fit as

$$\begin{aligned} \chi ^{2}_\mathrm {vtx}= & {} \sum _{\upgamma =\upgamma _1, \upgamma _2}\left( \frac{E_{\upgamma }-E_{\upgamma }^{\mathrm {best}}}{\sigma _{E_{\upgamma }}}\right) ^{2} + \sum _{\upgamma =\upgamma _1, \upgamma _2}\left( \frac{{\varvec{x}}_{\upgamma } - {\varvec{x}}^{\mathrm {best}}_{\upgamma }}{{\varvec{\sigma }}_{{\varvec{x}}_\upgamma }}\right) ^2 \nonumber \\&+\left( \frac{\theta _\mathrm {X}-\theta _\mathrm {X}^{\mathrm {best}}}{\sigma _{\theta _\mathrm {X}}}\right) ^{2} +\left( \frac{\phi _\mathrm {X}-\phi _\mathrm {X}^{\mathrm {best}}}{\sigma _{\phi _\mathrm {X}}}\right) ^{2} +\left( \frac{l_\mathrm {X}^\mathrm {best}}{\gamma \beta c \tau _\mathrm {X}}\right) ^2. \nonumber \\ \end{aligned}$$
(14)

The variables with the superscript “best” indicate the best-fitted parameters in the maximum likelihood fit and the variables with no superscript indicate the measured ones. Here, \((\theta _\mathrm {X},\phi _\mathrm {X})=(\pi -\theta _\mathrm {e^+}, \pi +\phi _\mathrm {e^+})\) is the direction opposite to \((\theta _\mathrm {e^+}, \phi _\mathrm {e^+})\).

The \(\sigma \) of each variable is the corresponding resolution when the distribution is approximated as a single Gaussian. This expression is not expected to follow a \(\chi ^2\) distribution because the PDFs of the variables are not in general Gaussian. The last term is quadratic by analogy with the other terms and its expression has been found to be effective in separating signal from background. The rationale for using Eq. (14) is to provide a powerful discriminator between signal and background as shown later in Fig. 14f.

Fig. 12
figure 12

\(\sqrt{\chi ^2_\mathrm {vtx}}\) distribution for the MC signal events at \((m_\mathrm {X},\tau _\mathrm {X}) = (30~\mathrm {MeV/c^2}, 20~\mathrm {ps})\) as a function of \(m_\mathrm {X}\) assumed in the reconstruction

Figure 12 shows the dependence of \(\chi ^2_\mathrm {vtx}\)Footnote 7 on the assumed value of \(m_\mathrm {X}\) for the MC signal events, providing another rationale for Eq. (14). When the assumed value is the same as the true value (\(m_\mathrm {X}=30\) MeV/c\(^2\) in this case), the resultant \(\chi ^2_\mathrm {vtx}\) becomes minimum on average. The effective \(m_\mathrm {X}\) resolution is \(\sim 2.5\) MeV/c\(^2\).

5.3.2 Momentum

Given the vertex position, the momentum of each \(\upgamma \) can be calculated. The sum of the final-state three particles momenta,

$$\begin{aligned} {\varvec{P}}_{\mathrm {sum}}\equiv {\varvec{P}}_{\mathrm {e^+}}+{\varvec{P}}_{\upgamma _1}+{\varvec{P}}_{\upgamma _2}, \end{aligned}$$
(15)

should be 0 for the MEx2G events.

5.3.3 Relative time

The time difference between the 2\(\upgamma \)s at the X vertex is calculated as

$$\begin{aligned} t_{\upgamma \upgamma } = \left( t_{\upgamma _1}-\frac{l_{\upgamma _1}}{c}\right) - \left( t_{\upgamma _2}-\frac{l_{\upgamma _2}}{c}\right) , \end{aligned}$$
(16)

where \(l_{\upgamma _{1(2)}}\) is the distance between the \(\upgamma _{1(2)}\) interaction point in the LXe photon detector and the X vertex position, \(l_{\upgamma _{1(2)}}=|{\varvec{x}}_{\upgamma _{1(2)}} - {\varvec{x}}_{\mathrm {vtx}}|\). The relative position of the vertices is such that this definition is identical to the signed distance defined according to the X direction and therefore the distribution is centred at 0 for MEx2G events.

The time difference between \(\upgamma _1\) and e\(^+\) at the muon vertex is calculated as

$$\begin{aligned} t_{\upgamma _1\mathrm {e^+}}= \left( t_{\upgamma _1}-\frac{l_{\upgamma _1}}{c}-\frac{l_\mathrm {X}}{\beta c}\right) -t_{\mathrm {e^+}}. \end{aligned}$$
(17)

With the unsigned definition of \(l_\mathrm {X}\) the distribution is slightly offset with respect to 0 for MEx2G events as visible in Fig. 14d.

Fig. 13
figure 13

Top: Signal (MC) and bottom: background (sideband data) event distributions in the \(t_{\upgamma _1\mathrm {e^+}}\)\(t_{\upgamma \upgamma }\) plane before the signal selection criteria are applied. (20 MeV/c\(^2\), 20 ps) case is shown as an example. The time sideband regions (A, B, C) and the signal region (the red box) are also shown

6 Dataset and event selection

We use the full MEG dataset, collected in 2009–2013, as was used in the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)search reported in [9]. As described in Sect. 2, the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)trigger data are used in this analysis. In total, \(7.5\times 10^{14}\,\upmu ^+\)s were stopped on the target.

A pre-selection was applied at the first stage of the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)decay analysis, requiring that at least one positron track is reconstructed and the time difference between signals in the LXe photon detector and TC is in the range \(-6.9<t_{\mathrm {LXe}-\mathrm {TC}}<4.4~\mathrm {ns}\). At this stage, aiming to select the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)decays, the time of the LXe photon detector is reconstructed with PMTs around the largest peak found in the peak search (Sect. 5.2.1). This retained \(\sim \)16% of the dataset, on which the full event reconstruction for the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)decay analysis was performed. Before processing the MEx2G dedicated reconstruction, we applied an additional event selection using the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)reconstruction results. It was based on the existence of multiple (\(\ge \!2\)) \(\upgamma \)s and the total energy of the \(\upgamma \)sFootnote 8 and e\(^+\) (\(E_\mathrm {total}\)) being \(|E_\mathrm {total} - m_\upmu |<0.2m_\upmu \). This selection reduces the dataset by an additional factor of \(\sim \) 300. We applied the MEx2G dedicated reconstruction (Sects. 5.2.2, 5.2.3, and 5.3) to this selected dataset.

A blind region was defined containing the events satisfying the cuts \(|t_{\upgamma _1\mathrm {e^+}}|< 1\,\mathrm {ns} \wedge |t_{\upgamma \upgamma }| < 1\,\mathrm {ns}\). This blind region is large enough to hide the signal. Those events were sent in a separated data-stream and were not used in the definition of the analysis strategy including cuts; background events in the signal region were estimated without using events in the blind region. After the analysis strategy was defined, the blind region was opened and events in this region were added to perform the last step of analysis.

The accidental background can be estimated from the off-time sideband regions defined in Fig. 13. There are three such regions: A, B, and C; each containing a different combination of the types of background as defined in Sect. 3. The outer boundary of the time sidebands, \(|t_{\upgamma _1\mathrm {e^+}}|< 3.5\,\mathrm {ns} \wedge |t_{\upgamma \upgamma }| <3.5\,\mathrm {ns}\), are determined so that the background distribution is not deformed by the time-coincidence trigger condition. The widths \(x_\mathrm {A}\) and \(y_\mathrm {B}\) in Fig. 13 are the same as the outer boundary of the signal region defined depending on \(m_{X}\) by the signal selection criteria described below.

Fig. 14
figure 14

Distributions of variables used in the event selection for \((m_\mathrm {X}, \tau _\mathrm {X})\) = (20  MeV/c\(^2\), 20  ps) case. The hatched histograms show the distribution of MC signal events while the blank histograms that of background events; each histogram is normalised to 1. The vertical lines show the optimised thresholds. a The peak value of the signal distribution is at \(m_\upmu \) with FWHM\(_{E_\mathrm {sum}}\) = 2.7 MeV. c Cut-off at 20 cm in the background distribution comes from one of the 2\(\upgamma \) reconstruction conditions. e The threshold lines are not visible because they are set to \(\pm 1\) ns. For a detailed definition of the variables see Sect. 6

The following seven variables are used for the signal selection:

  1. 1.

    \(E_\mathrm {e^+}\): the e\(^+\) energy.

  2. 2.

    \(E_\mathrm {sum}\): the total energy of the three particles.

  3. 3.

    \(|{\varvec{P}}_\mathrm {sum}|\): the magnitude of the sum of the three particles’ momenta.

  4. 4.

    \(d_{uv}\): the distance between the 2\(\upgamma \) positions on the LXe photon detector inner face.

  5. 5.

    \(t_{\upgamma _1\mathrm {e^+}}\): the time difference between \(\upgamma _1\) and e\(^+\) calculated in Eq. (17).

  6. 6.

    \(t_{\upgamma \upgamma }\): the time difference between 2\(\upgamma \)s calculated in Eq. (16).

  7. 7.

    \(\chi ^2_\mathrm {vtx}\): the goodness of vertex fitting calculated in Eq. (14).

First, we fix the \(E_\mathrm {e^+}\) selection to require \(|E_\mathrm {e^+}-E_\mathrm {e^+}^{m_\mathrm {X}}|<1\) MeV, where \(E_\mathrm {e^+}^{m_\mathrm {X}}\) is the e\(^+\) energy for the MEx2G decay with \(m_\mathrm {X}\). This selection is also used in the Michel normalisation described in Sect. 7.

Next, we optimise the cut thresholds for the other variables to maximise the experimental outcomes. Distributions of these variables for the signal and background at a parameter set (20  MeV/c\(^2\), 20  ps) are shown in Fig. 14. All other selection criteria, such as trigger and reconstruction conditions as well as the \(E_\mathrm {e^+}\) selection, are applied. The time sideband events are used for the background distribution, while MC samples are used for the signal distribution.

Punzi’s expression [63] is used as a figure of merit

$$\begin{aligned} F_{\mathrm {Punzi}} =\frac{\epsilon _{\mathrm {selection}}}{b^{2}+2 a \sqrt{N_{\mathrm {BG}}}+b \sqrt{b^{2}+4 a \sqrt{N_{\mathrm {BG}}}+4 N_{\mathrm {BG}}}}, \nonumber \\ \end{aligned}$$
(18)

where a and b are the significance and the power of a test, respectively, \(\epsilon _{\mathrm {selection}}\) is the selection efficiency for the signal, and \(N_{\mathrm {BG}}\) is the expected number of background events. The values of a and b should be defined before the analysis, and we set \(a = 3, b = 1.28\,(=90\%)\), where b is set to the value appropriate to the confidence level being used to set the upper limit when a non-significant result is obtained.

The optimisation process is divided into two steps. In the first step, we optimise the cut thresholds of variables 2–6, independently for each variable in order to maintain high statistics in the sidebands. Because the absolute value of \(N_{\mathrm {BG}}\) does not make sense in this independent optimisation process, we approximate \(N_{\mathrm {BG}}\) to \(\epsilon _{\mathrm {BG}}\), a selection efficiency for the background events calculated using the time sideband samples selected up to this point. Because of this approximation, the first step leads to suboptimal criteria.

In the second step, after all other selection criteria are applied, the threshold for \(\chi ^2_\mathrm {vtx}\) is optimised to give the highest \(F_{\mathrm {Punzi}}\). In this step, to estimate \(N_\mathrm {BG}\) from the low statistics in the sideband regions, we use a kernel-density-estimation method [64] to model the continuous event distribution.

The cut thresholds are optimised at 5 \(\mathrm {MeV/c^2}\) intervals in \(m_\mathrm {X}\), while the same thresholds are used for different \(\tau _\mathrm {X}\) for each \(m_\mathrm {X}\). The optimised thresholds for \(m_\mathrm {X}=20~\mathrm {MeV/c^2}\) are shown as black lines in Fig. 14. These cuts result in \(\epsilon _\mathrm {selection} = 67\%\) \((m_\mathrm {X} =45\) MeV/c\(^2\)) – 51% (\(m_\mathrm {X} =20\) MeV/c\(^2\)).

7 Single event sensitivity

The single event sensitivity of the MEx2G decay s is defined as follows:

$$\begin{aligned} \mathcal {B}_{\mathrm {MEx2G}} = s\times N_\mathrm {MEx2G}, \end{aligned}$$
(19)

where \(N_\mathrm {MEx2G}\) is the expected number of signal events in the signal region. We calculate it using Michel decay (\(\upmu ^+ \rightarrow \mathrm {e}^+ \upnu \bar{\upnu }\)) events taken at the same time with the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)trigger. This Michel normalisation is beneficial for the following reasons. First, systematic uncertainties coming from the muon beam are cancelled because beam instability is included in both Michel triggered and the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)triggered events. Moreover, we do not need to know the \(\upmu ^+\) stopping rate nor the live DAQ time. Second, most of the systematic uncertainties coming from e\(^+\) detection are also cancelled. The absolute value of e\(^+\) efficiency is not needed.

The number of Michel events is given by

$$\begin{aligned} N_\mathrm {Michel}= N_{\upmu ^+} \cdot \frac{\mathcal {B}_{\mathrm {Michel}} \cdot f_{\mathrm {Michel}}}{p_{\mathrm {Michel}}\cdot p_{\mathrm {correction}}} \cdot A_{\mathrm {Michel}} \cdot \epsilon _{\mathrm {Michel}}, \nonumber \\ \end{aligned}$$
(20)

where

\(N_{\upmu ^+}\)::

the number of stopped \(\upmu ^+\)s;

\(\mathcal {B}_{\mathrm {Michel}}\)::

branching ratio of the Michel decay (\(\approx 1\));

\(f_{\mathrm {Michel}}\)::

branching fraction of the selected energy region (7%–10% depending on \(m_{\mathrm {X}}\));

\(p_{\mathrm {Michel}}\)::

prescaling factor of the Michel trigger (\(=10^7\));

\(p_{\mathrm {correction}}\)::

correction factor of \(p_{\mathrm {Michel}}\) depending on the muon beam intensity;

\(A_{\mathrm {Michel}}\)::

geometrical acceptance of the spectrometer for Michel e\(^+\)s;

\(\epsilon _{\mathrm {Michel}}\)::

e\(^+\) efficiency for Michel events within the geometrical acceptance of the spectrometer.

The number of MEx2G events is given by

$$\begin{aligned} N_\mathrm {MEx2G}= & {} N_{\upmu ^+} \cdot \frac{\mathcal {B}_{\mathrm {MEx2G}} }{p_{\mathrm {MEG}}} \cdot A_{\mathrm {e^+}} \cdot \epsilon _{\mathrm {e^+}} \cdot \epsilon _{2\upgamma } \cdot \epsilon _\mathrm {DM} \cdot \epsilon _\mathrm {selection}, \nonumber \\ \end{aligned}$$
(21)

where

\(p_{\mathrm {MEG}}\)::

prescaling factor of the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)trigger (=1);

\(A_{\mathrm {e^+}}\)::

geometrical acceptance of the spectrometer for MEx2G e\(^+\)s;

\(\epsilon _{\mathrm {e^+}}\)::

e\(^+\) efficiency for MEx2G events conditional to e\(^+\)s in the geometrical acceptance of the spectrometer;

\(\epsilon _{2\upgamma }\)::

the product of 2\(\upgamma \) geometrical acceptance and 2\(\upgamma \) trigger, detection, and reconstruction efficiency, conditional to the e\(^+\) detection;

\(\epsilon _\mathrm {DM}\)::

the trigger direction match efficiency conditional to the e\(^+\) and \(2\upgamma \) detection (Fig. 6);

\(\epsilon _\mathrm {selection}\)::

the signal selection efficiency.

Using Eqs. (19)–(21), an estimate of the SES (\(s_0\)) is given by

$$\begin{aligned} s_0^{-1}= & {} N_\mathrm {Michel} \frac{1}{\mathcal {B}_{\mathrm {Michel}}\cdot f_{\mathrm {Michel}}} \cdot \frac{p_{\mathrm {Michel}}\cdot p_{\mathrm {correction}}}{p_{\mathrm {MEG}}}\nonumber \\&\&\cdot \frac{A_{\mathrm {e^+}}}{A_{\mathrm {Michel}}} \cdot \frac{\epsilon _{\mathrm {e^+}}}{\epsilon _{\mathrm {Michel}}} \cdot \epsilon _{2\upgamma } \cdot \epsilon _\mathrm {DM} \cdot \epsilon _\mathrm {selection}. \end{aligned}$$
(22)

The geometrical acceptance of the spectrometer is common, hence \(A_{\mathrm {e^+}}/A_{\mathrm {Michel}}= 1\); the estimate of the relative e\(^+\) efficiency is \(\epsilon _{\mathrm {e^+}}/\epsilon _{\mathrm {Michel}}=89\%\ (m_\mathrm {X} =45\) MeV/c\(^2)\ -\ 97\%\ (m_\mathrm {X} =20\) MeV/c\(^2)\) increasing monotonically with \(m_\mathrm {X}\). The estimate of \(\epsilon _{2\upgamma }\) is shown in Fig. 15; \(\epsilon _{2\upgamma } = 0.6\%\) (\(m_\mathrm {X} =45\) MeV/c\(^2\)) – 2.9% (\(m_\mathrm {X} =20\) MeV/c\(^2\)), decreasing monotonically with \(m_{\mathrm {X}}\). This dependence comes mainly from the 2\(\upgamma \) acceptance: for increasing \(m_{\mathrm {X}}\), the opening angle between the 2\(\upgamma \)s becomes larger, resulting in a decreasing efficiency.

Fig. 15
figure 15

\(\epsilon _{2\upgamma }\) (see the text for the definition) versus \(m_\mathrm {X}\) for \(\tau _\mathrm {X}= 20\) ps

The systematic uncertainties are summarised in Table 1. The uncertainty in the 2\(\upgamma \) detection efficiency and that in the MC smearing parameters are the dominant components.

Table 1 Systematic uncertainties in the single event sensitivity (\(\tau _\mathrm {X}=20\) ps)

The estimated value of SES is \(s_0 = (2.9 \pm 0.3)\times 10^{-12}\) (20 MeV/c\(^2\)) – \((6.3 \pm 1.1)\times 10^{-10}\) (45 MeV/c\(^2\)) for \(\tau _\mathrm {X} = 20\) ps increasing monotonically with \(m_\mathrm {X}\). The e\(^+\) efficiency is \(\epsilon _{\mathrm {e^+}} = 1\%\) (45 MeV/c\(^2\)) – 36% \(\left( 20~\mathrm{MeV/c}^2\right) \) decreasing monotonically with \(m_{\mathrm {X}}\), estimated with the MC, although this quantity is not necessary for the normalisation. The overall efficiency for the MEx2G events conditional to the e\(^+\) in the geometrical acceptance of the spectrometer is therefore \(\epsilon _{\mathrm {MEx2G}} = 2.0\times 10^{-5}\) (45 MeV/c\(^2\)) – \(4.7\times 10^{-3}\) (20 MeV/c\(^2\)) decreasing monotonically with \(m_{\mathrm {X}}\).

8 Statistical treatment of background and signal

Fig. 16
figure 16

Time distributions in the sideband regions for (20 MeV/c\(^2\), 20 ps). (a) \(t_{\upgamma _1\mathrm {e^+}}\) distributions for \(|t_{\upgamma \upgamma }|<1\) ns (red open circles) and for \(1<|t_{\upgamma \upgamma }|<3.5\) ns (black closed circles). (b) \(t_{\upgamma \upgamma }\) distributions for \(|t_{\upgamma _1\mathrm {e^+}}|<1\) ns (red open circles) and for \(1<|t_{\upgamma _1\mathrm {e^+}}|<3.5\) ns scaled by the ratio of the time ranges (black closed circles). A loose cut is applied: \(|E_\mathrm {e^+}-E_\mathrm {e^+}^{m_\mathrm {X}}|<1~\mathrm {MeV} \wedge E_\mathrm {sum}<115~\mathrm {MeV} \wedge |{\varvec{P}}_\mathrm {sum}|<30~\mathrm {MeV/c} \wedge d_{uv}<90~\mathrm {cm} \wedge \chi ^2_\mathrm {vtx}<80\)

In the following, we describe how we estimate the expected number of background events in the signal region (\(N_{\mathrm {BG}}\)) from the numbers of events observed in sidebands A, B, and C (\(N_\mathrm {A}^\mathrm {obs}\), \(N_\mathrm {B}^\mathrm {obs}\), and \(N_\mathrm {C}^\mathrm {obs}\)).

There are three types of accidental background events defined in Sect. 3. The expected number of background events in the signal region is given by

$$\begin{aligned} N_{\mathrm {BG}} = N_1 + N_2 + N_3, \end{aligned}$$
(23)

where \(N_1, N_2, N_3\) are the expected numbers of background events in the signal region from the types 1, 2, and 3, respectively. Sideband A has the contributions from types 2 and 3, B has the contributions from types 1 and 3, and C has the contribution from type 3.

Figure 16 shows the time distributions in the sideband regions. A peak of type 2 on a flat component of type 3 is observed in the \(t_{\upgamma \upgamma }\) distribution, while a peak of type 1 is not clearly visible in the \(t_{\upgamma _1\mathrm {e^+}}\) distribution. The uniformity of the accidental backgrounds is examined using these distributions; the number of events in (\(|t_{\upgamma _1\mathrm {e^+}}|<1~\mathrm {ns} \wedge 1<|t_{\upgamma \upgamma }|<3.5 ~\mathrm {ns}\)) is compared to the the number of events interpolated from the region (\(1<|t_{\upgamma _1\mathrm {e^+}}|<3.5~\mathrm {ns} \wedge 1<|t_{\upgamma \upgamma }|<3.5~\mathrm {ns}\)) scaled by the ratio of the widths of the time ranges (2 ns/5 ns). They agree within 1.7% (the central part, including type 1, is 1.7% larger than the interpolation). In Fig. 16b, \(t_{\upgamma \upgamma }\) the distribution for \(1<|t_{\upgamma _1\mathrm {e^+}}|<3.5\) ns is superimposed on that for \(|t_{\upgamma _1\mathrm {e^+}}|<1\) ns after scaling by the time range ratio. The tail component of type 2 is consistent in these regions. The errors on the background estimations by the interpolation are thus negligibly small compared with the statistical uncertainties in \(N_\mathrm {A}^\mathrm {obs}, N_\mathrm {B}^\mathrm {obs}, N_\mathrm {C}^\mathrm {obs}\).

Using \(N_1, N_2, N_3\), the expected numbers of events in sidebands A, B, C can be calculated as follows:

$$\begin{aligned} N_\mathrm {A}^\mathrm {exp}= & {} N_2 \times \frac{2y_{\mathrm {C}}}{y_{\mathrm {B}}} + N_3 \times \frac{2y_{\mathrm {C}}}{y_{\mathrm {B}}}, \end{aligned}$$
(24)
$$\begin{aligned} N_\mathrm {B}^\mathrm {exp}= & {} N_1 \times \frac{2x_{\mathrm {C}}}{x_{\mathrm {A}}} + N_3 \times \frac{2x_{\mathrm {C}}}{x_{\mathrm {A}}} + N_2 \times f_{\mathrm {escape}}, \end{aligned}$$
(25)
$$\begin{aligned} N_\mathrm {C}^\mathrm {exp}= & {} N_3 \times \frac{2y_{\mathrm {C}}}{y_{\mathrm {B}}} \times \frac{2x_{\mathrm {C}}}{x_{\mathrm {A}}} + N_2 \times f_{\mathrm {escape}}\times \frac{2y_{\mathrm {C}}}{y_{\mathrm {B}}}, \end{aligned}$$
(26)

where \(x_\mathrm {A(C)}\) and \(y_\mathrm {B(C)}\) are the sizes of the signal regions (sideband regions) in \(t_{\upgamma \upgamma }\) and \(t_{\upgamma _1\mathrm {e^+}}\), respectively, as defined in Fig. 13, and \(f_{\mathrm {escape}}=0.171 \pm 0.003\) is the fraction of type 2 events in \(|t_{\upgamma \upgamma }| > 1\,\mathrm {ns}\).

The likelihood function for \(N_{\mathrm {BG}}\) is given from the Poisson statistics as,

$$\begin{aligned}&\mathcal {L}&\left( N_\mathrm {BG} \mid N_\mathrm {A}^\mathrm {obs}, N_\mathrm {B}^\mathrm {obs}, N_\mathrm {C}^\mathrm {obs}\right) \nonumber \\&{=}&P_\mathrm {Poi}\left( N_\mathrm {A}^\mathrm {obs} \mid N_\mathrm {A}^\mathrm {exp}\right) P_\mathrm {Poi}\left( N_\mathrm {B}^\mathrm {obs} \mid N_\mathrm {B}^\mathrm {exp})P_\mathrm {Poi}(N_\mathrm {C}^\mathrm {obs} \mid N_\mathrm {C}^\mathrm {exp}\right) . \nonumber \\ \end{aligned}$$
(27)

The best estimate of \(N_\mathrm {BG}\) can be obtained by maximising Eq. (27) (listed in Table 2). However, we do not use this estimated \(N_\mathrm {BG}\) in the inference of the signal but use \(\left( N_\mathrm {A}^\mathrm {obs}, N_\mathrm {B}^\mathrm {obs}, N_\mathrm {C}^\mathrm {obs}\right) \) as discussed in the following.

Our goal is to estimate the branching ratio of the MEx2G decay (\(\mathcal {B}_{\mathrm {MEx2G}}\)). The likelihood function Eq. (27) is extended to include \(\mathcal {B}_{\mathrm {MEx2G}}\) as a parameter and the number of events in the signal region (\(N_\mathrm {S}^\mathrm {obs}\)) as an observable. In addition, to incorporate the uncertainty in the SES into the \(\mathcal {B}_{\mathrm {MEx2G}}\) estimation, the estimated SES (\(s_0\)) and the true value (s) are included into the likelihood function:

$$\begin{aligned} \mathcal {L}\left( \mathcal {B}_{\mathrm {MEx2G}}, N_\mathrm {BG}, s \mid N_\mathrm {S}^\mathrm {obs}, N_\mathrm {A}^\mathrm {obs}, N_\mathrm {B}^\mathrm {obs}, N_\mathrm {C}^\mathrm {obs}, s_0 \right) . \end{aligned}$$
(28)

Using \(N_1, N_2, N_3\) and a Gaussian PDF for the inverse of SES, it can be written as,

$$\begin{aligned}&\mathcal {L}\left( \mathrm {\mathcal {B}_{\mathrm {MEx2G}}}, N_1, N_2, N_3, s \mid N_\mathrm {S}^\mathrm {obs}, N_\mathrm {A}^\mathrm {obs}, N_\mathrm {B}^\mathrm {obs}, N_\mathrm {C}^\mathrm {obs}, s_0 \right) \nonumber \\&\quad = P_\mathrm {Poi}\left( N_\mathrm {S}^\mathrm {obs} \mid N^{\mathrm {exp}}_\mathrm {S}\right) P_\mathrm {Poi}\left( N_\mathrm {A}^\mathrm {obs} \mid N_\mathrm {A}^\mathrm {exp}\right) P_\mathrm {Poi}\left( N_\mathrm {B}^\mathrm {obs} \mid N_\mathrm {B}^\mathrm {exp}\right) \nonumber \\&\qquad \times P_\mathrm {Poi}\left( N_\mathrm {C}^\mathrm {obs} \mid N_\mathrm {C}^\mathrm {exp}\right) P_\mathrm {Gaus}(s_0^{-1} \mid s^{-1}), \end{aligned}$$
(29)

where \(N^{\mathrm {exp}}_\mathrm {S}=N_1+N_2+N_3+\mathcal {B}_{\mathrm {MEx2G}}/s\) is the expected number of events in the signal region.

The best estimated values of the parameter set \(\{\mathcal {B}_{\mathrm {MEx2G}}\), \(N_1\), \(N_2\), \(N_3\), \(s\}\) are obtained by maximising Eq. (29). Among them, only \(\mathcal {B}_{\mathrm {MEx2G}}\) is the interesting parameter, while the others are regarded as nuisance parameters \({\varvec{\nu }}=(N_1, N_2, N_3, s)\).

A frequentist test of the null (background-only) hypothesis is performed with the following profile likelihood ratio \(\lambda _p\) as the test statistic [3]:

$$\begin{aligned} \lambda _p(\mathcal {B}_{\mathrm {MEx2G}})=\frac{\mathcal {L}\left( \mathcal {B}_{\mathrm {MEx2G}}, \hat{\hat{{\varvec{\nu }}}}\right) }{\mathcal {L}\left( \hat{\mathcal {B}}_{\mathrm {MEx2G}}, \hat{{\varvec{\nu }}}\right) }, \end{aligned}$$
(30)

where \(\hat{\mathcal {B}}_{\mathrm {MEx2G}}\) and \(\hat{{\varvec{\nu }}}\) are the best-estimated values, and \(\hat{\hat{{\varvec{\nu }}}}\) is the value of \({\varvec{\nu }}\) that maximises the likelihood at the fixed \(\mathcal {B}_{\mathrm {MEx2G}}\). The systematic uncertainties of the background estimation and the SES are incorporated into the test by profiling the likelihood about \({\varvec{\nu }}\). The localFootnote 9 significance is quantified by the p-value \(p_\mathrm {local}\), defined as the probability to find \(\lambda _p\) that is equally or less compatible with the null hypothesis than that observed with the data when the signal does not exist.

Since \(m_\mathrm {X}\) is unknown, we need to take the look-elsewhere effect [3] into account to calculate the global significance. We estimate this effect following the approaches in [65, 66], in which the trial factor of the search is estimated using an asymptotic property of \(\lambda _p\), obeying the chi-square distribution. The smallest \(p_\mathrm {local}\) in the \(m_\mathrm {X}\) scan is converted into the global p-value \(p_\mathrm {global}\) assuming that the signal can appear only at one \(m_\mathrm {X}\).

The range of \(\mathcal {B}_{\mathrm {MEx2G}}\) at 90% C.L. is constructed based on the Feldman–Cousins unified approach [67] extended to use the profile-likelihood ratio as the ordering statistic in order to incorporate the systematic uncertainties [68].

Table 2 The number of observed events in the sideband regions and the signal region and the expected number of background events in the signal region

9 Results and discussion

Table 2 summarises the numbers of events in the signal region and the sidebands as well as the expected number of background events in the signal region. We observe non-zero events in the signal region for some masses. Note that the adjacent \(m_\mathrm {X}\) bins are not statistically independent. Summing up the observed events gives nine events but five of them are unique events. One event appears in four bins (\(m_\mathrm {X}=\) 34, 35, 36, 37 MeV/c\(^2\)) and another event appears in two bins (\(m_\mathrm {X}=\) 35, 36 MeV/c\(^2\)).

We discuss the results for \(\tau _\mathrm {X}=20\) ps below. The results for other \(\tau _\mathrm {X}\) are similar, with small changes in the efficiency. The results are presented in detail in Appendix A.

Figure 17 shows 90% confidence intervals on \(\mathcal {B}_\mathrm {MEx2G}\) obtained from this analysis together with the sensitivities and the previous upper limits due to Crystal Box. The sensitivities are evaluated by the mean of the branching ratio limits at 90% C.L. under the null hypothesis. Note that since we adopt the Feldman–Cousins unified approach, a one-sided or two-sided interval is automatically determined according to the data. Therefore, lower limits can be set in \(m_\mathrm {X}\) regions where non-zero events are observed with small \(N_\mathrm {BG}\).

Fig. 17
figure 17

Confidence intervals (90% C.L.) on \(\mathcal {B}_\mathrm {MEx2G}\) (blue band) for \(\tau _\mathrm {X}=20\) ps. The red broken line shows the expected upper limits under the null hypothesis and the yellow line shows the limits extracted by Crystal Box analysis

Fig. 18
figure 18

Local p-value under null hypothesis as a function of assumed \(m_\mathrm {X}\)

The statistical significance of the excesses is tested against the null hypothesis. Figure 18 shows \(p_\mathrm {local}\) versus \(m_\mathrm {X}\). We observe the lowest \(p_\mathrm {local} = 0.012\) at \(m_\mathrm {X}=35\) MeV/c\(^2\), which corresponds to 2.2\(\sigma \) significance. The global p-value is calculated to be \(p_\mathrm {global} \approx 0.10\) by taking the look-elsewhere effect into account. This corresponds to 1.3\(\sigma \), that is not statistically significant.

Owing to the large statistics of the MEG dataset, the branching ratio upper limits have been reduced to the level of \(\mathcal {O}(10^{-11})\). Our results improves the upper limits from the Crystal Box experiment for \(m_{\mathrm {X}}< 40\,\mathrm {MeV/c^2}\), by a factor of 60 at most.

This publication reports results from the full MEG dataset. Hence, new experiments will be needed for further exploration of this decay, e.g. to test whether the small excess observed in this search grows. An upgraded experiment, MEG II, is currently being prepared  [69]. A brief prospect for improved sensitivity to MEx2G in MEG II is discussed below. In this analysis the sensitivity worsens with increasing \(m_{\mathrm {X}}\), mainly due to the 2\(\upgamma \) acceptance and direction match efficiencies. The acceptance is determined by the geometry of the LXe photon detector and is not changed by the upgrade. The direction match efficiency can even worsen if we only consider the \(\upmu ^+ \rightarrow \mathrm {e}^+ \upgamma \ \)search; the \(\upgamma \) position resolution is expected to improve by a factor two, which enables tightening the direction match trigger condition. However, the MEG II trigger development is underway and the trigger efficiency for high mass can be improved up to a factor \(\sim 2\) if a dedicated trigger is prepared. Basically, MEG II will collect ten times more \(\upmu ^+\) decays and the resolutions of each kinematic variable will improve by roughly a factor two, leading to higher efficiency while maintaining low background. It is therefore possible to improve the sensitivity by one order of magnitude.

10 Conclusions

We have searched for a lepton-flavour-violating muon decay mediated by a new light particle, \(\upmu ^+ \rightarrow \mathrm {e}^+\mathrm {X}, \mathrm {X} \rightarrow \upgamma \upgamma \ \)decay, for the first time using the full dataset (2009–2013) of the MEG experiment. No significant excess was found in the mass range \(m_\mathrm {X} = 20\)–45 MeV/c\(^2\) and \(\tau _\mathrm {X}< 40\) ps, and we set new branching ratio upper limits in the mass range \(m_\mathrm {X} = 20\)–40 MeV/c\(^2\). In particular, the upper limits are lowered to the level of \(\mathcal {O}(10^{-11})\) for \(m_\mathrm {X} = 20\)–30 MeV/c\(^2\). The result is up to 60 times more stringent than the bound converted from the previous experiment, Crystal Box.