Reinterpreting the ATLAS bounds on heavy neutral leptons in a realistic neutrino oscillation model

Heavy neutral leptons (HNLs) are hypothetical particles, motivated in the first place by their ability to explain neutrino oscillations. Experimental searches for HNLs are typically conducted under the assumption of a single HNL mixing with a single neutrino flavor. However, the resulting exclusion limits may not directly constrain the corresponding mixing angles in realistic HNL models — those which can explain neutrino oscillations. The reinterpretation of the results of these experimental searches turns out to be a non-trivial task, that requires significant knowledge of the details of the experiment. In this work, we perform a reinterpretation of the latest ATLAS search for HNLs decaying promptly to a tri-lepton final state. We show that in a realistic model with two HNLs, the actual limits can vary by several orders of magnitude depending on the free parameters of the model. Marginalizing over the unknown model parameters leads to an exclusion limit on the total mixing angle which can be up to 3 orders of magnitude weaker than the limits reported in ref. [1]. This demonstrates that the reinterpretation of results from experimental searches is a necessary step to obtain meaningful limits on realistic models. We detail a few steps that can be taken by experimental collaborations in order to simplify the reuse of their results.


Heavy neutral leptons
The idea that new particles need not be heavier than the electroweak scale, but rather can be light and feebly interacting is drawing increasing attention from both the theoretical and experimental communities [see e.g.[2][3][4][5].In particular, the hypothesis that heavy neutral leptons are responsible for (some of the) beyond-the-Standard-Model phenomena has been actively explored in recent years, see e.g.[2,3,[6][7][8][9][10] and refs.therein.Heavy neutral leptons (HNLs) are massive particles that interact similarly to neutrinos, but with their interaction strength suppressed by flavor-dependent dimensionless numbersmixing angles -(U 2 e , U 2 µ , U 2 τ ).HNLs first appeared in the context of left-right symmetric models [11][12][13][14] which required an extension of the fermion sector with Standard Model (SM) gauge singlet particles, and then in the (type I) see-saw mechanism [15][16][17][18][19][20][21][22] in which heavy Majorana neutrinos lead to light Standard Model neutrinos.The interest for these models increased when it was recognized that the same particles could also be responsible for the generation of the matter-antimatter asymmetry of the Universe [23].This scenario (known as leptogenesis) has been actively developed since the 1980s (see reviews [24,25]).In particular, it was found that the Majorana mass scale of right-handed neutrinos could be as low as the TeV, GeV or even MeV scale [7,[26][27][28][29][30]; for a recent overview see e.g.[31,32].While two HNLs are sufficient to explain neutrino masses and oscillations as well as the origin of the matter-antimatter asymmetry, a third particle can play the role of dark matter [6-8, 33, 34] within the Neutrino Minimal Standard Model (νMSM).
Starting from the 1980s [35][36][37][38], many experiments have searched for HNLs (as summarized e.g. in refs.[3,4,10,[39][40][41]).Current generation particle physics experiments, including LHCb, CMS, ATLAS, T2K, Belle and NA62, all include HNL searches into their scientific programs [1,[42][43][44][45][46][47][48][49][50][51][52][53].However, as pointed out in ref. [54], most of the existing or proposed analyses concentrate on the case of a single HNL mixing with only one flavor.Such a model serves as a convenient benchmark, but it cannot explain any of the BSM phenomena that served as initial motivations for postulating HNLs.The same benchmarks are used when estimating the sensitivity of future experiments [see e.g.4], with the notable exception of the SHiP experiment, which provided sensitivity estimates for arbitrary sets of mixing angles [55].This raises a few questions: 1. Which HNL models explaining neutrino oscillations and/or other BSM phenomena are allowed or ruled out by previous searches?What parts of the HNL parameter space will be probed by future experiments?2. What information do experimental groups need to provide in order to facilitate the answer to such questions in the future?
A number of tools exists, see e.g.[56][57][58][59][60][61][62][63], that allow recasting LHC results for new sets of models (see also [64]).These tools have mostly been developed in the context of supersymmetry and similar searches at the LHC and are not readily applicable to HNL models, whose collider phenomenology is quite different.In this work we perform a step in the direction of recasting LHC results.Specifically, we recast the ATLAS tri-lepton search [1] in the case of the simplest realistic HNL model of neutrino oscillations.This model features two heavy neutral leptons with (almost) degenerate masses.The possible values of the HNL mixings are constrained by neutrino oscillation data. 1 In what follows we will refer to this model as a realistic HNL model.As we shall see below, even in this simple model, the interpretation of the results is a non-trivial task.

Motivation for a reinterpretation
The realistic seesaw model describing neutrino oscillations brings several changes compared to the single-HNL, single-flavor model analyzed by the ATLAS collaboration [1].The analysis from ref. [1] concentrated on the following process: where ± α are light leptons (e ± or µ ± ), α = β and (−) ν β is a neutrino or anti-neutrino with flavor β.They performed two independent analyses: one for the e ± e ± µ ∓ +MET final state ("electron channel") and one for the µ ± µ ± e ∓ +MET final state ("muon channel").In both cases, only a single process (corresponding to diagram (b) in figure 1), along with its CPconjugate, contributed to the final signal.The upper limit on an admissible signal was thus directly translated into an upper bound on the mixing angle U 2 e or U 2 µ , depending on the channel.The situation changes once we consider a realistic seesaw model with 2 HNLs: 1.In such a model, several processes contribute incoherently 2 to each final state.The upper bound on an admissible signal in any channel thus translates non-trivially into limits on all three mixings angles (U 2 e , U 2 µ , U 2 τ ). 2. Any set of mixing angles consistent with neutrino oscillation data leads to observable signals in both the e ± e ± µ ∓ and µ ± µ ± e ∓ channels, therefore the statistical procedure should take into account that the signal is non-zero in both channels.3. Different processes that contribute to the same tri-lepton final state have different kinematics (due in part to spin correlations [67]).Therefore the signal efficiencies need to be evaluated separately for every process.4. We consider 2 HNLs with nearly degenerate masses.Due to HNL oscillations (cf.[67] or [68][69][70][71][72][73][74] for earlier works) tiny mass differences (well below the mass resolution limit of ATLAS) can significantly affect the interference pattern, leading to the suppression or enhancement of some processes as compared to the single HNL case, see e.g.[70,[75][76][77].Since different processes (such as those in figure 1) have different kinematics and thus efficiencies, this implies that the overall signal efficiency depends not only on the mixing angles, but also on the level of the HNL mass degeneracy.In order to account for this, we present our analysis for two limiting cases: the "Majorana-like" and "Dirac-like" limits (which we will define in section 2).
All these points make it impossible to reinterpret the ATLAS results by just rescaling them (as done e.g. in ref. [78]).Instead one should perform a full signal and background modeling and evaluate the signal selection efficiencies.Although this can only be done properly by the collaboration itself, thanks to their access to the full detector simulation, the analysis framework and the actual counts in the signal regions, we will demonstrate that one can nonetheless reproduce the original ATLAS limits sufficiently well for the purpose of reinterpretation.Finally, we will briefly discuss what data from the collaboration could simplify our analysis and make it more precise, in the spirit of the recommendations from the LHC Reinterpretation Forum [64].
The present paper is organized as follows: In section 2 we introduce the notion of "realistic" seesaw models.To this end, we review the so-called type-I seesaw mechanism, discuss how neutrino oscillation data constrain its parameters, and examine how interference effects between multiple HNLs can completely change their phenomenology.We then describe our analysis procedure in section 3: we present the event selection, detail the calculation of the expected signal and efficiencies, and discuss our background model as well as the statistical method used to derive the exclusion limits.In section 4, we finally present our reinterpretation of the ATLAS limits on promptly-decaying HNLs within a realistic seesaw model with 2 HNLs, and we comment on these results.We conclude in section 5, and summarize what data should ideally be reported by experiments in order to allow reinterpreting their limits easily and accurately within realistic models.
The Lagrangian of the model reads where L SM is the usual SM Lagrangian and ν R I are new right-handed particles that are SM gauge singlets.In the present paper we will consider the case of two HNLs, therefore the index I runs over 1, 2. L α are the left-handed lepton doublets labeled with the flavor index α = e, µ, τ and Φ = iσ 2 Φ, where Φ is the Higgs doublet.F αI is the matrix of Yukawa couplings in the basis where the Yukawa couplings of charged leptons and the Majorana mass M I of the right-handed neutrinos are both diagonal.After electroweak symmetry breaking, the Higgs field in the Lagrangian (2.1) obtains a vacuum expectation value Φ = (0, v) T and the Yukawa interaction terms in eq.(2.1) effectively become Dirac mass terms coupling the left and right chiral components of the neutrinos.Since the right-handed neutrinos have, in addition, a Majorana mass, the spectrum of the theory is obtained by diagonalizing the full mass matrix.
For |F αI v| |M I | one finds after the diagonalization 3 light mass eigenstates ν i with masses m 1 , m 2 , m 3 and two heavy mass eigenstates N I -the HNLs -with masses M 1 and M 2 . 3As a consequence, the flavor eigenstates (SM neutrinos) ν Lα can be expressed as a linear combination of the 5 mass eigenstates as where V pmns is the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix (see e.g.[81]).As a result, the heavy mass eigenstates N I contain an admixture of SM neutrinos ν Lα , and therefore possess "weak-like" interactions, suppressed by the mixing angles Θ αI , approximately given by

Parametrization of the Yukawas
The Lagrangian (2.1) contains 11 new parameters, as compared to the SM one [3].These parameters are, however, constrained by neutrino oscillation data [82].Five neutrino parameters have already been measured: two mass differences (∆m 2 atm and ∆m 2 sun ) and three mixing angles (θ 12 , θ 23 , θ 13 ).The remaining unknown parameters are the mass of the lightest neutrino, two Majorana phases, and the CP -violating phase δ.Our a priori choice of two HNLs restricts the mass of the lightest neutrino to be zero and only allows a certain combination of the Majorana phases to be independent.As a result, we are left with only two unknown parameters in the active neutrino sector, in addition to the discrete choice of the mass ordering. 4he measured low-energy parameters mean that for any choice of heavy neutrino masses M I , the Yukawa couplings F αI are not completely free.To account for this, we can parametrize the neutrino Yukawa couplings using the Casas-Ibarra parametrization [86]: where the matrix M diag = diag (M 1 , M 2 ), and R is a complex 3×2 matrix satisfying R T R = 1 2×2 .For the PMNS matrix we use the standard parametrization [79].We parametrize the relevant combination of the Majorana phases in the PMNS matrix as η = In the model with two right-handed neutrinos, the matrices R depend on the neutrino mass hierarchy and are given by with a complex angle ω = Re ω + i Im ω, and a discrete parameter ξ = ±1.Changing the sign of ξ can be undone by ω → −ω along with N 2 → −N 1 [87], so we fix ξ = +1.

Heavy neutrino mixing
The weak-like interactions of HNLs are suppressed by the mixing angles Θ αI defined in eq.(2.3).These mixing angles may contain complex phases, which play no role for the processes that we consider. 5Only the cumulative effects of both N 1 and N 2 contributes to the observed signal and therefore the experimentally measurable quantities are which respectively quantify the total HNL mixing to a particular flavor and the overall mixing between HNLs and neutrinos of definite flavor.The latter quantity has a particularly simple form in terms of the neutrino masses and Casas-Ibarra parameters: ation of neutrinoless double beta decay experiments may provide information on the Majorana phases [83], while the CP -violating phase δ is already constrained by T2K [84], with further improvements expected from the DUNE experiment [85]. 5These complex phases can be important if the period of HNL oscillations is comparable with the size of the experiment, see e.g.[67] and references therein.[92,93] fit to neutrino oscillation data, at the 1, 2 and 3σ levels, for the normal and inverted hierarchies.The markers denote the selected benchmark points, which are meant to represent both typical and extreme ratios of the squared mixing angles. where we can display the combinations of U 2 α which are compatible with neutrino oscillation data using a ternary plot as in figure 2, cf.[78,[89][90][91].In our analysis, we used the most recent global fit to neutrino oscillation data, NuFIT 5.0 [92,93].The shape of the allowed regions depends on the values of the Dirac phase δ and of the active neutrino mixing angle θ 23 .We have used the three-dimensional projections of ∆χ 2 provided by NuFIT 5.0 in order to determine the 1, 2 and 3σ contours presented in figure 2. 6 In order to better visualize the correspondence between the exclusion limits and various points in the allowed regions, we have defined a number of benchmarks, which are represented in figure 2.

Quasi-Dirac HNLs, lepton number violating effects and relevant limits
As neutrino oscillations do not constrain the masses of HNLs, M 1 and M 2 can be arbitrary.In this work we choose to consider the case where (2.9) The motivation for this scenario is twofold.First, the mass degeneracy of two HNLs allows for sizable mixings between active neutrinos and HNLs in a technically natural way [75][76][77][95][96][97][98][99][100][101].Secondly, low-scale leptogenesis (see the recent work [31] and references therein) requires a mass degeneracy between two heavy neutrinos.The mass splitting between the HNLs needs to be especially tiny if one wants to create the initial conditions required for the generation of sterile neutrino dark matter in the early Universe [28,34,102].
In the limit M 1 ≈ M 2 there is an approximate global U (1) symmetry in the theory. 7In this quasi-Dirac limit of the two-HNLs model, the lepton number violating (LNV) processes (such as 1(b)) are suppressed compared to the lepton number conserving (LNC) processes.When M 1 = M 2 but ∆M M N , HNL oscillations take place, as discussed in e.g.[7,[67][68][69][70][71][72][73][74].As a result, lepton number violation may not be suppressed any more.Rather, the rates of LNC and LNV processes undergo a periodic modulation as a function of the proper time τ = (x D − x P ) 2 between the HNL production and decay vertices [67]: with the (+) sign for LNC and (−) for LNV, and where d Γlnv/lnc αβ is the differential rate for a tri-lepton process mediated by a single Majorana HNL N in the (unphysical) limit of a unit mixing angle between the HNL and the active flavor α at its production vertex, with flavor β at its decay vertex, and without the absorptive part; where Γ def = Γ 1 ∼ = Γ 2 and by assumption Θ α2 ∼ = ±iΘ α1 .Notice how in this quasi-Dirac limit, the oscillation pattern does not explicitly depend on the lepton flavors α and β, but only on whether the process is LNC or LNV.If ∆M vanishes exactly, then HNLs form a Dirac fermion and LNV effects are completely absent.Equation (2.10) demonstrates the two limiting cases of the two-HNLs seesaw model: where τ must satisfy both τ Γ 1 and γτ L det (whichever is stronger), with Γ denoting the total HNL width, γ its boost factor, and L det the typical detector size.
the Super-Kamiokande collaboration [94].Our choice of benchmark models is only slightly affected by this choice, and this does not qualitatively change our analysis or conclusions. 7The symmetry becomes exact when M1 = M2 and Θα1 = ±iΘα2.In this limit active neutrinos become massless and the two HNLs form a single Dirac particle Ψ such that In this work we will consider these two limiting cases for quasi-Dirac HNLs: • Dirac-like: the pure Dirac (∆M = 0) limit where all LNV effects are completely absent, and LNC rates are coherently enhanced by a factor of 2; • Majorana-like: the ∆M τ 2π limit where both LNV and LNC processes are present, with the same integrated rates.Comparing these two limiting cases for the same benchmark models allows to assess the level of uncertainty introduced by the unknown ∆M .

Procedure
In order to reinterpret the limits from the ATLAS prompt search [1] (with extra details in the Ph.D. thesis [103]) we have tried to reproduce the ATLAS analysis as accurately as possible.Our signal is simulated using MadGraph5_aMC@NLO [104] with the HeavyN model [105,106] (section 3.2).For the event selection (section 3.1), we have implemented the ATLAS cut flow and obtained comparable efficiencies (section 3.3).We take the total background counts from the ATLAS publication [1] (section 3.4).Finally, in order to compute the limits (section 3.5), we use the CL s test statistics, along with a very simplified treatment of uncertainties.

Event selection
The prompt ATLAS analysis [1] considers the final states consisting of three isolated charged leptons (with electron or muon flavor) with no opposite-charge same-flavor lepton pairs (in order to limit the background from Z decays), i.e.only e ± e ± µ ∓ (electron channel) and µ ± µ ± e ∓ (muon channel) are considered.It focuses on HNLs which are sufficiently short-lived that their decay vertex can be efficiently reconstructed using the standard ATLAS tracking algorithm.Since our reinterpretation will include a number of processes not included in the original ATLAS analysis8 and having different kinematics (e.g.LNC processes, which are absent in the single-flavor mixing assumption), we cannot use the published ATLAS efficiencies and we have to compute them on our own.As we will see, imposing the same cut flow allows reproducing the ATLAS efficiencies with sufficient accuracy for the purpose of this reinterpretation.The list of cuts is shown in table 1, and their order roughly follows that of ref. [103].When different cuts were applied to the 2015 and 2016 datasets, we use the 2016 cuts, since the 2015 dataset is smaller than the 2016 one by about an order of magnitude.
1. We start by applying a cut on the distance of closest approach to the origin in the r-z plane: |∆z 0 sin(θ)| < 0.5 mm for the leading lepton9 and |∆z 0 sin(θ)| < 1 mm for the remaining ones.
2. Next, we apply the default transverse momentum and pseudorapidity requirements on the three changed leptons, i.e. p T > 4.5 GeV and |η| ∈ [0, 1.37[∪]1.52,2.47[ for all electrons and p T > 4 GeV and |η| < 2.5 for all muons.3.Then, we simulate the selection performed by the trigger by applying the relevant p T requirements, as found in ref. [1], ch.4.1, §1.For the single-electron trigger used in the electron channel, we do not apply the ID requirements, since the ID efficiency is difficult to accurately estimate.4. We then apply the trigger offline requirements on the two leading leptons: p T (e lead ) > 27 GeV and p T (e sublead ) > 10 GeV for the electron channel and p T (µ lead ) > 23 GeV and p T (µ sublead ) > 14 GeV for the muon channel. 5. Next, we require the tri-lepton invariant mass M 3l to be in the interval ]40, 90[ GeV. 6.We then apply a weight to each lepton in order to simulate the efficiency of lepton isolation.We use the p T -differential isolation efficiencies reported in ref. [107] for electrons and ref. [108] for muons, using the "loose" working point in both cases.7.For the electron channel only, a further cut is applied on the invariant mass of the e ± e ± pair, M (e, e) < 78 GeV, in order to veto the background from Z → e + e − where one of the electron charges is misreconstructed.8.The missing transverse energy is then restricted to E miss T < 60 GeV.
Finally, the events passing the above cuts are binned in M (l sublead , l ), which approximates the invariant mass of the HNL for small HNL masses (for which the leading lepton is usually the prompt lepton).The bins are [0, 10[,[10,20[,[20,30[,[30,40[ and [40,50[ GeV.Our cut flow is summarized in table 1.One notable difference with the ATLAS paper is the absence of a b-jet veto in our analysis, which we omitted since b-jets appear in only O(1%) of the signal events, therefore this cut would remove almost no signal at truth level.For this reason, we have not generated b-jets in our final samples.A further difference comes from the cuts related to the displacement of the leading lepton.ATLAS additionally imposes |d 0 /σ(d 0 )| < 5 (electron) or < 3 (muon), while we only impose the |∆z 0 sin(θ)| cut and omit the d 0 cut, since we do not know σ(d 0 ) well enough. 10This most likely does not affect the signal at truth level, since the leading lepton has a very small displacement in all relevant cases: for light HNLs, the leading lepton is almost always the prompt lepton from the W decay, while heavier HNLs decay with a very short displacement due to their much shorter lifetime.We also decided to omit the lepton identification (ID) requirements, whose efficiency is harder to model for electrons due to being significantly less smooth [107] than the isolation one, in particular for the "tight" working point and at low p T .For muons the ID efficiency is close enough to 1 [108] that it can probably be safely neglected.Our attempt at implementing this cut only resulted in a significantly decreased accuracy for the efficiency estimates.A possible cause could be that the tabulated efficiencies have been computed using different sets of triggers and cuts and therefore cannot be transposed directly to the present analysis.

# Electron channel
Muon channel Table 1: Our cut flow for the electron and muon channels.The † indicates cuts which differ between 2015 and 2016 (the 2016 cuts were used in this analysis).Lepton identification and |d 0 /σ(d 0 )| cuts have been omitted due to the complexity of their implementation.

Signal
In order to reinterpret the sensitivity of the ATLAS prompt HNL search for arbitrary combinations of HNL masses M N and ratios of mixing angles, we need to be able to compute the expected signal counts in each M (l sublead , l ) bin in each signal region, for any model parameters.We do so using a simple model, described below.

MadGraph setup
The signal processes contributing to each channel are listed in tables 2 and 3.11 For Majorana-like HNL pairs, all processes contribute, while for Dirac-like HNL pairs only those which conserve the total lepton number (∆L = 0) contribute (with a factor-of-2 enhancement for the total cross section).
For each process, we generate a Monte-Carlo sample which will be used to compute both the cross section and the efficiency.Each sample consists of ∼ 40000 weighted events generated at leading order using MadGraph5_aMC@NLO v2.8.x [104] along with the HeavyN model [105,106] (specifically, we use the SM_HeavyN_CKM_AllMasses_LO model 12 ), which includes the non-diagonal CKM matrix as well as the finite fermion masses.The center of mass energy is set to √ s = 13 TeV and the integrated luminosity to L int = 36.1 fb −1 , in order to match the parameters of the 2019 prompt analysis.We Table 3: Signal processes contributing to the muon channel.Up to two additional hard jets have been included in the process string, but are omitted here for brevity.generate the processes listed in the "MadGraph process string" column in tables 2 and 3, with up to two additional hard jets (excluding b-jets).Pythia 8 is then used (through the MadGraph interface) to shower and hadronize the events.We use the event weights and the merged cross section reported by Pythia.

Signal computation for arbitrary model parameters
In order to obtain the physical cross section, a number of model parameters need to be specified: the HNL mass M N , its mixing angles 13 |Θ e |, |Θ µ | and |Θ τ | and its total decay width Γ N .Generating a new sample for every set of parameters would be computationally prohibitive.Fortunately, we can leverage the scaling properties of the cross section in order to exactly recompute it for each new set of mixing angles.This is done as follows.
As a first step, we generate Monte-Carlo samples for all the processes listed in tables 2 and 3, for each HNL mass M N ∈ {5, 10, 20, 30, 50} GeV and using the reference parameters |Θ| ref = 10 −3 and Γ ref = 10 −5 GeV as placeholders for the remaining model parameters. 14 For each process P , we only set the relevant mixing angle |Θ α(P ) | and |Θ β(P ) | to |Θ| ref , where α(P ) and β(P ) respectively correspond to the generations coupling to the HNL at production and decay, as listed in tables 2 and 3.
The key observation here is that the branching fraction of 13 Since we are dealing with 2 HNLs far from the seesaw line, Θα2 ∼ = ±iΘα1 [75,76].We generate the Monte-Carlo samples for a single HNL with parameters Θα α , see eq. (2.6). 14These parameters allow for the successful numerical integration in the narrow width approximation.
the cross section for a given process P is proportional to Starting from the reference cross section σ ref P obtained for the reference parameters, this allows to extrapolate the physical cross section to new parameters: Since the total HNL width enters this formula, we need to be able to compute it for arbitrary parameters too.To this end we follow a similar approach.We notice that the partial width into a given decay channel D is proportional to |Θ β(D) | 2 , where β(D) denotes the flavor with which the HNL mixes when decaying.Summing over all decay channels and all three flavors, we can then express the total decay width as: where Γβ (M N ) = Γ N (M N , δ βe , δ βµ , δ βτ ) is the total decay width obtained by setting Θ β = 1 and the two other mixing angles to zero.It can be easily computed with MadGraph by generating the n1 > all all all process.This extrapolation method, which makes use of the scaling properties of the relevant branching fractions, has been successfully validated by explicitly computing the cross section for a few non-trivial benchmark points and comparing the results.The contribution N P of a given process P to the total event count (before applying any selection) is then obtained by multiplying the relevant cross section by the integrated luminosity:

Signal computation for quasi-Dirac HNLs
Finally, since the signal samples have been computed for a single Majorana HNL, we need to apply a correction factor c P to each cross section when considering a quasi-Dirac HNL pair.If this HNL pair is Majorana-like (i.e. it has both LNC and LNV processes with equal rates), then all cross sections must be multiplied by 2, since there are two mass eigenstates whose event rates add incoherently.However, for a Dirac-like HNL pair (which only has LNC processes), the LNC cross sections must be multiplied by 4 due to the coherent enhancement discussed in section 2.4, while the LNV ones should all be set to zero.Unlike in the case of a single Dirac fermion, no correction to the total HNL width needs to be applied.The correction factors are summarized in table 4.

Efficiencies
In order to obtain a sensitivity estimate, we must compute the expected signal count in every M (l sublead , l) bin reported by the ATLAS collaboration. 16This is done by multiplying the true signal count by a signal efficiency.Since the relative contributions of the various diagrams -which all have different kinematics and therefore different efficiencies -depend of the mixing angles Θ quasi-Dirac αI = Θ Dirac αI / √ 2. 16 We consider both signal regions (for the e ± e ± µ ∓ and µ ± µ ± e ∓ signatures) simultaneously, so there are 10 bins in total: 5 in the electron channel and 5 in the muon channel.  1 4 0 1 Table 4: Multiplicative coefficients c P to be applied to the cross section of each process P , and c Γ to be applied to the total HNL width Γ N , depending on the HNL(s) nature and on whether the process is LNC or LNV.
on the model parameters, in general we expect the signal efficiency to depend on the mass M N , nature (Majorana-like or Dirac-like), lifetime τ N and all the mixing angles of the quasi-Dirac HNL pair.However, when considering a single process/diagram, the nature and mixing angles "factor out" such that the efficiency for this process depends only on the mass and lifetime of the HNL.We therefore need to compute one efficiency P,b (M N , τ N ) for every process P and every bin b.The total event count in bin b is then computed by summing over all the processes: where c P is the correction factor applied to the cross section for quasi-Dirac HNLs.
For a given process P and bin b, the efficiency P,b (M N , τ N ) is computed by filtering the corresponding Monte-Carlo sample through the cut flow described in section 3.1 and table 1.The binned efficiency is then: (weights of events after cuts, which end up in bin b) (weights of all events before cuts, from any bin) (3.4) where the sums run over all events generated for the process P and the events which fail to pass a given cut have their weight set to zero. 17Similarly, we can obtain the unbinned efficiency as: (all event weights after cuts) (all event weights before cuts) . (3.5) The unbinned efficiencies for the four LNV processes are plotted in figure 3 along with the efficiencies reported by ATLAS in ref. [1], while those for LNC processes are plotted in figure 4. Since the efficiency of a process depends on both the HNL mass and its lifetime, we had to choose a set of benchmark points to produce figures 3 and 4. In order to be able to compare our efficiency calculation with the ATLAS efficiencies, we have chosen the same benchmarks as reported in ref. [1] and reproduced in table 5. Our estimate is reasonably accurate for the muon channel, with a mean relative error 18 of 18% (maximum 48%), but 17 Some cuts (such as lepton ID and isolation cuts) are implemented by reweighting events using tabulated efficiencies. 18We define the relative error on the total efficiency as -14 -HNL mass M N 5 GeV 10 GeV 20 GeV 30 GeV 50 GeV HNL lifetime τ N 1 mm 1 mm 0.1 mm 0.01 mm 1 µm Table 5: Benchmark points (taken from ref. [1]) used to plot the efficiencies in figures 3 and 4. Note that our calculations are more general, and work for any combination of M N and τ N .
less so for the electron channel, with a mean relative error of 38% and a factor of ∼ 4 in the worst case (which corresponds to the lowest HNL mass hypothesis M N = 5 GeV).The main difference between the two channels comes from the larger reliance on the electron ID (which we ignore) in the electron channel.Indeed, the electron ID is used for the single-electron trigger as well as for the ID cuts on both electrons; and contrary to the "loose" muon ID [108, fig. 12] used for muons, its efficiency can be significantly smaller than 1 [107, fig. 17].This omission could contribute to the worse agreement between signal efficiencies in the electron channel.Another potential factor could be the large HNL displacement.The displacement has not been taken into account when tabulating the isolation efficiencies (computed for Z → ll in refs.[107,108]).This would explain why the discrepancy is stronger for larger cτ N γ N .Comparing figure 4 with figure 3, we also notice that the efficiencies for LNC processes can be significantly smaller than for LNV processes.This is mostly due to the different spin correlation patterns [67,110] for LNC vs. LNV leading to different lepton spectra and to different geometrical acceptances of the lepton p T and displacement cuts.
Since the original Monte-Carlo samples used for this analysis did not take spin correlations into account, and were generated under the single-flavor mixing hypothesis, the cut flow has been optimized under these assumptions.In principle, this might lead to a sub-optimal cut selection when it is applied to more realistic models (which we eventually hope to observe).For this reason, we would generally recommend performing the cut optimization using a set of signal samples which are representative of realistic models (instead of simplified benchmarks) and which have been generated using a Monte-Carlo event generator (such as MadGraph) which can model spin correlations.However, in the present case, it seems that most cuts were chosen solely based on the minimal requirements imposed by the existing triggers, and therefore would not have been very different had the cut optimization been performed with more realistic models in mind.
Even using the extrapolation method described above and eq.( 3.3), one efficiency P,b (M N , τ N ) must in principle still be computed for every process P , bin b, HNL mass M N and lifetime τ N .However, several simplifications exist.First, the efficiencies for the full set of M (l sublead , l ) bins (keeping the other parameters fixed) can be computed simultaneously, since the events only need to go through the cut flow once, before the binning is applied.More interestingly, it also turns out that the τ N dependence can be quite accurately parametrized using a simple functional form (τ N ).This functional form can be constrained by requiring the following asymptotic behavior:

W
(N e + e ) Figure 3: Cumulative unbinned signal efficiencies (for the total event count, i.e. summed over all bins) after applying each cut listed in table 1, computed for the benchmark points found in ref. [1].The black dashed line denotes the total efficiencies reported in ref. [1], table 2, and should be compared to the gray line with diamond markers (which corresponds to all cuts being applied).These efficiencies are for lepton number violating (LNV) processes only, since these were the only relevant processes in the original prompt search.
• (τ N ) ∝ 1 τ N for sufficiently large τ N .The "simplest" functional form satisfying these two conditions is: with 0 the prompt efficiency and τ 0 the typical lifetime after which the efficiency starts to drop due to the HNL displacement.After fitting it to the efficiencies which have been explicitly computed for a number of lifetime points, this model can be used to extrapolate the efficiency to arbitrary HNL lifetimes.As an example, the model, along with the lifetime points used for the fit, are presented in figure 5 for both the binned and unbinned efficiencies, for the W + → e + (N → e + µ − νµ ) process with a 30 GeV HNL.The relative

W
(N e + ) Figure 4: Cumulative unbinned signal efficiencies (for the total event count, i.e. summed over all bins) after applying each cut listed in table 1, computed for the benchmark points found in ref. [1], for lepton number conserving (LNC) processes.The gray line with diamond markers corresponds to the total efficiency.
error between the data and the model is 10% (on top of the statistical error).The efficiencies for other processes and mass points display a similar behavior.
Thanks to these simplifications, for each HNL mass M N and process P , the efficiencies need only be computed for 3 or more lifetime points (we used 13) in order to obtain the full lifetime dependence along with an error estimate.This amounts to 12 or more Monte-Carlo samples per mass point for Dirac-like HNL pairs, and 24 or more for Majorana-like HNL pairs. 19Lifetime reweighting can additionally be used to simulate intermediate lifetimes without having to generate new samples.This makes the approach computationally tractable (although expensive) for experiments who would like to report their efficiencies in a benchmark-agnostic way, while still using their full detector simulation.

Background
A number of Standard Model processes can mimic the signatures that we are looking for.This can happen if these processes have the same final state (irreducible background) or if they are misreconstructed as the same final state (reducible background) due to fake leptons (i.e.non-prompt leptons from jets or leptons from pileup).ATLAS has found the irreducible background to be subdominant [1], and the main background components to be multi-fakes (multiple fake leptons coming from W +jets or multiple jets) as well as t t with a fake lepton.Each of these background sources comes with statistical uncertainties.The kinematic distribution of the multi-fake sample is estimated from data using a number of estimation regions, then normalized by fitting a normalization factor µ mf to the three control regions.Due to the finite sizes of the data samples, both of these steps introduce statistical errors into the multi-fake estimate, with potentially non-trivial correlations between the M (l sublead , l ) bin counts, which we are ultimately interested in.Similarly, the finite size of the t t Monte-Carlo sample and the finite event counts in the control regions used to estimate its normalization factor µ t t also introduce statistical errors into the t t estimate.
The detailed uncertainties (including correlations) of the individual background components are not listed in ref. [1].Performing a detailed background analysis is out of the scope of the present paper.Instead, we have decided to use a simplified background model, which only takes into account the total background count in each bin, but is nonetheless capable of providing a good enough approximation of the sensitivity for the purpose of this reinterpretation.
To this end, the total background count in each channel and each M (l sublead , l ) bin, along with its uncertainty band, is digitized from figure 5 in ref. [1].Since the uncertainties on the individual components of the background are unfortunately not reported, implementing a statistical test necessarily requires some guessing on our side.After experimenting with several well-motivated background models and selecting the one which leads to the best approximation of the ATLAS limits, we have decided to model the uncertainty as being entirely caused by a single, Gaussian-constrained normalization factor µ tot .In other words, we assume that the background expectations in the various M (l sublead , l ) bins are maximally correlated.This is consistent with the observation that the statistical errors on the normalization factors µ mf and µ t t are among the leading uncertainties.The accuracy of this simplified model will be explicitly tested in section 3.5.

Statistical limits
Ref. [1] found a very good compatibility between the observed counts and the backgroundonly hypothesis.They then proceeded with exclusion limits by testing the compatibility of the observed counts under the signal + background hypotheses for five different benchmark points in the (mass, lifetime) space, each for two different mixing patterns: with electron or muon flavor.
In order to define the exclusion limit, ATLAS uses the CL s test [111].For completeness, a quick reminder about the CL s technique follows in section 3.5.1.Knowledgeable users are welcome to skip it and go directly to section 3.5.2.

CL s technique: a general reminder
The CL s technique is based on the likelihood-ratio test statistics, more specifically on: where L denotes the likelihood, x the data, H b the background-only hypothesis and H s+b a signal + background hypothesis.Larger values of t indicate more signal-like data.The distribution of t is estimated under each hypothesis through the use of pseudo-experiments X: p b (t) = P(t(X)) for X ∼ H b and p s+b (t) = P(t(X)) for X ∼ H s+b .Given an observation x obs and the corresponding value of the test statistics t obs = t(x obs ), the CL b and CL s+b values are then computed as: In other words, CL b and CL s+b are the probabilities of obtaining a dataset that is more background-like than the observed one, respectively under the background and signal + background hypotheses.Both increase for increasingly signal-like x obs .Finally, the value of the CL s test statistics is given by the ratio: and a given signal + background hypothesis H s+b is considered to be excluded if CL s < 0.05.For any signal stronger than the CL s = 0.05 limit, the probability of a type-I error (false exclusion) will always be less than 0.05.In order to complete the statistical analysis, the likelihood remains to be specified.We will proceed with this in the following section.

CL s technique: implementation
The observables in question are the event counts in the two signal regions (for the electron and muon channels), each channel consisting of 5 M (l sublead , l ) bins.Since we will be dealing with non-trivial combinations of mixing angles, we simultaneously include both channels in our likelihood.We thus end up with 10 bin counts {x i }, with i = 1 . . . 5 for the electron channel and i = 6 . . . 10 for the muon channel.As discussed in section 3.4, we model the background as a set of expectation values {b i } for each bin i = 1 . . . 10 (taken from the ATLAS paper) along with a Gaussian-constrained normalization factor µ tot with standard deviation , where the − and + superscripts respectively denote the lower and upper uncertainty bands from the ATLAS plot (see table 6).The signal is modeled as a set of signal expectations {s i }, i = 1 . . .10, which we compute for each set of the model parameters (M N , Θ e , Θ µ , Θ τ ) using the method described in sections 3.2 and 3.3.Contrary to ATLAS, we do not use a signal strength parameter µ, since this would amount to rescaling the mixing angles without changing the lifetime, leading to inconsistent results. 20We neglect all uncertainties on the signal counts, which we have estimated to be at the sub-percent level.The bin counts x i are assumed to be Poisson distributed, with expectation values of respectively µ tot b i for the background-only hypothesis and µ tot b i + s i for the signal + background hypothesis.The full likelihood for the signal + background hypothesis is thus: where The likelihood for the background-only hypothesis H b is obtained by setting the signal s i to zero in eq.(3.11).
In order to validate our simplified statistical analysis, we can compare the limits that it produces to the limits obtained by ATLAS, when using the exact same counts as ATLAS (extracted again from figure 5 in ref. [1]).In order to perform this comparison, a few changes need to be made.First, we need to reintroduce the signal strength parameter µ.Second, we need to consider both channels separately.After making these changes, we  obtain the limits shown in figure 6.The mean ratio between our limits and the ones from ATLAS is 0.64, and the worst-case ratio is 0.42.Although not fully satisfactory, this discrepancy should still be small enough to allow us to reliably compare limits which differ by an order of magnitude or more, as we will do in the next section.This is especially true when the reinterpreted limits are all computed using the same method.

Results
Below we present our results -the exclusion limits for the model with two HNLs.We calculate exclusions for each of the benchmark points defined in figure 2. Benchmarks are Figure 7: Original (black lines) and reinterpreted (colored lines) 95% exclusion limits on the total mixing angle U 2 tot = α=e,µ,τ I=1,2 |Θ αI | 2 for a Majorana-like HNL pair for the normal (left) and inverted (right) mass orderings.The black lines are limits obtained under the single-flavor assumption, while the solid colored lines denote those obtained for the benchmark points defined in figure 2. When scanning over all ratios of mixing angles allowed by neutrino oscillation data, the exclusion limits span the blue (green) shaded regions.Correspondingly, the gray filled area is excluded at CL > 95% for all possible ratios of mixing angles, and thus constitutes an exclusion limit independent of the specific choice of mixing angles, valid as long as we consider the two HNL model explaining neutrino oscillations.
chosen in such as way as to represent both typical and extreme ratios of the mixing angles U 2 e : U 2 µ : U 2 τ .As each benchmark fixes the mixing pattern, our results are most compactly expressed as exclusion limits for the total mixing angle U 2 tot = U 2 e + U 2 µ + U 2 τ (eq.(2.7)).Figures 7 and 8 present our results for the Majorana-and Dirac-like cases respectively.The limits for the flavor mixing angles U 2 α are presented in figures 9 to 12.All these limits are the observed exclusion limits, and all of them (including the single-flavor limits) have been derived using the same statistical method, 21 which we described in 3.5.
The legend for these plots is as follows.The thick dashed and dotted lines in each plot represent the exclusion limits obtained under the assumption of a Majorana-like HNL pair mixing with a single flavor (respectively the electron and muon flavor).Up to a factor of 2, this corresponds to the scenario considered by ATLAS in the current prompt search [1].These limits are grayed out in the plots for the Dirac-like pair in order to emphasize that the search has no sensitivity to the Dirac-like case for the single-flavor mixing.The solid colored lines denote the exclusion limits obtained for the various benchmark points defined in figure 2. The benchmarks can be identified using the numbers in the right margin.The colored, filled area represents the set of possible ("benchmark-dependent") 22 limits spanned by all the combinations of mixing angles allowed by the NuFIT 5.0 neutrino data (at 95% CL). 23In other words, it shows the dependence of the exclusion limits on the specific combination of mixing angles, within the constraints from neutrino oscillation data (which are represented by the similarly-colored area in figure 2).Finally, the gray filled area denotes the set of mixing angles which are excluded at the 95% level for all the allowed ratios of mixing angles.It thus represents the most conservative (benchmarkindependent) limit that can be obtained for a given model.No choice of mixing angles that is in agreement with neutrino oscillation data (within the 2 HNL seesaw model24 ) can produce a limit within the gray filled region.

Majorana-like HNL pair
Let us first consider the case of a Majorana-like HNL pair, which is closer to the "single Majorana HNL" model considered by ATLAS and many other experiments.The relevant limits are shown in figures 7, 9 and 10.Apart from a trivial factor of two due to the two nearly degenerate mass eigenstates, the main difference with ATLAS is that in a  for a Majorana-like HNL pair and for the normal hierarchy.The legend is the same as in figure 7 and the rightmost figure coincides with the left panel in figure 7.
realistic seesaw model the HNLs must mix with all three flavors at the same time.Looking at the total mixing angle in figure 7, we immediately notice that the limits on U 2 tot are weaker than the single-flavor mixing limits for all our benchmarks, sometimes by more than an order of magnitude.The pattern is obvious for the normal hierarchy (but also visible for the inverted one): the benchmark points which have the strongest tau fraction x τ = U 2 τ /U 2 tot also have the worst sensitivity.This was already observed in ref. [54], and it is the manifestation of a well-known phenomenon: the introduction of new decay channels (here mediated by the tau mixing) reduces the branching fraction of the HNLs into the search channels.This has an important consequence: exclusion limits derived for U 2 α under the single-flavor assumption do not translate directly into limits on U 2 α in a model where HNLs mix with multiple flavors. 25Instead, such limits must always be recast!When we look at the exclusion limits obtained for the individual mixing angles (in figure 9 for the normal hierarchy and figure 10 for the inverted hierarchy), we observe that for some benchmarks the exclusion limits on individual mixing angles can sometimes be much stronger than the single-flavor limits.This actually reflects a rather trivial fact: if β and the ratio U 2 α : U 2 β is fixed, setting a limit on U 2 β automatically sets a much stronger limit on U 2 α (e.g. the limit set on U 2 e for benchmark 10 in the IH indirectly sets a limit on U 2 µ , which is enhanced by the ratio of the two mixing angles, in this case U 2 µ /U 2 e ∼ 1/2000).In the same way we obtain an indirect limit (filled gray region) on the tau mixing angle, which was not directly probed by this search.This simply reflects the fact that no valid combination of mixing angles which passes the constraints set by ATLAS in both the electron and muon channels, can have a mixing angle U 2 τ with tau above this limit.Although the fact that introducing new constraints (such as fixing the ratio of mixing angles) can increase the sensitivity is not unexpected, it may still be useful when one considers specific sets of model parameters.This situation is not so far-fetched, since this is what happens when performing a scan over the parameter space in order to e.g.combine constraints from multiple sources, which may be complementary if they probe different combinations of mixing angles.For instance, we expect that future experimental results (such as excluding one neutrino mass hierarchy, or observing/setting limits on neutrinoless double-beta decay) will introduce additional constraints on the possible combinations of mixing angles, thus leading to a more predictive model.These potential use cases once again support the reinterpretation of exclusion limits.

Dirac-like HNL pair
Let us now turn our attention to the case of a Dirac-like HNL pair.Unlike in the Majoranalike case, there is no observable lepton number violation in this case, since the HNLs do not have enough time to oscillate among themselves.Its phenomenology thus significantly differs from the one of a single Majorana HNL, usually considered by experiments.In particular, the only lepton number conserving contributions to the experimental signatures considered in ref. [1] come from processes in which the HNL mixes with different flavors during its production and decay (due to the veto of opposite-charge same-flavor trilepton events).This search has therefore no sensitivity to Dirac-like HNLs mixing with a single flavor!By reinterpreting the limits (obtained for one Majorana HNL) within a realistic seesaw model (which requires HNLs to mix with all three flavors), we are nonetheless able to set some exclusion limits for this model.These limits are presented in figures 8, 11 and 12.The legend is the same as for the Majorana-like HNL pair, except for the single-flavor mixing limits which are grayed out in order to emphasize that they were computed for a different model (Majorana-like HNLs) and are only present here for comparison purpose.Looking at our benchmark points, we immediately notice that the limits for the total mixing angle (figure 8) are always weaker than the corresponding Majorana-like/single-flavor limits, sometimes by more than three orders of magnitude.The weakest limits are obtained when one of U 2 e or U 2 µ is suppressed compared to the other, which is unsurprising given that this approximates the single-flavor mixing case, to which the search has no sensitivity.Looking at the colored, filled area, we also observe a wider possible range of limits (with variations by more than two orders of magnitude) compared to the Majorana-like case, depending of the specific ratio of mixing angle chosen.This reflects the fact that the limits now depend mainly on two mixing angles instead of just one, which enhances the benchmark dependence.Finally, similarly to the Majorana-like case, we observe that we can obtain strong benchmark-dependent limits on the individual mixing angles (see figures 11 and 12), as well as some benchmark-independent limits (for this specific seesaw model with a Diraclike HNL pair; see the gray filled area).The latter are significantly weaker (by up to two orders of magnitude) than for a Majorana-like HNL pair, due to the larger variation among benchmarks.
We can summarize the case of Dirac-like HNLs by emphasizing how, despite the absence of sensitivity to the single-flavor mixing case, we nonetheless managed to obtain both benchmark-dependent and benchmark-independent (but still model-dependent) exclusion limits by reinterpreting the ATLAS results within a realistic seesaw model featuring a Diraclike HNL pair.Since the relevant processes now depend on the product of two different mixing angles, limits for Dirac-like HNLs show a stronger dependence on their ratio than limits for Majorana-like HNLs, resulting in weaker benchmark-independent exclusion limits (filled gray area) for this model.Yet, the reinterpretation allowed us to obtain a limit on all three mixing angles (as well as their sum), where there was previously none (from this search).

Reinterpretation
Heavy neutral leptons (HNLs) are promising candidates for explaining neutrino masses and oscillations.Within the seesaw model, their mass scale is not predicted by neutrino masses.Experiments searching for HNLs typically report null results in the form of exclusion limits on the mixing angle with one of the lepton flavors.We emphasize that these constraints are neither model nor benchmark independent.Rather they correspond to limits obtained within a specific model where one HNL mixes with a single flavor.As discussed in section 2, these simplified models are incompatible with the observed neutrino masses and mixing pattern.One may then wonder if the exclusion limits reported within these models remain valid when considering more realistic and theoretically motivated models of HNLs.In this work, we have performed a reinterpretation of the latest ATLAS prompt search for heavy neutral leptons [1] within one of the simplest realistic models: a low-scale seesaw mechanism with two quasi-degenerate HNLs.At least two HNLs are required in order to be compatible with neutrino oscillation data, and the combination of their mixing angles is constrained by the seesaw relation.In particular, for two HNLs, no mixing angle can be zero.
Our aim was to study to which extent the exclusion limits on the HNL mixing angles are model or benchmark dependent and by how much they change when considering our more realistic model.To this end, we have implemented a simplified version of the analysis employed by ATLAS in ref. [1].This reinterpretation was described in details in section 3.
Furthermore, as discussed in section 2.4, the two HNLs must form a "quasi-Dirac" pair (i.e.be nearly degenerate, with a specific mixing pattern) for sufficiently large mixing angles (which may be accessible at current experiments) to be viable.Depending on the specific value of the mass splitting as well as the length scale over which the HNLs are observed, this quasi-Dirac pair may behave either as a Majorana-like or a Dirac-like particle, due to quantum interference between the two mass eigenstates.Only Majorana-like HNL pairs feature lepton number violating decays, and the different spin correlation patterns for LNC and LNV decay chains lead to different signal efficiencies for Majorana-and Diraclike HNLs.Moreover, due to the veto applied by ATLAS on opposite-charge same-flavor lepton pairs (in their prompt HNL search), different diagrams, which depend on different combinations of mixing angles, contribute to the signal regions for Majorana-and Dirac-like HNLs.In particular, the only diagrams contributing to the signal in the case of Dirac-like HNLs involve two different mixing angles, such that there is no sensitivity at all under the single-flavor mixing assumption!In order to handle both the Majorana-and Dirac-like cases, we have performed the reinterpretation for each of them separately.The results were respectively presented in sections 4.1 and 4.2.
For Majorana-like HNL pairs, we have observed that: • The exclusion limit on the total mixing angle U 2 tot is always weaker (sometimes by more that one order of magnitude) in realistic models than for single-flavor mixing.This is essentially caused by the opening of new decay channels (hence reducing the other branching fractions) which do not contribute to the search signature.
• Fixing the ratio of the mixing angles can result in (sometimes significantly) stronger indirect constraints on some of the mixing angles.This can be useful when performing scans over the model parameters.• Assuming the two-HNLs seesaw model and marginalizing over the ratio of mixing angles while keeping the HNL mass fixed, we can obtain limits on the individual mixing angles (including the tau mixing angle, which was not probed directly by this search) which do not depend on their ratio.
For Dirac-like HNL pairs, we have observed that: • Contrary to the single-flavor mixing where the signal was identically zero, in our realistic model no single mixing angle can ever be zero, which ensures that we can always set an indirect (model-dependent) limit.• The limits on the total mixing angle are, however, always weaker (by up to three orders of magnitude) than in the Majorana-like, single-flavor case.• The weakest limits are obtained when one of U 2 e or U 2 µ is suppressed compared to the other.This is expected, since these mixing patterns approximate the single-flavor case.
• Compared to the Majorana-like case, the dependence of the limits on the specific benchmark is stronger.This is likely caused by the fact that the product of two different mixing angles enters the cross section as a factor (instead of a single mixing angle) thus enhancing the parametric dependence.• Similarly to the Majorana-like case, we can also set strong benchmark-dependent limits on the individual mixing angles by fixing their ratio.However, the corresponding marginalized/benchmark-independent limits are significantly weaker (by up to two orders of magnitude) due to the increased benchmark dependence.Our results show that the reinterpretation of the exclusion limits is a necessary step in order to test HNL models which differ from those directly probed by an experiment.In particular, if one interprets the reported limits on some parameter in a given model as exclusion limits on the same parameter in a different model, they risk wrongly excluding part of the parameter space within the latter.This of course does not affect the validity of the limits set by the experiment for the "one-HNL, single-flavor" benchmarks; it just means that one should be cautious when investigating models other than those two initial benchmarks.
When assuming specific choices of model parameters (as in parameter scans), stronger constraints can often be derived for the individual mixing angles.In the case of two HNLs, benchmark-independent constraints can also be derived by marginalizing over all the combinations of mixing angles allowed by neutrino data.For three or more HNLs, we expect most of the above results to remain valid, with the notable exception of the marginalized limits, which become much weaker or even non-existent due to the significantly weaker constraints from neutrino data [65,66].
For experimental results to be useful for constraining a wide range of model and parameters, it is therefore desirable to cast them into a form which allows them to be easily reinterpreted, bearing in mind that the main "drivers" for such interpretations -theorists -are typically unfamiliar with the inner workings of the experiment.Below we outline a concrete proposal for reporting these results in the case of heavy neutral leptons, that would allow for an easy reinterpretation of the exclusion limits.

Wish-list for a painless reinterpretation of future experimental results
The LHC collaborations typically conduct searches in terms of simplified models.Theorists, on the other hand, investigate models which address some of the shortcomings of the SM.Those are typically more complicated, and it is therefore necessary to reinterpret the search results in order to test them.In order to facilitate this reinterpretation, one would greatly benefit from the following data being reported alongside the analysis (see also the recommendations in refs.[64,[112][113][114]): • The observed bin counts.
• The various efficiencies needed to evaluate the signal using the method described in sections 3.2 and 3.3, i.e.: -The prompt efficiency 0 P,b for every process P (as defined above, see footnote 8 on page 9) and every bin b in all signal regions.In simple cases there is a one-to-one correspondence between a Feynman diagram and a process P , as in the charge-current decays considered in this paper.Ideally, all possible processes contributing to the search signature should be included.In the present case this would mean: (a) single and mixed flavor processes; (b) LNV and LNC processes; (c) processes mediated by charge currents, neutral currents and by their interference.
-If the parametrization in eq.(3.6) (or a modification thereof) allows reproducing the actual efficiency even approximately, report the relevant parameters such as the lifetime cutoff τ 0 in our case.This slightly differs from the recommendations of the LHC Reinterpretation Forum [64], which advocates for releasing the object-level efficiencies in order to enable more general reinterpretations.Since the scope of the present reinterpretation is restricted to HNL models, those are not needed, and instead the signal can be more easily and accurately estimated using the simplified signal extrapolation method presented in sections 3.2 and 3.3.26However, we agree with their recommendation to (among many other things) break down the efficiencies for each signal region (or bin b), each topology or final state, and each particle lifetime τ .This directly corresponds to our P,b (τ ) if we include neutrinos in the final state P (which we called "process" to avoid confusion with the visible final state often used by experiments).As an example of how to report these per-process, per-bin efficiencies, appendix A.3 describes the JSON files containing the efficiencies computed using our simplified cut flow.A similar layout could be used to report the actual signal efficiencies from the experiment.
• For the background it is important to release the likelihood function.This can be either: -The "full" likelihood, including every background component and nuisance parameter used in the analysis (to the extent that this is possible).This can be done using tools such as HistFactory [112,115] or pyhf [116].-A simplified likelihood, containing only the dominant background components and nuisance parameters (see e.g.ref. [117] or the simplify [118] package).-The covariance matrix of the background [117], for all the signal bins, across all signal regions (since they need to be fitted together when considering non-trivial models with e.g. both electron and muon mixing).This is in line with the recommendations from the LHC Reinterpretation Forum [64].Finally, to ensure that the reported likelihood is accurate enough for performing a reinterpretation, it is important to validate it, e.g. by comparing the resulting limits with those obtained using the full analysis.To go further and to recast the analysis to a different class of models, which include Feynman diagrams not initially considered, one needs to be able to re-implement the cut flow, rather than use the efficiencies themselves.This requires knowing the efficiency maps for non-trivial cuts such as ID and isolation (as a function of both p T and η).These maps should be conditional on the cuts which appear before them in the cut flow, i.e. they should be computed after applying the cuts appearing before them.This is in line with the recommendation of the LHC Reinterpretation Forum to report analysis-specific efficiencies [64].

Figure 2 :
Figure 2: Ternary plot showing the combinations of mixing angles U 2α /U 2 tot , α = e, µ, τ , which are consistent with the NuFIT 5.0[92,93] fit to neutrino oscillation data, at the 1, 2 and 3σ levels, for the normal and inverted hierarchies.The markers denote the selected benchmark points, which are meant to represent both typical and extreme ratios of the squared mixing angles.

Figure 5 :
Figure 5: Binned and unbinned efficiencies as a function of the HNL lifetime τ N , for the process W + → e + (N → e + µ − νµ ) with M N = 30 GeV.The dots represent the efficiencies calculated explicitly, while the lines correspond to the fitted model.Error bars denote an estimate of the statistical uncertainties from the finite size of the Monte-Carlo sample.

Figure 6 :
Figure 6:Comparison of the limits obtained using our simplified statistical model with the ones observed by ATLAS, using the exact same dataset (i.e.event counts, total background and expected signal).

Figure 8 :
Figure 8: Same as figure7, but for a Dirac-like HNL pair.The single-flavor mixing limits are grayed out because this search has no sensitivity to the Dirac-like case under this assumption; instead the limits for the Majorana-like case are given for comparison.

Figure 10 :Figure 11 :
Figure 10: Same as figure 9, for a Majorana-like HNL pair and the inverted hierarchy.

Figure 12 :
Figure 12: Same as figure 11, for a Dirac-like HNL pair and the inverted hierarchy.

Table 2 :
Signal processes contributing to the electron channel.Up to two additional hard jets have been included in the process string, but are omitted here for brevity.

Table 6 :
[1]kground in 5 invariant mass bins (rows) for the searches in e ± e ± µ ∓ and µ ± µ ± e ∓ channels correspondingly.The values have been digitized from Figure5in[1].Only the total background expectation (without the individual contributions) is shown.