Lepton Flavor Violating Dilepton Dijet Signatures from Sterile Neutrinos at Proton Colliders

In this article we investigate the prospects of searching for sterile neutrinos in lowscale seesaw scenarios via the lepton flavour violating (but lepton number conserving) dilepton dijet signature. In our study, we focus on the final state $e^\pm \mu^\mp jj$ at the HL-LHC and the FCC-hh (or the SppC). We perform a multivariate analysis at the detector level including the dominant SM backgrounds from di-top, di-boson, and tri-boson. Under the assumption of the active-sterile neutrino mixings $|V_{ l N}|^2=|\theta_e|^2=|\theta_\mu|^2$ and $|V_{ \tau N}|^2 = |\theta_\tau|^2=0$, the sensitivities on the signal production cross section times branching ratio $\sigma(p p \to l^\pm N)\times {\rm BR} (N \to l^{ \mp} jj)$ and on $|V_{ l N}|^2$ for sterile neutrino mass $M_N$ between 200 and 1000 GeV are derived. For the benchmark $M_N=500$ GeV, when ignoring systematic uncertainties at the HL-LHC (FCC-hh/SppC) with 3 (20) ${\rm ab}^{-1}$ luminosity, the resulting 2-$\sigma$ limits on $|V_{ l N}|^2$ are $4.9\times 10^{-3}$ ($7.0\times 10^{-5}$), while the 2 -$\sigma$ limit on $\sigma \times {\rm BR}$ are $4.4\times10^{-2}$ ($1.6\times10^{-2}$) fb, respectively. The effect of the systematic uncertainty is also studied and found to be important for sterile neutrinos with smaller masses. We also comment on searches with $\tau^\pm \mu^\mp jj$ and $\tau^\pm e^\mp jj$ final states.


Introduction
The observation of neutrino oscillations provides evidence that at least two of the involved neutrinos are massive. The absolute mass scale of the light neutrino masses has not been measured but it is bound to lie below about 0.2 eV from neutrinoless double beta decay experiments and cosmological constraints, see for instance Ref. [1,2] for recent reviews. The origin of the neutrino masses is a prominent puzzle of today's elementary particle physics, since it is not possible within the Standard Model (SM) to account for it in a renormalisable way. Thus neutrino oscillations are evidence from the laboratory for physics beyond the SM.
In the following, we shall focus on the class of SM extensions with neutral fermions, which are gauge singlets and therefore often referred to as "sterile" neutrinos, and can provide mass terms for the light neutrinos to explain the observed oscillations. In particular, the addition of sterile neutrinos allows for a Majorana-type mass term as well as for Diractype masses via Yukawa couplings with the SM active neutrino fields. The sterile and active neutrinos mix when the electroweak symmetry is broken, resulting in light and heavy mass eigenstates. This mass generating mechanism goes by the name of type-I seesaw and is highly searched for by the particle physics community, cf. e.g. Refs. [3][4][5]. Prominent signatures are the likes of neutrinoless double beta decay and same-sign dilepton searches at proton colliders. Furthermore, this class of models can give an explanation for the observed baryon asymmetry of our universe via leptogenesis and of dark matter, for a recent review see e.g. Ref. [6] and references therein.
In type-I seesaw, one often assumes either tiny neutrino Yukawa couplings or a very high mass scale for the heavy neutrinos in order to explain the smallness of the light neutrinos' masses. This assumption makes it, however, nearly impossible to produce these particles at collider experiments.
Alternatively, one may impose a protective ("lepton number"-like) symmetry, where a slight breaking from this symmetry is responsible for the small mass of the light neutrinos. Various types of symmetry protected seesaw models have been constructed in the literature, cf. for instance [7][8][9][10][11][12]. In this framework neither tiny neutrino Yukawa couplings nor large masses for the heavy neutrinos are required to explain the smallness of the light neutrino masses. Thus heavy neutrinos with masses around the electroweak scale with unsuppressed Yukawa couplings (and thus unsuppressed active-sterile neutrino mixings) are possible, and their effects can be studied at colliders (cf. Ref. [13] for an overview).
Regardless of the underlying model, especially at proton colliders the signatures from sterile neutrinos are often hidden behind comparably enormous rates of SM background for most processes. There are a few processes at high-energy colliders where the background does not pose an unsurmountable problem, the most prominent ones being the lepton number violating (LNV) same sign dilepton ± α ± α final states in the dilepton-dijet channel. However the signal strength of this type of signature is suppressed together with the LNV by the smallness of the neutrino masses, as discussed for instance in Refs. [10,[14][15][16].
On the other hand, as was suggested in Ref. [16], the lepton flavour violating (LFV) (but lepton number conserving (LNC)) dilepton signature, with the final state ± α ∓ β (α = β) (and to some extent also the LFV trilepton signature) has reducible background only while its signal strength is unsuppressed by the light neutrino masses.
Previous collider studies have focused mostly on same-sign dileptons for the LHC, e.g. [17][18][19][20][21][22][23][24][25]. Some studies of this channel can also be found for future accelerators such as the Future Circular Collider (FCC) [26]. Also the trilepton channel has gotten attention recently and triggered some studies of LHC discovery prospects [27][28][29]. Very little attention has been given to the LFV (but LNC) dilepton-dijet channel so far, despite the promising sensitivity obtained from a "first look" at the parton level in Refs. [16,30].
The goal of this article is therefore to present a thorough investigation of the LFV (but LNC) dilepton-dijet channel as a signature from sterile neutrino extensions of the Standard Model, especially the e ± µ ∓ jj final state. Our study goes beyond previous works by discussing relevant backgrounds, performing a fast simulation of the detector response for the signal and background, applying multivariate analysis techniques to separate the signal from the background, as well as including a discussion for the statistical and systematic errors. We provide sensitivities not only for the high-luminosity Large Hadron Collider (HL-LHC) but also for the FCC in the hadron colliding mode (FCC-hh). Our results are also applicable to the Super proton-proton Collider (SppC) [31] depending of course on the final design and the corresponding detector performance.
The article is organized as follows. In section 2, we briefly describe the theory model we used. In section 3, we present the search strategy for LFV dilepton-dijet signals from heavy sterile neutrinos. The results at HL-LHC and FCC-hh are shown in section 4. We conclude in section 5.

The Theory Model
We use a specific realisation which captures the relevant features of the symmetry protected seesaw models for the collider phenomenology as our benchmark model. This realisation involves two heavy neutrinos that supposes a "lepton number"-like symmetry (an extended version of the usual lepton number), which can be found in e.g. [32]. For this collider study it is sufficient to focus on the limit of intact protective symmetry, i.e. symmetry limit, since the signal is lepton number conserving and the light neutrino masses are for collider purposes effectively zero, see below. 1 The benchmark model includes one pair of sterile neutrinos N 1 R and N 2 R which are relevant for the collider phenomenology. The resulting Lagrangian density is given by: where L SM contains the usual SM field content and with L α , (α = e, µ, τ ), and φ being the lepton and Higgs doublets, respectively, y να are the complex-valued neutrino Yukawa couplings, and M N the sterile neutrino mass. The ellipses indicate terms for additional sterile neutrinos which are decoupled from collider phenomenology. The symmetric mass matrix M of the active and sterile neutrinos is obtained from Eq. (2.1) after electroweak symmetry breaking L contains It can be diagonalized by the unitary 5 × 5 leptonic mixing matrix U : The mass eigenstatesñ j = (ν 1 , ν 2 , ν 3 , N 4 , N 5 ) T j = U † jα n α are the three light neutrinos, which are massless in the symmetry limit, and two heavy neutrinos with degenerate mass eigenvalues M N in the symmetric limit. The leptonic mixing matrix U in Eq. (2.2) can be expressed explicitly, cf. [32]. Its entries are governed by the active-sterile neutrino mixing angles which are quantified via with v EW = 246.22 GeV the vacuum expectation value of the Higgs field. Since the light and heavy neutrino mass eigenstates are admixtures of the active and sterile neutrinos, the weak currents, cast into the mass basis, are given by with U the leptonic mixing matrix, g being the weak coupling constant, c W the cosine of the Weinberg angle and P L = 1 2 (1 − γ 5 ) the left-chiral projection operator. The resulting heavy neutrino interactions can be summarized as with h = √ 2 Re(φ 0 ) being the real scalar Higgs boson and ϑ ij = α=e,µ,τ U † iα U αj . In the limit of the protective symmetry being exact, the benchmark model adds seven parameters to the SM, the moduli of the neutrino Yukawa couplings (|y νe |, |y νµ |, |y ντ |), their respective phase, or equivalently, the active-sterile mixing angles from Eq. (2.3), and the mass M N . The phases may be accessible in neutrino oscillation experiments (see e.g. [33,34]). We restrict ourselves to the four parameters |θ e |, |θ µ |, |θ τ | and M N . In the following, we also use the neutrino mixing matrix elements |V αN | 2 to present our results, which are commonly used in the literature to quantify the active-sterile neutrino mixing. For a fixed flavor α (usually identified via the charged lepton l α ) this notation relates to the one introduced above in the following way: (2.9)

Search Strategy
Proton colliders provide an environment where the SM can be tested at highest centerof-mass energies. For what follows we consider the HL-LHC with 14 TeV center-of-mass energy and a total integrated luminosity of 3 ab −1 [35]. We also consider the discussed FCC-hh [36][37][38] and the SppC [31], with envisaged center-of-mass energies of 100 TeV and target integrated luminosities of around 20 ab −1 [39]. For brevity, we will only refer to the FCC-hh in the following.

Signal: Mixed-flavor Dilepton Plus Jets from Heavy Neutrinos
Heavy neutrinos can be produced from proton-proton collisions via Drell-Yan processes, Higgs boson decays, and gauge boson fusion, cf. [40][41][42]. We focus here on charged current Drell-Yan production of a heavy neutrino with an associated charged lepton yielding pp → ± α N , cf. Fig. 1. It is the dominant production mechanism for heavy neutrino masses around the electroweak scale and the considered center-of-mass energies. It is worth noting that the W γ fusion production process is the next most important process in the mass range from 200 GeV to ∼1 TeV, and it becomes more important and even surpasses the charged current Drell-Yan production for larger heavy neutrino masses [26]. The contribution from W γ adds about 20∼30% to the LO cross section, cf. Ref. [26]. Due to its limited enhancement on Figure 1: The Feynman diagram depicting the dominant signal production mechanism for heavy neutrino masses and center-of-mass energies as considered in this article.
the final discovery limits for the here considered mass range, the W γ contributions to the signal are not considered in this study.
As shown in Fig. 1, the charged current decays of the Drell-Yan produced heavy neutrinos together with the hadronic decay of the final state W boson yield the semileptonic final state ± α ∓ β jj. To discriminate between these two final state leptons, we label the lepton from the Drell-Yan off-shell W * ± as l ± W * (i.e. l ± α or l ± ), while the lepton from the heavy neutrino as l ∓ N (i.e. l ∓ β or l ∓ ). We note that for the signal these two leptons can have different flavors. The event rate is sensitive to the mixing angle combination of |θ α | 2 and |θ β | 2 /|θ| 2 through the production and decay channel, respectively. Here the flavor indices α, β = e, µ, τ can be inferred from the charged leptons. For α = β, this final state yields a signal for lepton flavour violation, because there is no SM background process at the parton level as discussed in Refs. [16,30]. We emphasize that we study the LNC process with leptons of opposite charge since there the signal strength is not suppressed by the smallness of the neutrino masses. 2 The signal for our study is e ± µ ∓ jj with α = e (µ) and β = µ (e), which tests the mixing angle combination |θ e θ µ | 2 /|θ| 2 or equivalently, |V eN V µN | 2 / α |V αN | 2 . For practical reasons we make the following assumption and discuss the special case for the active-sterile mixing angles: The results derived below are valid for this case only, but they can be translated to any of the other possible set of active-sterile mixing angles with a numerical overall factor.
In Fig. 2, we show the production cross section times branching ratio σ(pp → l ± N ) × BR(N → l ∓ jj) in fb at the HL-LHC and FCC-hh when |V lN | 2 = 10 −2 and |V τ N | = 0. We note that here l = e, µ. Besides the mixed flavor lepton pair e ± µ ∓ , the cross sections in this figure also include the production of the same flavor lepton pairs e + e − and µ + µ − . The FCC-hh HL-LHC Figure 2: Production cross section times branching ratio σ(pp → l ± N ) × BR(N → l ∓ jj) in fb for heavy neutrino mass eigenstates via the Drell-Yan processes pp → W * → ± N → ± ∓ jj at leading order. Here, l = e, µ and the cross section includes di-leptons with all flavor combinations (i.e., e ± µ ∓ , e + e − , and µ + µ − ). The active-sterile mixings are fixed as cross sections for a few mass points can be also calculated from the initial number of events in Table 1 and 2.
It is worth noting that the signal process may feature two jets with an invariant mass around the W boson mass with possible further hadronic activity. We remark that in scenarios where the heavy neutrino mass is large its decay products can be strongly boosted, such that the hadronic decays of the W bosons may be collimated, giving rise to a single jet instead of two.

Standard Model Backgrounds
The dominant SM backgrounds contributing to the e ± µ ∓ jj signature arise for instance from the di-lepton final state with additional missing momentum due to processes with light neutrinos, or from the di-tau final state with both tau's decaying leptonically. In principle, these backgrounds can be rejected with high signal efficiency by requiring the amount of missing energy in the final state to be small. However, due to effects like the finite resolution of the missing momentum, some backgrounds may still survive after such cuts. Thus, we expect that a full detector simulation, which is beyond the scope of the present analysis, can be important.
The background processes considered in our analysis are 1. di-top in fully leptonical decays: , where both l can be either e or µ ; 2. di-boson with di-tau di-jet final states: 3. tri-boson with at least 2 jets and at least 2 leptons (including taus): For the tri-boson, if both taus decay leptonically, the final state will have 3 leptons. When one lepton is out of the detector range or mis-identified, it can still contribute to the backgrounds. The other decay channels of W or Z bosons will either have no e ± µ ∓ final states or are lacking of jets. Therefore, they are not included in our analysis. The production cross sections corresponding to tt, W Z, ZZ, W W Z with decaying into the final states listed above are about 3432 (1.37 × 10 5 ), 1787 (5654), 468 (4483), 6.83 (95.5) fb at the HL-LHC (FCC-hh), respectively.
Furthermore, we checked many other possible background processes, including for instance all the processes listed above with one additional gluon jet or photon (γ) in the final state, and also the processes V V gg , γµµV with V = Z, W , and γµνW . We used an estimated rate of misidentifying γ, g as an electron at FCC-hh of ∼ 10 −3 , comparable to the one at the LHC. We found that especially the requirement of large transverse momenta of the g, γ, renders the cross sections of these processes much smaller than the ones listed above, and we decided not to include them into our analysis.

Simulation, Pre-selection and Analysis
For the simulation of signal and background samples, we use MadGraph5 version 2.4.3 [46] as the event generator. The parton shower and hadronization is done by Pythia6 [47], while the detector simulations are completed by Delphes [48] with the ATLAS configuration card file (version 3.4.1) for the HL-LHC and with the FCC-hh configuration card file (October 2016 version) for the FCC-hh.
Based on the kinematics of the signal and backgrounds, in order to generate the events more effectively, we apply the following cuts at the simulation level: a minimal transverse momentum p T (j) > 20 GeV, p T (l) > 20 GeV and the range of the pseudorapidity |η(j)| < 10, |η(l)| < 7 for jets (including b-jets) and leptons; a maximal missing energy E T < 30 GeV. The cuts on |η| do not affect the analysis because the detector geometry limits this range to be |η| 5. The cut on the missing energy are motivated from the prior knowledge that the signal does not produce missing energy at the parton level and only a limited amount can be created during reconstruction [16]. These cuts at the parton level enhance the quality of the background events and thus save the simulation time.
The following pre-selection cuts are then applied on the simulation events: 1. Exactly 1 muon, exactly 1 electron, with opposite charges (i.e. e ± µ ∓ ); at least 2 jets; no b-jet and no taus; 2. Both jets and leptons with threshold cuts of p T > 30 GeV; 3. Missing energy E T < 20 GeV.
After the pre-selection cuts, the final state will have at least 2 light jets, 1 muon and 1 electron. The first two leading jets j 1 and j 2 are considered to be the jets from the final state W decay (see Fig. 1). To identify the lepton l N from the sterile neutrino decay, we combine the first two leading jets with each lepton and calculate the invariant masses corresponding to two combinations. The combination with invariant mass closer to the sterile neutrino mass indicates l N , while the other lepton will be identified as the lepton l W * from the off-shell W * decay.
The details of the multivariate and statistical analysis are explained in the Appendix A.

Results
In this section, we present the analysis results for the HL-LHC and for the 100 TeV proton collider FCC-hh, which is also valid for the SppC with the same detector performance. We remind ourselves that the HL-LHC (FCC-hh) has center-of-mass energy √ s = 14 (100) TeV and that we consider a total integrated luminosity of 3 (20) ab −1 .

Results at HL-LHC and FCC-hh
To illustrate our results, we show the distributions of some selected observables after applying the pre-selection cuts for the signal with benchmark mass M N = 500 GeV (S, black with filled area), and SM backgrounds of tt (red), WZ (blue), ZZ (cyan), and WWZ (green) in Appendix B. The Fig. 8 and Fig. 9 are for the HL-LHC and FCC-hh, respectively. The distributions of the M (j 1 + j 2 + l N ) of the signal peaks sharply around the sterile neutrino mass 500 GeV, while all backgrounds peak below 250 GeV; In the di-jet invariant mass M (j 1 + j 2 ) plot, the signal and WZ peaks around the W boson mass, while ZZ and WWZ peak around the Z boson mass and tt has a flat peak around 110 GeV; In the di-lepton invariant mass M (l W * + l N ) plot, the backgrounds WZ and ZZ peak around 70 GeV, and tt and WWZ peak around 100 GeV, while the signal has a very flat peak around 400 GeV; For the distributions of M (j 1 + j 2 + l N + l W * ), p T (l N ), p T (j 1 + j 2 ) and p vis T , the signal peaks at larger values compared to the backgrounds. Other useful distributions to distinguish signal from background exist, for instance E T and angular observables, which we list in section 3.3.
As described in section 3.3, all the 40 observables listed in that section are input into the TMVA. We utilize the Boosted Decision Trees (BDT) method to perform the multivariate analysis. The distributions of the BDT response for the signal with M N = 500 GeV (S, black with filled area), and for the SM backgrounds including tt (red), WZ (blue), ZZ (cyan), and WWZ (green) are shown in Fig. 3  response shows that a very good separation between the signal and background is possible. For WWZ background process, although it has larger mixing with the signal, due to its relatively small initial production cross section, its final contributions to the backgrounds after the optimized BDT cut are still limited, cf.  Table 2: Numbers of events at each cut stage for signals with fixed |V lN | 2 = 10 −2 and different sterile neutrino masses MN and for background processes. The numbers correspond to an integrated luminosity of 20 ab −1 at the FCC-hh.
In Table 1, we show the numbers of events at each cut stage for signals with |V lN | 2 = 10 −2 and different sterile neutrino masses M N and for background processes of tt, WZ, ZZ, and WWZ at the HL-LHC with 3 ab −1 integrated luminosity. The numbers of events at the FCC-hh with 20 ab −1 integrated luminosity are presented in Table 2. Since the kinematical distributions vary with M N , the BDT cuts are optimized for different masses.
Based on our analysis, the prospects for sterile neutrino searches via the opposite sign mixed-flavor dilepton plus di-jet (i.e. e ± µ ∓ jj) including a systematic uncertainty of δ sys = 10% on the background are derived, using the Higgs Analysis-Combined Limit tool [50], for details see the explanations in the Appendix A. In Fig. 4, we show the expected limit on the production cross section times branching ratio σ(pp → l ± N ) × BR(N → l ∓ jj) in fb when testing the signal hypothesis at the HL-LHC (left) with √ s = 14 TeV and 3  FCC−hh 1σ CL 2σ CL 3σ CL 5σ CL Figure 5: Same as Fig. 4, including the 1, 2, 3 and 5-σ median expected limits on the production cross section times branching ratio σ(pp → l ± N ) × BR(N → l ∓ jj) in fb at the HL-LHC (left) with 3 ab −1 luminosity and at the FCC-hh (right) with √ s = 100 TeV and 20 ab −1 luminosity. In both panels the solid (dashed) line denotes that a 10% (0%) systematic uncertainty on the background is considered.
In Fig. 5, we show the 1, 2, 3 and 5-σ median expected limits on the production cross section times branching ratio σ(pp → l ± N ) × BR(N → l ∓ jj) in fb at the HL-LHC (left) with √ s = 14 TeV and 3 ab −1 luminosity and at the FCC-hh (right) with √ s = 100 TeV and 20 ab −1 luminosity. In this figure, the solid (dashed) line denotes that 10% (0%) systematic uncertainty on the background is considered. Comparing the solid and dashed curves, one can see that as sterile neutrino mass M N decreases, the effects of the systematic uncertainty on the background become more obvious. This is because that the number of background events after the BDT cut will increase as M N decreases (cf. Table 1 and Table 2). When M N = 500 GeV, with 0% systematic uncertainty on background, the 2 (5)-σ limit on the σ × BR is 4.4 × 10 −2 (1.5 × 10 −1 ) fb at the HL-LHC, while it is 1.6 × 10 −2 (4.3 × 10 −2 ) fb at the FCC-hh. Using the assumption in Eq. (3.1) for the active-sterile mixing angles, we can convert the limits from Fig. 5 into limits on |V lN | 2 , cf. the definition in Eq. (2.9). We show the resulting expected median limit on the total active-sterile mixing |V lN | 2 in Fig. 6 for the HL-LHC (left) with √ s = 14 TeV and 3 ab −1 and at the FCC-hh (right) with √ s = 100 TeV and 20 ab −1 , including the 1 and 2σ confidence interval. When extracting these limits, a systematic uncertainty of 10% on the background has been considered. It is worthwhile to point out that these results are quantitatively close to the first estimates from Ref. [16].  Figure 7: Same as Fig. 6, including the 1, 2, 3 and 5-σ median expected limits on the parameter |V lN | 2 for |V lN | 2 = |VeN | 2 = |VµN | 2 and |VτN | 2 = 0, at the HL-LHC (left) with 3 ab −1 luminosity and at the FCC-hh (right) with √ s = 100 TeV and 20 ab −1 luminosity. In both panels the solid (dashed) line denotes that a 10% (0%) systematic uncertainty on the background is considered.
In Fig. 7, we show the 1, 2, 3 and 5-σ median expected limits on the total active-sterile mixing squared |V lN | 2 for the HL-LHC (left panel) and the FCC-hh (right panel), including a systematic uncertainty of 0% (dashed) and 10% (solid) on the background. Comparing the solid and dashed curves, one can see that at the HL-LHC (FCC-hh), when M N < 400 (600) GeV, the effects of 10% systematic uncertainty on the background become visible. For 200 GeV mass point, due to much larger background events after the BDT cut, the systematic uncertainty can weaken the limits greatly. Therefore, to enhance the discovery power for sterile neutrino with small masses, controlling the systematic uncertainty at such future colliders will be very important. When M N = 500 GeV, with 0% systematic uncertainty on the background, the 2 (5)-σ limit on the |V lN | 2 are 4.9 × 10 −3 (1.7 × 10 −2 ) at the HL-LHC, while it is 7.0 × 10 −5 (1.9 × 10 −4 ) at the FCC-hh.

Discussion
We note that our results on the sensitivity of the proton-proton colliders are qualitatively identical to those in Ref. [16]. Moreover, the sensitivity is comparable to the analyses that consider lepton-number violating final states, cf. e.g. [51].
An important low energy constraint exists, that also tests the here considered activesterile mixings: the µ → eγ measurement from the Mu to E Gamma (MEG) collaboration. Via the null result in their searches for the process µ → eγ [52] they put stringent limits on the combination |V eN V µN | (which is equal to |θ * e θ µ |) [32,53]. Indeed, finding a signal in the e ± µ ∓ jj channel at the HL-LHC or the FCC would be in tension with the present constraints from MEG.
It is interesting to consider the LFV dilepton dijet signature with one tau lepton in the final state. The relevant active-sterile mixing parameters that are tested in this way are then |V eN V τ N | and |V µN V τ N |, respectively. The present constraints on these mixing angle combinations are much weaker compared to those derived from the MEG result, cf. e.g. Ref. [53] and references therein. This could mean a great discovery potential in these channels, if our results for the sensitivities would also hold, at least approximately, for the e ± τ ∓ jj and µ ± τ ∓ jj final states.
However, including the tau flavor necessitates to reconstruct the tau lepton either from a muon or an electron, which requires the finding of a non-vanishing impact parameter and inserts additional missing momentum from the neutrino associated to a tau decay. More promising is the reconstruction of a tau from its hadronic decays, which, on the other hand, introduces many additional backgrounds involving heavy quarks.
Heavy neutrinos with masses above 1 TeV are produced dominantly via the W γ fusion processes. The kinematics of the final state particles is very similar to the ones studied in our analysis. We therefore assume, that for M > 1 TeV, the sensitivity via W γ fusion becomes better compared to our results, such that the latter comprise a conservative limit on these parameters.

Conclusions
Low scale seesaw scenarios allow for large active-sterile neutrino mixings and heavy neutrinos with masses that are kinematically accessible at particle colliders. Due to the approximate symmetry, lepton number violation is suppressed in these scenarios, which motivates the study of lepton flavour violating (LFV) but lepton number conserving (LNC) signal channels.
In this article we investigated the most promising sterile neutrino signature of this type, based on parton level studies from previous works, the LFV but LNC final state e ± µ ∓ jj. This final state does not have SM backgrounds at the parton level, such that the signal and backgrounds can be well separated via a thorough analysis of the distributions from a number of kinematic observables.
For the active-sterile neutrino mixings we assumed, for definiteness, |V lN | 2 = |θ e | 2 = |θ µ | 2 and |V τ N | 2 = |θ τ | 2 = 0. We remark that the e ± µ ∓ jj signature is sensitive to |V eN V µN | 2 / α |V αN | 2 , which means that it is suppressed if one of the two active-sterile mixing parameters is much larger than the other ones, while the signal rate is maximal for the case we assumed (for fixed α |V αN | 2 ).
We considered the HL-LHC (FCC-hh/SppC) with √ s = 14 (100) TeV and a total integrated luminosity of 3 (20) ab −1 . We simulated large event samples for the signal and for the dominant SM backgrounds processes (di-top, di-boson, and tri-boson) including parton shower, hadronization and fast detector simulation. Forty kinematic observables are constructed from each event and are fed into a multivariate analysis tool to perform a BDT analysis. We derive the 1, 2, 3, and 5-σ limits on the production cross section times branching ratio σ(pp → l ± N ) × BR(N → l ∓ jj), and recast it as a limit on the active-sterile mixing parameter |V lN | 2 . The result is comparable to the previous estimates obtained in Ref. [16], but more robust.
It is worth noting that the systematic uncertainties affect smaller heavy neutrino masses more than larger ones. In particular, this effect is relevant when M N < 400 (600) GeV at the HL-LHC (FCC-hh). For 200 GeV mass, the limits can be weakened greatly by adding a 10% systematic uncertainty on the background. Therefore, controlling the systematic uncertainty at the future pp colliders will be very important to enhance the discovery power for sterile neutrinos with small masses.
The results presented here can also be representative for final states with the τ flavor. In this case, additional backgrounds have to be included, and the difficulty of reconstructing the tau lepton has to be taken into account. Consequently, we expect the sensitivities of the LNC-LFV τ ± µ ∓ jj and τ ± e ∓ jj final states to be weaker. However, also the present constraints on the combinations |V eN V τ N | and |V µN V τ N | are much weaker compared to those from MEG on |V eN V µN |. The τ ± µ ∓ jj and τ ± e ∓ jj channels could therefore mean great discovery potential, but require a dedicated analysis which is left for future studies.

Acknowledgments
We thank the members of the FCC study collaboration for many useful discussions. K.W. also wants to thank Christophe Grojean and Cai-dian Lü for helpful discussions and their support. This work has been supported by the Swiss National Science Foundation. K.W. acknowledges support from the International Postdoctoral Exchange Fellowship Program (No.90 Document of OCPC, 2015). O.F. acknowledges support from the "Fund for promoting young academic talent" from the University of Basel under the internal reference number DPA2354 and has received funding from the European Unions Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 674896 (Elusives).

A Multivariate and Statistical Analysis
In this section we describe our setup of the Multivariate analysis (MVA), which is a statistical analysis of large data sets based on machine learning techniques to discriminate between two sets of data. Here we use the Tool for MultiVariate Analysis (TMVA) [49], employing the Boosted Decision Tree (BDT).
We perform a frequentist test which uses the profile Likelihood ratio as test statistics. In addition to the parameters of interest for the limit calculation such as, the total cross section of the process and the integrated luminosity, we include nuisance parameters for background of 10% to account for the unknown systematics at future colliders, assuming a logarithmic-normal distribution.
We construct an upper expected limit for the signal with upper/lower one and two sigma error bands using Higgs Analysis-Combined Limit tool [50]. The limits can be set via the level of agreement between the data collected and a given hypothesis by computing the probability of finding the observed data incompatible with the prediction for a given hypothesis, this probability is referred to as the p-value.
The expected value of finding the number of events in the ith bin of the BDT distribution is given by where the parameter µ is called the signal strength. When a hypothesis with µ = 0 is rejected a discovery can be established, while rejecting the hypothesis with µ = 1 defines our limit for the calculation. The likelihood function is constructed as Poisson probabilities for all bins as: The profile likelihood ratio can be constructed by the Maximum-Likelihood Estimate (MLE) as: λ(µ) = L(µ,θ) L(μ,θ) (A. 3) withθ andμ are the estimated parameters for θ and µ that maximize the likelihood function, i.e., for a given µ and pseudo data atθ, the combinedμ withθ define the point for which the likelihood reaches its global maximum. The fact that the profile likelihood ratio depends on systematical errors broaden the estimate of the maximum likelihood, thus large systematical errors lead to weaker limits. For the statistical test one can construct the profile log likelihood as: q(µ) = −2 ln λ(µ) (A.4) To measure the level of incompatibility we compute p−value as: with F [q(µ)|µ] being the probability distribution function that measures the incompatibility between data and our hypothesis, while higher values of q(µ) correspond to high disagreement between data and hypothesis. The signal is excluded at (1 − α) confidence level if CL s = P (q(µ)|µS + B) P (q(µ)|B) < α, (A.6) where the upper limit on µ is the largest value for µ with P < α, i.e., if α = 0.05 then the signal is excluded with 95% confidence level. Thus one can simply get the upper exclusion limit at different confidence levels by withμ being the estimated expected median and Φ −1 being a cumulative distribution function. We use the following confidence levels: In fact if we restrict the number of events for the signal and the background to be large and ignore the correlation effect between bins, one can calculate the limit from the following formula for the significance with N s , N b being the number of signal and background events, respectively, and σ b parametrising the systematic uncertainty. Figure 8: Kinematic distributions of some selected observables for the signal with MN = 500 GeV (S, black with filled area), and for SM background processes of tt (red), WZ (blue), ZZ (cyan), and WWZ (green) after applying the pre-selection cuts at the HL-LHC. Figure 9: Kinematic distributions of some selected observables for the signal with MN = 500 GeV (S, black with filled area), and for SM background processes of tt (red), WZ (blue), ZZ (cyan), and WWZ (green) after applying the pre-selection cuts at the FCC-hh.