Deep learning searches for vector-like leptons at the LHC and electron/muon colliders

The discovery potential of both singlet and doublet vector-like leptons (VLLs) at the Large Hadron Collider (LHC) as well as at the not-so-far future muon and electron machines is explored. The focus is on a single production channel for LHC direct searches while double production signatures are proposed for the leptonic colliders. A Deep Learning algorithm to determine the discovery (or exclusion) statistical significance at the LHC is employed. While doublet VLLs can be probed up to masses of 1 TeV, their singlet counterparts have very low cross sections and can hardly be tested beyond a few hundreds of GeV at the LHC. This motivates a physics-case analysis in the context of leptonic colliders where one obtains larger cross sections in VLL double production channels, allowing to probe higher mass regimes otherwise inaccessible even to the LHC high-luminosity upgrade.

The discovery potential of both singlet and doublet vector-like leptons (VLLs) at the Large Hadron Collider (LHC) as well as at the not-so-far future muon and electron machines is explored.The focus is on a single production channel for LHC direct searches while double production signatures are proposed for the leptonic colliders.A Deep Learning algorithm to determine the discovery (or exclusion) statistical significance at the LHC is employed.While doublet VLLs can be probed up to masses of 1 TeV, their singlet counterparts have very low cross sections and can hardly be tested beyond a few hundreds of GeV at the LHC.This motivates a physics-case analysis in the context of leptonic colliders where one obtains larger cross sections in VLL double production channels, allowing to probe higher mass regimes otherwise inaccessible even to the LHC high-luminosity upgrade.

I. INTRODUCTION
The Standard Model (SM) of particle physics has, so far, served as the guide towards unraveling the nature of all subatomic phenomena and its success is now indisputable.However, it is clear that it lacks some of the necessary ingredients to fully describe nature, such as an explanation for neutrino masses, as firstly observed in neutrino oscillations by experiments like the Super-Kamiokande [1] or a particle candidate to explain experimental evidences for the existence of Dark Matter (DM) [2].Recently, other discrepancies have also come into the forefront, such as the measurement of the anomalous magnetic moment of the muon at Fermilab [3], indicating a tension of 4.2σ with the SM predictions, as well as the recent results coming from the LHCb experiment, where hints of lepton flavour universality violation have been reported at a 3.1σ significance [4].Naturally, such deviations do not meet the 5σ requirement to indicate the existence of new physics (NP) and more data still needs to be collected if the experimental deviations are confirmed.Together with the theoretical calculations in the framework of the SM, an explanation of anomalies requires the addition of NP.
Finding well motivated NP scenarios that simultaneously address all the aforementioned questions is of utmost importance and, in particular, Grand Unified theories (GUTs) provide interesting avenues to follow.These classes of models typically predict new particles present at the TeV scale (or not too far from it), that may be probed at current and future colliders.Therefore, the study of simplified frameworks that can later be mapped to the low-energy limit of a certain UV complete framework can help in constraining their viable parameter space.In particular, some of the authors have proposed a GUT framework, which unifies all matter, Higgs and gauge sectors within the E 8 group, either at a conventional high-scale scenario [5,6], or a low-scale one [7,8].It turns out that, as a common prediction of any of such proposals, the presence of new doublet Vector-Like Lepton (VLL) particles at the TeV scale or below is expected.A phenomenological analysis of doublet type VLLs, in the context of GUT model, was extensively discussed by some of the authors in a previous work [9].Naturally, such description was bounded by the constraints and details of the underlying UV theory and as such, the main VLL features can only be fully captured in more simple phenomenological models, without the vast particle content and constraints of GUT frameworks.Additionally, singlet-type VLLs, which are not predicted in the context of the previously considered GUT, can also be constructed and represent a significant par of this work.Studying the differences between both types of VLL is of relevance not only in the context of the LHC but also at future proposed leptonic colliders.Furthermore, if the anomaly in the measurement of the magnetic moment of the muon is confirmed, which can happen in about one year from the date this article is being written, a possible explanation can arise in the form of TeV scale VLLs.In this context, we refer to [10][11][12][13][14][15] for further recent VLLs phenomenological studies.
With the above arguments in mind we study the phenomenological viability of simple extensions of the SM enlarged with a single VLL and a right-handed neutrino.For completeness, we consider two scenarios, the SU(2) L doublet and the singlet VLLs.Due to their vector-like nature, they escape constraints from the fourth generation searches [16], while non-vanishing couplings to the SM can induce contributions not only to the muon (g − 2) µ anomaly, but also opening tightly constrained channels such as µ → eγ [17].
This article is organized as follows.In Sec.II, we introduce the simplified models, presenting the relevant theoretical details that will serve as the basis behind the subsequent numerical simulations.In Sec.III we introduce the numerical methodology for the studies we conduct in this work.In particular, in Sec.III A, we discuss the impact of the main flavour constraints associated with the addition of a VLL, while in Sec.III B we delve into collider analysis, both in the context of the LHC and future electron (linear) or muon colliders.Finally, in Sec.IV we present our main results and conclude in Sec.V.

II. THE MODELS
In this article, we study collider phenomenology of both SU(2) L doublet and singlet VLL extensions of the SM accompanied with new right-handed neutrinos.We present the NP and SM fields' quantum numbers in Tabs.I and II respectively.
Table I: Quantum numbers of the new exotic fermions.We adopt the notation of E L,R being an SU(2) doublet, while E L,R is a singlet, and ν R is the new right-handed neutrino.
Table II: Quantum numbers for the SM fields.Here, Q L , d R , u R are the quark fields, L and e R are the lepton fields, and φ is the Higgs doublet.
The SU(2) L doublets are defined as where i is a generation index, i = 1, 2, 3.The most general and renormalizable Lagrangian density of the doublet VLL model, consistent with the symmetries in Tab.I, reads as where the Yukawa couplings are denoted by Θ, Υ, Σ, Ω and Π, whereas bi-linear mass terms are indicated by M E , M LE and M νR .For the singlet VLL case, the Lagrangian is written as where π, σ and θ are the corresponding Yukawa couplings.The mass matrices for both the singlet and the doublet scenarios, expressed in the R , e R } basis, take the following form with v = 246 GeV the electroweak symmetry breaking (EWSB) Higgs doublet VEV (0, v) , and whose eigenstates describe both the chiral SM-like charged leptons as well as the new VLLs.
For the neutrino sector the mass matrices can be expressed as where the basis was used for the doublet scenario whereas that of the singlet model was chosen as In this article, one considers that the lightest beyond-the-SM (BSM) neutrino is in the KeV range and sterile, acting as missing energy in the detector.Therefore, while the neutrino mass matrices above have the standard type-I seesaw form, such a light sterile neutrino implies that M νR is equally small, which means that Σ, Υ and Ω (for the doublet scenario) as well as σ (for the singlet scenario), need to be very small in order for both models to be consistent with the active neutrinos mass scale.While a detailed discussion for the neutrino mass generation mechanism is beyond the scope of this article, it is not a difficult task to minimally extend both the doublet and singlet scenarios, e.g. by introducing the dimension-five Weinberg operator, without affecting our analysis and conclusions [18].However, the fact that Σ, Υ, Ω and σ need to be very small is calling for the introduction of an approximate global lepton number symmetry, which, for convenience, we define as U(1) e × U(1) µ × U(1) τ , such that only leptons transform non-trivially according to the following quantum numbers: Table III: Quantum numbers under the approximate lepton number symmetry.
Such an approximate symmetry constrains the model parameters as follows: • Σ, Υ, Ω (for the doublet case) and σ (for the singlet case) must be tiny in consistency with the smallness of the active neutrino masses; • the charged lepton-sector Yukawa couplings Π (for the doublet model) and π (for the singlet model), are approximately diagonal and off-diagonal elements can be ignored in our analysis; • The VLLs solely couple to muons such that only Θ 2 , M LE,2 (for the doublet scenario) and θ 2 (singlet scenario) are sizeable and relevant for our numerical studies.
In what follows, we are essentially interested in lepton couplings to both gauge and Higgs bosons.In particular, the focus is on the W νE(E), Z 0 E(E) and HE(E) vertices.All numerical computations are performed in the mass basis such that one needs to rotate the gauge eigenstates via the bi-unitary transformations The list of all Feynman rules that are relevant to the numerical analysis are shown in Appendix A. Before presenting the methodology behind the numerical calculations, it is instructive to revisit the main experimental constraints for VLLs as well as the most relevant couplings for this discussion.First, let us recall that there are not many direct searches for VLLs, at least, in comparison with their quark counterparts.In fact, LEP searches have lead to a lower bound on the VLLs mass of M VLL > 100.6 GeV [19], while the most restrictive constraints are coming from CMS where direct searches for doublet VLLs with sizeable couplings to taus are excluded at 95% Confidence Level (CL) in the 120-790 GeV mass range [20].Additionally, there are searches for charged lepton resonances which constrain the mass of a hypothetical VLL to be above a few hundred GeV, depending on the underlying assumptions (114-176 GeV for VLL and 100-468 GeV for a heavy lepton in the type-III inverse seesaw model [21], as well as 80-210 GeV in Ref. [22]).However, for the model under consideration, the [U(1)] 3 lepton number symmetry dictates that the relevant couplings are those to muons such that the most restrictive search in [20] does not apply.
The left-mixing matrix that rotates from the flavour to the mass basis can be parametrized by a single mixing angle, α, as follows With the U(1) e × U(1) µ × U(1) τ approximate symmetry the mass matrices for both the doublet and singlet cases take a simple form Besides lepton mixing terms one also considers a neutrino mixing with the following structure, where the 3 × 3 SM block is fixed by the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix [23].Technically, and considering only such a block, the PMNS matrix must in rigour be defined as However, the benchmark scenarios considered in Sec.IV are such that cos α ≈ 1 for both the doublet and the singlet models and therefore the parametrization in Eq. ( 9) is consistent with a realistic lepton mixing.
The 2 × 2 block in the doublet case represents the mixing between the ν 5 and ν 6 mass eigenstates.These elements do not contribute to the interaction vertices used in our analysis and therefore their size and numerical values are not relevant for the discussion.For consistency, we assume a generic O(1) mixing.It also follows from the approximate lepton number symmetry that the mixing between the right-handed neutrino and the remaining neutral leptons is negligible and can be ignored.This means that (U ν ) 44 = 1, (U ν ) 4j = (U ν ) j4 = 0 and the couplings between the SM and BSM neutrinos can be neglected.
Let us note that if one requires consistency with the recently observed muon (g − 2) µ anomaly [3], the preferred values of the cosine of the mixing angle α must not be far from unity.On the other hand, a small α (or equivalently, small Yukawa parameters θ 2 and Θ 2 ) is preferable in order to keep tree-level modifications to the Feynman rules for the Hµ + µ− and Z 0 µ + µ − vertices under control (a list of the rules are shown in appendix A).Indeed, a large mixing angle would make the values of the branching fractions BR(Z 0 → µ + µ − ) and BR(H → µ + µ − ) unacceptably large.Furthermore, and noting that the VLL only couples to muons, one can see from [24] that EW global fits constrain the mixing angle to be less than 0.034, which we impose in our numerical analysis.With such a small mixing, lepton flavour violation observables can be kept under control and the VLL single production cross-section is, for the singlet model, dominated by the W ν E vertex and, for the doublet model, by the W ν L E vertex, as we discuss in Sec.III B. It follows from the mixing structure in Eq. ( 9) (see also the Feynman rules in Eqs.(A2), (A3), (A6) and (A7)) that interactions involving right-handed neutrinos are absent in both scenarios.This is consistent with the approximate lepton number symmetry, where one can assume that the smallness of the σ and Σ Yukawa couplings, results in a negligible W ν R E and W ν R E interaction strength.Therefore, in our numerical analysis, one can ignore such interactions by safely setting it to zero.
Another important element to take into consideration is the VLL decay width.In particular, it can be expressed as Here, U 2 MIX is a non-trivial combination of the neutrino and charged lepton mixing matrices whose specific structure can be determined from the Feynman rules in Eqs.(A2), (A3), (A6) and (A7).One notes that such a decay width grows as a power law in m E , becoming rather significant for large masses as shown in Fig. 1.For example, taking the case where U MIX = 1, one can observe that for a mass of 2 TeV, the decay width already surpasses the mass itself with Γ = 2705.4GeV.Such an effect can be mitigated for smaller mixing angles as demonstrated in Fig. 1.Indeed, a decay width greater than its mass leads to conceptual issues with respect to the interpretation of such large-width fields as particles.In fact, within the context of Quantum Field Theory, a large value for the width implies that the particle is highly non-local and the narrow-width approximation is no longer valid calling for a more robust framework to describe it (see e.g.[25][26][27]).Such a scenario is not relevant for the charged VLL as the mixing element U MIX must be small due to EW constraints involving Higgs and Z boson decays.On the other hand, the VLL neutrino partners that reside in the same doublet representation can also decay via a W boson and a charged lepton.In the limit of m ν L m , the expression for the decay width is identical to that of the VLL (with the appropriate changes), where U MIX can become sizable.Indeed, taking = µ, then, in accordance with the Feynman rules, the mixing elements coming from the left-chiral couplings are given as cos α U doublet ν 55 .Both cos(α) and U doublet ν 55 are of order O(1) and therefore the ν L → W decay width becomes relevant for large masses.However, notice that the neutrino mass is not independent of its charged VLL counterpart and thus cannot be made arbitrarily small for the width not becoming too large.At leading order, both the VLL and the heavy neutrino masses are controlled by the M E parameter in Eq. ( 2) making them almost degenerate with a small deviation expected to be induced at one-loop level [28].Typically, such a mass splitting is of the order of a few hundred MeV [28], which must not play a sizeable role.With this in mind, and without loss of generality for this article's analysis, we consider both the neutrino and its corresponding charged VLL mass degenerate, ensuring that Γ(ν L → W ) is never unacceptably large.Although the widths must be kept under control, a sizeable value has interesting experimental implications.For instance, it results in wider distributions of kinematic variables such as the mass and the transverse momentum (p T ), where large tails can extend into phase space regions not populated by the SM background events.

A. Flavour constraints
It follows from interactions between the VLL and the muon that non-zero mixing elements can trigger Lepton Flavour Violation (LFV) interactions, in particular, those that relate to muon and tau decays.Therefore it is important to confront all points considered in our numerical analysis with the following LFV branching ratios (BR): ), BR(τ + → π 0 e + ) and BR(τ + → π 0 µ + ).For such a purpose we use the latest version of SPheno [29] to generate the Wilson coefficient cards, which are then passed to flavio [30] in order to compute the corresponding LFV observables.It is important to mention that SPheno also computes the BRs i.e. flavio merely works as an extra layer of added scrutiny, even if both programs' numerical outputs (for the BRs) are well within each other's error and are therefore compatible.We use the 90% CL experimental limits on the considered LFV observables as reported in the most recent issue of the PDG review [17].
Last but not least, let us comment that VLLs have long been proposed as a potential explanation for the anomalous magnetic moment of the muon.However, it follows from the smallness of the α mixing angle that neither the singlet nor doublet models discussed in this article can successfully accommodate such an anomaly as couplings of muons to VLLs and to Higgs or Z bosons are too small.This was numerically verified and consistent with the results obtained e.g. in [31].Therefore, it is not sufficient to extended the SM with one generation of VLLs to explain the muon g − 2 anomaly and additional new physics with less constrained couplings is necessary.A possibility is to extend the scalar sector as proposed by various authors [32][33][34][35][36][37].

B. Collider analysis: LHC and muon colliders
The analysis techniques and Deep Learning algorithms used in the current study for both the doublet and the singlet models were first implemented in [9] and [37], respectively.Starting with the doublet case, the channel we are probing is characterised by a final state with 4 light jets, originating from the decays of two W bosons, a charged lepton, which we take to be the muon, and transverse missing energy (MET), whose origin results from the undetected neutrinos in the final state.At Leading Order (LO), the production diagram, at the LHC, can be seen in Fig. 2, where the heavy ν L neutrino belongs to a VLL doublet as shown in Eq. (1).For this signal topology the main irreducible backgrounds are the double top-quark production, t t, W +jets (we consider up to 4 jets) and Diboson plus jets.Notice that for the t t channel we consider that one of the W 's decays in its hadronic channel with the other one decaying into a muon and a neutrino.All backgrounds are generated at LO precision.For the singlet scenario, the final state topology is characterized by one isolated charged lepton from the W boson decay, and transverse missing energy associated with the undetected neutrinos.In particular, we focus on the channel where the charged lepton is a muon.For this signal topology, we consider the irreducible background together with all physics processes that lead to the production of, at least, one muon and up to two jets in the final state, pp → µ − νµ (+jets) 1 .
For the case of the VLL singlet only one BSM coupling contributes to the decay chain, in particular, the one containing a muon-specific neutrino, whereas for the doublet scenario the couplings involving both the charged and neutral components of the VLL doublet are present.Taking into account the Feynman rules shown in appendix A, the strength of the relevant left and right chiral charged current projections read as This in turn means that the cross section for the doublet case will be much larger than the one for the singlet case, because the strength of the dominant contribution for the former, in particular the coupling between the VLL and its neutrino counterpart, is proportional to cos α while for the latter all couplings are controlled by sin α.Notice that the single production channel in Fig. 3 is valid for both the doublet and singlet scenarios where, in the case of equal angles g . Therefore, it is sufficient to study the singlet model and extract identical conclusions for the doublet case.
The analysis follows a well-defined guideline.Both models are first implemented in SARAH [38] to generate all interaction vertices as well as all relevant files that interface with Monte-Carlo generators, namely, MadGraph5 [39] for quark-level matrix-element calculations and Pythia8 [40] for hadronization and showering leading to final state particles.In MadGraph5 we simulate proton-proton collisions at a centre-of-mass energy √ s = 14 TeV, for a total of 250k Figure 2: Leading-order Feynman diagram for single production of the doublet VLL.Here, q and q correspond to quarks originating from the colliding protons, E represents the VLL and ν L denotes the VLL doublet partner.Both W s in the hadronic channel with j indicating first and second generation chiral quarks.
Figure 3: Leading-order Feynman diagram for VLL single production.q and q corresponds to quarks originating from the initial protons and ν are the SM neutrinos.
events for the signal and backgrounds.We also employ the default LO parton-distribution function NNPDF2.3[41] which fixes the evolution of the strong coupling constant, α s .Fast-simulation of the ATLAS detector is conducted, with Delphes [42].All kinematic distributions are extracted with the help of ROOT [43].At this stage, we also impose selection criteria, to maximise the signal significance.In particular, all events must satisfy where η represents the muon pseudo-rapidity2 .Additionally, for the doublet topology, which involves jets in the final state, we require them to be tagged as originating from light chiral quarks, i.e., they cannot be tagged as b-jets.Note that the signal production represented in Fig. 3, is characterized by having no jets in the final states.While it is true that the lack of no jet candidates at any given event in a hadron collider is perhaps too unrealistic, the absence of such selection can also cause complications in the Deep Learning algorithms used later in the analysis, since signal/background classes are incredibly unbalanced and may lead to over-fitting problems.The kinematics of final states for both signal and background events are translated into tabular datasets and used as inputs for neural models, whose job is to separate the signal from the background.The neural network is constructed using Keras [44].
For classification of the singlet model, we choose a set of 5 observables computed in the laboratory frame.Low-level features include the muon observables cos θ µ − , η(µ − ), φ(µ − ) and p T (µ − ) as well as the MET.For the doublet model, a richer final state is present and, as such, one can compute a much more complete list of observables.Do note that, for this physics scenario, the final states contain 4 light-jets, which implies that one cannot distinguish between the jets originating from W + and those originating from W − .Therefore, for reconstructed observables, we consider all possible combinations of jets that can be used.These are C = (j 1 , j 2 ), (j 1 , j 3 ), (j 1 , j 4 ), (j 2 , j 3 ), (j 2 , j 4 ) and (j 3 , j 4 ), where we define j 1 as the leading jet, with the highest p T , and j 4 as the subleading jet, with the lowest p T .The full list of observables that are used in the training are shown in Tab.IV.

Dimension-full Dimensionless
Doublet VLL Table IV: Angular and kinematic distributions for the analysis of the doublet production topology.All observables are computed in the laboratory frame of reference.To simplify notation, we define (j a , j b ) as corresponding to all jet combinations, as described in the text.Variables that involve combinations of final states, e.g.p T (j a , j a , µ − ), are reconstructed from the states indicated inside the parenthesis.
To optimize the neural network architecture, we employ a genetic algorithm following the same steps as described in [9,37] and schematically represented in the diagram of Fig. 4. The algorithm begins by first generating an arbitrary number of neural networks, whose architecture is determined by randomly pooling a list of predefined hyperparameters (number of layers, activation functions, regularisers, etc.).We then train each individual network over the data for a given number of epochs.From the trained networks, we pick the top five best performing ones.From these, we create Father-Mother pairs, where we combine 50% of the Father's traits (that is, its hyperparameters) and 50% of the Mother traits to construct new neural architectures, which we dub as Daughters.We also consider the possibility of mutation, that is, after the Daughter networks have been built, we consider that the hyperparameters may change to another, with the probability of P(M ) = 20%.We then train the Daughter networks for a given number of epochs and repeat the procedure for a set of generations.Finally, we select the best performing network based on some metric of choice.In our analysis, the algorithm is designed to maximise the Asimov statistical significance, based on a earlier work of Elwood and Adam in [45].The Asimov metric is defined as [46] where s is the number of signal events, b is the number of background events and σ b is the uncertainty of the background.As part of the optimization procedure, we consider the same list of hyperparameters as in our previous work [9]: • number of hidden layers: 1 to 5 • number of neurons in each layer: 256, 512, 1024 or 2048 • kernel initializer: 'normal','he normal','he uniform' • L2 regularization penalty: 1e-3, 1e-5, 1e-7 • activation function: 'relu', 'elu', 'tanh', 'sigmoid' • optimizer: 'adam', 'sgd', 'adamax', 'nadam' Our evolutionary algorithm is initialized by building a set of ten NNs.The parameters are chosen randomly from the previous lists.Each network is trained up 200 epochs and if any improvement is not observed by at least 5 epochs, then the training stops.Note that a subset of the neural network properties are not subject to optimisation but instead remain fixed during the runs.Namely, we consider the following • The input data of the NNs are standardised, that is, input vectors have a mean of zero and a standard deviation of 1.All observables were extracted from the ROOT files and outputed into dataframes.The data is reshuffled and then divided into training samples (80% of the total data) and validation samples (the remainder 20%).We also consider cross-validation with a five-fold scheme during training.The statistical significance is computed based on the trained NN predictions of the validation data.
• We employ a cyclic learning rate during the training phase with 0.01 initial value and maximal value of 0.1.
• The output of the NN is a vector of probabilities such that we define a signal for an output with probability greater than 0.5, otherwise it is considered as a background.
• The best NN is selected based on the Asimov metric.The loss function for this case is defined as the inverse of Eq. ( 13), such that when the loss function is minimised, the Asimov significance is maximised.We also consider a fixed value of the background uncertainty to σ b = 10 −1 .
An additional consideration is the fact that our datasets are unbalanced, which is in part a result of the selection criteria imposed on the final states, reducing the allowed phase space for both signal and backgrounds.Unbalanced datasets can lead to overfitted networks and, as such, must be properly dealt with.In this work, we have utilised the Synthetic Minority Oversampling Technique (SMOTE) [47], which oversamples the minority classes of our training data.Do note that this algorithm is only employed to the training dataset and we do not perform any re-sampling of the validation samples.
With the LHC being a proton collider, production of coloured particles by the strong interaction is heavily favoured, compared to identical processes involving electroweak bosons.As VLLs are colour singlets, they can only be produced via this last interaction at LO.It is then worth mentioning that, in addition to the obvious motivation of studying VLL production at the LHC already during RUN3 and, later on, in its high luminosity phase (HL-LHC), it is of utmost importance to understand the sensitivity with which these new particles can be probed, at future colliders.
In particular, there has been an active discussion within the community about e + e − colliders like the Compact Linear Collider (CLIC) [48,49] or the International Linear Collider (ILC) [50], and more recently, on the possibility of building a µ − µ + collider [51,52].Besides offering cleaner environments, when compared to hadronic machines, production via electroweak processes is favoured, hence, VLLs should have a higher chance of being observed, in case they exist, in this type of machines.For this purpose, we perform numerical and analytical computations with FeynCalc [53] for the pair-production of VLLs whose tree-level diagrams can be seen in Fig. 5.As previously stated, singlets have lower couplings compared to doublets, hence we discuss the prospects for VLL discovery at lepton colliders for the singlet scenario and not the doublet, as this can be seen as the worst-case scenario.Do note, however, that the conclusions one takes from the singlet model can be easily generalized to the doublet model.That is, the only difference between the collider studies for the two scenarios are the values of the couplings.
In conclusion, the single VLL production is favoured at the LHC for the doublet case, due mainly to the strength of the couplings.Double production of VLLs has the obvious disadvantage of the need to produce two heavy states and a more elaborate final state to detect.Additionally, as it was shown in previous works [9,37], double production topologies are sub-leading when compared with single production ones and, as such, are not considered in this work.If one wants to probe the singlet model, and assuming that it cannot be done at the LHC, the future lepton colliders, where single production is precluded at LO, may give us enough VLLs in the double production channel to test the singlet VLL scenario.

IV. RESULTS
In this section, we present a numerical analysis focusing on the LHC and future lepton colliders.In Eqs. ( 14) and ( 15) we show two distinct viable scenarios (one for the doublet and one for the singlet model), where all couplings are fixed and only the VLL mass varies.The colour coding, black for the doublet model and green for the singlet case, merely serves to distinguish each of the benchmark scenarios and identify them throughout the text whenever necessary.We first consider a benchmark point where we set the VLL mass to M VLL = 700 GeV.The remaining BSM neutrinos present in the doublet model share, at LO, the same mass with their SU(2) L doublet VLL counterpart and, as noted in Eq. ( 12), these neutrinos efficiently decay into muons and W ± bosons, with an interaction strength of the order of the weak gauge coupling, where, for the considered benchmarks, U doublet ν 55 ≈ [U e R ] 24 ≈ 1.Indeed, the right chiral component of the coupling is the dominating factor, as the left-coupling is suppressed by a factor of sin(α), which is of O(10 −2 ) in our numerical analysis.The selected benchmark point, represented in black, for the doublet model, compatible with all flavour observables discussed above reads as whereas for the singlet scenario, in green, one has The neutrino mixing is identical in the two considered cases and therefore is always represented in black.It is evident that the doublet model cross section is far larger than that of the singlet.This is indeed expected as the dominant contribution in the singlet model contains a sin α suppression factor as one can see in Eq. (12).The dependency of the production cross section with the VLL mass is shown in Fig. 6.One can now estimate the total expected number of events, given the cross section above, after event selection.Assuming the target luminosity of the HL-LHC, L = 3000 fb −1 and that the expected number of events is given by N = σL, we have for the doublet case N = 570.0events, while for the singlet we obtain N = 0.01782 events.For the singlet model, this implies that the production cross section is not large enough to generate one event at the LHC.For this reason it is meaningless to present the statistical significance for the singlet model for such a heavy VLL.However, we see from Fig. 6 that, for lighter singlet VLLs, in particular for their masses of 100 and 200 GeV, one has N ≈ 30 and N ≈ 3, respectively.Thus, it becomes possible to produce them at the HL-LHC motivating a further full analysis.
The kinematic features used in the Deep Learning analysis can be seen in Figs. 10 and 11 of Appendix B for the doublet and singlet models, respectively.For the singlet model, the main variables allowing for a good discrimination between the signal and the background are the transverse momentum of the muon i.e., p T (µ − ) as well as the MET distribution.These distributions are characterized by long tails at higher energies where the SM background is no longer present.On the other hand, the angular distributions for the cosine of the polar angle, as well as the azimuthal angle, offer the least discriminating power.This follows from the fact that both signal and background have a similar shape.The pseudo-rapidity distributions can also be used in the discrimination since signal events tend to peak around η ∼ 0 whereas the SM backgrounds spread out over the entire |η| ≤ 2.5 region.In particular, for the η(µ − ) distributions, the SM backgrounds spread uniformly in the entire range.For the doublet model, we note that the kinematic distributions offer the best discriminating power, with both the p T and mass distributions peaking at higher energies than those of the main irreducible backgrounds.While ∆θ distributions closely follow the SM background prediction, the ∆φ and ∆R distributions are distinct from those of the SM, with ∆φ possessing a double peak structure near zero, whereas ∆R have its maximum at zero.
With this information at hand, one can construct multi-dimensional distributions to be used as inputs for a neural network that solves a classification task.This is done via an evolution algorithm to optimise the various hyperparameters of the neural model, whose metric to be maximised is the Asimov statistical significance.For completeness, we present our results for different statistical models, some more conservative than others.In this article we use the same measures as in [9,37], which include: 1.The Asimov significance, Z A , with 1% systematic uncertainty3 , as defined in Eq. ( 13); 2. A less conservative version of the Asimov significance, which we dub as Z(< 1%).In this measure, we assume that backgrounds are known with an error of 10 −3 .Of all measures, this is the most lenient one and typically offers the most significant results; 3. The more traditional metric, s/ √ s + b.
With this in mind, we compute the significances, for the doublet scenario, in a wide range of masses, from 100 to 1000 GeV, in steps of 100 GeV.In particular, we plot the various metrics as a function of the neural network score in Fig. 7, taking a VLL mass equal to 700 GeV, for illustration purposes.For this mass point and, for an integrated luminosity of L = 3000 fb −1 , we obtain where we note that we can exclude (or claim an hypothetical discovery) for a VLL with such a mass, since the Z(< 1%) metric offers a statistical significance larger than 5σ.However, one must keep in mind that this metric is the least conservative of the those considered in this work.On the other hand, the Asimov metric is the stringiest and most conservative one resulting in a tiny significance for this particular point.It is also important to study the role of different values of the luminosity.As such, in Fig. 8, we show the evolution of the significance as a function of the collider's luminosity for a VLL with a mass of 700 GeV.In particular, we highlight with dashed black vertical lines the target luminosities at Run III (300 fb −1 ) and at the HL-LHC (3000 fb −1 ).Focusing on Run III, we obtain such that, for a 700 GeV VLL, one does not expect any significant excess and therefore can not extract conclusions.14)).Target luminosities of the HL-LHC (3000 fb −1 ) and Run III (300 fb −1 ) are marked with vertical dashed black lines.Both axes are shown in logarithmic scale.
For an inclusive picture we perform a mass scan in the range M VLL ∈ [100, 1000] GeV, whose numerical values for the couplings are fixed to those shown in Eq. ( 14).On the other hand, it follows from the discussion in the previous paragraphs that only two example points are selected for the singlet model, both featuring the green mixing matrices in Eq. (15).We also vary the mass of the heavy left-handed neutrinos, present in the doublet model, such that they share the same mass with their charged SU(2) L doublet counterpart.In Tab.V our results for the scan are summarized, where the VLL masses and the calculated significance for both Run III and the HL-LHC upgrade are shown.14) and (15).

Mass of VLL (GeV)
We notice that, both for L = 3000 fb −1 or L = 300 fb −1 as well as any of the mass values, we are able to obtain significances greater than 5σ if the simplified Asimov metric and s/ √ s + b are considered.These results may indeed suggest that, significances above 5σ can be achievable even for larger VLL mass values at the LHC.This is particularly relevant in the doublet model while for singlet VLLs the significance quickly drops if we go beyond 200 GeV.If only the Asimov significance, Z A , which is the most conservative one, is considered we can still probe doublet VLLs up to about 200 GeV already at the LHC Run III.For the singlet scenario, considering the HL-LHC program, we can not exclude/discover VLLs, with the hightest significance being 4.22σ for the Z(< 1%) metric.While bellow the discovery threshold, it can still be regarded as a potential anomaly, whose significance can be increased with the combination Although LEP constraints have already excluded VLLs up to 100.6 GeV, our results for 100 GeV are merely indicative of how large can the significance become for small doublet masses, if our analysis technique is employed.
As one can note, there is an overall trend of the significance dropping as the mass of the VLL decreases.However, we note that for masses above 600 GeV, we have consistently obtained signifcances close to 10σ for the Z(< 1%) measure.Here, there are two main factors at play.First, considering larger masses, kinematic distributions tend to peak at higher energies, which is particularly relevant for mass distributions, enhancing the neural networks discriminating power.Additionally, we employ the evolutionary algorithm to every single point, meaning that each network is optimized to a specific phase space region 4 .In a realistic search scenario we would not know the mass of the VLL and therefore it would not be reasonable to set a stronger preference in one of the various networks optimized towards different VLL masses.However, the networks that we use are engineered to be generic enough and can in principle be applied to distinct masses up to a certain discrimination power.To illustrate this, we apply the 700 GeV network to all scanned masses and present the results in Tab.VI.As one can see, the numerical values of the significance changed for the more lenient metrics, s √ s + b and Z(< 1%), whereas for the most conservative one the significance remained the same.However, an increase was also experienced, in particular, for a VLL mass of 800 GeV, where the lenient Asimov metric grew from 7.84σ to 11.58σ.These results indicate that, despite the networks' training in distinct phase space regions, they are versatile enough to offer a good discrimination power, which also indicates absence of over-fitting.In essence, the hypothetical observation of a statistical excess for a given mass could motivate employing an optimized network to potentially enhance such a signal/excess.
The low cross-sections of the singlet model call for a different approach on how to probe them.It is in this context that the near future electron and muon colliders can offer new opportunities.As it was mentioned in Sec.III B, production of particles via electroweak processes is favoured in these colliders.As such, it is instructive to understand how can a lepton collider enhance the cross-section for the case of pair-produced VLLs via an s-channel process.Note that the analysis that follows is independent of the chosen collider, as the s-channel cross-section is independent of the mass of the initial colliding particles.Therefore, all results shown here are valid both for the electron and muon machines.At LO, the main diagrams involved are shown in Fig. 5.Note that, since we are assuming non-zero couplings between the muon/electron and the VLL, there are also t-channel contributions of the form The neural networks found to be optimized for each point are shown in appendix C.However, such contributions are sub-leading when compared to production via the s-channel process, since these depend on mixing structures of the VLL with the muon, whereas in the s-channel the interaction vertices feature two leptons of the same flavour coupling directly to vectors via gauge interactions.Furthermore, in the particular case of our models, the approximate family symmetry that only allows couplings between muons and VLLs results in vanishing t-channel contributions at electron colliders, but not at muon machines.Note that this channel can be seen as a direct probe to the flavour structure in the leptonic sector if both muon and electron colliders become operational.It is then safe to neglect the t-channel contribution in the remainder of this analysis, where we use the same mixing structure as defined in Eq. (15).We then consider the E → W ν channel, which, depending on how the W bosons decay, can lead to the 2ν + 4j or 2ν + 2j + ν or 2ν + ν + ν final states.For the topologies that involve jets  (15).We show the cross-section for a centre-of-mass energy E CM = 1.5 and 3 TeV, which are the target energies for the CLIC collider, whereas E CM = 3, 10 and 14 TeV correspond to the current proposal for the future µ + µ − collider.Points marked with "−" indicate that there is not enough energy to pair-produce the particles at that mass.as final states, the main backgrounds include diboson production (W W , W Z 0 and Z 0 Z 0 ) with associated jets as well as tau pair production.For the purely leptonic final state 2ν + ν + ν , diboson production is relevant.Indeed, a lepton collider is a much cleaner environment when compared to a hadronic machines in such a way that backgrounds involving jets can be safely discarded.VLLs can also decay into Z 0 bosons as E(E) → µZ 0 .Such a channel would be an important test for a hypothetical VLL discovery provided that the final states can contain at least 6 charged leptons.Besides being a very clean process, 6 lepton topologies are also expected to be small in the context of a SM background that typically results from triboson production processes.A detailed phenomenological analysis at lepton colliders using analogous deep learning methods to those discussed in the context of the LHC is beyond the scope of this article and is left for future work.
For a comprehensive understanding on how the cross section depends on a lepton collider center-of-mass energy, E CM , we use FeynCalc to obtain the following expression where α i for i = 1, . . ., 6 are dimensionless constants, proportional to the product of various couplings, i.e. α j = α j (U e L , U e R , g, g , θ W ), M Z 0 is the mass of the Z 0 boson and M E the mass of the singlet VLL.With the numerical values in Eq. ( 15) these constants read as α 1 = 0.189325, α 2 = −0.213435,α 3 = −0.328608,α 4 = 0.403093, α 5 = 0.152126, α 6 = −0.209655.(18) In Fig. 9 we plot the corresponding cross-section as a function of the centre-of-mass energy, for VLL masses in the range between 500 GeV and 6 TeV.Do note that we are studying the singlet scenario, where the decay width always remain bellow the mass of the VLL.It is interesting to note that the cross sections are rather large, above 13.23 fb for a VLL mass of 500 GeV and E CM = 1.5 TeV, dropping to 0.095 fb for a VLL mass of 6TeV and E CM = 14 TeV.This increase of the cross section allows to probe higher mass ranges than the ones at the reach of the LHC, in a much cleaner environment.Immediately noticeable is the fact that the cross section hits a maximum shortly after E CM ∼ 2M VLL , with a subsequent drop as E CM increases.This implies that when the collider beam energy is two times the mass of the VLL, the double production of VLL is enhanced and the discovery potential is maximized.For lower masses, the drop is more pronounced when compared to the high-mass regime, essentially because we are taking E CM in the range of 3 − 14 TeV for low masses, while for high masses it occurs for much higher centre-of-mass energies (beyond E CM = 14 TeV), and therefore not as relevant for the proposed lepton colliders.In particular, for M VLL = 500 GeV and at E CM = 14 TeV we have σ = 0.23 fb, while for the same mass, at E CM = 3 TeV, we have σ = 4.63 fb.Fixing the centre-of-mass energy and looking at various mass points, as shown in Tab.VII, we notice that for the proposed high-energy colliders, the variation of the mass does not cause significant deviations in the cross-section.For example, taking E CM = 14 TeV, we observe that the cross-section remains nearly constant for the displayed masses, ranging from 0.23 fb for a 200 GeV VLL to 0.18 fb for a 3.7 TeV one.
We end this section with a comment about the cross-sections at the high energy end of the colliders.As the energy grows the s-channel cross sections decrease.However, there is an alternative process that grows with ln 2 (s/m 2 f ), where f stands for the incoming fermion.This is e + e − → e + e − E(E) Ē( Ē) for an electron-positron collider and µ + µ − → µ + µ − E(E) Ē( Ē) for a muon collider.The photon fusion processes have cross-sections that grow with the centre-of-mass energy [56,57] and although they are not competitive for the lower energies they become dominant at high energies.We present in Tab.VII the values of these cross-sections for an energy of 14 TeV which shows that they can play an important role for very high energy lepton colliders.

V. CONCLUSIONS
In this article we have studied two simple SM extensions featuring, each of them, a new vector-like lepton and a sterile neutrino.We have confronted the cases of a doublet and a singlet VLL and discussed their collider phenomenology, both at the LHC and future leptonic machines.For the former we have employed Deep Learning techniques to compute the statistical significance of a hypothetical discovery.In the selection of benchmark scenarios we have required that the coupling of VLL to muons is consistent with flavour constraints and the branching fractions of Higgs and Z bosons to muons.We have also shown that, decay width of both VLL doublet components becomes increasingly larger with growing exotic lepton masses.In our analysis we have only considered scenarios where the width is smaller than the mass.
In the context of the LHC studies, we have performed Monte-Carlo simulations to generate data for signal and background topologies.The signal is characterized by the presence of a single isolated lepton and a substantial amount of MET for the singlet case.For the doublet case, the signal involves 4 light jets, a charged lepton and a neutrino as final state particles.To separate the signal from the background we constructed neural networks which follow from an implementation of an evolution algorithm that maximises the Asimov significance.We have shown that, for doublet VLLs, we can exclude masses up to 1 TeV with more than five standard deviations.In particular, for masses of 1 TeV, we obtain a significance of Z(< 1%) = 11.14σ for the high luminosity phase of the LHC, with an integrated luminosity of L = 3000 fb −1 .We have also verified that one can already test the doublet VLL scenario at the Run III of the LHC, which will deliver L = 300 fb −1 of data.For such luminosity, one can test VLL masses up to about 300 GeV obtaining s/ √ s + b = 8.02σ, Z(< 1%) = 8.09σ and Z A = 2.56σ.For the singlet scenario, production cross-sections are substantially smaller and the expected number of events is usually zero, with an exception for VLL masses of 100 GeV and 200 GeV.While for the former one finds s/ √ s + b = 2.98σ, Z(< 1%) = 4.22σ and Z A = 0.0023σ at L = 3000 fb −1 , the run III estimation gives s/ √ s + b = 0.94σ, Z(< 1%) = 1.33σ and Z A = 0.00094σ at L = 300 fb −1 , experiencing a great drop for a mass of 200 GeV.In particular, the latter scenario can not be excluded, with a statistical significance bellow the discovery threshold.
Owning to the low production cross-sections of the singlet scenario, a supplementary analysis was made within the context of lepton colliders.We have performed numerical computations for the expected VLL pair-production cross-section in the s-channel.We find that, in general, larger cross-sections are obtained when compared to the LHC analysis, allowing for a much wider range of masses to be probed with relevance for singlet VLLs.In particular, we note that for luminosities of the order of ab −1 the, even for singlet VLLs with mass 3.7 TeV one can expect 260 events at E CM = 10 TeV and 180 events if E CM = 14 TeV.Furthermore, even for a VLL as heavy as 6 TeV, a 14 TeV lepton machine delivering a luminosity of 1 ab −1 is expected to produce almost 100 events.With this in mind we conclude that the study of VLL particles at future electron or muon colliders is a rather relevant physics case scenario to be explored, allowing to significantly extend the current reach of the LHC.
Figure 11: Kinematic variables that the neural network uses for classification.Data is simulated for a doublet VLL with mass of 700 GeV and in the ATLAS detector.From left to right and top to bottom, the variables are cos θ µ − , transverse momentum of the muon, pseudo-rapidity of the muon, azimuthal angle of the muon and MET.In the y-axis, it is indicated that events are normalized (NE).We consider 30 bins for all background and signal histograms.

Figure 1 :
Figure 1: Decay width as a function of the VLL's mass, both displayed in GeV.Each coloured line is representative of a different mixing element, whose numerical value is displayed in the box to the right of the plot.
t r a i t sP(M ) = 20% t r a i n d a u g h t e r s

Figure 4 :
Figure 4: Flowchart representative of all iterative steps involved in the genetic algorithm that we employ in this work.

Figure 5 :
Figure 5: LO Feynman diagrams for VLL double production at a lepton/anti-lepton collider via exchange of either a photon or a Z 0 boson.

Figure 6 :
Figure6: Production cross section for both doublet and singlet topologies, in femtobarn (fb), as a function of the VLL's mass, in GeV.The y-axis is shown in logarithmic scale.The black and green curves correspond to the same colour mixing scenarios in Eqs.(14) and(15).

Figure 7 :
Figure 7: Statistical significance as a function of the classifier score given by a neural network for different metrics, assuming a doublet-type VLL with M VLL = 700 GeV and a collider luminosity L = 3000 fb −1 .The computation of the statistical significance is made using the best neural network that the evolution algorithm found.From left to right we plot the significance s/ √ s + b in (a), the adapted Asimov significance where we assume a background uncertainty of 10 −3 in (b), and the Asimov significance with systematics of 1% in (c).

Figure 8 :
Figure8: Statistical significance as a function of the collider luminosity, for a fixed doublet VLL mass of 700 GeV (mixing in Eq. (14)).Target luminosities of the HL-LHC (3000 fb −1 ) and Run III (300 fb −1 ) are marked with vertical dashed black lines.Both axes are shown in logarithmic scale.

M
VLL =6000.0GeV Cross-section vs Centre of mass energy

Figure 9 :
Figure 9: VLL pair production cross section, in fb, as a function of centre-of-mass energy for a lepton collider, in GeV.Each individual box is representative of a different mass of the singlet VLL, which is also given in GeV.

Figure 12 :
Figure 12: Receiver operator characteristic (ROC) plots for the neural networks of the doublet VLLs from the masses of 100 GeV (top) to 500 GeV (bottom).

Figure 13 :
Figure 13: Receiver operator characteristic (ROC) plots for the neural networks of the doublet VLLs from the masses of 600 GeV (top) to 1000 GeV (bottom).

Table V :
Signal significance for VLL single-production at the LHC calculated with an evolution algorithm that maximises the Asimov metric.The last two rows represent the singlet model whereas the remaining ten are benchmark points of the doublet model.The colour coding is the same as in Eqs. (

Table VI :
Signal significance for VLL single-production at the LHC calculated with the neural network trained in the 700 GeV data, with the network details shown in Tab.IX of additional channels.