Phenomenology of vector-like leptons with Deep Learning at the Large Hadron Collider

In this paper, a model inspired by Grand Unification principles featuring three generations of vector-like fermions, new Higgs doublets and a rich neutrino sector at the low scale is presented. Using the state-of-the-art Deep Learning techniques we perform the first phenomenological analysis of this model focusing on the study of new charged vector-like leptons (VLLs) and their possible signatures at CERN's Large Hadron Collider (LHC). In our numerical analysis we consider signal events for vector-boson fusion and VLL pair production topologies, both involving a final state containing a pair of charged leptons of different flavor and two sterile neutrinos that provide a missing energy. We also consider the case of VLL single production where, in addition to a pair of sterile neutrinos, the final state contains only one charged lepton. All calculated observables are provided as data sets for Deep Learning analysis, where a neural network is constructed, based on results obtained via an evolutive algorithm, whose objective is to maximise either the accuracy metric or the Asimov significance for different masses of the VLL. Taking into account the effect of the three analysed topologies, we have found that the combined significance for the observation of new VLLs at the high-luminosity LHC can range from $5.7\sigma$, for a mass of $1.25~\mathrm{TeV}$, all the way up to $28\sigma$ if the VLL mass is $200~\mathrm{GeV}$. We have also shown that by the end of the LHC Run-III a $200~\mathrm{GeV}$ VLL can be excluded with a confidence of $8.8$ standard deviations. The results obtained show that our model can be probed well before the end of the LHC operations and, in particular, providing important phenomenological information to constrain the energy scale at which new gauge symmetries emergent from the considered Grand Unification picture can be manifest.


Introduction
The ultimate goal of any scientific endeavour is to uncover the mysteries of the universe and the world around us and, so far, the best model that we devised to describe all the matter that surrounds us at the most fundamental level is modestly called the Standard Model (SM). The SM is a particle physics model based upon modern quantum field theory (QFT) framework whose predictions and results have matched the stringiest of tests and predictions [1][2][3][4][5][6][7]. However, there are clear indications that something is missing, from the fact that neutrinos have mass, as confirmed by the neutrino oscillation phenomena [8], and that it does not take into account the existence of dark matter (DM) [9]. Besides such experimental evidences, there are also theoretical motivations, as, e.g. the origin of the family replication found in nature, the fermion masses and mixing hierarchies and the origin of the SM gauge structure, where a consensual understanding is still lacking.
So, we can notice that there are certain deficiencies in our understanding of fundamental particle physics which leave us with the obvious question: what is missing and how to fix it? Well, so far, the most exotic theories have been put into the forefront, ranging -1 -from models where extra spacetime dimensions exist [10] to models with a new symmetry between bosons and fermions known as supersymmetry (SUSY) [11]. While somewhat separate, these theories have a common underlying idea. The SM is an effective description of a more fundamental theory and is only valid up to a certain energy scale beyond which New Physics (NP) is needed. Therefore, the problems of the SM all result from our lack of understanding of what such theory really is and at which energy scale it becomes manifest.
High-scale theories like the string theory and SUSY, despite their mathematical complexity, provide a solid theoretical framework from which one can build upon in order to e.g. obtain NP models well motivated by the first principles. However, the amount of new states and model parameters emerging from such scenarios can be overwhelming. One possibility is to use a brute force method to analyse each combination of parameters and select the most promising ones or, alternatively, follow a smarter approach based upon Deep Learning (DL) techniques and optimization algorithms to find the best parameter space. Furthermore, one might have to overcome the typical challenges inherent to collider phenomenology, where the impact of background events can easily bury possible signal events preventing potential NP signatures from becoming observable. A better approach to handle such problem is the use of multivariate analysis to identify possible deviations from expected events which can be caused by NP. These deviations can be further amplified by combining multiple distributions into multidimensional distributions [12]. This state of affairs, the need to quickly identify subtle effects in multidimensional distributions of information, clearly calls for artificial intelligence methods. Particularly the use of Machine Learning and DL techniques [13].
With this being said, in this paper we revisit the key properties of a Grand Unified model recently introduced in [14][15][16][17] which attempts to unify all matter and fundamental interactions in a framework inspired by the E 8 symmetry. Among the key features one can highlight a possible explanation for the fermion mass and mixing hierarchies observed in nature, as well as predicting NP states such as vector-like fermions, additional scalars doublets and a rich neutrino sector, well motivated by the model structure and the unification picture.
The goal of this article is to construct and study the low-energy limit of such a framework which offers interesting phenomenological implications for future explorations at particle colliders. In particular, we focus on the phenomenological study of vector-like leptons (VLLs) and propose potential smoking-gun signatures as direct search channels to probe our model both at the LHC Run-III as well as its high-luminosity upgrade. Furthermore, the techniques that we develop are rather generic and can be used well beyond the scope of the model under consideration. The numerical analysis will be performed using standard Monte Carlo tools, where the final step of our analysis consists in applying the DL techniques for statistical significance studies. This paper is organized as follows. First, in Sec. 2, we discuss the model structure. Here, we briefly review its basic properties both at the unification scale as well as its low energy (SM-like) limit, motivating the parameter choices used in the numerical analysis. The latter represents the main focus of this work which is performed in Sec. 3 where a detailed description of the methods employed in our analysis and the results obtained is -2 -given. In Sec. 4 we conclude and discuss future work and research directions.

Theoretical background
In this section, we introduce the model that is being explored in this article. We divide it into three main parts. In Sec. 2.1 we motivate and introduce the concept of Grand Unified Theories (GUTs) our model is based upon. In particular, we consider an attractive lowscale picture where the presence of new vector-like fermions is well motivated alongside with three Higgs doublets. Then, in Sec. 2.2 we make a short overview of the key properties of the unified framework. While the main purpose of this article is to study the phenomenological implications at the LHC and in particular the potential observability of VLLs, it is important to explain the origin of such NP and which parameter choices are relevant and well motivated. We then finalise in Sec. 2.3 with an effective low-energy description by providing the full Lagrangian density, the particle masses as well as give a brief discussion of the benchmark scenarios that we use in our numerical analysis.

A Motivation
One of the most attractive features of SUSY is an elegant solution to the well known hierarchy problem. Among the key predictions, every known particle in nature should have a SUSY partner with the same mass. However, none of the current or previous experiments have ever observed the existence of such particles. This means that SUSY cannot be an exact symmetry, at least, at phenomenologically relevant scales and should be broken in such a way to generate a larger mass contributions to the superpartners of the SM particles. The actual mass scale of such particles is not known, but the current lack of observation at the LHC [18][19][20][21][22][23] indicates that SUSY breaking should occur well above the electroweak (EW) scale. However, this by no means excludes SUSY as a well motivated formalism to describe realistic theories. This is the case of the model designed in [14][15][16][17] that we analyse in this article. While SUSY does not necessarily manifest at low scale and the effective theory can be treated as a standard non-SUSY model, its high scale limit is indeed supersymmetric with remarkable implications.
As we will see, such model belongs to a class of GUTs that can potentially emerge from a single gauge E 8 group, the unifying force. One of the key properties of this framework is that flavour is promoted to a gauge symmetry that is part of E 8 and treated in the same footing as conventional gauge interactions. The model aims at addressing various issues of the SM delving into fundamental questions such as the origin of gauge interactions and the origin of mass hierarchies for the different matter particles, which is typically known as the flavour problem. As a byproduct of this unification picture NP in the form of vector-like fermions (VLFs) may be manifest at the TeV scale. The emergence of light VLFs from other GUT models had previously been proposed in [24,25] where one of the key advantages of the presence of their leptonic counterparts is the possibility for explaining the muon anomalies [25,26]. In this article we will pay special attention to this sector since a potential discovery of VLLs at the LHC can offer relevant phenomenological probes of the high-energy theory and hints of the unification picture. Figure 1: Symmetry breaking scheme from the original E 8 gauge symmetry down to the strong and electromagnetic gauge groups (SU(3) C × U(1) EM ). The various terms between the different boxes (M 6 , M 3 , etc) represent the distinct mass scales involved in this scheme according to the discussion in [17]. M 8 encodes details inspired by theories with extra compact dimensions.

High-Energy limit
Here, we present the high-energy scale formulation of the model under consideration with focus on the main properties needed for a basic understanding relevant for our numerical analysis. A more detailed description can be found in [14][15][16][17]27], where we highlight [17] as the most recent and complete reference.
The main idea of a GUT model is to embed all SM-gauge interactions, i.e. SU(3) C × SU(2) L × U(1) Y , into a larger group. As already stated, an interesting possibility resides on the E 8 symmetry. It has been presented as a GUT candidate in various superstring theories [28,29] and is, in fact, a motivation inherent to our model 1 .
Our model was engineered to address some of the main concerns one encounters in the SM. It proposes a first principles explanation for a common origin of the strong and EW interactions as well as the flavour structure observed in nature. The Higgs and matter fields are unified into a single superfield equipping both the scalar and the fermion sectors with the same flavour structures. This results in a rather reduced freedom in the Yukawa interactions allowing only for two free parameters, Y 1 and Y 2 , which will provide the dom-inant contributions to three generations of exotic vector-like quarks (VLQs) as well as the third and second generation SM-like quark masses. All remaining fermions, including the first-generation quarks and charged leptons, have their masses radiatively generated making them naturally lighter. The CKM mixing is also emergent in this framework provided that there are, at least, three Higgs doublets developing vacuum expectation values (VEVs). Given its rather unique properties, this model has been named as SUSY Higgs-matter Unified Trinification or SHUT for short. Note that the trinification group emerges as a subgroup of E 8 which we consider below the M 6 scale in Fig. 1.
As stated before, the starting point is the E 8 gauge symmetry, and the first symmetry breaking step reads where the subscript F denotes the family symmetry. From this point on, the sequence of steps by which we obtain the SM gauge group is schematically illustrated in Fig. 1. The SM particle content and all NP emergent at low-energy scales correspond to the states that remain light after the various breaking stages. Only second-and third-generation SM-like quarks and all three VLQ masses are tree-level generated, with their relative sizes controlled by the only two Yukawa couplings in the theory, which are of SUSY origin. To see this, let us consider the theory after the breaking step denoted by M 3 in Fig. 1 whose superpotential reads [17] where ε ij is the two-dimensional Levi-Civita symbol in the generations' (or flavour, in what follows) space, Y 1 and Y 2 are the Yukawa couplings, and L/R denotes SU(2) L/R doublet superfields. Note that while χ is a SU(2) L × SU(2) R bi-doublet where the light Higgs sector resides, φ is a singlet carrying only family symmetry charges, typically dubbed as flavon.
Despite some allowed mixing after symmetries are sequentially broken, the left-and righthanded leptons are essentially embedded in L and R respectively, whereas the SM-like quark sector belongs to both q L and q R . Note also that this model addresses neutrino masses due to the existence of six right-handed sates, three in R and three in φ. Last but not least, new down-type SU(2) L singlet VLQs and SU(2) L doublet VLLs are predicted in the SHUT model emerging from the fermionic components of D L,R and χ, respectively. In this work, we will study the collider phenomenology of the latter and discuss possible implications for the high scale picture. All exotic scalars are assumed to be decoupled at the soft SUSY breaking scale beyond the reach of the LHC. As we have mentioned above, one of the features of the SHUT model is that only second and third generation chiral quarks as well as the three VLQ generations are allowed to obtain masses at tree-level. For a better understanding of this statement let us inspect the superpotential in Eq. (2.2). First, after the second last breaking stage in Fig. 1 all six neutral scalars inφ i and˜ i R develop VEVs which are denoted by p, f , ω and s i (see [17] for -5 -details). We immediately see that mass terms for VLQs are generated from φ D L D R and ˜ R D L q R type of terms resulting in [17] where, for simplicity, we have ignored the subdominant effect of the s i VEVs and where we adopt a notation such that the lightest VLQ is the D-quark. Along the same lines, SM-like quark masses are generated from χ q L q R type of terms where, even for a generic setting with all six EW doublets inχ developing nonzero VEVs, the up and down-quark masses are always zero. Furthermore, it was shown in [16,17] that a proto-realistic description of the CKM matrix requires a minimum of three light Higgs doublet VEVs where the quark masses become with u i and d i being i-th family EW-symmetry breaking VEVs from Higgs doublets coupling to up-type and down-type quarks, respectively. If we consider, for simplicity, that p ≈ f ≈ ω, we obtain the following ratios implying also the presence of up to two generations of VLQs at the reach of the LHC if ω and f are around 100 TeV. This relation fixes the size of Y 1 and Y 2 and implies that mass ratios in the VLQ sector are the same as the ones found among their chiral counterparts. Note that in this work we will only study the VLL sector and leave a detailed numerical study of VLQs for a future work. For the case of both SM-like leptons as well as VLLs, there are no allowed terms of the form χ L R and φ χχ, respectively, which means that, at tree-level, their masses are zero just like the first-generation quarks. However, below the second to last symmetry breaking stage in Fig. 1, such type of operators become allowed which means that they can be radiatively generated via loops with internal heavy scalar and fermion propagators.

Low-Energy effective limit
While a direct probe for the high energy limit of the SHUT model at, or above, the ω −f −p scales is far beyond the reach of the LHC, exploring the corresponding NP signatures at the TeV-scale can offer us solid indications about the structure of the model at higher scales. Furthermore, such an analysis will provide an important piece of information about the -6 -low-scale properties of the model, which, although not explored in this work, can become relevant for matching of the low and the high scale regimes of the theory.
We consider in this section a possible low-energy scale limit of the SHUT model whose gauge symmetry is given in the second to last box of Fig. 1. All the quantum numbers for the gauge groups are shown in Tabs. 1, 2 and 3 Table 1: SM-like sector for the fermions and quarks. Table 2: Beyond-the-SM sector for the fermions and quarks. where the SU(2) L doublets are defined as follows, with Q i L denoting the fermionic components of the q i=1,2,3 L superfields, whereas L i are the lepton doublet components of i L , and E i L,R belong to the χ i bi-doublet superfields. Let us now describe the low-scale version of the SHUT model, step by step. The gauge boson's quantum numbers are not shown 2 since they are identical to the SM. On the other hand, the matter sector can be subdivided into two sub-sectors. The first, shown in Tab. 1, represents the SM-like fermions from where ordinary matter emerges. The second sector, shown in Tab. 2, is where NP appears including three new VLL generations, E L,R , and two light VLQ generations which we denote as D L,R . The Beyond-the-SM (BSM) sector also offers a rich neutrino content including six left-handed states originating from the E L and E R SU(2) L doublets and six right-handed SM-singlet Majorana neutrinos which we denote as ν R . Recall that the latter are embedded in three i R SU(2) R -doublets and three φ i flavons as stated above. Note that the lightest of the right-handed neutrinos, which we cast as ν BSM in the remainder of this article, can be sterile enough to provide a good DM candidate [30]. While we do not perform DM studies in the current work, we will consider this scenario in our numerical analysis by setting its mass in the keV-MeV range and the mixing to the SM-like neutrinos to zero. In such a scenario ν BSM escapes the detector and is treated as missing energy. While the scalar sector also offers NP we will not further study it in this paper leaving any further details for a future work.
We can now introduce the relevant interaction terms for our analysis. We start with the low-scale Yukawa Lagrangian that reads as where Γ, ∆, Θ and Π are the 3×3 Yukawa matrices, Υ, Σ, and Ω are 3×6 matrices whereas Y is a 3 × 2 one. Note that only Y , Γ and ∆ contain entries whose leading contributions are proportional to Y 1 and Y 2 . The remaining ones are purely of a radiative origin. Unlike what we have in the SM, here, the gauge symmetries allow for explicit construction of invariant bilinear and mass terms These arise from the vector-like nature of the involved fields where SU(2) L transformations do not distinguish between left and right chiralities. All such mass terms in (2.9) are generated at the ω-f -p scales, thus larger than the EW scale. Note that the neutrino mass matrix M ν R is generated once the p, f , ω and s i VEVs are developed. However, contrary to all remaining bilinear and Yukawa terms in the leptonic sector, its entries are generated by tree-level diagrams once the corresponding operators become allowed (see [17] for details). Therefore, small loop factors will not suppress the size of M ν R , whose entries can be up to an order of p, f and ω scales. As a byproduct, the neutrino sector automatically contains a seesaw mechanism and hence an explanation for the smallness of SM neutrino masses as we further discuss below. For completeness, we show the remaining Lagrangian terms in appendix A. With the model fully defined, we finalise this section by showing the fermion mass matrices in the gauge eigenbasis that are implemented in our numerical analysis. First, for the quarks, and considering the components of the Q L SU(2) L doublets as in (2.7), the new Lagrangian is written as with v a being the VEV of the respective Higgs doublet φ a . The up-type quark mass matrix written in the basis {u 1 L ,u 2 L ,u 3 L } ⊗ {u 1 R ,u 2 R ,u 3 R } takes the form The eigenvalues of [M u ] give the masses of the up-type quarks whose leading contributions are proportional to (2.4). A similar strategy can be now employed for the down quark sector where, in the basis We can now extend this analysis to the lepton sector and write down the mass matrices for the charged leptons and neutrinos. Starting with the charged leptons, in the basis and for the neutrinos, in the basis {ν i e L ,ν i e L ,ν i e R ,ν j R } ⊗ {ν i e L ,ν i e L ,ν i e R ,ν j R } we arrive at where i = 1, 2, 3 as usual and j = 1, . . . , 6. For charged leptons, besides the SM-like states we also have exotic VLLs which we name as e 4 , e 5 and e 6 4 , defined in such a way that m e 6 > m e 5 > m e 4 . The neutrino sector is quite rich in new particles, besides the three SMlike ones, we have a total of twelve new states. The numerical analysis will only consider the three lightest, keV-MeV scale BSM neutrinos which are denoted as ν 4 ≡ ν BSM , ν 5 and ν 6 .

Physically viable benchmark scenarios for masses
Before moving to the numerical analysis we present possible benchmark scenarios for couplings and masses in such a way to preserve the key properties emergent from the unification picture as well as complying with the measured phenomenological quantities.
The main focus of this work is the construction of an analysis framework dedicated to the study of VLLs and how important the DL techniques can be. This will enable us to propose robust signal events to be tested via direct searches at the LHC as well as understanding whether the model under consideration can be probed in such a sector. As it was shown in [17], under certain approximations and before EW symmetry breaking (EWSB), the lepton mass matrix is reduced to 5 where the various κ i terms are radiatively generated Yukawa couplings, thus expected to be smaller than unity. The VLL masses are then where we defined Λ 1 = κ 2 5 + κ 2 6 + κ 2 7 + κ 2 8 , Λ 2 = (κ 5 κ 6 − κ 7 κ 6 ) 2 , Λ 3 = (κ 5 κ 7 + κ 6 κ 8 )κ 1 κ 3 , Λ 4 = (κ 2 5 + κ 2 8 )κ 2 1 and Λ 5 = (κ 2 6 + κ 2 7 )κ 2 3 . The plus sign in (2.16) corresponds to e 5 and the minus sign to e 4 .
Considering a scenario where ω ∼ f p, Taylor expansion of (2.16) leads to the simplified expressions m e 6 ≈ pκ 2 , m e 5 ≈ pκ 1 , m e 4 ≈ ω κ 2 5 + κ 2 8 . (2.17) Along the lines of what was discussed in [17], let us consider a set of possible solutions with 5 It is important to note that this does not represent a one-to-one correspondence between (2.15) and (2.13). One should interpret (2.15) as a matrix one would get by following all symmetry breaking steps as seen in Fig. 1, while (2.13) corresponds to the stage immediately after the ω, f and p VEVs. This benchmark scenario leads to the following mass ranges • m e 6 ∼ O(5 − 10 TeV), • m e 5 ∼ O(0.15 − 10 TeV), • m e 4 ∼ O(0.1 − 1 TeV), which we will use as a guiding principle for our numerical analysis. In particular, we see that for the model under consideration e 4 can be light enough to be probed at the LHC. On another hand, e 6 will always be rather heavy and a potential observation at the LHC would likely be very challenging. Regarding e 5 , we see that it can either be as heavy as e 6 or as light as e 4 depending on yet unexplored model details. Based on this estimation, we will consider both possibilities in the numerical studies.
To finalise this subsection, let us consider the neutrino sector. Before EWSB, the mass matrix is block diagonal, whereM represents neutral components belonging to SU(2) L doublets while M denotes SM singlets corresponding to ν R in Tab. 2. Starting with the M 6×6 block, which corresponds to M ν R in (2.9), its components offer the larger contributions to the neutrino mass matrix. In this sector, hierarchies among gauge eigenstates result from the relative sizes of the EWpreserving VEVs. On the other hand, theM components are radiatively generated and share the same properties as the VLLs. Thus, after the p, f and ω VEVs one can writē with eigenvalues, 20) such that, the left-handed neutrino components, at this stage, share the same masses as their charged lepton partners. In total, and before EWSB, we have three massless, and twelce massive neutrinos (six from the doublets and six from singlets). In the corresponding mass -11 -basis, if we identify the massive states as µ i (i = 1, . . . , 12), we can recast the neutrino mass matrix in a condensed notation as where the contribution of EWSB VEVs was already included. Note that y ν are the 3 × 12 Yukawa matrices whose entries are all radiatively generated. While a more dedicated analysis is beyond the scope of this work, this structure can potentially offer three sub-eV states as well as light keV-MeV order sterile neutrinos as we will assume in our numerical studies.

Physically viable benchmark scenarios for couplings
In our numerical analysis we will be using MadGraph5 [31] which is a tool that requires a theory written in the mass basis. Therefore, not only masses but also couplings need to be rotated to such a basis. To this end we use SARAH [32], which also offers a complete set of Feynman rules with physical fields. All relevant diagrams for our studies are shown in appendix A. Note that all signal and background processes that we will consider involve only triple gauge self-interactions as well as fermion-fermion-gauge vertices. While the gauge sector is purely SM-like with well-known parameters, the Feynman rules involving fermion vertices will be sensitive to elements of the mixing matrices in the charged lepton (including VLLs) and neutrino sectors, defined by the bi-unitary transformations Let us now discuss which phenomenological constraints are applied to these matrices. First, for the charged leptons, we consider the limit where the SM-like sector is flavourdiagonal. Therefore, in U e L and U e R , we add a 3 × 3 identity block and consider a limiting scenario where there is no mixing with VLLs. While this may not be the case in general, a realistic scenario cannot strongly deviate from the flavour alignment limit that we impose. A complete study with flavour mixing is beyond the scope of this work. For the VLL block, we consider a generic mixing with the only restriction being that both U e L and U e R are unitary. To summarize, the lepton mixing matrices used in the numerical analysis are given by For the neutrino sector, we also consider a limiting scenario where, for simplicity, the mixing between the three light active neutrinos and the remaining twelve BSM states is zero. Once again, a more generic case with flavour mixing is beyond the scope of our analysis and does not significantly affect our main conclusions. Note, however, that mixing among light neutrinos is allowed and fixed by the PMNS matrix. For the remaining BSM -12 -12 × 12 block we recall that the mixing among right-handed and left-handed components is radiatively generated and is thus likely small. Here, we consider that those elements are always smaller than 10 −3 . Having said this, the full neutrino mixing matrix reads with, We set the matrix elements in D 1 to be of order O(10 −3 − 10 −8 ) while in U 1,2 they are randomly generated in a way that preserves unitarity and guarantees that e 4 couples democratically to the sterile neutrinos. With the above ingredients we have defined a possible benchmark scenario to start our collider phenomenology studies while preserving the essential features of the model under consideration.
3 Searching for vector-like leptons at the LHC As a first step, we create the necessary UFO [33] files using SARAH. These are later used by MadGraph5 (MG5) to generate the signal and background Monte-Carlo events. Hardscattering events are generated with Pythia8 [34], and then Delphes [35] is used to include hadronization and detector effects. All hard-scattering events are generated using pp collisions at 14 TeV center-of-mass energy, with the parton distribution function nn23lo1 and with the strong coupling constant α s fixed automatically by MG5. We have generated a total of 250000 events. The background channels up to two extra jets are generated with the MLM matching scheme [36]. While substantial theoretical work has already been done over the last decades [24][25][26][37][38][39][40][41][42][43][44][45][46][47][48], only recently, the searches for exotic charged leptons have started. The most recent analysis was done by the CMS collaboration at the LHC in 2019 [49], where a search for VLLs coupling to taus was performed. In fact, one of the three topologies that we propose in our analysis is very similar to the one in Fig. 1 of [49], and more in line to what we see, for example, in Fig. 30 of [50]. In the context of our model, such topology can be seen in Fig. 2 which, in what follows, will be referred to as "ZA".
Recall that we are treating the lightest, sterile, BSM neutrinos as missing energy as long as their mass is in the keV-MeV mass range. This means that possible decays are kinematically forbidden and therefore, such neutrinos escape the detector. Of course, it would be interesting to study a scenario where we have SM neutrinos instead of the BSM ones provided that the final state would be purely the SM one. However, due to the structure of the mixing matrix used in this study (2.23), such a mixing is non-existing. We are leaving for a future work the scenario where inclusion of a non-zero mixing between SM leptons and VLLs, in consistency with both the flavour observables constraints and predictions from the high-scale theory, is implemented. Figure 2: Leading-order Feynman diagram for the ZA topologies. Here, q andq correspond to quarks originating from the colliding protons, V represents VLLs and ν BSM denotes the lightest BSM neutrino. There are two purely SM leptonic final channels identified with and ν .
We also consider vector-boson fusion events whose topology is shown in Fig. 3 and that we shall refer to as "VBF". While the latter is expected to have a smaller cross section, the presence of two well-defined forward jets enables us to tag such events using the high transverse mass of the forward jets. The signal channels, we propose here, provide a good starting point for our analysis. However, due to the expected low cross section for the signal events in comparison with the overwhelming cross-section of the irreducible background, searching for such particles at the LHC solely considering these two processes can become rather challenging. A third channel, denoted as "VLBSM", with only four internal vertices (VBF diagrams contain eight vertices while ZA ones -six) is then considered and can be seen in Fig. 4. Furthermore, we use DL techniques inspired from previous works [13,51,52] and tailored for our analysis, in order to efficiently discriminate signal from background.
Another possible signal topology would be to consider diagrams similar to Fig. 4 but with a neutral boson Z 0 /A decaying directly into a pairē 4 e 4 , that is, pp → Z 0 /A →ē 4 e 4 . This would provide us a sizeable cross section, a clean signal and could appear as charged tracks in the detector potentially offering a good smoking gun for our model. However, preliminary numerical calculations showed that e 4 decays too quickly and does not reach the detector track chamber such that one would have to reconstruct e 4 from their main product decays. Therefore, we will only consider the ZA, VBF and VLBSM topologies in our analysis and leave other possibilities for a future work.
The main irreducible background for each signal channel is chosen as follows: • For ZA topologies, we consider tt and W + W − both with fully leptonic final states, tt + Z 0 , with Z 0 decaying into lepton/anti-lepton pair and tt fully leptonic decay and finally tt + Z 0 with Z 0 decaying into neutrinos.
• For VBF topologies, we consider W + W − with fully leptonic final states, tt + (j, jj), where tops decay into leptons accompanied with either one or two light jets.
• For VLBSM topologies, we consider the single lepton production pp → ν with zero, one and two light jets. Figure 3: Leading-order Feynman diagrams for the VBF topologies. The same nomenclature as seen in Fig. 2 applies here. ν BSM correspond to any BSM neutrino.
Both the background and signal leptonic final states are chosen to be identical. The Feynman diagrams for W + W − and tt with fully leptonic final states are displayed in Figs. 5 and 6. To facilitate the signal detection and reduce the background contamination we consider final-state leptons of different flavour. In particular, we choose W + decaying to µ + and ν µ , while for W − we consider the e − +ν e channel. We also choose the following simple kinematic cuts as event selection criteria: 1. Charged leptons (e − and µ + ) are required to have a transverse momentum p T > 25 GeV and |η| ≤ 5 and 3. For events with jets, we use the Cambridge/Aachen jet clustering algorithm with ∆R = 1.0, transverse momentum p T > 35 GeV and pseudo-rapidity |η| ≤ 5.
At this point we are able to reconstruct all particles up to the VLLs with relative precision using the information from the final state visible particles, tracks and calorimetric towers provided by the Delphes output. All chosen observables in our studies are detailed in Sec. 3.2. We compute the observables both in the lab frame, as well as the W frame. We also emphasize that all Monte Carlo simulations and posterior data processing (PYTHIA8, Figure 5: tt background topologies with fully leptonic final states. All particles shown here are purely SM ones. The final leptonic channels in both the signal events and the tt background processes shown here are chosen to be identical. Figure 6: W + W − background topologies. All particles shown here are purely SM ones. The final leptonic channels in both the signal events and the W + W − background processes shown here are chosen to be identical. MadGraph, Delphes) are performed in the blafis 6 and ARGUS computer clusters as part of the overall computing infrastructure at the University of Aveiro.

Methodology: Deep Learning models and dataset
In this section, we describe the construction of our neural network (NN) models and what are the best architectures we found to accurately separate and identify signal events from their respective backgrounds. For the unfamiliar reader, NNs, and by extension DL algo- 6 Technical details can be found at the Gr@v's website [53].
-16 -rithms, are rooted in the universal approximation theorem, which essentially states that a given NN with a given number of hidden layers and a finite number of hidden units, a.k.a neurons, can approximate any arbitrary continuous function on compact subsets of R n ; this function can describe a hyperdimensional plane which separate samples from distinct classes (classification problems) or predict new samples based om previous one (regression problems). However, the universal approximation theorem does not tells us how deep the NN, neither the number of hidden units needed to better approximate the desired arbitrary function. This is a problem that one faces when identifying clever solutions and finding better designs for NNs.
The main goal of our NN model is to classify the signal channels defined in the previous section over each respective background. This procedure is often denoted as classification. For a better performance one must determine what kind of architecture should be employed, the number layers, how many neurons each layer needs, etc. An appropriate choice of such parameters can lead to models which are capable of giving very accurate predictions, which in the context of high energy physics can potentially lead to discoveries using the available data.
The problem of selecting the correct parameters is refereed to as hyperparameter optimization or tuning. There are many methods to search for the best combinations of parameters including a brute force approach by testing each possible combination until the optimal NN model is recovered. However, as one might expect such a method is very time consuming. A more efficient procedure consists in using an evolutionary algorithm search [13,52]. We define for our analysis the following set of hyperparameters: Our evolutionary algorithm is initialized by building a set of ten NNs using Keras [54] with TensorFlow [55] as back-end. The hyperparameters are then randomly chosen from the ones presented in the list above. Each NN is trained for 200 epochs and once the training phase is complete we select the top five NNs that have shown better performances in order to "breed" new NNs for the next iteration. Such NNs are then initialized with the hyperparameters of the selected ones and treated as "parent traits" while randomly including new ones as mutations. We have set a 20% probability of a random mutation to occur. We then construct a new population set and repeat the training/evaluation process for five times, five generations until finally retrieving the best NN architecture.
Another important aspect of the evolutionary algorithm is the fitness function. This helps the algorithm to select the best architectures based on a pre-defined metric. In our case, we set two fitness functions, one where the best models are ranked according to their accuracy on the test set, and a second one that ranks the models according to their Asimov significance defined as: with s and b being the number of signal and background events, respectively, and σ 2 b is the variance of background events. We include a similar procedure to that described in [56] and the necessary modifications for the training methodology into our evolutionary algorithm.
Although, the main "body" of our NN is built using the principles of natural selection [57], some characteristics of the model construction are universal to all NN models that we summarize as follows: • As input data, the NN receives a standard normalised vector, i.e. such a vector has mean 0 and standard deviation 1, and data sets from all observables are extracted from the ROOT detector output. This data set is then reshuffled and divided into a training set (80 % of data) and a test set (20 % of data). To avoid overfitting, we use the cross-validation with a five-fold scheme during the training of the NN.
• We employ a cyclic learning rate during the training phase with 0.01 initial value and maximal value of 0.1.
• In the output layer, the data is transported to a vector, with entries between 0 and 1 (which correspond to probabilities), in the format (S, B j ), where S is the signal and B j correspond to different backgrounds. The index j runs over the number of backgrounds chosen for a given signal. As an example, in ZA we consider 4 distinct backgrounds (j = 4). So, such a vector would correspond to (S, B 1 , B 2 , B 3 , B 4 ).
• Batch size of 32768 entries.
• We impose a total limit of 200 epochs with a patience of 5 epochs and a validation loss monitor, i.e. if the loss value on the test/validation set did not change for 5 epochs the training is resumed and all metrics are computed and stored to be passed to the evolutionary algorithm.
• To select the NN models with better accuracy we use the binary cross entropy (BCE) loss, while for the selection of those models that maximize the Asimov significance we use Eq. (3.1) as a loss function during the evolutionary scan of the best hyperparameters.
Another important aspect to mention is the fact that our data is unbalanced, i.e., we have more data points for some of the classes we are analyzing (in some cases, with a ratio of s/b ≈ 1.83). This is a result of the event selection criteria that we impose for the signal and background. Unbalanced data can lead to models with lower predictive power for substantially outnumbered classes, which is a serious issue if one wants to search for -18 -NP phenomena. To avoid this, we use the Synthetic Minority Oversampling Technique (SMOTE) [58] to create synthetic entries for the minority classes in our training dataset. Note that we do not employ any re-sampling technique on the test dataset. This method is faster and more efficient than generating additional Monte-Carlo events and passing them through subsequent hadronization and detector effects.
Our dataset is stored in a table format where each row corresponds to an event entry that has successfully passed the selection cuts described in 3, and each column corresponds to the observables described in Tabs. 4 and 5. The dimensions of each training and test datasets are displayed in Tab. 6.

Dimension-full Dimensionless
Lab. frame cos(∆φ), cos(∆θ) Table 4: Kinematic (dimension-full) and angular (dimensionless) observables selected to study the ZA and VBF channels. We include observables in four different frames of reference: laboratory frame (top row), W − rest frame (second row), W + rest frame (third row) and ¯ frame. θ i,j denotes the angle between the respective particles from either the final state or reconstructed objects.

Results
We start our discussion by presenting a specific benchmark point whose parametric choice was guided by our discussion in Secs. 2.3.1 and 2.3.2. Note that the analysis methodology is independent of such a parametric choice. We will then study events for a light VLL (e 4 ) accompanied by the lightest BSM neutrino (ν 4 ), which is treated as missing energy, and whose masses read as m e 4 = 677 GeV , m ν 4 = 216 keV . As one can see the main problem we face is the overwhelming background resulting from tt events whose cross-section largely overtakes that of ZA and VBF channels, as well as pp → ν + (0, 1 and 2) jets whose cross-section exceeds that of the VLBSM channel by at least eight orders of magnitude. While each diagram in Fig. 3 has a larger suppression factor associated with the presence of more interaction vertices and internal propagators, it ends up generating more contributions to the overall cross section. Indeed, while in Fig. 2 we only have e 4 as an intermediate state, so less combinations are concerned, in Fig. 3 we have all BSM neutrinos contributing to the ν BSM propagator thus implying a larger number of possible VBF processes. Furthermore, two of the BSM neutrinos in ν BSM are rather light with masses of the order of 100 keV (for ν 4 ) and 100 MeV (for ν 5 ), which, on its own, offers an enhancement factor of at least 6 and 3 orders of magnitude, respectively, in comparison to massive EW-scale (or above) propagators. The relevant observables for VLBSM signals are detailed in Tab. 5, where both angular and kinematic variables are determined in the laboratory frame. The most obvious distinction between this dataset and the previous two is in the number of features. While the VLBSM signal with less internal propagators and only one lepton final state yields crosssections larger than those of VBF and ZA events, as a drawback, we do not have a wealth of distinct variables to choose from. Not only that, the VLBSM topology and its corresponding backgrounds do not allow for a direct one-to-one correspondence between variables such as VLL invariant mass distributions, as well as the azimuthal and polar angles cos(∆φ) and cos(∆θ) are absent in VLBSM events. A schematic representation of these new angular variables can be seen in Fig. 7, where ∆θ is the angle between the W − and W + planes, and ∆φ is the azimuthal angle between those two planes. The only relevant distributions, as specified in Tab. 5, can be seen in Fig. 20 of   Starting with the ZA/VBF signal and background topologies one can reconstruct the top, W and e 4 masses within the expected range provided that there is a noticeable difference between them. We also observe a sizeable separation between signal events and background, especially for ∆R distributions, where, for the former, ∆R e −ν e and ∆R µ + νµ showcase a peak near zero, well separated from background events. For pseudo-rapidity distributions the majority of signal events have a peak at around zero indicating particle trajectories perpendicular to the beam axis, which helps to separate events from some of the background channels (tt, W + W − ). While there are indeed sizeable differences in some variables, others are clearly dominated by background, especially for angular variables cos(θ). Similar conclusions arise when observing the distributions for the VLBSM channel (see Fig. 20), with pseudo-rapidity distributions providing a good signal-to-background separation, whereas angular distributions suffer from the considered backgrounds.
Generally speaking, for all studied channels, kinematic distributions such as transverse momentum and missing transverse energy can offer some degree of distinction in order to separate signal and background events. In fact, for backgrounds, these distributions tend to accumulate at lower energies when compared to signals. In particular, for transverse momentum distributions we have such an accumulation of events at p T < 200 GeV and for missing energy the preferred region is MET < 200 GeV. On another hand, for signal events, we have a significant accumulation in the high energy region where p T /MET > 300 GeV. In fact, due to a rich neutrino sector, missing energy distributions are of particular interest as the signals we are considering here which contain both BSM and SM missing energy.
However, it is important to note that the information available at experiments is limited, typically referred to as low level observable. This operates mostly on counting the number of hits (or events) that were "observed" by a given detector. A high level approach would combine the different information from these detectors with further complex and sophisticated observables, such as the ones we explore in this work. The use of such a multitude of observables, including the variables in the W frame, will then serve as an important step in the subsequent analysis. It allows us to build a vast dataset for DL studies, which in turn, allows for a quicker training and a greater overall accuracy despite lower -23 -cross-sections.
For the benchmark scenario considered above (a VLL with m e = 667 GeV and a BSM neutrino in the keV range), the architecture that maximizes the accuracy can be seen in Tabs. 10, 11 and 12. On the other hand, the architecture that maximizes the Asimov significance is shown in Tabs. 13, 14 and 15. A quantitative approach to evaluate our NN models can be done with the help of ROC (Receiver operating characteristic) curves, which represent a measure of how well the NN classification has performed. In Fig. 8 we show our results for the best accuracy whereas the best Asimov significance can be seen in Fig. 9. As one can observe, for the models which perform with a better accuracy, the selected architectures are capable of separating signal events from background with almost 100 % efficiency. In particular, we see that signal events are above background events with signal efficiencies of about ε S = 1, with 97% accuracy for ZA, 100% for VBF and 98% for VLBSM channels.
However, we note that a large significance does not necessarily imply a good accuracy. In fact, for the NN architectures that maximize the Asimov significance, the accuracy is substantially reduced. A particularly relevant example is that of the VLBSM channel exhibiting the lowest accuracy with a value AUC = 0.32. The predicted confidence scores can be found on the right panels of Fig. 8. Note that the NN assigns a different score to each prediction. For example, taking Fig. 8(a), the NN score of 1.0 labels an event that is either a signal or a W + W − background event.
The significance of signal events is typically regarded by experimental physicists as the most meaningful measure to either claim a discovery or that a given NP candidate is excluded. Therefore, we compute the significance for all channels proposed so far showing our results in Figs. 10 and 11. While in the form the accuracy metric is maximized, in the latter it is the Asimov significance that is maximized. For the sake of rigour and completeness of our analysis we compute the significance for three distinct statistics: 1. First, we consider what we denote as naive significance. This is calculated solely by counting the number of background b and signal s events according to the well known formula s/ √ s + b.
2. The second metric to consider is the plain Asimov significance Z A as given in Eq. (3.1). This is typically the most conservative measure in our analysis and it assumes that the background is known with 1% uncertainty.
3. Finally, we consider the Asimov significance in the case of backgrounds known with an uncertainty much smaller than 1 % referring to it as Z(< 1%). In particular, we choose background uncertainty of 10 −3 % in our studies. This measure typically gives the best results but requires that all physics backgrounds are under control by the experiment. We expect this to be realizable by the ATLAS and CMS communities upon accumulated knowledge and experience over time. Note that in the limit of vanishing background uncertainty we recover the naive significance formula.
We then study these three significance measures in terms of the NN scores specializing to the case of the High-Luminosity (HL) LHC runs, i.e. L = 3000 fb −1 . Considering -24 -    the results from both scenarios, we note that, as a general rule for all channels we get For the case of the VLBSM channel, note that we have not selected the highest significance that our algorithm has found. The reason for this is that we have regions of the NN parameter space where only signal is present. For example, in Fig. 10(g), we observe that for scores of ∼ 0.80 we obtain a significance greater of 10σ. However, for a realistic evaluation, we have asked the NN to guarantee the existence of both signal and background events. This is not the case of the ZA and VBF channels where the largest significance is found in a region where both signal and background is always present.  Under the assumption that all signal topologies represent independent events, we can define the combined significance as the sum of all three contributions, i.e.
For the results in Fig. 10 we see that if we privilege an evolutionary algorithm that looks for a better accuracy we get • Z(< 1%): σ C = 13.71σ, while if we choose to maximize the Asimov significance, the results depicted in Fig. 11 are translated into -28 -• Z(< 1%): σ C = 10.93σ, • s/ √ s + b: σ C = 6.16σ, Note that for both metrics we surpass the 5σ threshold if the Z(< 1%) measure is considered. However, note that this statistics works under the assumption that all backgrounds are very well under control. Nicely, for the case of the Asimov metric, we obtain for the naive significance s/ √ s + b = 6.16σ providing us a stronger argument towards the possibility of probing VLLs with masses around 700 GeV perhaps even before the end of the HL-LHC runs. We also observe that an evolutive algorithm engineered to maximize the Asimov significance offers overall better results. We should mention here that the combined (or even individual) significance grows with luminosity. Therefore, and based on the results so far discussed, it provides a compelling argument in favour of high-luminosity machines in the longer term.
The results presented so far for a single point are already rather interesting not only in the context of the model formulation we discuss here but also for any other model with VLLs and sterile neutrinos. For completeness and better scrutiny we will study the impact of varying the mass of both the two lightest VLLs as well as of the lightest BSM neutrino. In particular, we are interested in understanding under which circumstances one can reach or surpass a signal significance of 5 standard deviations in order to motivate direct VLL searches for this class of models at the LHC. To do this we repeat the numerical procedure explained above considering the following cases: 4. the same as in 1 but with varying luminosity.
For a fixed luminosity of L = 3000 fb −1 we study the signal significance calculated for the VLL masses m e 4 = 200, 486, 868 and 1250 GeV. First, we consider an evolutive algorithm that maximizes the accuracy metric showing our results in Tab. 7. We have repeated     tioned tables we notice that for heavy VLL scenarios with m e 4 = 1.25 TeV the combined significance for all event signals drops to values near zero indicating that such channels can be rather challenging for direct searches at the LHC. However, if the background is well under control, the Z(< 1%) statistics offers a combined significance of 5.66σ. Note that the larger component of the combined significance results from the VLBSM channel with 4.9σ, which means that a potential observation of heavy VLLs with masses in the TeV range can only be possible if the pp → ν + (0j, j, jj) backgrounds are known and with a high precision. The fast decrease in significance for larger masses is a consequence of the, also fast, decrease in cross-section with increasing mass, as shown in Fig. 12, left panel. A small significance is also obtained for the heavier VLLs, e 5 and e 6 , whose masses lie beyond O(3 TeV) and likely out of the reach of the HL-LHC. However, recall that all three signal events represent independent variables, which means that we can evaluate a global significance as the sum of the individual ones from each process. This implies that we can consider additional event signals that would boost this global significance. In particular, this entails that including channels with jets from W decays to quarks can be relevant due to a larger expected number of events. In fact, the W decay width is larger for light jets with a branching ratio (BR) of approximately 67.4%, rather than for leptons, whose BR is 10.86% [59]. For the present case, and when the accuracy metric maximisation is concerned, the combined significance at m e 4 = 200 GeV -30 - for Z(< 1%), 8.57σ for Z A and 20.75σ for the naive significance. These results, and in particular those for the naive significance, indicate that a light VLL characteristic of our model is expected to have a strong presence for a high-luminosity run at the LHC and can be probed well before the end of the LHC programme.
In general, we note that for an ever increasing mass of e 4 , the overall significance reduces, as we have already discussed above. We also see that the rate of decrease is faster for the ZA channel rather than for the VBF and VLBSM ones. Note that VBF signals yield a maximum significance in Z(< 1%) of 2.83σ for m e 4 = 200 GeV and for both the Asimov and accuracy metrics maximization. On the other hand, for the ZA and VLBSM channels with maximized accuracy, the same VLL mass yields a significance of 12.18σ for ZA events and 12.95σ for the VLBSM channel (see Tab. 7). However, for results obtained upon maximization of the Asimov metric the naive significance can be as large as 6.10σ for ZA events and 12.65 for VLBSM ones (see Tab. 8).
For completeness of information, we show in Fig. 13 that fixing m e 4 = 200 GeV and for m ν 4 < m ν 5 , the effect of varying the sterile neutrino mass, m ν 4 , is negligible for any value of the lightest BSM neutrino mass in the range 100 keV to 100 MeV. Note that, while our analysis is generic enough, due to a combination between the seesaw nature of neutrino -31 - Provided that lighter VLLs represent a more interesting case for forthcoming explorations at the LHC we will essentially focus our attention in the mass range [200, 700] GeV for both e 4 and e 5 , in such a way that m e 4 < m e 5 . This choice is based on our discussion in Sec. 2.3.1 just below Eq. (2.17), where two light VLLs below 1 TeV order with a heavy one at around 5 TeV is a viable scenario. First, we fix the mass of the e 4 to be m e 4 = 200 GeV, as this represents the case where we obtain the greatest significance, as shown in Tabs. 7 and 8. We also consider the BSM neutrino to be in the hundreds of keV order. The results of these scans can be seen in Fig. 14 where we show e 4 significance contours in terms of m e 4 and m e 5 . One can immediately see that a varying e 5 mass has a very marginal impact on the e 4 significance, indicating that it is independent of m e 5 . This can be understood from the e 5 → W ν i decay branching fractions shown in Tab. 9. In fact, we observe that the overall BRs do not suffer significant alterations with the varying e 5 mass, and as such, the computed e 4 production and decay cross-section remains very much the same. Therefore we do not expect visible changes in the significance and thus, the impact on the significance is understandably small.
So far all results have been computed for a luminosity of L = 3000 fb −1 . However, the -32 -  Looking first at Figs. 15 and 16, for L = 300 fb −1 , we note that a signal significance at or beyond 5σ level is only achievable for a light VLL of 200 GeV. In particular, the combined significance is 8.84σ for Z(< 1%). This means that, if all backgrounds introduced in Sec. 3 are known with high precision, it will be possible to either discover or exclude a VLL with mass 200 GeV, possibly even before the end of Run III.
For the case of heavy VLLs, the Z(< 1%) significance is no larger than 1.78σ for the case of m e 4 = 1.25 TeV, where VLBSM signals provide the larger contribution. With these results in mind, we would only expect to observe such heavy states in a high luminosity run. In fact, since we are simulating proton-proton collisions at √ s = 14 TeV, at such high masses, pair production of VLLs is increasingly unlikely, meaning that we only expect such heavy states to become visible in either high-luminosity runs or at higher energy colliders. These results confirm that light states are favored to be probed at Run III of the LHC. This is especially noticeable when we use an evolutive algorithm that maximizes the Asimov significance. In plots 17, 18 and 19, we obtain, for a VLL of 200 GeV, significances well above 5σ, at L = 300 fb −1 in all three statistics. Therefore, a 200 GeV VLL characteristic of our model can already be probed by the LHC Run III. We also note that for a VLL of 486 GeV, we are already able to obtain a significance of 4.25σ for Z(< 1%) and 4.64σ for s/ √ s + b. While the latter two do not pass a 5σ baseline, they already represent significant deviations from pure SM processes. We argue that the addition of new signals, such as the ones with jets as mentioned before, should offer the necessary boost to achieve 5σ. This type of argumentation can also apply, for example, for a VLL of 677 GeV where we read a combined significance of 3.45σ.
-33 -  Figure 15: Significance as a function of the luminosity for different statistics and topologies that results from an evolutive algorithm that maximises accuracy. Here, x axis is in logarithmic scale for all plots and in the bottom plot the y axis is also in logarithmic scale. Green plots correspond to VBF signal events, red plots correspond to ZA signals and blue plots are representative of VLBSM topologies. The top plot corresponds to the Asimov significance where backgrounds are exactly known with a systematics of 1%, the middle plot is representative of the naive significance (σ = s/ √ s + b) and the bottom plot corresponds to the Asimov significance where backgrounds are not exactly known. The combined significance shown for L = 300 fb −1 and L = 300 fb −1 is defined as σ C = σ VBF + σ ZA + σ VLBSM . The lightest VLL has mass of 200 GeV, while the second lightest has mass of 3.2 TeV.
-34 -    Figure 17: The same as in Fig. 15 but for an evolutive algorithm that maximizes the Asimov significance.
-36 -     Figure 19: The same as in Fig. 17 but for a lightest VLL mass m = 677 GeV.
-38 -In this work we have studied the collider phenomenology inherent to SU(2) L -doublet VLLs at the LHC, relevant for Run III and beyond. The properties of such exotic leptons were based on a framework built upon unification principles where the strong and EW interactions are ultimately unified with a local family symmetry. One of the major goals of the model under consideration is to offer a potential solution to the flavour problem, including a description for neutrino masses. It features a low-scale theory where new TeV-scale VLLs and VLQs are a natural consequence of the unification of Higgs and matter in common representations, sharing the same gauge and flavour quantum numbers.
We have performed Monte Carlo simulations relying on DL techniques with the aim of determining the statistical significance of an hypothetical VLL discovery at future LHC runs. In this work, simple neural networks were considered, following the implementation of an evolutive algorithm that maximizes either the accuracy metric or the Asimov significance. For the first scenario, we are able to distinguish background events from signal events with an accuracy between 98% to 100%, depending in the signal topology in question, while the second scenario provides an increase in the Asimov significance at the cost of lower overall accuracy (32% to 70%).
We have proposed three distinct signatures for VLLs in our model which can be searched for at the LHC in the ZA, VBF and VLBSM channels with purely leptonic final states. Three distinct statistical significances were subject of our analysis, namely, the Asimov significance Z A , an adapted version of the Asimov significance Z(< 1%) and the well known naive significance s/ √ s + b. A combined result of 27.71σ for Z(< 1%), 20.75σ for s/ √ s + b and finally 8.87σ for Z A was obtained for a luminosity of 3000 fb −1 , a center of mass beam energy √ s = 14 TeV, a VLL mass m e 4 = 200 GeV and a maximized Asimov metric significance. Under the same conditions, but for an accuracy metric search we have obtained a combined significance of 27.96σ for Z(< 1%), 4.33σ for s/ √ s + b and 3.46σ for Z A . In this mass range, the ZA and VLBSM channels provide the dominant contributions. Note that we have also considered relatively light BSM neutrinos with a mass of the order of O(100 keV). Furthermore, we have shown that varying the mass of the lightest BSM neutrino up to O(100 MeV) has a residual effect on the significance. As expected, the luminosity also has a noticeable impact on the significance. In particular, a value at or above 5σ can be achieved for L = 300 fb −1 , meaning that a 200 GeV VLL as predicted in our model can already be probed at the LHC Run-III. However, for larger VLL masses, and in particular for m e 4 = 1.25 TeV, we have observed that a combined 5σ significance in the fully leptonic channels can only be achieved for L = 3000 fb −1 , that is, at the highluminosity LHC, and with a combined significance of 5.66σ if the backgrounds are known with a high precision.
A scenario with two light VLLs, both below 650 GeV, was also studied, where an identical significance for an e 4 discovery was achieved. This follows from a negligible effect played by different m e 5 masses on the decay branching fractions and thus on the signal cross-section.
For all studied cases one has observed that the significance quickly drops if the VLL -39 -masses lie beyond 1 TeV. One of the first steps beyond the work presented here is to add jets to the final states from W decays into light quarks. Since such channels offer larger branching ratios it is likely possible that the significance for higher masses can be increased. Adding these channels would possibly increase the significance of the region around 1 TeV, but also provide a decisive boost for the m e 4 = 486GeV scenario, where significances for Run III luminosities over 4σ are already obtained. As such, we conclude that VLLs in the mass range ∼ [200, 500] GeV are under the capabilities of being either discovered or excluded before the end of Run III. Based on the model under consideration, the observation of VLLs at the reach of forthcoming LHC runs can offer a crucial probe to falsify our model and obtain hints about the high scale dynamics. For example, if the New Physics scale above the EW one, which was defined by the p, f and ω VEVs, is of the order 100 TeV, the e 4 mass as given in (2.17) would imply that the radiatively generated Yukawa couplings κ 5,8 need to be approximately O(10 −2.7 ). However, one should comment here that an even stronger link to the high-scale can be obtained through the study of VLQs due to the tree-level nature of their masses. In fact, while for VLLs there is still some degree of uncertainty steaming from a non-trivial functional dependency of the κ i parameters with masses and couplings, leading contributions to the two lightest VLQ masses are well understood and are proportional to the Yukawa coupling Y 2 ∼ O(10 −2 ). In turn, this would allow us to fix the ω ∼ f and p scales establishing a direct link to the scale where larger symmetries are broken. Having said this, performing collider phenomenology studies for VLQs using similar methods to those employed in this work is one of our key priorities for the near future. A combined study of both VLQ and VLL sectors can offer us a rather complete information about the ω, f and p scales, the sizes of the radiatively generated Yukawa couplings and the physics involved at such high energies.
Other important studies to perform concern Higgs and flavour physics which, due to the presence of three SU(2) L scalar doublets, is highly relevant. In fact, we have chosen a basis where the SM lepton sector has zero mixing with other BSM fermions. However, a more complete approach should consider small deviations from this limit up to flavour physics constraints. Last but not least, the presence of keV-MeV scale neutrinos can potentially offer a DM candidate if, in a basis with a more generic neutrino mixing, it is stable enough. In the longer term, with all such phenomenological studies, we can determine more precisely what are the viable regions of the parameter space that will help us in performing a direct matching between the low-scale and the high-scale regimes of the theory.

A Low-scale SHUT model Lagrangian and Feynman rules
In this appendix the tree-level Lagrangian and Feynman rules between the EW gauge bosons and leptons are presented. To simplify some notation, projection operators are defined as The Yukawa and fermion bilinear interactions are presented in Sec. 2.3 in Eqs. (2.8) and (2.9). Here we write the remaining Lagrangian terms of the low-scale SHUT model by considering all renormalizable, Lorentz and gauge invariant operators. We start by writing out all kinetic terms for the fermions, where repeated index i represents summation over the different generations. The covariant derivative is defined as 8 The kinetic terms for the bosonic sector reads where b and c represent SU(2) L and SU(3) C adjoint indices, while a denotes scalar generations. The scale potential is that of a generic 3HDM model and reads as: The full Lagrangian density for the effective low-energy 3HDM is the sum of all previous sectors and reads as Expanding and rotating (A.6) to the mass basis one arrives to the following Feynman rules: • Lepton and Gauge bosons interactions   : Dimensionless (angular) and dimension-full (kinematic) observables at lab reference frame for VLBSM channel (solid red), with pp → ν (green line), pp → ν + j (yellow line) and pp → ν + jj (purple line) backgrounds where it is considered 30 bins for all histograms. From top left to bottom right, we have distributions for transverse momentum µ + , pseudo-rapidity µ + , cos θ + µ , cos θ ν + µ µ + , MET, transverse mass for W , transverse momentum for W , cos(θ W ), pseudo-rapidity for W and azimuthal angle for µ + -44 -         Figure 23: Dimensionless (angular) and dimension-full (kinematic) observables at the W ± reference frame for VBF and ZA channels (solid red) where it is considered 30 bins for all histograms. VBF channel correspond to plots with 3 distinct backgrounds, while ZA has 4. The same backgrounds from previous plots also applies here. From top left to bottom right, for both channels, we have distributions for pseudo-rapidity for e − , µ + , e 4 andē 4 and transverse momentum for e − , µ + , e 4 andē 4 .

C Neural Network models for different masses
Mass ZA