1 Introduction

The Standard Model (SM) particle spectrum appears to be complete after the Higgs discovery [1, 2], albeit the several shortcomings, both theoretical and experimental. The resolution to any of these shortcomings beg for an extension of the SM framework, which invariably lead us to scenarios with additional bosonic or fermionic degrees of freedom in the particle spectrum. One such possible scenario is an extension with vector like fermions, whose left- and right-chiral counterparts transform identically under the SM gauge symmetry. These VLLs can appear naturally in grand unified theories [3,4,5,6], theories with non-minimal supersymmetric extensions [7,8,9,10,11,12,13], warped or universal extra-dimension [14,15,16,17,18,19,20,21,22], composite Higgs model [23,24,25,26,27,28,29], little Higgs model [30,31,32,33,34], etc. VLLs can acquire masses from the gauge-invariant bilinear dimension-3 bare mass terms. Since they do not achieve masses from the Yukawa couplings alone, unlike fourth generation of chiral fermions, the vector like fermions are weakly constrained from electroweak precision observables and Higgs data [35]. Although the vector like quarks participate in both the production and decay of Higgs boson, only the decay of Higgs boson is modified in presence of vector like leptons (VLLs). The rate of the process where the Higgs decays to two photons receives additional contribution from the charged VLLs in loop and gets modified [36]. SM extended with one or more generations of VLLs have been studied earlier in [37,38,39]. The extensions of Higgs singlet [40], Higgs doublet [41,42,43,44], Higgs triplet [35, 45, 46], left right symmetric model [47,48,49] along with VLLs have been very popular in literature for dark matter (DM) and collider searches throughout.

In this work, we study the \(S_3\)-symmetric two Higgs doublet model (2HDM) [50, 51] augmented with two generations of VLLs. The need to add two generations of VLL instead of one, is to maintain an exact \(S_3\)-symmetry in the Yukawa sector. One of the primary motivations of the \(S_3\)-symmetric 2HDM is aimed at understanding the fermion mass hierarchy within the SM, as it provides a proper description of the mass hierarchy and mixing among the quarks. Non-zero quark masses and non-block-diagonal CKM matrix, compatible with the experiment, makes this special kind of 2HDM endowed with non-abelian \(S_3\) group very attractive. One also notes that unlike the general 2HDM, \(S_3\)-symmetric 2HDM naturally provides a 125 GeV SM-like Higgs boson, which we discuss later. As we want to study the DM phenomenology of the VLL in this model, we impose an additional \(Z_2\) symmetry in the model under which all the VLLs are odd while the SM particles along with the 2HDM is even. In Ref. [43], the authors have used a CP-conserving 2HDM along with one generation of VLLs to study the DM phenomenology. The difference in their study with ours lies not only in the particle spectrum due to the presence of an extra generation of VLL which is mandated by the \(S_3\)-symmetry, but also in the way these VLLs interact with the visible sector particles. To be more specific, the quartic part of the Yukawa Lagrangian being \(S_3\)-symmetric, there exist additional interactions with respect to the reference [43] due to the presence of an extra generation of VLLs. In addition, in our framework, \(S_3\)-symmetry is softly broken by the dimension-3 Dirac as well as Majorana mass terms unlike reference [43], where the authors considered only Dirac mass terms for soft breaking. In our framework the lightest neutral mass eigenstate of the VLL is a viable DM candidate, which produces the correct relic density, and its direct detection cross-sections and thermally averaged annihilation cross-sections in indirect detection are compatible with that of the experiments.

We choose some representative benchmark points from the multi-dimensional parameter space which satisfy the relic density, direct and indirect search constraints and perform collider analysis for some specific multi-lepton channels containing mono-lepton, di-lepton, tri-lepton and four leptons along with missing transverse energy in the final state. Multi-lepton signals have already been analysed in the context of additional VLLs [52, 53]. In addition, there already exists several searches by ATLAS and CMS, in the context of a few beyond SM models, for the final states comprising of mono-lepton [54,55,56], di-lepton [57], tri-lepton [58, 59] and four leptons along with missing transverse energy [60, 61]. We have considered the limits arising out of these studies in our work and highlight how the signals differ in each of our individual cases and the necessity to modify the cuts to optimise our signal events over the SM background.

The paper is structured as follows. In Sect. 2 we briefly discuss the necessary extensions over the \(S_3\)-symmetric 2HDM [50, 51] done in our model. Section 3 deals with the relevant theoretical and experimental constraints to be considered which is followed by the DM phenomenology in Sect. 4. Then we move on to Sect. 5 where we present the collider analysis of the model in the leptonic channels, namely the signals having mono-lepton, di-lepton, tri-lepton and four-lepton along with missing transverse energy in the final state. Finally we summarise and conclude in Sect. 6.

2 Model

We consider the \(S_3\)-symmetric 2HDM augmented with two generations of VLL. The reason for adding two generations of VLL is to ensure an \(S_3\)-symmetric Yukawa Lagrangian. Each generation of VLL consists of one left-handed lepton doublet \(L_{L_i}'\), one right-handed charged lepton singlet \(e_{R_i}'\), one right-handed singlet neutrino \(\nu _{R_i}'\) and their mirror counter parts with opposite chirality but same gauge charges, i.e. \(L_{R_i}'', e_{L_i}''\) and \(\nu _{L_i}''\) with \(i=2.\) These two generations of VLLs are doublets under \(S_3\)-symmetry. Different quantum numbers associated with the particles are shown in Tables 1 and 2. In Table 1, \(Q_{iL}, L_{iL}\) are left-handed quark and lepton doublets respectively in SM with \(i=1,2,3\). \(u_{iR}, d_{iR}\) are right-handed up-type and down-type quark singlets respectively with \(i=1,2,3\).

Table 1 \(SU(2)_L \times SU(3)_C \times U(1)_Y \times Z_2\) quantum numbers assigned to the particles in the model
Table 2 \(S_3\) quantum number assigned to the particles in the model

2.1 Scalar Lagrangian

In the \(S_3\)-symmetric 2HDM, there are two \(SU(2)_L\) doublets \(\phi _1\) and \(\phi _2\) with hypercharge \(Y= + 1\)Footnote 1, which collectively behave like a doublet under \(S_3\)-symmetry, i.e. \(\begin{pmatrix} \phi _1 \\ \phi _2 \end{pmatrix} = \Phi \). For this specific doublet representation, the elements of \(S_3\) is given by [62],

$$\begin{aligned} \begin{pmatrix} \mathrm{cos} \psi ~~ \mathrm{sin} \psi \\ -\mathrm{sin} \psi ~~ \mathrm{cos} \psi \end{pmatrix} , ~~ \begin{pmatrix} \mathrm{cos} \psi ~~ \mathrm{sin} \psi \\ \mathrm{sin} \psi ~~ -\mathrm{cos} \psi \end{pmatrix}, \mathrm{for}~~ \left( \psi = 0, \pm \frac{2 \pi }{3}\right) \nonumber \\ \end{aligned}$$
(1)

After symmetry breaking, \(\phi _i\) can be expressed as,

$$\begin{aligned} \phi _i = \begin{pmatrix} \phi _i^+ \\ \frac{1}{\sqrt{2}} (v_i + h_i + i \rho _i) \end{pmatrix} \end{aligned}$$
(2)

Here \(v_i\)’s are vacuum expectation values (VEV) and \(v_1 = v \cos \beta , ~v_2 = v \sin \beta , v = \sqrt{v_1^2 + v_2^2} = 246\) GeV. \(\tan \beta \) can be defined as the ratio of two vacuum expectation values: \(\tan \beta = \frac{v_2}{v_1}\).

The quartic part of the most general renormalisable \(S_3\)-symmetric scalar potential is given by [50],

$$\begin{aligned} V_4(\phi _1, \phi _2)= & {} \lambda _1 (\phi _1^\dagger \phi _1+\phi _2^\dagger \phi _2)^2 +\lambda _2 (\phi _1^\dagger \phi _2 -\phi _2^\dagger \phi _1)^2 \nonumber \\&+ \lambda _3 \left\{ (\phi _1^\dagger \phi _2+\phi _2^\dagger \phi _1)^2 +(\phi _1^\dagger \phi _1-\phi _2^\dagger \phi _2)^2\right\} \,.\nonumber \\ \end{aligned}$$
(3)

The most general quadratic part of the scalar potential is [50]:

$$\begin{aligned} V_2(\phi _1, \phi _2) =&m_{11}^2 (\phi _1^\dagger \phi _1) + m_{22}^2 (\phi _2^\dagger \phi _2)\nonumber \\&- \{m_{12}^2 (\phi _1^\dagger \phi _2) + \mathrm{h.c.}\}\nonumber \\ \end{aligned}$$
(4)

In Eq. (3), the quartic couplings \(\lambda _1, \lambda _2\) and \(\lambda _3\) are real owing to the hermiticity of the scalar potential. In the quadratic part of the potential in Eq. (4), \(m_{11}^2, m_{22}^2\) are real, \(m_{12}^2\) can be complex in principle. Throughout the analysis, we have taken \(m_{12}^2\) to be real to avoid CP-violation. Though \(m_{11}^2 = m_{22}^2\) and \(m_{12}^2 =\) 0 ensure that the quadratic part of the potential is \(S_3\)-symmetric, this configuration ends up with a massless scalar [50]. Thus for our analysis, we stick to the configuration \(m_{11}^2 = m_{22}^2\) and \(m_{12}^2 \ne \) 0 (which breaks the \(S_3\)-symmetry softly), which fixes the value of \(\tan \beta =1\), following the minimisation condition of the scalar potential:Footnote 2

$$\begin{aligned}&m_{11}^2 = m_{12}^2 \tan \beta - (\lambda _1 + \lambda _3)v^2 \,, \nonumber \\&m_{22}^2 = m_{12}^2 \cot \beta - (\lambda _1 + \lambda _3)v^2 \end{aligned}$$
(5)

The scalar sector of this model consists of SM-like Higgs (h), heavy Higgs (H), pseudoscalar Higgs (A) and charged Higgs (\(H^{\pm }\)). The limit at which h behaves as SM-like Higgs boson is defined as the alignment limit. This limit is naturally achieved in this model [50].

Transformations from flavour basis to mass basis of the scalars occur through the following \( 2 \times 2 \) orthogonal matrix:

$$\begin{aligned} \begin{pmatrix} w^\pm (z) \\ H^\pm (A) \end{pmatrix} = \begin{pmatrix} \cos \beta &{}\quad \sin \beta \\ -\sin \beta &{}\quad \cos \beta \end{pmatrix} \begin{pmatrix} \phi _1^\pm (\rho _1) \\ \phi _2^\pm (\rho _2) \end{pmatrix} \end{aligned}$$
(6)

\(w^\pm \) and z being the charged and neutral Golstone boson respectively.

The light Higgs and the heavy Higgs of the model are connected to there flavour eigenstates via,

$$\begin{aligned} \begin{pmatrix} h \\ H \end{pmatrix} = \begin{pmatrix} \cos \beta &{} \quad \sin \beta \\ -\sin \beta &{}\quad \cos \beta \end{pmatrix} \begin{pmatrix} h_1 \\ h_2 \end{pmatrix} \end{aligned}$$
(7)

In Eq. (3), the quartic couplings \(\lambda _1, \lambda _2, \lambda _3\) can be expressed in terms of the physical scalar masses as:

$$\begin{aligned} \lambda _1= & {} \frac{M_h^2 - M_H^2 + M_{H^\pm }^2}{2 v^2} \,, \nonumber \\ \lambda _2= & {} \frac{(M_{H^\pm }^2 - M_A^2)}{2 v^2} \,, \nonumber \\ \lambda _3= & {} \frac{(M_H^2 - M_{H^\pm }^2)}{2 v^2} \,, \end{aligned}$$
(8)

2.2 Yukawa Lagrangian

The dimension-4 terms in \(S_3\)-symmetric Yukawa Lagrangian involving two generations of VLLs can be written as,

$$\begin{aligned} {\mathcal {L}}_4= & {} -y_2[(\overline{L_{L_1}'}\tilde{\phi _2}+\overline{L_{L_2}'}\tilde{\phi _1})\nu _{R_{1}}' + (\overline{L_{L_1}'}\tilde{\phi _1}-\overline{L_{L_2}'}\tilde{\phi _2})\nu _{R_{2}}']\nonumber \\&-y_4[(\overline{L_{R_1}''}\tilde{\phi _2}+\overline{L_{R_2}''}\tilde{\phi _1})\nu _{L_{1}}'' \nonumber \\&+ (\overline{L_{R_1}''}\tilde{\phi _1}-\overline{L_{R_2}''}\tilde{\phi _2})\nu _{L_{2}}''] - y_2'[(\overline{L_{L_1}'}\phi _2+\overline{L_{L_2}'}\phi _1)e_{R_{1}}'\nonumber \\&+ (\overline{L_{L_1}'}\phi _1 - \overline{L_{L_2}'}\phi _2)e_{R_{2}}'] \nonumber \\&- y_4' [ (\overline{L_{R_1}''} \phi _2 + \overline{L_{R_2}''} \phi _1) e_{L_{1}}'' + (\overline{L_{R_1}''}\phi _1-\overline{L_{R_2}''}\phi _2)e_{L_{2}}''] \nonumber \\&+ \mathrm{h.c.} \end{aligned}$$
(9)

Next we write down the dimension-3 Dirac and Majorana mass terms present in the Yukawa Lagrangian, which break \(S_3\)-symmetry softly:Footnote 3

$$\begin{aligned} {\mathcal {L}}_{3}= & {} - M_{1}\, \overline{L_{L_1}'} \, L_{R_1}'' - M_{2}\, \overline{L_{L_1}'} \, L_{R_2}'' - M_{3}\, \overline{L_{L_2}'} \, L_{R_1}'' \nonumber \\&- M_{4}\, \overline{L_{L_2}'} \, L_{R_2}'' - \frac{1}{2} M_{5} \, \overline{\nu _{L_{1}}^{c ~ ''}} \, \nu _{L_{1}}''\nonumber \\&- \frac{1}{2} M_{6} \, \overline{\nu _{L_{2}}^{c~''}} \, \nu _{L_{2}}'' \nonumber \\&- \frac{1}{2} M_{7} \, \overline{\nu _{R_{1}}^{c~'}} \, \nu _{R_{1}}' - \frac{1}{2} M_{8} \, \overline{\nu _{R_{2}}^{c~'}} \, \nu _{R_{2}}' - M_{9} \, \overline{\nu _{L_{1}}''} \, \nu _{R_{1}}' \nonumber \\&- M_{10} \, \overline{\nu _{L_{1}}''} \, \nu _{R_{2}}' - M_{11} \, \overline{\nu _{L_{2}}''} \, \nu _{R_{1}}'\nonumber \\&- M_{12} \, \overline{\nu _{L_{1}}''} \, \nu _{R_{2}}' \nonumber \\&- M_{L_{1}} \, \overline{e_{L_{1}}''} \, e_{R_{1}}' - M_{L_{2}} \, \overline{e_{L_{2}}''} \, e_{R_{2}}' - M_{L_{3}} \, \overline{e_{L_{1}}''} \, e_{R_{2}}' \nonumber \\&- M_{L_{4}} \, \overline{e_{L_{2}}''} \, e_{R_{1}}' \, + \, \mathrm{h.c.} \end{aligned}$$
(10)

Here the fields with superscript “c” are the charge conjugated fields. The subscripts of “\({\mathcal {L}}\)” in Eqs. (9) and (10) denote the mass dimensions of the operators. Thus whole Yukawa lagrangian can be written as the sum of \({\mathcal {L}}_3\) and \({\mathcal {L}}_4\) as:

$$\begin{aligned} {\mathcal {L}}_{\mathrm{Yuk}} = {\mathcal {L}}_3 + {\mathcal {L}}_4 \,. \end{aligned}$$
(11)

Here we can construct eight neutral mass eigenstates (\(N_i , i = 1\)..8) out of two generations of vector leptons. To ensure that the lightest VLL \(N_1\) is the DM candidate, we impose a \(Z_2\)-symmetry (mentioned in Table 1) under which all the VLLs are odd and all the SM leptons are even. This forbids the mixing between the SM leptons and VLLsFootnote 4. The two Higgs doublets are assumed to be even under this \(Z_2\)-symmetry.

In this set up, \(8 \times 8 \) neutral VLL mass matrix in the basis \((\nu _{L_1}^{'c}, \nu _{R_1}', \nu _{R_1}'', \nu _{L_1}^{''c}, \nu _{L_2}^{'c}, \nu _{R_2}', \nu _{R_2}'', \nu _{L_2}^{''c})^T\) reads :

$$\begin{aligned}&\frac{1}{2} \left( ~~\overline{\nu _{L_1}'} ~~ \overline{\nu _{R_1}^{'c}} ~~ \overline{\nu _{R_1}^{''c}} ~~ \overline{\nu _{L_1}''} ~~ \overline{\nu _{L_2}'} ~~ \overline{\nu _{R_2}^{'c}} ~~ \overline{\nu _{R_2}^{''c}} ~~ \overline{\nu _{L_2}''} ~~\right) M_\nu \begin{pmatrix} \nu _{L_1}^{'c} \\ \nu _{R_1}' \\ \nu _{R_1}'' \\ \nu _{L_1}^{''c} \\ \nu _{L_2}^{'c} \\ \nu _{R_2}' \\ \nu _{R_2}'' \\ \nu _{L_2}^{''c} \end{pmatrix} \end{aligned}$$

With

$$\begin{aligned} M_\nu = \begin{pmatrix} 0 &{} \frac{y_2 v_2}{\sqrt{2}} &{} M_1 &{} 0 &{} 0 &{} \frac{y_2 v_1}{\sqrt{2}} &{} M_2 &{} 0 \\ \frac{y_2 v_2}{\sqrt{2}} &{} M_7 &{} 0 &{} M_9 &{} \frac{y_2 v_1}{\sqrt{2}} &{} 0 &{} 0 &{} M_{11} \\ M_1 &{} 0 &{} 0 &{} \frac{y_4 v_2}{\sqrt{2}} &{} M_3 &{} 0 &{} 0 &{} \frac{y_4 v_1}{\sqrt{2}} \\ 0 &{} M_9 &{} \frac{y_4 v_2}{\sqrt{2}} &{} M_5 &{} 0 &{} M_{10} &{} \frac{y_4 v_1}{\sqrt{2}} &{} 0 \\ 0 &{} \frac{y_2 v_1}{\sqrt{2}} &{} M_3 &{} 0 &{} 0 &{} \frac{ - y_2 v_2}{\sqrt{2}} &{} M_4 &{} 0 \\ \frac{y_2 v_2}{\sqrt{2}} &{} 0 &{} 0 &{} M_{10} &{} \frac{ - y_2 v_2}{\sqrt{2}} &{} M_8 &{} 0 &{} M_{12} \\ M_2 &{} 0 &{} 0 &{}\frac{y_4 v_1}{\sqrt{2}} &{} M_4 &{} 0 &{} 0 &{} \frac{ - y_4 v_2}{\sqrt{2}} \\ 0 &{} M_{11} &{} \frac{y_4 v_1}{\sqrt{2}} &{} 0 &{} 0 &{} M_{12} &{} \frac{ - y_4 v_2}{\sqrt{2}} &{} M_6 \end{pmatrix}\nonumber \\ \end{aligned}$$
(12)

Since \(M_\nu \) is hermitian, it can be brought to diagonal form by the following transformation via unitary matrix V:

$$\begin{aligned} V^\dag M_\nu V = \mathrm{diag} ~(M_{N_1}, M_{N_2}, M_{N_3}, M_{N_4}, M_{N_5}, M_{N_6}, M_{N_7}, M_{N_8}). \end{aligned}$$
(13)

Among all states \(N_1\) is the lightest and \(M_{N{j+1}} > M_{N_j}\) for \(j = 1,2,\ldots 7\).

In the charged VLL sector, the mass matrix is,

$$\begin{aligned}&\left( ~~\overline{e_{L_1}'} ~~ \overline{e_{L_1}''} ~~ \overline{e_{L_2}'} ~~ \overline{e_{L_2}''} ~~\right) M_c \begin{pmatrix} e_{R_1}' \\ e_{R_1}'' \\ e_{R_2}' \\ e_{R_2}'' \end{pmatrix} \end{aligned}$$

where,

$$\begin{aligned} M_c = \begin{pmatrix} \frac{y_2' v_2}{\sqrt{2}} &{} M_1 &{} \frac{y_2' v_1}{\sqrt{2}} &{} M_2 \\ M_{L_1} &{} \frac{y_4' v_2}{\sqrt{2}} &{} M_{L_3} &{} \frac{y_4' v_1}{\sqrt{2}} \\ \frac{y_2' v_1}{\sqrt{2}} &{} M_3 &{} -\frac{y_2' v_2}{\sqrt{2}} &{} M_4 \\ M_{L_4} &{} \frac{y_4' v_1}{\sqrt{2}} &{} M_{L_2} &{} -\frac{y_4' v_2}{\sqrt{2}} \\ \end{pmatrix} \end{aligned}$$
(14)

The non-hermitian \(M_c\) can be diagonalised by using the following bi-unitary transformation, with the unitary matrices \(U_L\) and \(U_R\),

$$\begin{aligned} U_L^\dag M_c U_R = \mathrm{diag} ~(M_{E_1^+}, M_{E_2^+}, M_{E_3^+}, M_{E_4^+}) \end{aligned}$$
(15)

Here we follow the same convention as the neutral sector, i.e. \(M_{E_{i+1}^+} > M_{E_{i}^+}\) for \(i= 1,2,3.\)

3 Constraints to be considered

The \(S_3\)-symmetric 2HDM model has an extended scalar sector and we have included VLLs in our model with some being \(SU(2)_L\) doublets . The addition of particles under a new symmetry which are not singlets under the SM gauge symmetry would lead to interactions and mixings that could affect several existing experimental observations. In addition, the new parameters in the model would also have to adhere to theoretical constraints to make the model mathematically consistent. We look at the most relevant ones and extract the constraints they could put on the model parameters.

3.1 Theoretical constraints

  • Perturbativity: The quartic couplings \(\lambda _1. \lambda _2, \lambda _3\) are taken to be perturbative: \(|\lambda _i| < 4 \pi ,~~ i=1,2,3.\) For Yukawa couplings the corresponding bound reads: \(|y_2|, |y_2'|, |y_4|, |y_4'| < \sqrt{4 \pi }.\)

  • Stability conditions of the potential: The quartic couplings \(\lambda _1, \lambda _2\) and \(\lambda _3\) are also constrained from the stability conditions, so that the potential remains bounded from below in any field direction:

    $$\begin{aligned}&\lambda _1 + \lambda _3 \ge 0 \,, \nonumber \\&\lambda _1 \ge 0 \,, \nonumber \\&2 \lambda _1 - \lambda _2 + \lambda _3 \ge 0 \,. \end{aligned}$$
    (16)
  • Higgs mass : We keep the SM-like Higgs mass within the range: 125.1 ± 0.14 GeV [64]. We have used the SM-like Higgs mass as an input parameter and fix its value to \(M_h = 125\) GeV throughout the analysis.

3.2 Constraints from electroweak precision observables

The additional extra scalars and VLLs that are not SM singlets would interact with the W and Z boson. There contributions in the self-energy correction diagrams would modify the mass of the weak gauge bosons and related electroweak precision observables parametrised by the oblique parameters ST and U [65, 66]. Using \( M_h = 125\) GeV and top mass as 172.5 GeV, the allowed ranges [67] are

$$\begin{aligned} \Delta S = 0.04 \pm 0.11 , \Delta T = 0.09 \pm 0.14, \Delta U = -0.02 \pm 0.11 \end{aligned}$$
(17)

Notably the deviations in the T-parameter from its SM value enforces the mass splitting between the neutral and the charged scalar to be less than \(\sim 50-100\) GeV. Regarding the contributions coming from the VLL counterpart, the differences between the Yukawa couplings \(|y_2 - y_2'|\) and \(|y_4 - y_4'|\) should be small to evade the bound coming from T-parameter [35].

3.3 Higgs signal strength

Since we demand that the lightest CP-even scalar h is the SM like Higgs, it is imperative that we should check whether the production and decay of h in our model is consistent with the current experimental data. The compatibility can be checked by computing the signal strength in the ith decay mode of h as,

$$\begin{aligned} \mu _i = \frac{\sigma ^{S_{3}} {\mathrm{2HDM}}(pp \rightarrow h)~ {\mathrm{BR}}^{\mathrm{S}}_{3} {\mathrm{2HDM}}(h \rightarrow i)}{\sigma ^{{\mathrm{SM}}}(pp \rightarrow h)~ {\mathrm{BR}}^{\mathrm{SM}}(h \rightarrow i)}\,. \end{aligned}$$
(18)

Assuming gluon-gluon fusion to be the most dominant Higgs production process at the LHC, one can rewrite the signal strength \(\mu _i\) as,

$$\begin{aligned} \mu _i= & {} \frac{\sigma ^{\mathrm{{S_3 \mathrm {2HDM}}}}(gg \rightarrow h)}{\sigma ^{\mathrm{{SM}}}(gg \rightarrow h)} ~\frac{\Gamma _i^\mathrm{{S_3 \mathrm {2HDM}}}(h \rightarrow i)}{\Gamma _{\mathrm{{tot}}}^\mathrm{{S_3 \mathrm {2HDM}}}} ~\frac{\Gamma _{\mathrm{{tot}}}^{\mathrm{{SM}}}}{\Gamma _i^{\mathrm{{SM}}}(h \rightarrow i)} \nonumber \\= & {} \frac{\Gamma ^{\mathrm{{S_3 \mathrm {2HDM}}}}(h \rightarrow gg)}{\Gamma ^{\mathrm{{SM}}}(h \rightarrow gg)} ~\frac{\Gamma _i^\mathrm{{S_3 \mathrm {2HDM}}}(h \rightarrow i)}{\Gamma _{\mathrm{{tot}}}^\mathrm{{S_3 \mathrm {2HDM}}}} ~\frac{\Gamma _{\mathrm{{tot}}}^{\mathrm{{SM}}}}{\Gamma _i^{\mathrm{{SM}}}(h \rightarrow i)}\nonumber \\ \end{aligned}$$
(19)

where \(\Gamma _{\mathrm{{tot}}}\) stands for the total decay width of SM like Higgs.

Since the Alignment limit is maintained naturally in this model, the signal strengths in the decay channels of h into WW [68, 69], ZZ [70, 71], \(b\overline{b}\) [72, 73], \(\tau ^+\tau ^-\) [74, 75] are satisfied without any loss of generality. On the other hand, the loop-induced decay mode of the \(h \rightarrow \gamma \gamma \) will get additional contribution from the charged vector leptons and the non-standard charged scalars. For the chosen benchmark points, the \(h \rightarrow \gamma \gamma \) signal strength remains within 2\(\sigma \)-range of the current experimental value [76, 77]. The detailed formula for the decay width of \(h \rightarrow \gamma \gamma \) channel is relegated to Appendix A.

Table 3 DM masses along with DM relic density, spin-dependent, spin-dependent cross-sections, thermally averaged annihilation cross-sections and dominant annihilation modes for five benchmarks

4 Dark matter phenomenology

As mentioned earlier, the lightest neutral VLL \(N_1\) is a viable DM candidate due to its stable nature ensured by the imposed \(Z_2\)-symmetry. In this section we investigate the parameter space spanned by relevant and independent model parameters which are compatible with relic density [78], direct and indirect DM searches. Setting the mass of the SM-like Higgs to 125 GeV, we scan the independent parameters of the model in the following range:

$$\begin{aligned}&|y_2| \in [1.0: 3.0] \,, ~ |y_2'| \in [1.0: 3.0] \,, ~ |y_4| \in [0.5 : 2.0] \,, ~ |y_4'| \nonumber \\&\quad \in [1.0 : 3.0] \,, \nonumber \\&M_{L_1} \in [50 ~\mathrm{GeV} : 500 ~\mathrm{GeV}], ~~ M_{L_2} \in [50 ~\mathrm{GeV} : 500 ~\mathrm{GeV}] \,, \nonumber \\&M_5 \in [10 ~\mathrm{GeV} : 500 ~\mathrm{GeV}], ~ M_6 \in [10 ~\mathrm{GeV} : 500 ~\mathrm{GeV}] \,, \nonumber \\&M_7 \in [10 ~\mathrm{GeV} : 1 ~\mathrm{TeV}], ~ M_8 \in [10 ~\mathrm{GeV} : 500 ~\mathrm{GeV}] \,, \nonumber \\&M_H \in [500 ~\mathrm{GeV} : 800 ~\mathrm{GeV}], ~ M_{H^+} \in [500 ~\mathrm{GeV} : 800 ~\mathrm{GeV}] \,, \nonumber \\&M_A \in [500 ~\mathrm{GeV} : 800 ~\mathrm{GeV}] \,. \end{aligned}$$
(20)

For the analysis, we derive the interactions, mass and mixings in the model which is then implemented in FeynRules [79]. The CALCHEP [80] model files obtained from FeynRules is made compatible to use with micrOMEGAs [81]. The DM observables like relic density (\(\Omega _{\mathrm{DM}} h^2\))Footnote 5, spin-dependent (\(\sigma _\mathrm{SD}\)) and independent cross-sections (\(\sigma _{\mathrm{SI}}\)), thermally averaged annihilation cross-sections (\( \langle \sigma v \rangle \)) etc. are computed with the help of micrOMEGAs. The observed relic abundance obtained from the PLANCK experiment [78] lies in the band: \(0.1166 \le \Omega _\mathrm{DM} h^2 \le 0.1206\). Furthermore the parameter space is also constrained by the bounds coming from several direct detection experiments like LUX [82], PANDAX-II [83], Xenon-1T [84], PICO [85], etc. and from the indirect detection bounds coming from FERMI-LAT [86], MAGIC [87] and PLANCK [78] experiments.

Having discussed what could be the possible constraints coming from the DM sector, let us illustrate it more in a model specific manner. The relic density can be computed as the function of the thermally averaged DM pair annihilation cross-sections. Since the lightest neutral VLL \(N_1\) in this model is the admixture of SU(2) singlets and doublets, it has couplings with both \(W^\pm \) and Z-bosons. Depending on the mass of DM, the s-channel pair annihilation of the DM to \(W^+ W^-, ZZ, Zh, hh, f\overline{f}\) mediated by hHA and Z-boson can occur. Besides, t-channel annihilation to ZZZhhh (\(W^+ W^-\)) via \(N_i,~ i=1,2,\ldots ,8\) (\(E_j^\pm ,~ j= 1,2,..,4\)) as mediator also contribute to the \( \langle \sigma v \rangle \).

In Table 3, we present five representative points BP1, BP2, BP3, BP4 and BP5 with increasing DM mass, which satisfy the relic density constraints as well as the direct and indirect detection bounds. Corresponding values of \(\sigma _{\mathrm{SI}}\), \(\sigma _{\mathrm{SD}}\) and \(\langle \sigma v \rangle \) are also tabulated in the same table. Since the DM is Majorana-like, due to the Z-mediated process, for all the benchmarks \(\sigma _{\mathrm{SI}} < \sigma _{\mathrm{SD}}\). As mentioned earlier, from the minimisation conditions of the scalar potential of the \(S_3\)-symmetric 2HDM, with \(m_{11}^2 = m_{22}^2\) and \(m_{12}^2 \ne 0\), \(\tan \beta \) is fixed to 1. Now \(Hf \overline{f}\) and \(A f \overline{f}\) (“f” is SM fermion) couplings being proportional to \((\cos \beta -\sin \beta )\), vanish at \(\tan \beta = 1\) limit. Thus s-channel annihilations into SM fermions mediated by H or A are absent in this framework. The only surviving s-channel annihilation to SM fermions is mediated by h. For the first two points, since \(M_{N_1} < M_Z\), the DM pair dominantly annihilate into \(W^+ W^-\). After crossing the ZZ-threshold, the major annihilation occurs to the final state ZZ along with \(W^+ W^-\). Since the HA-mediated s-channel annihilation to \(W^+ W^-\) and ZZ are forbidden at alignment limit, t-channel annihilation to \(W^+ W^-\) and ZZ via \(E_i^\pm \) becomes the major contributor. Moderate \(Z N_i N_j\) couplings (with \( i \ne j\)) participating in the annihilation come out to be the main reason behind this dominance. To put this in perspective, we list the dominant annihilation modes for the aforementioned five benchmark points in Table 3 too. We note that since the alignment limit is maintained naturally in this model, the s-channel HA-mediated processes leading to \(W^+ W^-\) and ZZ final state will not contribute to \( \langle \sigma v \rangle \). The scattering of DM with the nuclei within the detector material mediated by Z or h, gives rise to spin-dependent and spin-independent cross-sections (\(\sigma _{\mathrm{SD}}\) and \(\sigma _\mathrm{SI}\)) respectively, which in turn are constrained from direct detection experiments. This forces \(h N_1 N_1\) and \(Z N_1 N_1\) couplings to be small enough to circumvent the direct detection bound. This is merely a choice and the smallness of the aforementioned couplings is achieved by tuning relevant parameters of the model. Due to the Majorana nature of the DM, the WIMP-nucleon cross-section is dominated by spin-dependent interactions mediated by Z boson. Hence we have to consider the direct detection bound on the \(\sigma _{\mathrm{SD}}\) coming from the PICO experiment [85].

In Fig. 1, we depict the variation of spin-independent cross-section (\(\sigma _{\mathrm{SI}}\)) with DM mass predicted by our model (magenta curve). The black line and the green band correspond to 90% confidence level (C.L.) and 2\(\sigma \) sensitivity predicted by Xenon-1T experiment. We can conclude that for the dark matter mass range allowed by relic density constraint, \(\sigma _{\mathrm{SI}}\) are smaller and allowed by the experimental limit shown by the black line in Fig. 1. Therefore the spin-independent cross-sections for all the chosen benchmarks evade the constraint coming from the direct detection experiments. As mentioned earlier, the strongest bound on spin-dependent cross-section comes from PICO experiment [85]. For the chosen benchmark points, the spin-dependent cross-section remain below the experimental limit as can be seen from Fig. 2.

Fig. 1
figure 1

Spin-independent cross-section of the proton as a function of the DM mass in our model(magenta curve), and the experimental upper limits from LUX(red curve) [82], PANDA-X(blue curve) [83], XENON1T (dotted black curve with 90% C.L. and 2\(\sigma \) sensitivity bands (green band)) [84]

Fig. 2
figure 2

Spin-dependent cross-section of proton as a function of the DM mass in our model (magenta curve), and the experimental upper bound from PICO-60 [85] experiment

Indirect detection experiments look for annihilation of the DM pair to SM particles through various channels that could produce distinctive signatures in cosmic rays. In Fig. 3, we show the variation of the thermally averaged annihilation cross-section as a function of dark matter mass. The magenta curve signifies the variation of annihilation cross-section for our model. Combined results from the FERMI-LAT and MAGIC experiments [87] are represented by the dashed lines. Here the blue, black, red and green dashed curves show the variation of \(\langle \sigma v \rangle \) with DM masses for the annihilation to \(\mu \mu , \, \tau \tau , \, b \overline{b}\) and \(W^+ W^-\) respectively. We find that the parameter space characterised by our benchmarks survive the bounds coming from the indirect detection experiments. We however note that for the DM mass range of 100-200 GeV lies quite close to the experimental bounds from the indirect detection and may become sensitive to future data from indirect detection experiments. We have also incorporated the experimental results obtained from PLANCK data [78] in our analysis, though we have not shown it in Fig. 3. We have checked that the curve representing our model in Fig. 3 lies well below the experimental limits from PLANCK.

Fig. 3
figure 3

Variation of thermally averaged annihilation cross-section with DM masses. The magenta and the dashed curves represent the variations of \(\langle \sigma v \rangle \) for the model and various annihilation cross-sections predicted by combined results of FERMI-LAT and MAGIC experiments [87] respectively. Colour coding is expressed in legends

Table 4 Masses of neutral and charged VLLs for five benchmarks

5 Collider searches

In this section we focus on the collider phenomenology of our model. We study the most likely signals of the model that may manifest itself at the current and future runs of the large hadron collider (LHC). As the model consists of an extension of the spectrum in the electroweak and leptonic sector, it becomes quite clear that the production of these new exotic particles would be limited by their cross section if they are too heavy. In fact, the limits on weakly interacting BSM particles are still quite weak from LHC. In this model, VLLs with unbroken \(Z_2\)-symmetry have no mixing with the SM fermions. Thus, the production of these VLL’s will have to be in pairs and they would decay to a SM particle and a lighter component of the VLL. We therefore focus on the relatively lighter spectrum of the exotics whose lightest neutral component is the DM candidate, represented by the states for the five benchmark points (BP) shown in Table 3 and consistent with the DM phenomenology presented in the previous section. The mass of the remaining VLL components which correspond to the same five BP’s viz. \(M_{N_i}, \) with \(i=1,2,\ldots ,8\) and \(M_{E_j^\pm }\) with \(j=1,2,\ldots ,4\) are tabulated in Table 4.

The pair production of the VLL would give rise to lepton rich final states, that may include mono-lepton, di-leptons, tri-leptons and four-leptons along with in the final states. Note that in the absence of any mixing between the SM leptons and VLL’s, the all hadronic multi-jet + is the dominant signal. However this signal would be swamped by huge SM backgrounds, which leads us to consider multi-lepton final states starting with at least one charged lepton (\(e/\mu \)) as a more useful signal for this model. We shall perform the analysis for the collider signals based on five benchmark points (BP1, BP2, BP3, BP4, BP5) given in Table 4. We tabulate the two-body and three-body decay branching ratios of the charged and neutral VLLs in Appendix B (Tables 18 and 19 and 20). We must however note that for all benchmark points, the relative mass splittings among the mother and daughter particles of the VLL in the cascade decays are not very large, leading to a somewhat compressed spectrum. This would imply relatively softer decay products in the final state for some of the benchmark points leading to challenges in signal-background discrimination, as we will see in our analysis. We therefore try to use machine learning methods in a few channels to check what kind of improvement one may achieve over the traditional cut-based analysis.

To check that our choice of benchmark points do not conflict any existing LHC analysis in a given leptonic channel, we validate these points against existing multi-lepton searches by the ATLAS and CMS collaborations. For example, the final state containing originated from the decay of \(W'\) has been studied by ATLAS at \(\sqrt{s} = 8\) TeV and integrated luminosity of 20.3 fb\(^{-1}\) in [54], with a similar study carried out by CMS both at \(\sqrt{s} = 8\) TeV and 13 TeV [55, 56]. The electroweak production of charginos and sleptons decaying into final states with has been explored by ATLAS at 13 TeV LHC [57]. Similarly search results for final state arising from the decay of pair produced chargino-neutralino with degenerate masses (with mass splitting at electroweak scale) has been reported by ATLAS at \(\sqrt{s} = 13\) TeV in Ref. [59] ( [58]). In addition, search for the more robust final state with four or more charged leptons in supersymmetric framework by ATLAS at 13 TeV LHC has also been summarised in Ref. [60]. Finally, a detailed study of the multi-lepton final state coming from the decay of doubly- and singly-charged Higgs bosons has also been performed by ATLAS at 13 TeV LHC [61]. Although the above studies are in context of other BSM scenarios, the overlap with our signal topology allows us to use these studies to check whether our representative points are allowed or not. The checks have been performed for the five benchmarks using the mono-lepton [88], di-lepton [89] and multi-lepton [90] searches in Madanalysis5 [91,92,93,94,95,96].

For the chosen benchmark points, we implement the model using FeynRules [79], which gives the required UFO that is fed in MG5aMC@NLO [97] to generate the signal and background events with the cross-section at the leading order (LO).

The LO production cross-sections at the LHC for signal and SM backgrounds are calculated using the NNPDF3.0 parton distribution functions (PDF). To simulate the showering and hadronisation, the parton level events are passed through Pythia8 [98]. Finally, we implement the detector effects in our analysis using the default CMS detector simulation card for LHC available in Delphes-3.4.1 [99]. For jet reconstruction, the anti-\(k_t\) clustering algorithm has been used throughout. Besides the traditional cut-based analysis to compute the signal significance, more sophisticated technique, i.e. Decorrelated Boosted Decision Tree (BDTD) algorithm is used for improvement. For such analysis, the Toolkit for Multivariate Data Analysis (TMVA) package [100] has been used. Details of the package will be discussed later in Sect. 5.1. The signal significance \({\mathcal {S}}\) is derived using \({\mathcal {S}} = \sqrt{2\Big [(S + B) \log \Big (\frac{S + B}{B}\Big )- S\Big ]}\), with S(B) denoting the number of signal (background) events surviving the cuts applied on the kinematical variables. The number of signal (S) and the background (B) events can be calculated as:

$$\begin{aligned} S(B) = \sigma _{S(B)} \times {\mathcal {L}} \times \epsilon _{S(B)} \,, \end{aligned}$$
(21)

where \(\sigma _{S(B)},~{\mathcal {L}},~\epsilon _S (\epsilon _B)\)Footnote 6 denote the signal (background) cross-section, integrated luminosity and signal (background) cut-efficiency respectively. Following this strategy, let us proceed to perform the collider analysis of the aforementioned channels at 14 TeV high luminosity (HL)-LHC.

5.1 Mono-lepton final state

To include all possible processes leading to a signal containing mono-lepton and missing transverse energy () in the final state, we take into account the pair production and associated production of the VLL’s:

$$\begin{aligned}&p p \rightarrow E_i^+ E_j^- \nonumber \\&p p \rightarrow N_k N_m \nonumber \\&p p \rightarrow E_i^\pm N_k \end{aligned}$$
(22)

where \(i,j = 1\ldots 4\) and \( k,m = 1\ldots 8\). Following the decay cascades listed in Appendix B, of all the final states arising from the decay of VLLs, we choose the final states yielding one lepton \(\ell _1\) (electron or muon) with a minimum transverse momentum \(p_T^{\ell _1} > 10\) GeV and reject any additional lepton with \(p_T^{\ell } \ge 10\) GeV. We also put veto on any hadronic activity by rejecting all jets with \(p_T^{j} > 20\) GeV. This ensures that the signal consists of a single charged lepton with \(p_T^{\ell _1} > 10\) GeV and .

The irreducible SM background for this final state is the W boson mediated \(p p \rightarrow \ell \nu \), with \(\ell = e, \mu \). There can be additional contributions from the di-boson productions, such as \(W^\pm \,Z, \, W^+ \,W^- \) and ZZ, which yield one or more charged leptons in the final state if only leptonic decays are allowed and where the additional charged leptons are missed. Similarly, the multi-jet QCD background could also be a source of the mono-lepton background provided one of the jets is mis-tagged as a charged lepton. Although the probability of mis-identifying a jet as a charged lepton is rather small (\(\lesssim 10^{-5}\)) [101], the sheer size of the QCD cross section makes it non-negligible. However, we also require a large missing transverse energy in the final state along with a jet veto which helps in suppressing the QCD background to negligible values. Thus in the study, we can afford to ignore this background completely.

To generate the signal and backgrounds, we apply the following criterion to identify the isolated objects (\(\Delta \, R_{ij}>0.4\)):

$$\begin{aligned} p_{T}^j> 20 ~\mathrm{GeV},~~~~|\eta _j|< 5.0, \nonumber \\ p_{T}^{\ell } > 10 ~\mathrm{GeV},~~~~|\eta _\ell | < 2.5, \end{aligned}$$
(23)

In Table 5, the cross-sections of the signal () for the chosen benchmark points are tabulated for 14 TeV LHC.

Table 5 The LO cross-sections for the signal
Fig. 4
figure 4

Normalised distributions of (a) \(p_T^{\ell _1}\), (b) , (c) \(M_T\), and (d) \(M_{\mathrm{eff}}\) for channel at 14 TeV HL-LHC

To perform the cut-based analysis, we apply the following selection cuts on chosen kinematic variables to disentangle the signal from SM backgrounds:Footnote 7

  • \(A_1\): From Fig. 4a it can be seen that the \(p_T\)-distribution of the lepton for the SM background coming from the decay of W boson has the sharp Jacobian peak at \(\sim M_W/2\), whereas the corresponding distribution is smeared for the signal where the charged lepton comes from the cascade decays of the heavy VLLs. However, a large part of the signal overlaps where the background peaks. Thus we demand that the charged lepton has a minimum transverse momentum \(p_T^{\ell } > 20\) GeV to exclude a significant part of the sharply peaked background (magenta line) without losing too many signal events.

  • \(A_2\): A lower cut on the missing transverse energy, i.e. GeV helps to reduce the background drastically as the distribution for the background (magenta line) in Fig. 4b peaks sharply at lower value than the signal. Unlike the background where the neutrinos carry most of the imbalanced missing transverse energy, the signal gets contributions from neutrinos as well as from the much massive DM candidate (\(N_1\)) which is the end-product of all cascades giving rises to a much larger in the signal distribution.

  • \(A_3\): The next kinematic variables used for separating signal from background is transverse mass \((M_T)\) which is defined as [56],

    (24)

    where is the azimuthal angle between the lepton and . In Fig. 4c, the \(M_T\) distribution sharply peaks around \(M_W\) for the background as expected, while the signal has a comparatively smeared distribution. Thus we demand \(M_T > 100\) GeV to eliminate the sharp background peak which in turn enhances the signal significance.

  • \(A_4\): Distribution of \(M_{\mathrm{eff}}\) is depicted in Fig. 4d. \(M_{\mathrm{eff}}\) is defined by the scalar sum of the lepton \(p_T\) and . We find that putting a lower cut \(M_{\mathrm{eff}} > 110\) GeV for all the benchmark points helps enhance the signal over background.

Table 6 The cut-flow for signal and backgrounds along with the significances for BP1, BP2, BP3, BP4 and BP5 at 14 TeV HL-LHC for 3 ab\(^{-1}\) integrated luminosity for the channel

We summarize the cut flow for both signal and background and calculate the signal significances for all five benchmarks in Table 6. The table also shows the efficacy of the applied cuts for enhancing the signal significance. With 3 ab\(^{-1}\) integrated luminosity BP1, BP2 and BP3 can be probed with significances 7.0,  5.3,  2.4 respectively. Remaining two benchmarks BP4 and BP5 yield negligible significances owing to small signal cross-sections.

Having completed the cut-based analysis, we now proceed to perform the multivariate analysis (MVA) using Decorrelated Boosted Decision Tree (BDTD) algorithm within the Toolkit for Multivariate Data Analysis (TMVA) framework, with the hope of improvement in signal significance compared to the cut-based one. Before doing the BDTD analysis of the channels, let us present a brief overview of the method.

To classify the signal-like or background-like events, decision trees are used as classifier. One discriminating variable with an optimised cut value applied on it is associated with each node of the decision tree, to provide best possible separation between the signal-like and background-like events depending on the purity of the sample. Within TMVA, this can be done by tuning the BDTD variable NCuts. The training of the decision trees starts from the root node (zeroth node) and continues till a particular depth specified by the user is reached. This particular depth is termed as MaxDepth. Finally from the final nodes or the leaf nodes an event can be specified as signal or background according to their purity.Footnote 8 An event can be tagged as signal (background) when \(p > 0.5\) (\(p < 0.5\)).

Now the decision trees are termed as weak classifiers as they are prone to statistical fluctuations of the training sample. To circumvent this problem, one can combine a set of weak classifiers into a stronger one and create new decision trees by modifying the weight of the events. This procedure is termed as Boosting. For this analysis, we choose Adaptive boost with the input variables transforming in a decorrelated manner, since this is very useful for weak classifiers. In TMVA it is implemented as Decorrelated AdaBoost. Several BDTD parameters like the number of decision trees NTrees, the maximum depth of the decision tree allowed MaxDepth, the minimum percentage of training events in each leaf node MinNodeSize and NCuts for five benchmarks of our analysis have been tabulated in Table 7. To avoid over training of the signal and background samples, the results of the Kolmogorov–Smirnov test, i.e. Kolmogorov–Smirnov score (KS-score) should always be> 0.1Footnote 9 and stable.

Table 7 Tuned BDT parameters for BP1, BP2, BP3, BP4 and BP5 for the channel

According to the degree of discriminatory power between the signal and backgrounds, following are the kinematic variables of importance :

(25)

These relevant kinematic variables are constructed for each and every channel to discriminate between the signal and the backgrounds.

Fig. 5
figure 5

KS-scores corresponding to a BP1 and b BP3 for channel

Figure 5 shows the KS-scores for both signal (blue distribution) and backgrounds (red distribution) for two representative benchmark points BP1 and BP3. For convenience, we have tabulated the KS-scores for both signal and background in the sixth column of Table 7 for all benchmark points. To make the KS-score stable, one can tune the BDTD parameters given in Table 7. With the aforementioned discriminating variables at hand, we tune the BDT cut value (BDT score) in such a way that the significance is maximized. We plot the Receiver’s Operative Characteristic (ROC) curve for all benchmarks in Fig. 6aFootnote 10, which classifies the degree of rejecting the backgrounds with respect to the signal. Variation of the signal significance with BDT cut value for all the benchmarks are shown in Fig. 6b. It can be clearly seen that the signal significance attains a maximum value for each benchmark at a particular value of BDT score.

Fig. 6
figure 6

a ROC curves for chosen benchmark points for channel. b Variation of significance with BDT-score for channel

Signal and background yields with 3 ab\(^{-1}\) integrated luminosity after performing BDTD analysis have been tabulated in Table 8. The significances for BP1, BP2, BP3, BP4 and BP5 are 7.8, 5.9, 2.6, 0.5 and 0.4 respectively. It becomes quite clear that compared to the cut-based analysis, the signal significances get typical improvements of \(11.4\, \%,\, 11.3 \,\%,\, 8.3\,\%,\, 25\,\%\) and \(33.3\, \%\) for BP1, BP2, BP3, BP4 and BP5 respectively. An integrated luminosity of \(\sim \)1232 fb\(^{-1}\) is required to achieve 5\(\sigma \) significance for BP1 after performing the BDTD analysis.

Table 8 The signal and background yields at 14 TeV-LHC with 3 ab\(^{-1}\) integrated luminosity for BP1,BP2, BP3, BP4 and BP5 along with signal significances for the channel after performing the BDTD analysis

5.2 Di-lepton final state

We now consider the final states containing same or different flavour and opposite sign (OS) di-leptons along with that can arise from the following subprocesses in our model:

$$\begin{aligned}&p p \rightarrow E^{+}_i E^{-}_j, E^{\pm }_{i,j} \rightarrow \ell ^{\pm } N_1 \nonumber \\&p p \rightarrow N_1 N_k, N_k \rightarrow N_1 \ell ^{+} \ell ^{-} \end{aligned}$$
(26)

where \(i,j = 1,2,\ldots 4, k = 2,3,\ldots 8\). The dominant signal contribution comes from the pair production of the charged VLLs followed by their decay to DM and a lepton. Production of the vector like neutrino along with the DM can also give rise to the similar final state albeit small cross-section. However for the sake of completeness we take into account all such processes that may give rise to a di-lepton final state. The major SM background for the signal comes from the inclusive process which includes contributions from \(W^{+}W^{-}\) and ZZ pair production. Due to large cross-section, \(t\overline{t}\) followed by the leptonic decay of top-quark (leading to final state) also contributes as one of the major background. Even after a b-jet veto along with a jet-veto, the small fraction of events surviving from the \(t \overline{t}\) process can still lead to a significant number of events in the final state. In addition, processes with smaller cross-sections such as \(W^{\pm }Z\), and \(W^+W^-Z\) followed by the leptonic decay of \(W^\pm \) and Z can also be a possible source of background for the final state, if one or more leptons escape detection. For the analysis, we consider the above three SM subprocesses as major contributions to the SM background. In Table 9 the LO signal cross-sections for the di-lepton final state are tabulated for all our benchmark points.

Table 9 The LO cross-sections for the signal

To generate the signal and backgrounds we apply the same set of generation cuts as mentioned in Sect. 5.1. We select events with exactly two charged leptons with \(p_T^{\ell } > 10\) GeV and \(|\eta _{\ell }| < 2.5\) and reject any additional lepton with \(p_T^{\ell } > 10\) GeV. To ensure a hadronically quiet final state, we veto all light-jets, b-jets and \(\tau \)-jets with \(p_T > 20\) GeV. We then analyse the signal containing OS di-leptons and compute the signal significance using traditional cut-based method. To differentiate our signal from the SM background, we focus on the following kinematic variables: and invariant mass of two OS same or different flavoured leptons \(M_{\ell ^{+} \ell ^{-}}\). We define the cuts applied on them as \(B_1, B_2, B_3, B_4\) respectively and we describe them below:

  • \(B_1\): In Fig. 7a, b, we depict normalised \(p_T\) distribution for the leading and sub-leading leptons \(\ell _1\) and \(\ell _2\) for both signal and SM background. In can be seen that

    the distributions have a significant overlap owing to their origin being from W decay. Thus we apply \(p_T^{\ell _1} > 20\) GeV suppress the SM backgrounds.

  • \(B_2\): The normalised distribution of missing transverse energy is shown in Fig. 7c. The distributions for the BP1 and BP3 (green and blue lines) are much harder as in the mono-lepton case. Thus we demand GeV, which helps to reduce the background. As the mass splitting between the VLLs become smaller for heavier DM, the distribution is shifted towards the softer side.

  • \(B_3\): The normalised distribution of \(M_{\ell ^{+} \ell ^{-}}\) is shown in Fig. 7d. The distribution for the WZ background (red line) shows a peak at \(M_Z\), since two same flavour opposite sign leptons out of the three in the final state, originate from the Z-boson decay. As the signal does not have a Z peak in its distribution, we reject events \( 75< (M_{\ell ^+ \ell ^-})_{1,2} < 105\) GeV to exclude the Z-peak. This cut helps in suppressing the WZ background. Along with, we also demand \((M_{\ell ^+ \ell ^-})_{1,2} > 12\) GeV to reduce the Drell–Yan background contribution [102].

Fig. 7
figure 7

Normalised distributions of for channel at 14 TeV HL-LHC

We sum up the number of surviving signal and background events after applying the aforementioned cuts in Table 10. It can be seen that with 3 ab\(^{-1}\) integrated luminosity, the significance reach for BP1 and BP2 is 6.7 and 2.5 respectively. The search prospect of this channel at the same integrated luminosity for BP3, BP4 and BP5 is considerably poor with the signal significances being 1.3, 0.8 and 0.3 respectively, owing not only to the small signal production cross-sections but also to the huge SM background which is not reducible. Note that for the di-lepton channel, the signal and background distributions have significant overlap in most kinematic variables, which makes it somewhat difficult to suppress the backgrounds without reducing the signal events.

Table 10 The cut-flow for signal and backgrounds for channel along with the significance for BP1, BP2, BP3, BP4 and BP5 at 14 TeV LHC for 3 ab\(^{-1}\) integrated luminosity
Table 11 Tuned BDT parameters for BP1, BP2, BP3, BP4 and BP5 for the channel
Fig. 8
figure 8

KS-scores corresponding to a BP1 and b BP3 for channel

Fig. 9
figure 9

a ROC curves for chosen benchmark points for channel. b BDT-scores corresponding to BP1, BP2, BP3, BP4 and BP5 for channel

Table 12 The signal and background yields at 14 TeV-LHC and 3 ab\(^{-1}\) integrated luminosity for BP1, BP2, BP3, BP4 and BP5 along with signal significances for the channel after the BDTD analysis

After discussing the cut-based analysis, let us move on to the multivariate (BDTD) analysis, which improves the signal significance by enhancing the discriminatory power between the signal and the backgrounds. For this analysis, we consider the following kinematic variables with maximal discerning ability:

(27)

Using these variables we train the signal and backgrounds so that the signal significance is maximized.

We present the set of tuned BDT parameters for all the benchmarks in Table 11 to make the KS-score stable following the criteria mentioned in Sect. 5.1. The KS-scores for BP1 and BP3 (both for signal and background) are given in Fig. 8. In the sixth column of Table 11 KS-scores for all benchmarks have been quoted. Having fixed the KS-score, we next proceed to tune the BDT score to yield maximum significance. Background rejection efficiency vs. signal efficiency have been plotted in the ROC curves in Fig. 9a using the aforementioned kinematic variables. From the ROC curves of the channel, it is evident that the background rejection efficiency is somewhat poor compared to the channel. The significances have been plotted against BDT score for all benchmarks in Fig. 9b.

Signal and background yields with 3 ab\(^{-1}\) integrated luminosity for our chosen benchmark points along with the significances are listed in Table 12. From Table 12 it can be inferred that the signal significance has improved a bit compared to the cut-based counter part. For BP1, BP2, BP3, BP4 and BP5 the improvements in signal significance are \(16.4\%, \, 60.0\%, \, 30.8\%, \, 12.5\%\) and \(6.7\%\) respectively.

5.3 Tri-lepton final state

The tri-lepton final state can originate from the following subprocesses:

$$\begin{aligned}&p p \rightarrow E_i^\pm N_j, \nonumber \\&E_i^\pm \rightarrow W^\pm N_1, ~ W^\pm \rightarrow \ell ^\pm \nu _\ell \nonumber \\&N_j \rightarrow N_1 \ell ^+ \ell ^-, \nonumber \\&\mathrm{with}~~ i=1,2,3,4, ~ j = 2,3,4,\ldots ,8. \end{aligned}$$
(28)

We generate the events with tri-lepton final state using the same generation-level cuts and following the method discussed in Sect. 5.1. Among all possible decay products of the pair produced neutral and charged VLLs, we select only those events which have three charged leptons and missing transverse energy in the final state. We consider with zero jets as the dominant irreducible SM background for our signal, which includes both on-shell and off-shell contributions from diboson and triboson production. In addition, the pair production of Z boson where \(ZZ \rightarrow 4\ell \) can also give rise to a similar final state if one of the leptons is missed. All LO cross-sections for this signal and backgrounds at 14 TeV LHC are given in Table 13Footnote 11.

Table 13 The cross-sections for the signal
Fig. 10
figure 10

Normalised distributions of for channel at 14 TeV HL-LHC

For this channel with more leptons, which is cleaner with smaller SM background, we restrict ourselves to the cut-based analysis only. To discriminate the signal from background, we demand our final state to have exactly three charged leptons with \(p_T^\ell > 10\) GeV out of which two leptons are of the same sign and the third lepton is of opposite sign. Among these three leptons, at least two are expected to be of same flavour. We also impose b-jet veto (reject \(p_T(b) > 20\) GeV) to eliminate the b-jets in the final state coming from the \(t\overline{t}\) background. Next we identify a few kinematic variables which would help to discriminate the signal from background as follows:

Table 14 The cut-flow for signal and backgrounds along with the significance for BP1, BP2, BP3, BP4 and BP5 at 14 TeV LHC for 3 ab\(^{-1}\) integrated luminosity for the channel
  • \(C_1\): Out of two same sign leptons and one opposite sign lepton in the final state, one can construct two invariant mass system \((M_{\ell ^+ \ell ^-})_{1,2}\), considering one same and one opposite sign lepton at a time. Demanding \(((M_{\ell ^+ \ell ^-})_{1,2} < 75\) GeV and \((M_{\ell ^+ \ell ^-})_{1,2} > 105\) GeV one can get rid of the Z-peak, which in turn reduces the \(W^\pm Z\), ZZ background drastically. We also impose a lower cut \((M_{\ell ^+ \ell ^-})_{1,2} > 12\) GeV to suppress the Drell–Yan background [102].

  • \(C_2\): We define a variable \(M_{\mathrm{eff}}\) as the scalar sum of all the lepton \(p_T\)’s and the . In Fig. 10a the distribution of background (magenta line) is flatter and smeared with respect to the distributions of the signal (green and blue lines) and other background ZZ (brown line). Setting \(M_{\mathrm{eff}} < 500\) GeV helps in reducing the background.

  • \(C_3\): Since the background ZZ does not have in the final state explicitly, corresponding distribution peaks at lower value than the signal as can be seen from Fig. 10b. Thus a minimum cut of GeV helps to reduce the ZZ background drastically as can be found in Table 14.

  • \(C_4\): We choose the vector sum of three leptons (\(p_{T, 3 \ell }^{\mathrm{vector}}\)) and the scalar sum of the same (\(p_{T, 3 \ell }^{\mathrm{scalar}}\)) and show their distributions in Fig. 10c, d respectively. We find that kinematic selections of \(p_{T, 3 \ell }^{\mathrm{vector}} < \) 200 GeV and \(p_{T, 3 \ell }^{\mathrm{scalar}}<\) 250 GeV helps to reduce the background efficiently.

  • \(C_5\): We also construct the azimuthal angle between the unpaired third lepton out of total three leptons in the final state and as . Corresponding distributions are shown in Fig. 10e. We find that the choice 1.5 on the events help in eliminating the SM background further.

The number of events for signal and background, surviving after imposing the selection cuts on the aforementioned variables with 3 ab\(^{-1}\) integrated luminosity are quoted along with the significances in Table 14. For the five benchmarks BP1, BP2, BP3, BP4, BP5, using the cut-based analysis, the signal significances are 11.1, 4.5, 3.5, 1.2, 0.5 respectively. This is a substantial improvement over the previous two final state topologies considered earlier. In fact for BP1, \({\mathcal {L}} \sim \) 609 fb\(^{-1}\) of integrated luminosity is enough to achieve a 5\(\sigma \) significance in the tri-lepton channel.

5.4 Four-lepton final state

In this section, we analyse the final state comprising of . The final state for the signal can be obtained from the following processes:

$$\begin{aligned} p p \rightarrow N_i N_i, N_i \rightarrow N_1 \ell ^{+} \ell ^{-}, ~~\mathrm{with} ~~i = 2,3,\ldots 8. \end{aligned}$$
(29)
Table 15 The LO cross-sections for the signal
Table 16 The cut-flow for signal and backgrounds along with the significances for BP1, BP2, BP3, BP4 and BP5 at 14 TeV HL-LHC for 3 \(ab^{-1}\) integrated luminosity for the channel
Table 17 Significance reach with 3 \(ab^{-1}\) luminosity for all the five benchmark points for mono-lepton, di-lepton, tri-lepton and four lepton channel respectively

The events are generated using the same generation-level cuts and following the same method discussed in Sect. 5.1. The most dominant SM background [60] that gives rise to the similar final state is \(VVV,(V= W^{\pm },Z\)). The next irreducible background that follows the signal is \(ZZ \rightarrow 4\ell \). In principle, \(t\overline{t}Z\) can also mimic the signal, but putting a b-veto( rejecting \(p_T(b) > 20\) GeV) kills the background. The other SM process \(Z+2~jets\) also results in the same topology if the jets are mis-tagged as leptons. However, we find out that this background can be reduced considerably when proper cut is applied. Due to large cross-section, \(t\overline{t}\) could also be a possible background. But demanding four lepton with \(p_T^\ell > 10\) GeV and putting a b-jet veto reduces the background drastically. Thus from now on we shall only consider the dominant background VVV and \(ZZ \rightarrow 4\ell \)Footnote 12. In Table 15 we have tabulated the LO cross-sections for signal and background at 14 TeV LHC.

To disentangle the signal and background, we select four leptons with \(p_T^\ell > 10\) GeV and \(|\eta _\ell | < 2.5\) and reject any additional charged lepton satisfying the same. We also apply a veto on light-jets and \(b-\)jet in the final state. We consider the following set of kinematic variables to improve the the signal sensitivity over the background:

  • \(D_1\): Out of total four leptons in the final state, we first select two pairs of leptons (pairwise of same flavour and opposite sign), performing all possible combinations. Then we calculate the invariant mass of the pairs and compare whether they are close to \(M_Z\) or not. Considering the invariant mass of the first and second pair as \((M_{\ell ^+ \ell ^-})_{1,2}\) respectively, we reject all events where \( 105> (M_{\ell ^+ \ell ^-})_{1,2} > 75\) GeV to exclude the Z-peak of ZZ-background. For the signal, the four charged leptons are not produced from the decay of two Z-bosons, which makes this cut very useful in boosting the signal significance.

  • \(D_2\): For the ZZ-background, the only source of is the mis-tagging of one or more leptons and so the distribution for the dominant ZZ background peaks at lower value than the signal which is expected as the in ZZ process mainly comes from mis-measurements. The VVV background on the other hand would still have a substantial overlap with the signal distribution which has a softer due to the compression in the spectrum which leads to the cascade decays. Thus we choose a moderately low cut of GeV which helps to reduce the ZZ background significantly while not killing too many of the signal events as we can see in Table 16.

  • \(D_3\): We define the kinematic variable \(M_{\mathrm{eff}}\) as the sum of the transverse momenta of four leptons and . The background in this case seems to have a longish tail compared to the signal. To exclude the tail of the \(M_{\mathrm{eff}}\) distribution of the ZZ-background

    we demand \(M_{\mathrm{eff}} < 500\) GeV to enhance the signal significance.

The cuts applied on the aforementioned kinematic variables along with the significances are listed in Table 16. For the five benchmarks, the significances at the integrated luminosity 3 ab\(^{-1}\) are 10.1, 7.1, 5.2, 3.2, 1.0 respectively. Note that the first four benchmarks seem to achieve a significance \(> 3 \sigma \) (the first three having \({\mathcal {S}} > 5\sigma \)). Thus we find that the higher lepton multiplicity of the final states tend to achieve a more significant signal sensitivity in our model which is expected due to the addition of vector like fermions which decay to charged leptons.

Before concluding this section, let us present a comparative study among all the aforementioned channels according to the degree of performance. For convenience, we have tabulated the signal significances corresponding to all BPs for all channels at an integrated luminosity 3 ab\(^{-1}\) in Table 17. For and channels we present the signal significances both for by the BDTD and cut-based analysis, whereas for , channels we present the significances obtained from the cut-based analysis only. For all the benchmarks and all the channels there exist a generic pattern, which shows that the signal significance goes down with increasing DM masses. As mentioned earlier, smaller signal cross-section for larger DM masses are accountable for this particular pattern. According to the degree of performance, the channel fares the best among all for BP2, BP3, BP4 and BP5. With 3 ab\(^{-1}\) luminosity, first four benchmark points can be probed with significance > \(3 \sigma \). Next better performing channel after is for last four benchmarks. In fact for BP1, channel turns out to be best performing with signal significance 11.1 at 3 ab\(^{-1}\) integrated luminosity. For and channel the significance for BP1 are 7.0 and 6.7 respectively with cut-based analysis, which is improved to 7.8 using the BDTD analysis.

6 Conclusion

In this work, we extend the \(S_3\)-symmetric 2HDM with two generations of VLLs. The introduction of two generations of VLLs in the minimal version of the model is essential to ensure \(S_3\)-symmetric Yukawa Lagrangian. Since the VLLs are odd under the imposed \(Z_2\)-symmetry and the SM fermions are even under the same, the mixing between the SM leptons and the VLLs is forbidden. Thus we end up with a dark sector in our model which talks to the SM matter fields only through the SM force mediators and the scalar sector. In this set up, the lightest neutral VLL mass eigenstate serves as a viable DM candidate.

Having satisfied the constraints like perturbativity, vacuum stability, electroweak precision data and Higgs signal strength, we show that a large portion of parameter space spanned by the model parameters is allowed from the observed relic density, direct and indirect detection experiments. We choose five representative points BP1, BP2, BP3, BP4 and BP5 according to the low, medium and high DM masses to perform the collider analysis of some particular channels with and in the final state at 14 TeV HL-LHC. We must point out here that our choice of benchmarks for the LHC analysis only represents a part of the parameter space allowed by DM data, as it gives a relatively lighter spectrum which can give appreciable signal sensitivity to our model at LHC. To highlight this, we have chosen a benchmark point (BP5) which represents a point near the threshold (for the spectrum) which will be out of reach at LHC, even with the very high luminosity (vHL-LHC) option.

To start with, we first analyse the final state containing , which can originate from the pair production of the charged VLLs and neutral VLLs as well as from the associated production of the charged and neutral VLLs. The major background for this channel is . With traditional cut-based analysis, we show that with 3 ab\(^{-1}\) luminosity BP1 and BP2 can be probed with significance \(> 5\sigma \), which in turn improves with multivariate (BDTD) analysis. Next we move on to perform the collider analysis of the final state comprising of (di-lepton along with missing transverse energy), which mainly comes from the pair production of the charged VLLs and neutral VLLs individually. The main background is which takes care of \(W^{+}W^{-}, ZZ\) pair production. \(t\overline{t}\) and \(W^{\pm }Z\) also contribute as subdominant SM background for the di-lepton channel. After performing cut-based analysis we find that only BP1 can be probed with a significance \(> 5 \sigma \) while the sensitivity in the di-lepton channel diminishes for the rest of the benchmarks. The primary reason for this is the fact that the mass splittings amongst the VLL’s is not too large which leads to a relatively compressed spectrum. The resulting decay products in the cascade are therefore not very hard, leading to a significant overlap of the kinematic distributions with that of the SM background. This led to less signal significance in the di-lepton mode, though it can be improved marginally with a BDTD analysis. The final state containing tri-lepton with missing transverse energy can be generated from the associated production of the charged and neutral VLLs. Corresponding irreducible background originates from process. The situation is found to improve here as the SM background is now smaller compared to the di-lepton final state. With simple cut-based analysis we find out that with 3 ab\(^{-1}\) luminosity, BP1 can be probed with significance \(> 10\sigma \), while now even BP3 has a \(>3\sigma \) sensitivity. The situation improves further with an increase in charged lepton multiplicity, which we show with the analysis of final state, arising mostly from neutral VLL pair production. The major SM background for this process is \( p p \rightarrow ZZ \rightarrow 4\ell \) and \(p p \rightarrow V\,V\,V \,\, (V\equiv W,Z)\) which have small cross-sections. We find that now the four benchmark points can be probed with significance > 3 \(\sigma \) with 3 ab\(^{-1}\) integrated luminosity. Thus comparing all four channels of the multi-lepton final states we find that channel turns out to be the most promising among all owing to it being the most clean and background free final state.

We conclude our discussion by stating that this model can provide a viable Majorana type DM candidate and that a part of the allowed parameter space (with DM masses up to \(\sim 300\) GeV) can be tested at the 14 TeV HL-LHC in the multi-lepton channel. The relative compression in the mass spectrum of the VLL’s do not allow very clean kinematic thresholds that could provide as a good discriminator for signal against the SM background. This limits the search sensitivity of the model to relatively light VLL masses of about 350 GeV, beyond which it is very difficult to achieve any signal sensitivity even with the HL-LHC option. To probe higher DM masses one may benefit by looking for such a model at the 1 TeV ILC which warrants a separate study in future [103].