1 Introduction

The discovery of the 125 GeV Higgs boson (denoted h) [1, 2] not only deepens our understanding of the mechanism of electroweak symmetry breaking but also opens new avenues for searching for physics beyond the Standard Model (SM) which is required to clarify the unexplained theoretical and observational issues such as the problem of naturalness, the existence of dark matter and the observed baryon asymmetry of the universe. One of such avenues is exotic Higgs decay,Footnote 1 which is only loosely constrained by Higgs signal strength measurements. The combination of ATLAS and CMS Run I results constrains undetected Higgs decay branching ratio to be smaller than about 34% at 95% C.L. assuming \(\kappa _V\le 1\) [4] (\(\kappa _V\) denotes \(hVV(V=W,Z)\) coupling strength relative to SM assuming \(\kappa _V\equiv \kappa _W=\kappa _Z\)). The ultimate sensitivity on undetected Higgs decay branching ratio via indirect measurements at the High Luminosity Large Hadron Collider (HL-LHC) is estimated to be \(\mathcal {O}\) (5–10%) [5]. On the other hand, due to the expected extremely narrow width of the Higgs boson, even a rather weak coupling between it and any new light degrees of freedom can naturally induce a sizable exotic decay branching fraction. One such possibility is \(h \rightarrow \phi \phi \), where \(\phi \) denotes a light spin-0 particle, with mass less than about 62.5 GeV so that this decay channel is kinematically allowed. \(\phi \) can be CP-even or CP-odd, or even a CP-mixed state. If its mass is greater than \(2m_{b}\), then in most models which approximately obey Yukawa ordering \(\phi \) will mainly decay to \(b\bar{b}\). This decay channel is well motivated in a wide class of Beyond the Standard Model (BSM) theories [5], such as the Next to Minimal Supersymmetric Standard Model (NMSSM), Higgs singlet extension of the SM, general extended Higgs sector models [6], and little Higgs models. Quite a few phenomenology studies already exist with respect to this channel at the LHC [7,8,9,10,11], with or without using jet substructure techniques. Due to large QCD backgrounds in gluon fusion and vector boson fusion channels, the LHC searches generally focus on the VH associated production channel. However, this channel suffers from large top quark backgrounds. A recent ATLAS analysis [12] using \(3.2~{\text{ fb }^{-1}}\) \(13~{\text{ TeV }}\) data made the first attempt to constrain this channel using WH associated production but the sensitivity is currently quite weak (even \({\mathrm {Br}}(h\rightarrow \phi \phi \rightarrow 4b)=100\%\) cannot be constrained assuming \(\kappa _V=1\)).

The not-so-clean hadron–hadron collision environment motivates us to consider better places to search for this exotic Higgs decay channel. Here we consider using the Large Hadron Electron Collider (LHeC) [13] to explore \(h\rightarrow \phi \phi \rightarrow 4b\). The LHeC is a proposed lepton–hadron collider which is designed to collide a \(60~{\text{ GeV }}\) electron beam with the \(7~{\text{ TeV }}\) proton beam of the HL-LHC. It is supposed to run synchronously with the HL-LHC and may deliver an integrated luminosity as high as \(1000~{\text{ fb }^{-1}}\) [14]. The electron beam may have \(-0.9\) polarization [14]. It is worth noticing that with such high collision energy and luminosity, the LHeC indeed becomes a Higgs boson factory [14]. With Higgs boson production cross section of about \(200~{\text{ fb }^{-1}}\), the LHeC will provide amazing opportunities for precision Higgs physics, due to the fact that major QCD backgrounds will be much smaller than LHC and the complication due to pile-up will be greatly reduced. Previous studies on Higgs physics at the LHeC include measuring bottom Yukawa coupling [13, 15, 16], anomalous gauge–Higgs coupling [17,18,19], invisible Higgs decay [20] and MSSM Higgs production [21]. Studies on charm Yukawa measurements has been reported in [22]. The impact of double Higgs production at the higher energy ep collider FCC-he on Higgs-self coupling measurement has also been studied [23, 24].

To quantitatively estimate the sensitivity of the LHeC to the exotic Higgs decay \(h\rightarrow \phi \phi \rightarrow 4b\), we perform a parton level study for the signal and background in the next section. The signal definition depends on the required number of b-tagged jets. Here for simplicity and a clear identification of signal we require tagging at least four b-tagged jets. We provide the expected LHeC sensitivity for \(\phi \) mass between 15 and \(60~{\text{ GeV }}\) and investigate the robustness of our results under variation of b-tagging performance and pseudorapidity coverage. We also translate our results into the expected exclusion power in the parameter space of the Higgs singlet extension of the SM. In the last section we present our discussion and conclusion.

2 Collider sensitivity

The exotic Higgs decay \(h\rightarrow \phi \phi \rightarrow 4b\) can be simply characterized by the following effective interaction Lagrangian for a new real scalar degree of freedom \(\phi \):

$$\begin{aligned} \mathcal {L}_{eff}=\lambda _{h}vh\phi ^2+\lambda _{b}\phi \bar{b} b +\mathcal {L}_{\phi \,\text {decay,other}} \end{aligned}$$
(1)

In the above \(v=246~{\text{ GeV }}\). \(\lambda _{h}\) and \(\lambda _{b}\) are real dimensionless parameters and \(\mathcal {L}_{\phi \,\text {decay,other}}\) denotes the part of Lagrangian which mediates the decay of \(\phi \) into final states other than \(b\bar{b}\). The part of Lagrangian \(\mathcal {L}_{eff}-\mathcal {L}_{\phi \,\text {decay,other}}\) has been taken as CP-even without loss of generality. New physics may also modify \(hVV (V=W,Z)\) coupling which affects the Higgs production rate and kinematics. We assume the \(hVV (V=W,Z)\) coupling is purely CP-even. Assuming narrow width approximation is valid for both h and \(\phi \), we can express the collider reach for \(h\rightarrow \phi \phi \rightarrow 4b\) via the following quantity:

$$\begin{aligned} C_{4b}^2=\kappa _{V}^2\times {\mathrm {Br}}(h\rightarrow \phi \phi )\times {\mathrm {Br}}^2 (\phi \rightarrow b\bar{b}) \end{aligned}$$
(2)

for a given value of the \(\phi \) mass \(m_\phi \).

There are two major Higgs production channels at the LHeC: charged current (CC) and neutral current (NC). Due to the accidentally suppressed electron NC coupling, NC Higgs cross section is much less than that of CC [16]. Therefore in the following we only focus on CC process, although in a more detailed analysis the NC process should also be included to enhance the overall statistical significance.

Fig. 1
figure 1

Feynman diagram of the CC signal process

The signal process of CC Higgs production is

$$\begin{aligned} eq\rightarrow \nu _{e}hq'\rightarrow \nu _{e}\phi \phi q'\rightarrow \nu _{e}b\bar{b}b\bar{b}q' \end{aligned}$$
(3)

The corresponding Feynman diagram is shown in Fig. 1. The signal signature thus contains at least five jets (in which at least four jets are b-tagged) plus missing transverse energy.

Fig. 2
figure 2

Representative Feynman diagrams of the following backgrounds: CC multijet (top left), CC \(W+\)jets (top right), CC \(Z+\) jets (bottom left), CC \(t+\)jets (bottom center), CC \(h+\)jets (bottom right)

The backgrounds can be classified into charged current (CC) deeply inelastic scattering (DIS) backgrounds and photoproduction (PHP) backgrounds. From a parton level point of view, CC DIS backgrounds all have genuine in the final state which comes from neutrinos produced in the hard scattering or decay of heavy resonances (WZth). When it comes to PHP backgrounds, only PHP production of heavy resonances (WZth) could produce such genuine . However, these PHP processes (which involve the on-shell production of heavy resonances (WZth)) are found to be negligible in the total background. On the other hand, PHP multijet production (including heavy flavor jets) could produce only via energy mismeasurement and neutrinos from hadron decay, which means that it could be suppressed efficiently through a sufficiently large requirement of .

The CC backgrounds can be further classified according to the number of heavy resonances (WZth) produced which further decay to result in a large number of b-tagged jets. We found that if in one process the number of heavy resonances involved is greater than or equal to two, then its contribution to the total background is always negligible. Therefore in the following we only consider the following CC backgrounds: CC multijet, CC \(W+\)jets, CC \(Z+\)jets, CC \(t+\)jets, CC \(h+\)jets. Here “multijet” and “jets” contain jets of all flavor (gudscb). Higgs decay to 4b via SM processes is also included as a background, in CC \(h+\) jets. Figure 2 displays representative Feynman diagrams for these background processes.

To simulate the signal and backgrounds, we implement the effective interaction in Eq. (1) into FeynRules [25]. The generated model file together with the SM is then imported by MadGraph5_aMC@NLO [26]. The Higgs boson mass is taken to be \(m_h=125~{\text{ GeV }}\). The \(\phi \) mass is scanned in the region \([15,60]~{\text{ GeV }}\) with \(1~{\text{ GeV }}\) step size. The collider parameter is taken to be \(E_e=60~{\text{ GeV }}, E_p=7~{\text{ TeV }}\) with electron beam being \(-0.9\) polarized. The signal and background samples are generated by MadGraph5_aMC@NLO at leading order with NNPDF2.3 LO PDF [27] and the renormalization and factorization scale is set dynamically by MadGraph default. The NLO QCD correction to the signal process are known to be small [28]. In the following we take all the signal and background K-factors to be 1 although we expect the correct background normalization could be obtained from data. We apply jet energy smearing according to the following energy resolution formula:

$$\begin{aligned} \frac{\sigma _E}{E}=\frac{\alpha }{\sqrt{E}}\oplus \beta \end{aligned}$$
(4)

where \(\alpha =0.45~{\text{ GeV }}^{1/2},\beta =0.03\) [13]. We consider the following four scenarios of b-tagging performance for jets with \(p_T>20~{\text{ GeV }}\) (\(\epsilon _b\) denotes the efficiency of b-jet, while \(\epsilon _c\) and \(\epsilon _{g,u,d,s}\) denote the faking probability of c-jet and guds-jet, respectively):

  1. (A)

    \(\epsilon _b=70\%, \; \epsilon _c=10\%,\; \epsilon _{g,u,d,s}=1\%\);

  2. (B)

    \(\epsilon _b=70\%, \; \epsilon _c=20\%,\; \epsilon _{g,u,d,s}=1\%\);

  3. (C)

    \(\epsilon _b=60\%,\; \epsilon _c=10\%,\; \epsilon _{g,u,d,s}=1\%\);

  4. (D)

    \(\epsilon _b=60\%,\; \epsilon _c=20\%,\; \epsilon _{g,u,d,s}=1\%\).

The LHeC detector (including the tracker) is expected to have a very large pseudorapidity coverage [29] and therefore we assume the b-tagging performance listed above is valid up to \(|\eta |<5\). We will also show the expected sensitivities with smaller b-tagging pseudorapidity coverage \(|\eta |<4\) and \(|\eta |<3\), which turn out to change only slightly compared to the \(|\eta |<5\) case. Event analysis is performed by MadAnalysis 5 [30].

Table 1 The cross section (in unit of fb) of the signal and major backgrounds after application of each cut in the corresponding row. Lepton veto and electron anti-tagging is implicit in basic cuts. Signal corresponds to \(C_{4b}^2=1,m_\phi =20~{\text{ GeV }}\). Here we assume b-tagging performance scenario (A) and a b-tagging pseudorapidity coverage \(|\eta |<5.0\). \(E_0=40~{\text{ GeV }}\) is assumed except that in the last row for the signal and total background we show in parentheses the values corresponding to \(E_0=60~{\text{ GeV }}\)

The event selection in the 4b-tagging case first requires at least five jets satisfying the following basic cuts:

$$\begin{aligned} p_{Tj}>20~{\text{ GeV }}, \quad |\eta _j|<5.0, \Delta R_{jj}>0.4 \end{aligned}$$
(5)

Events with additional charged leptons are vetoed. To suppress the photoproduction background, we exclude events which can be tagged by an electron tagger and also require

(6)

Here \(E_0\) denotes the threshold of transverse missing energy. In the following we take \(E_0=40~{\text{ GeV }}\) as the default choice and assume PHP backgrounds can be accordingly suppressed to a negligible level compared to the total background. This rough estimate of the missing energy threshold is inspired by a naive simulation of direct photoproduction \(j+4b\) process.Footnote 2 A thorough and detailed detector simulation of multijet photoproduction would be needed to determine the best \(E_0\) (perhaps in synergy with appropriate missing energy isolation cuts or a cut on the ratio \(V_{ap}/V_p\) of transverse energy flow anti-parallel and parallel to the hadronic final state transverse momentum vector [32]), which is, however, beyond the scope of the present paper. In the cut flow tables below we will also show the signal and total background in the case that \(E_0\) needs to be increased to \(60~{\text{ GeV }}\).

Table 2 The cross section (in unit of fb) of the signal and major backgrounds after application of each cut in the corresponding row. Lepton veto and electron anti-tagging is implicit in basic cuts. Signal corresponds to \(C_{4b}^2=1,m_\phi =40~{\text{ GeV }}\). Here we assume b-tagging performance scenario (A) and a b-tagging pseudorapidity coverage \(|\eta |<5.0\). \(E_0=40~{\text{ GeV }}\) is assumed except that in the last row for the signal and total background we show in parentheses the values corresponding to \(E_0=60~{\text{ GeV }}\)
Table 3 The cross section (in unit of fb) of the signal and major backgrounds after application of each cut in the corresponding row. Lepton veto and electron anti-tagging is implicit in basic cuts. Signal corresponds to \(C_{4b}^2=1,m_\phi =60~{\text{ GeV }}\). Here we assume b-tagging performance scenario (A) and a b-tagging pseudorapidity coverage \(|\eta |<5.0\). \(E_0=40~{\text{ GeV }}\) is assumed except that in the last row for the signal and total background we show in parentheses the values corresponding to \(E_0=60~{\text{ GeV }}\)

Then we impose the 4b-tagging requirement:

$$\begin{aligned} \text {At least four }b\text {-tagged jets in }|\eta |<5.0 \end{aligned}$$
(7)

The four b-tagged jets which have the closest invariant mass to \(m_h\) are required to have their invariant mass \(m_{4b}\) lie in the following mass window:

$$\begin{aligned} |m_{4b}-m_h|<20~{\text{ GeV }} \end{aligned}$$
(8)

Finally we utilize the event structure of the signal: for the four b-tagged jets picked out in the previous step, we group them into two pairs such that the absolute value of the invariant mass difference between these two pairs is smallest among all grouping possibilities. Then we require the invariant masses of these b-jet pairs both lie in the following mass window:

$$\begin{aligned} |m_{2b,i}-m_\phi |<10~{\text{ GeV }},\quad i=1,2 \end{aligned}$$
(9)

Here \(m_{2b,i},i=1,2\) denote the invariant mass of the two “correctly” grouped b-jet pairs, respectively.

Fig. 3
figure 3

Expected \(95\%\) CLs exclusion limit (solid line) and \(5\sigma \) discovery reach (dashed line) in the (\(C_{4b}^2,m_\phi \)) plane at the LHeC. Left \(100~{\text{ fb }^{-1}}\) luminosity. Right \(1~{\text{ ab }^{-1}}\) luminosity. Different color corresponds to different b-tagging scenarios (A)(B)(C)(D) (see the text and legend). \(E_0=40~{\text{ GeV }}\) is assumed

Fig. 4
figure 4

Expected \(95\%\) CLs exclusion limit (solid line) and \(5\sigma \) discovery reach (dashed line) in the (\(C_{4b}^2,m_\phi \)) plane at the LHeC. Left \(100~{\text{ fb }^{-1}}\) luminosity. Right \(1~{\text{ ab }^{-1}}\) luminosity. Different color corresponds to different b-tagging pseudorapidity coverage (see the legend). \(E_0=40~{\text{ GeV }}\) is assumed

We present cut flow tables (Tables 12, 3) for three benchmark masses \(m_\phi =20,40,60~{\text{ GeV }}\) under b-tagging performance scenario (A). Only CC backgrounds are listed because PHP backgrounds are expected to be negligible due to electron tagging and an appropriate missing energy requirement. For the decay of tWZ in backgrounds, the following two cases are both considered and included in our results. One is the decay to a minimal number of partons, i.e. \(t\rightarrow bqq, W\rightarrow qq,Z\rightarrow qq\), with each parton identified as one jet. The other case is that one additional \(b\bar{b}\) pair is radiated from the decay products of tWZ. For \(t+\)jets the second kind of process is found to contribute sizably to the total background. For the \(h+\)jets background, only \(h\rightarrow bb\) and \(h\rightarrow 4b\) via tree-level SM processes are considered. Due to limited Monte Carlo statistics, there are slight differences among the three tables for the first four cuts on backgrounds. The cross section numbers shown in the tables correspond to the default choice \(E_0=40~{\text{ GeV }}\), except that in the last row of each table for the signal and total background we show in parentheses the final cross sections corresponding to \(E_0=60~{\text{ GeV }}\). From the tables it can be concluded that the \(h\rightarrow \phi \phi \rightarrow 4b\) channel at the LHeC is almost background free—with \(100~{\text{ fb }^{-1}}\) luminosity the expected number of background events is at most \(\mathcal {O}(0.1)\), while the remaining signal cross section is \(\mathcal {O}(1~{\text{ fb }})\) for \(C_{4b}^2=1\). This is in sharp contrast to the situation at the (HL-)LHC where the signal is buried in large top quark backgrounds.

Figure 3 shows the expected \(95\%\) CLs [33, 34] exclusion limits and \(5\sigma \) discovery reach at the LHeC for the \(C_{4b}^2\) quantity in the mass range \([15,60]~{\text{ GeV }}\) assuming \(100~{\text{ fb }^{-1}}\) and \(1~{\text{ ab }^{-1}}\) luminosity. Various b-tagging performance scenarios are considered in the plots, all assuming a b-tagging pseudorapidity coverage \(|\eta |<5\). Because the expected number of background events is quite small, in setting exclusion limits and discovery reach we use exact formulas of the Poisson distribution for a discrete random variable. This leads to some small discontinuities at certain \(m_\phi \) values when the expected limits/reach are interpreted as the limits/reach for the median of background-only or signal plus background hypothesis, as can be seen from the plots.

From Fig. 3 it can easily be seen that for \(m_\phi \) in the \([20,60]~{\text{ GeV }}\) range the LHeC with \(100~{\text{ fb }^{-1}}\) luminosity is capable of probing \(C_{4b}^2\) to a few percent level while with \(1~{\text{ ab }^{-1}}\) luminosity the LHeC will eventually probe \(C_{4b}^2\) down to a few per mille level, both at \(95\%\) CLs. We note that for \(m_\phi =20,40,60~{\text{ GeV }}\), the \(95\%\) CLs upper limit on \(C_{4b}^2\) is about \(0.3\%(0.5\%),0.2\%(0.4\%),0.1\%(0.2\%)\) respectively, for b-tagging scenario (A), assuming \(E_0=40~{\text{ GeV }}(60~{\text{ GeV }})\). The result is generally insensitive to mistag rates of c-jets, because with the requirement of at least four b-tagged jets, fake backgrounds do not contribute much to the total background. On the other hand the final signal rate is approximately proportional to the fourth power of the b-tagging efficiency, thus it will be relatively important to maintain a high b-tagging efficiency to retain more signal events. As can be expected, the sensitivity drops quickly when \(m_\phi \) becomes smaller than about \(20~{\text{ GeV }}\) due to the collimation of \(\phi \) decay products that renders the resolved analysis inefficient. A jet substructure analysis is needed to improve the sensitivity in this mass region, which we leave for future study. On the other hand the sensitivity improves as \(m_\phi \) increases from about 40–\(60~{\text{ GeV }}\). This is mainly because the b-jets from \(h\rightarrow \phi \phi \rightarrow 4b\) decay in the \(m_\phi =60~{\text{ GeV }}\) case are more likely to pass the basic cuts (especially, the \(p_{Tj}>20~{\text{ GeV }}\) cut) compared to the \(m_\phi =40~{\text{ GeV }}\) case.

Figure 4 also shows the expected \(95\%\) CLs exclusion limits and \(5\sigma \) discovery reach at the LHeC for the \(C_{4b}^2\) quantity in the mass range \([15,60]~{\text{ GeV }}\) assuming \(100~{\text{ fb }^{-1}}\) and \(1~{\text{ ab }^{-1}}\) luminosity. Here b-tagging performance is fixed to scenario (A) but various b-tagging pseudorapidity coverage conditions are considered. The plots indicate that the sensitivity reach of the LHeC for this channel is not very sensitive to b-tagging pseudorapidity coverage.

Fig. 5
figure 5

Future LHeC capability in probing the parameter space of the Higgs singlet extension of the SM [35], plotted in the \(\sin _\alpha -\tan _\beta \) plane for three benchmark light Higgs masses \(m_\phi =20,40,60~{\text{ GeV }}\). Each point is colored according to its \(C_{4b}^2\) value for reference. Also shown are current LEP and LHC bounds, and expected future HL-LHC bounds. See text for detail

3 Constraints on the Higgs singlet extension of the SM

We now consider the interpretation of the expected sensitivity of the LHeC in the context of Higgs singlet extension of the SM. For simplicity we consider the Higgs singlet extension studied in  [35]. In this model, an additional real singlet scalar S is added to the SM. The Lagrangian of the Higgs kinetic and potential terms is extended to the following form:

$$\begin{aligned} \mathcal {L}_s=(D^{\mu }\Phi )^\dag D_\mu \Phi +\partial ^{\mu }S\partial _{\mu }S-V(\Phi ,S) \end{aligned}$$
(10)

with scalar potential

$$\begin{aligned} V(\Phi ,S)= & {} -m^2\Phi ^{\dag }\Phi -\mu ^2 S^2+\lambda _{1}(\Phi ^{\dag }\Phi )^2 \nonumber \\&+\lambda _{2}S^4+\lambda _{3}\Phi ^{\dag }\Phi S^2 \end{aligned}$$
(11)

Here \(\Phi \) denotes the original SM Higgs doublet. The scalar potential obeys a \(Z_2\) symmetry. We allow S to acquire vacuum expectation value and express the Higgs fields in unitary gauge as

$$\begin{aligned} \Phi \equiv \begin{pmatrix}0 \\ \frac{\tilde{h}+v}{\sqrt{2}}\end{pmatrix}, \quad S\equiv \frac{h'+x}{\sqrt{2}}. \end{aligned}$$
(12)

Here \(v=246~{\text{ GeV }}\) ensures the correct mass generation for WZ bosons and SM fermions. The gauge eigenstates \(\tilde{h},h'\) can be related to mass eigenstates \(\phi ,h\) via an orthogonal rotation

$$\begin{aligned} \begin{pmatrix}\phi \\ h\end{pmatrix} =\begin{pmatrix}\cos \alpha &{} -\sin \alpha \\ \sin \alpha &{} \cos \alpha \end{pmatrix} \begin{pmatrix}\tilde{h} \\ h'\end{pmatrix}. \end{aligned}$$
(13)

Now it is convenient to parameterize the model in terms of five more physical quantities: (\(m_\phi ,m_h\) are masses of \(\phi \) and h respectively)

$$\begin{aligned} m_\phi ,m_h,\alpha ,v,\tan \beta \equiv \frac{v}{x}. \end{aligned}$$
(14)

The translation formulas between these quantities and original parameters in the Lagrangian can be found in [35]. We are interested in the case in which the additional Higgs boson is lighter, therefore we fix \(m_h=125~{\text{ GeV }}\) and allow three parameters \(m_\phi ,\alpha ,\tan \beta \) to vary. Here we focus on the more interesting region where \(\sin \alpha \rightarrow -1\) which also allows for a special direction \(\tan \beta =-\cot \alpha \), which results in a vanishing \({\mathrm {Br}}(h\rightarrow \phi \phi )\) [35]. We consider three benchmark values of [4] \(m_\phi \) (\(m_\phi =20,40,60~{\text{ GeV }}\)) and plot the current LEP and LHC constraints and future HL-LHC and LHeC constraints on the \(\tan \beta -\sin \alpha \) plane; see Fig. 5. Each point is colored according to its \(C_{4b}^2\) value for reference. The factor \({\mathrm {Br}}^2(\phi \rightarrow b\bar{b})\) appeared in \(C_{4b}^2\) definition Eq. (2) is almost at a constant value 0.77 in the mass range \(m_\phi \in [20,60]~{\text{ GeV }}\) [36]. The deep black regions which corresponds to very small \(C_{4b}^2\) values slightly tilt upwards with the decreasing of \(|\sin \alpha |\) from about 0.995 to 0.980. In this range these regions just center around the abovementioned special direction \(\tan \beta =-\cot \alpha \), which makes \({\mathrm {Br}}(h\rightarrow \phi \phi )\) vanish [35] and renders its vicinity difficult to probe through exotic Higgs decay search. The LEP constraints (green dashed line) come from direct search for additional Higgs bosons and is taken directly from  [35]. Points on the right side of the green dashed line is excluded at \(95\%\) confidence level. This indicates that LEP search forces the mixing between two Higgses to be very small for the scenario in which there is a light Higgs boson in the mass range (\(m_\phi \in [20,60]~{\text{ GeV }}\)). In such a case there cannot be sizable deviation of Higgs signal strength due to Higgs mixing. However, the opening of exotic Higgs decay \(h\rightarrow \phi \phi \) could lead to sizable suppression of \(125~{\text{ GeV }}\) Higgs signal strengths. The LHC Run I constraints (white solid line) come from the \(125~{\text{ GeV }}\) Higgs signal strength measurements [4]. The regions between the two white solid lines for \(m_\phi =20,40~{\text{ GeV }}\) (and the region below the white solid lines for \(m_\phi =60~{\text{ GeV }}\) case) are allowed by LHC Run I measurements at \(2\sigma \) level. We translated the HL-LHC projection of the precision of Higgs signal strength measurements [37] into constraints (yellow solid line “HL-LHC ind.”) on the parameter space of the Higgs singlet extension of the SM (assuming half theoretical uncertainties, according to [37]). At the (HL-)LHC, \(h\rightarrow \phi \phi \rightarrow 4b\) can be directly probed via Wh associated production, as has been done by ATLAS [12]. However, current constraint from this method is quite weak and even \(C_{4b}^2=1\) cannot be bound. We extrapolate the current constraint [12] to \(3~{\text{ ab }^{-1}}\) HL-LHC, with a very optimistic assumption that all selection efficiency can be maintained and all systematic uncertainties scale with the square root of luminosity. The corresponding \(95\%\) CLs exclusion limits is plotted as yellow dotted line “HL-LHC dir.(opt.)”. It can be seen that even with this very optimistic assumption the sensitivity of the direct search from Wh channel is at most comparable to the indirect constraint from the HL-LHC \(125~{\text{ GeV }}\) Higgs signal strength measurements. The LHeC \(1~{\text{ ab }^{-1}}\) \(95\%\) CLs sensitivity is plotted as the red solid lines, assuming b-tagging scenario (A), b-tagging pseudorapidity coverage \(|\eta |<5.0\) and \(E_0=40~{\text{ GeV }}\). The LHeC is expected to exclude region outside the red solid lines if no new physics exists. It is obvious that the LHeC exclusion capability extends to the deep black region which represents very small \(C_{4b}^2\) values. If no lepton colliders are available before the end of the HL-LHC, much of the parameter space of the Higgs singlet extension model could only be reached via the ep machine.

4 Discussion and conclusion

In this paper we studied the LHeC sensitivity to the exotic Higgs decay process \(h\rightarrow \phi \phi \rightarrow 4b\) in which \(\phi \) denotes a spin-0 particle lighter than half of \(125~{\text{ GeV }}\). We performed a parton level analysis and showed that with \(1~{\text{ ab }^{-1}}\) luminosity the LHeC is able to exclude \(C_{4b}^2\) at a fer per mille level (\(95\%\) CLs), when only statistical uncertainties are included. To maintain the sensitivity, it is important to choose a b-tagging working point with relatively large b-tagging efficiency. The sensitivity is not very sensitive to the variation of b-tagging pseudorapidity coverage from 3 to 5. Using the Higgs singlet extension of the SM as an illustration, we showed that the LHeC direct search of \(h\rightarrow \phi \phi \rightarrow 4b\) is the most sensitive probe of much of the parameter space of the model in the future, if no lepton colliders are available. Of course this LHeC search will also deliver significant impacts on the scalar sector of other BSM theories when one of the scalar boson lies in the mass range \({\sim }(2m_b,m_h/2)\).

The analysis presented here can be further improved in several aspects. First is of course a more realistic estimation of the signal and backgrounds including parton shower and more detailed detector effects. Especially for multijet final states a parton shower correctly merged to matrix element will be highly desirable. Secondly, we could further utilize the sample with the requirement of less b-tagged jets or even less reconstructed jets, e.g. three b-tagged jets. This technique has already been used in [12] and is expected to further improve the sensitivity, especially in the first stages of data collection when statistics is small. Thirdly, we have only applied a cut-based analysis with very simple variables. A further multivariate analysis may deliver additional gain in sensitivity. Furthermore, the sensitivity in the \(m_\phi <20~{\text{ GeV }}\) mass range could be improved via a jet substructure analysis, as has been emphasized. Besides these directions of exploration, it should, however, be emphasized that in the current analysis PHP backgrounds are assumed to be negligible compared to CC backgrounds under the condition discussed in Sect. 2. A more detailed detector simulation is thus needed to pin down the event selection conditions required to suppress PHP backgrounds. We also note that in the present study systematic uncertainties have not been included. However, since the expected background event number is very small, we expect that the obtained sensitivity (discovery and exclusion reach) would be qualitatively stable against systematic uncertainties, which means that the \(1~{\text{ ab }^{-1}}\) LHeC could still do much better than the HL-LHC with respect to the \(h\rightarrow \phi \phi \rightarrow 4b\) search.

The exotic Higgs decays constitute an intriguing and important part of Higgs physics which deserve comprehensive theoretical and experimental investigations. Previous attempts and attention have nearly all been devoted to hadron–hadron collisions or \(e^{+}e^{-}\) collisions. We demonstrate in this paper that for certain important processes which suffer from large backgrounds in hadron–hadron collisions, it is clearly superior to conduct the search at a concurrent ep collider, if an \(e^{+}e^{-}\) machine with sufficient center-of-mass energy is not available. In that case, it is highly expected that the ep machine will play an important role in precision Higgs studies, including the study of exotic Higgs decays like  [20], \(h\rightarrow \phi \phi \rightarrow 4b\) and other channels beset by jets or  [38, 39].