Feasibility of the observation of a heavy scalar through the fully hadronic final state at the LHeC

The proposed future Large Hadron electron collider provides sufficient center of mass energies, s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s}$$\end{document}, to probe heavy particles decaying into W±(Z)-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^\pm (Z)-$$\end{document}boson of mass >2mW\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$>2m_W$$\end{document}(2mZ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(2m_Z)$$\end{document}. In this work we present a study to produce one such heavy CP even scalar H of mass 2mh<mH<2mt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2m_h< m_H < 2 m_t$$\end{document} through charged-current production mode where H→W+W-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H \rightarrow W^+W^-$$\end{document}, where hadronic decay of W±-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^\pm -$$\end{document}boson is considered to reconstruct mH\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_H$$\end{document}. Due to the presence of missing energy and forward jet in this channel, it is challenging to reconstruct mH\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_H$$\end{document} with above final state and thus we employed three different reconstruction methods and discuss the significance of each one. For this analysis we consider a benchmark value of mH=270\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_H = 270$$\end{document} GeV and s≈1.3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s} \approx 1.3$$\end{document} TeV with an assumed luminosity of 1 ab-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}.


Introduction
To date many existing models beyond the Standard Model (BSM) like the two-Higgs doublet models [1] and its extensions incorporates scalars of mass lower or higher than the SM Higgs-boson (m h = 125 GeV) [2] with models parameters heavily constrained by existing experimental data and theoretical limits.The multi-lepton anomalies seen in Run 1 data at ATLAS and CMS are explained in a two-Higgs doublet model with additional real singlet scalar (2HDM+S) [3][4][5][6][7][8]. 1 In this model the mass of the heaviest CP-even scalar H is considered in the interval 2m h ≤ m H < 2m t , where m t is the mass of top-quark.The 2HDM+S model with different mass ranges of scalars are also well motivated from theories BSM [10][11][12][13][14], possibilities of existence of BSM scalars at the Large Hadron Collider data [15,16] and future e + e − collider [17,18], to explain dark matter abundance [19][20][21], di-Higgs production [22], excess seen at 96 GeV [23,24] and to explain recent CDF [25] W -mass measurements [26].Heavy scalars searches in WW /ZZ channels are considered at CMS and ATLAS [27][28][29].The discovery potential of heavy Higgs-boson through the resonant di-Higgs production in HL-LHC and FCC-hh has been studied with 4τ and bbγγ channels in "xSM" model [30]. 2 Even the physics of dark matter and axions or axions like particles can be connected with CP-even or odd scalars [31][32][33].
In this work we investigate the possibility of probing H at the proposed future electron-proton colliders via the deep-inelastic scattering charged-current (CC) process.The proposed Large Hadron Electron Collider (LHeC) facility at CERN provides sufficient center of mass energy √ s ≈ 1.3 TeV following electron (proton) energy of E e(p) = 60 GeV (7 TeV) to explore the allowed mass range of H. Interestingly with this mass range one can explore the resonance H via its decay to W ± and Z−bosons.In this work we consider H → W + W − , where W ± decay to hadronic final states.However, the mass reconstruction of H through this final state is challenging due to (a) the W ± −boson emanating from heavy H is boosted with respect to the laboratory system, and hence the jets coming from W ± are collimated, and (b) in the e − p production process, the scattered jet from the proton-line is not easily distinguishable from jets coming from W ± (Fig. 1).However, the high rapidity (η j ) region of the scattered jets can be exploited to reconstruct the signal.We also employ a machine learning approach to distinguish the signal and potential backgrounds in this work.
In sect. 2 we discuss the framework needed to perform this analysis.A description of event simulation and tools needed are discussed in sect.3. The mass reconstruction methods are described in sect.4. Summary and discussion of this work is presented in sec. 5.

Model
To investigate the discovery potential of heavy Higgs boson of mass 2m h ≤ m H < 2m t in e − p environment, we consider a model where H corresponds to a real singlet scalar field Φ H which mixes with the SM SU(2) doublet Higgs field Φ.Then the Higgs-boson Lagrangian will be modified and can be written as [34][35][36]: In general the, parameters µ h , µ H , λ h and λ H are all positive in order to have stable potential but ξ may not require any particular sign.We assume that in the above Lagrangian the scalar fields acquire a vacuum expectation values and hence the component fields can be written as: Here the fields G are Goldstone bosons absorbed by the vector bosons, and so no physical pseudoscalar states are left in the spectrum.But the scalar spectrum has two physical states h and H rather than just one of the SM.Also since the singlet do not couple to the SU(2) L ×U(1) Y gauge bosons, they do not contribute to m W and m Z and hence v must take the SM value v = 246 GeV.We can also redefine the coefficient of eq. 1 such that v H = 0. Note that we are not imposing any extra possible symmetries like Z 2 in the scalar sector, and in general φ will mix with the φ H to form the mass eigenstates.We assume the masses of h and H as in previous case, m h < m H , where m h = 125 GeV is taken as the SM Higgs boson and m H as mass of the heavy scalar singlet.The mass eigenstates h and H are related to the gauge eigenstates φ and φ H by a 2 × 2 unitary matrix 3 V : Hence the couplings of the gauge bosons and fermions with h will be same as in the SM if |V 11 | = 1 which implies However in this work we considered |V 11 | = 1 and |V 12 | = 0 .Then the production rates of 3 In general, a 2 × 2 unitary matrix V can be formed with one parameter θ as: Fig. 1 Leading order diagram for signal process pe − → ν e H j, H → W + W − , W ± → j j.Here, q ≡ u, c, d, s and q ≡ d, s, ū, c. the h and H are suppressed by a factor |V 1i | 2 relative to the SM h production rates.The branching ratios (BRs) of h to the SM particles are identical to the SM BRs, while the BRs of heavy H depend on whether the channel H → hh are kinematically accessible.For our analysis we scale the HW + W − coupling with respect to the SM Higgs boson hW + W − coupling.

Event Simulation and Tools
The simulation of CC process (signal) for the heavy scalar H production follows through pe − → ν e H j, where ν e is electron-neutrino (and is the source of missing energy) and j represents jets emanating from proton-line (we refer to this j as scattered or forward jet in the text).Further the decay of H → W + W − and W ± → j j is taken at the matrix element level for this signal process (see Fig. 1).Note that H can also be produced in neutral current process through the fusion of Z-bosons at tree-level as pe − → e − H j, but the cross-section is sub-dominant and approximately 5.5 times smaller than the CC process which follows through W ± -fusion for unpolarized e − beam.
To generate event samples for signal and potential backgrounds we use a Monte Carlo generator MadGraph5 [37], interfaced with a customised Pythia-PGS [38] for parton showers and hadronization (for details see Ref. [39]).The detector simulation is performed using Delphes [40]  In the first column the selection criteria are given.The second column contains the weight of the signal process pe − → ν e H j, H → W + W − , W ± → j j for m H = 270 GeV.From column third to sixth dominant weights for backgrounds are given.Seventh column is weighted total number of backgrounds.All weights are calculated with L = 1 ab −1 .The significance of signal over total background is given in the eight column.In the last column significance with δ sys = 2% is estimated.parameters optimised for the detector in LHeC.The jets are clustered using FastJet [41] with the anti-k T algorithm [42] and distance parameter R = 0.4.The factorisation and renormalisation scales for the signal simulation are fixed to the heavy Higgs boson mass m H .The background simulations are done with the default MadGraph5 dynamic scales.The polarization of the charged electron is assumed to be −80%.This enhances the polarized cross-sections by ∼ 1.8 times with respect to the unpolarized e − beam for both signal and background.
An estimation of cross-section for the signal 4 and potential background processes are calculated at leading order using MadGraph5 with applied minimal cuts on transverse momentum of jets p T j > 20 GeV, jet pseudo-rapidity −1 < η j < 5 and there is no requirements for transverse missing energy E miss T , and presented in Table 1 for a benchmark value of m H = 270 GeV.Before going for mass reconstruction of H with appropriate methodologies we made preliminary selection criteria to estimate the significance, and those are as follows: (a) since the final state of signal (Fig. 1) contains 4 We scaled the HW + W − coupling such that the cross-section for signal should be ∼ 20 times less than the corresponding cross-section of h with m h = 270 GeV.This factor is very optimistic in order to not evade any theoretical and experimental limits for m H cross-section in the considered signal.In Table 2 we presented the number of weighted events of signal (S) and backgrounds (B) at luminosity L = 1 ab −1 after these selection criteria where in the last column significance of signal over background is calculated with formula It is interesting to note the there is slight increase (≈ 2.3%) in σ after the selection of five leading jets, though E miss T > 20 GeV reduces the σ by ≈ 7% in comparison with initial weighted events.In order to estimate the systematic errors in the shape of signal and background distributions due to detector resolution, E miss T measurement, reconstruction efficiency etc., as well as on the expected number of events we calculate significance as function of systematic factor δ sys : σ (δ sys ) = S/ B + (δ sys • B) 2 and added the estimation in Table 2.
It is important to investigate and account for these observations during the mass reconstruction procedure of H and further discuss in next section.

Reconstruction of the invariant mass
In order to reconstruct m H it is important to select appropriate hadronic jets in our signal and observe the features with respect to the dominant backgrounds.To begin the procedure we must isolate and identify the hadronic jets after detector simulations.In Fig. 2a, number of hadronic jets are shown which are constructed with requirement on ∆ R = 0.4. 5It is clear that the number of hadronic jets from ZZ backgrounds are competitive in comparison to the signal.Also a similar feature can be observed in the pseudo-rapidity of forward jets, η j , as shown in Fig. 2b.And therefore, the ZZ backgrounds needs to be optimize with the help of missing transverse energy cut E miss T > 20 GeV (see Fig. 3) and corresponding significant reduction in weighted events can be seen in Table 2.
To compare the reconstructed invariant m H with the truthlevel mass, the hadronic jets originated from W + and W − bosons are selected using the truth-level information (note that W ± are decaying from H in signal).An illustration of invariant mass of two-jets, m j j , from W + (W − ) is shown in Fig. 4a (Fig. 4b).Note that along with signal we only showed backgrounds with W ± final states as there is no information stored for Z-bosons in truth-level.
After analysing these observable, we apply three different methodologies to reconstruct m H in the mentioned channel and compare the significance.In Method 1, selection of four p T -ordered leading jets are considered.Method 2 is to select four hadronic jets excluding the most forward jet (which corresponds to largest η j ), while a high-level machine learning (ML) techniques used in Method 3.

Method 1: selection of four p T -ordered leading jets
In this method, all jets are sorted according to the corresponding p T and the four out-of five leading (p T -ordered) jets are selected from the weighted signal and background events.We expect an inherent uncertainty in this method from the forward jet (which may not originate from either of W + or W − ) and this may contaminate the reconstruction of m H in the signal.The invariant mass distribution of four selected jets, m 4 j ≡ m H , using this method is shown in Fig. 5a.The corresponding significance σ are shown in Table 3 (second column).Here σ m 4 j represents the significance in full available range in m 4 j , and σ max is the range where maximum σ can be achieved.This method results maximum of 4.0σ within the invariant mass-range of m 4 j ∈ [190, 540] GeV and the improvement from full range of m 4 j is by 2.5% with initial events.However after selecting E miss T > 20 GeV, accuracy of measurement improves with 4.9σ in m 4 j ∈ [190, 540] GeV (4.1% improvement from full range).And an improvement of ∼ 16% in comparison with significance shown in Table 2.

3.6σ
Table 3 The significance is calculated at each stage of the optimised selection criteria using σ = S/ √ B and σ (δ sys = 2%) = S/ B + (δ sys • B) 2 where S and B are the expected signal and background yields at a luminosity of 1 ab −1 respectively.Here σ m 4 j represents the significance in full available range in m 4 j .And σ max (m 4 j ) is the range where maximum σ can be achieved, corresponding minimum to maximum range m 4 j ∈ [m min 4 j , m max 4 j ] are specified for each approach (corresponding S and B are given in the next row).
From distribution of m 4 j in Fig. 5a it is noticed that the width of invariant mass is wide and reason for this could be the contamination of forward jets as discussed.Thus a method to narrower the width suppose to result better mass reconstruction by removing the forward jet and discussed in next subsection.

Method 2: elimination of forward jet
As Method 1 slightly improved the accuracy in the measurement of m H through four p T -ordered leading jets using m 4 j (comparing the significance obtained in Table 2), we employ a second approach where forward jet corresponding to largest η j are eliminated and remaining four p T -ordered jets are selected.In addition we also verified that the selected jets originate from W ± -bosons using the truth-level information.The corresponding invariant mass distribution is shown in Fig. 5b.Clearly the m 4 j distribution has narrower width comparing with Method 1 (Fig. 5a) and this approach should improve the accuracy of measuring m H .This approach also uses the same number of initial weighted events as the above method.When reconstructing the invariant mass of H, this method achieved a maximum significance of 5.0σ before applying the missing energy cut.A maximum significance of 6.1σ can be attained with 24% improvement after selecting jets with E miss T > 20 GeV.In Table 3 (third column) significance obtained for Method 2 is shown.Overall applying this method shows improvement in significance of about 33% in comparison with significance obtained selecting at least 5 j with E miss T > 20 GeV as in Table 2.

Method 3: machine learning technique
Though the use of Method 2 results a higher significance of about 6σ shows the efficacy of this approach to reconstruct m H , we also analyse the event samples using high-level ma- chine learning technique as Method 3 and compare the significance.For our analysis we employed the Toolkit for Multivariate Data Analysis (TMVA) package [43] in which all multivariate methods respond to supervised learning only, i.e., the input information is mapped in feature space to the desired outputs.
To start with, the four-momentum information of jets from the signal and backgrounds' event samples are used to construct the low-level observables like jet's transverse momenta p T j , pseudo-rapidity η j , azimuthal angle φ j , energy E j and mass m j .The signal samples with these observables are passed in two equal proportions for training and testing, respectively, to reconstruct m 4 j .Here we include three different analysis routines known as: Boosted Decision Trees with gradient boosting (BDTG), Deep Neural Network (DNN) and Linear Discriminator (LD).The details of all three analysis procedure and mechanism are documented in Ref. [43].All background samples are passed through evaluation with default parameters in TMVA regression application with Boosted Decision Trees (BDTG), Deep Neural Networks (DNN) and Linear Discriminants (LD).The combination of outputs are shown in Fig. 6.The default pa-rameters are later tested and tuned to give maximum significance with target mass as 270 GeV. 6In Table 3, the significance obtained through all three analysis techniques are presented.All three analysis routines provides the maximum significance of mass measurement ∼ 5σ , which is a little less in comparison with Method 2 while is similar to Method 1.Though the improvements after E miss T > 20 GeV requirement are high in comparison with Method 1. However among the three analysis routines the DNN performance is better with maximum significance of 4.9σ in m 4 j ∈ [210, 270].
By analysing the m 4 j distributions shown in Fig. 6 the ML algorithms used here seems to accumulate the signal as well as the backgrounds region towards the target mass.Though the significance are consistent with other two methods and even better than Method 1 by using DNN as shown in Table 3.

Scanning m H
Among the three methods, the Method 2 -elimination of forward jet corresponding to the largest η j is the most efficient to reconstruct the m H .So we will use this technique for two different m H = 250 and 300 GeV, and compare the significance with the benchmark m H = 270 GeV taken in this study to understand how other masses affect the sensitivity of measurement method(s).This will allow us to investigate such masses at LHeC with considered √ s ≈ 1.3 TeV.For completeness we also analyse and compare the significance with Method 1 and DNN routines (as this method gives highest significance in comparison to BDTG and LD).
In Fig. 7a, Fig. 7b and 7c we compare m 4 j (signal only) using Method 1, Method 2 and DNN routines for m H = 250, 270 and 300 GeV, respectively.In Table 4, the maximum significance obtained using Method 1, Method 2 and DNN are shown as in Table 3.A comparison with m H = 270 GeV shows ∼ 1σ difference in significance for both masses.Since the cross-section of m H = 250 (300) GeV is higher (lower) than the corresponding cross-section of m H = 270 GeV, the enhancement (suppression) in significance is expected.

Discussion and summary
The existence of heavy particles are usually known in physics BSM and strategies to search such particles in colliders are very important.Specially in the scalar-sector it is most important since these particles are responsible for mass generation of several bosons and fermions in SM as well in BSM.In this article we attempted to prescribe mass reconstruction methods for a heavy scalar boson in a mass range of m H ∈ (2 m h , 2 m t ), where H particularly decays to hadronic jets through W ± and the production is followed through chargedcurrent in the LHeC environment.
As a benchmark, a heavy scalar of mass m H = 270 GeV produced in CC channel in LHeC with E e = 60 GeV and E p = 7 TeV.Further we considered H → W + W − and W ± → j j channel to develop a prescription for mass reconstruction.In doing so we explained the possible methods of selecting final state hadronic jets as the scattered jets in this channel are the source of contamination.Overall Method 2 gives a significance of about 6σ using m 4 j , which is better compared to the other two methodologies discussed.It is also noted that E miss T > 20 GeV plays a significant role to improve the significance only when a proper selection of four hadronic jets are taken out of at least five jets.Similarly, a significant results for mass reconstruction of m H = 250 and 300 GeV with 7σ and 5σ , respectively, indicates the efficiency to discover such heavy masses at future LHeC.By accounting for the systematics effect of 2% mentioned, the significance reduces from 6.1σ to 5.4σ for m H = 270 GeV in Method 2.
Future opportunities: A similar analysis can be performed with H → ZZ, Z → + − , j j in addition with the neutral current channel pe − → e − H j. Also, these studies can be carried forward in the HL-LHC and proposed FCC facilities.

Fig. 2 (
Fig. 2 (a) Multiplicity of jets in signal and backgrounds.(b) The pseudo-rapidity distribution of the forward jet after five jet selection in signal and backgrounds.

Fig. 4
Fig. 4 Invariant di-jet mass distribution m j j from truth-level information of (a) W + and (b) W − , where H → W + W − with m H = 270 GeV.

Fig. 6
Fig.6Invariant mass distribution m 4 j of the trained signal and evaluated background sample using the BDT, DNN and LD method.

Table 1
Total cross-sections (in fb) for signal production (see text) and potential backgrounds with E e = 60 GeV and E p = 7 TeV.The polarisation of e − is taken to be −80%.The first row represents the signal process and the other four rows are for the dominant background processes.

Table 2
A summary table of event selections.
with cuts signal (S) e − WW + j e − ZZ + j v e WW + j v e ZZ + j Total Background (B) S/ √ B σ (δ sys )

Table 4
). Same as Table3for m H = 250 and 300 GeV in comparison with m H = 270 GeV.