Studies on the $H\rightarrow bb$ cross section measurement at the LHeC with a full detector simulation

The future Large Hadron electron Collider (LHeC) would allow collisions of an intense electron beam with protons or heavy ions at the High Luminosity-Large Hadron Collider (HL-LHC). Owing to a center of mass energy greater than a TeV and very high luminosity (~1 ab{-1}), the LHeC would not only be a new generation collider for deep-inelastic scattering (DIS) but also an important facility for precision Higgs physics, complementary to pp and e+ e- colliders. Previously, it has been found that uncertainties of 0.8% and 7.4% can be achieved on the Higgs boson coupling strength to b- and c-quarks respectively. These results were obtained in the fast simulation frameworks for the LHeC detector. Focusing on the dominant Higgs boson decay channel, H to bb, the present work aims to give a comparison of these results with a fully simulated detector. We present our results in this study using the publicly available ATLAS software infrastructure. Based on state-of-the art reconstruction algorithms, a novel analysis of the bb decay could be performed leading to an independent verification of the existing results to an exceptionally high precision.


Introduction
A future Large Hadron Electron Collider (LHeC) [1,2] at CERN would collide 7 TeV LHC proton beams and 60 GeV electron beams at a luminosity of 10 34 cm −2 s −1 . This would take place in parallel to the proton-proton collisions of the LHC. The design for this electron accelerator is based on a linac-ring ep collider configuration with two superconducting linacs, each below 1km in length, operating in continuous wave (CW) energy recovery mode [2,3]. The main focus of the LHeC physics program is deep inelastic scattering (DIS) physics, probing a completely new area of the low-x phase space, allowing the precise determination of proton and nuclear parton distribution functions (PDFs) [4]. PDFs are an essential pre-requirement for any fututre high energy hadron collider(s), including LHC. In addition to the DIS program, searches for physics beyond the Standard Model, such as leptoquarks, contact interactions, as well as RPV and SUSY promise significantly higher sensitivities than is currently possible at existing colliders. The LHeC also has significant potential for measurements in the Higgs sector. Here, Higgs boson production via vector-boson-fusion and its decay into b-and c-quarks could be much cleaner than at the LHC [5], allowing us to probe the relevant Higgs couplings to a higher precision. This is due to the clean final state, absence of pile-up, unique and simple Higgs production mechanism, and the redundant reconstruction of the DIS kinematics using both the leptonic and the hadronic final state. Therefore, it is of great interest to study the prospects for Higgs production in ep collisions and examine the possible decay modes carefully. The previous results, published in Refs. [1,2,5], have been obtained in several independent analyses. They used a Delphes [6] based simulation framework adapted especially to the DIS environment of the LHeC detector.
The current analysis aims to independently verify the previous results using the full event simulation framework of the ATLAS detector which appears to be well suited within the limitations of detector differences to such a comparison. The current study proceeds in three steps: firstly, we reproduce the results of previous studies based on the Delphes LHeC detector simulation using a cut-based approach for the signal selection. Secondly, we repeat the same study using the fast simulation, also based on the Delphes framework of the AT-LAS detector, which has a slightly different acceptance and detector response functions. Finally, we repeat the study using the official full simulation and reconstruction infrastructure of the ATLAS experiment and compare the results with those of the fast simulations. This three-step procedure will permit an evaluation of the relevance and validity of certain assumptions in such analyses. It should be noted that, in Ref. [2] a dedicated effort was made to refine the final result through careful acceptance and background studies using a Boosted De-cision Tree (BDT) algorithm to obtain a result, which was extended to the six dominant Higgs decay channels. The focus here is on a cut based comparison of H → bb analysis approaches, to look for principal possible differences for which the reaction was simulated.
The paper is structured as follows: The differences between a future LHeC detector and the ATLAS Experiment at the LHC are briefly discussed in section 2. Details of the Monte Carlo (MC) event generation and detector simulation are discussed in section 3. Our implementation of the previous analysis approach [2] is validated for the study of H → bb at the LHeC and the transfer of this to the ATLAS experiment, based on fast detector simulations, is discussed in section 4. The results based on the full detector simulations are summarised and discussed in section 5. A short discussion on forward electron tagging and its impact on backgrounds are discussed in section 6. Finally, we conclude our discussion in section 7.

Comparison of a dedicated LHeC detector and the ATLAS experiment
The proposed LHeC detector design has to maximise the coverage in the forward and backward regions of the colliding beam, and be asymmetric in beam direction. This reflects the asymmetry in the energy of the colliding particles [2]. The detector dimensions are of the order of 13m in length and 9m in diameter, allowing the reuse of the magnet from the L3 experiment. Hence, this experiment has a much smaller footprint than that of the ATLAS and CMS detectors. 1 Starting from the beam line and moving outward, the inner most component is a tracking detector for the reconstruction of charged particles with a transverse momentum resolutions down to 10 −3 GeV −1 with an impact parameter resolution of 10 µm. The coverage in pseudo-rapidity for the inner barrel is |η| < 3.3, larger than the current setup of the ATLAS detector, which allows a track reconstruction within |η| < 2.5. 2 The inner detector is followed by an electromagnetic (EM) calorimeter which could be based on liquid argon technology, similar to ATLAS. The hadronic calorimeter of an LHeC detector could be based on an iron-scintillator 1 In the following we use a right-handed coordinate system to describe the ATLAS and LHeC detectors with its origin at the nominal interaction point (IP) and the positive z-axis along the proton beam direction. The x-axis points from the IP to the center of the LHC ring, and the y-axis points upward. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudo-rapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). Angular distance is measured in units of ∆R ≡ (∆η) 2 + (∆φ) 2 2 It should be noted that the upgrade of the ATLAS detector for the high luminosity phase of the LHC foresees an extension of the tracker coverage setup, surrounds the 3.5T magnet coil system and is enclosed within a muon tracker system. The coverage of the full calorimeter system is the same as the tracking detector, i.e. −4.3 < η < 4.9 which is comparable to the ATLAS coverage of |η| < 4.9. The muon system of an LHeC detector and that of ATLAS cover a pseudo-rapidity range up to 4.0 and 2.7 respectively. However, this difference is of minor importance for this study. One of the largest differences between ATLAS and LHeC detectors are dedicated calorimeters in the end caps of the LHeC detector to precisely measure forward high energy products (silicon-tungsten) or the scattered electron (silicon-copper). The identification and reconstruction of electrons in the forward region is of particular important to classify neutral current processes.
In order to study H → bb processes, the identification of particle jets that originate from b-quarks, known as b-tagging, is typically based on information of secondary vertices and track impact parameters, i.e. observables which are based on the inner detector of the experiment. Given the significantly larger coverage of the tracking detector of the LHeC compared to ATLAS, the b-tagging coverage is expected to be significantly improved. However, the minimal transverse momentum requirement on particle jets in the H → bb process is 20 GeV for the cut-based analysis, implying nearly no jets originating from b-quarks beyond η > 3.0, see section 4 below. The effective difference in the b-tagging between an LHeC detector and ATLAS is therefore rather small. We argue that the effective coverage of the ATLAS detector is therefore comparable to the acceptance of the LHeC detector when it comes to the study of the H → bb and H → cc processes.

Monte Carlo samples and detector simulation
The proposed baseline energy of the electron beam is 60 GeV, which in combination with 7 TeV proton beam results in a center of mass energy of 1.3 TeV. This is about four times that of its predecessor, HERA [7] at DESY which had a center of mass energy of 319 GeV. In addition the expected luminosity is about three orders of magnitude higher.
Higgs bosons in ep collisions will be produced through vector boson fusion via either a charged current (CC) or a neutral current (NC) interaction, depicted in Figure 1.  The MadGraph5 generator [8] has been developed to model the hard scattering of proton-electron collisions for all relevant signal and background processes, using the CTEQ6L1 PDF set [9]. The factorisation and renormalisation scales are taken as the mass of Higgs boson, m h = 125 GeV for the signal processes, while a dynamical scale setup has been used for the background processes. The showering and hadronisation of the hard scattering events was carried out using Pythia8.303 [10]. To control the cross section of the background processes during the event generation, several requirements on the transverse momentum, p T , pseudo-rapidity, η, of the final charged leptons and quarks as well as on the invariant mass of two final state quarks have been applied.
At least 100k events for all signal and background samples have been produced, in order to get sufficient statistics after final selection. A summary of all generated samples, including the applied generator-level cuts and the corresponding cross section predictions, are summarised in Table 1.
Once all sample for the signal and background processes are available on generator level, the detector response for a future LHeC detector and ATLAS has been simulated. Two different approaches have been used here: the Delphes framework [6] allows for a fast simulation of an approximated detector response for typical detector in high energy particle physics. The simulation includes a tracking system within a magnetic field, electromagnetic (ECAL) and hadronic calorimeter (HCAL) as well as a muon system. High-level objects like isolated electrons, particle jets or missing transverse energy are reconstructed using observables such as tracks and energy deposits in the calorimeter. The stable charged particles on generator level with a minimal transverse momentum (e.g. p T > 100 MeV) are subjected to track reconstruction. The track reconstruction efficiency as well as the resolution and the momentum scale is parameterised against p T , η and φ. Particles on generator level that reach the calorimeter system deposit energy in the electromagnetic and hadronic calorimeter cells. The relevant cells can then be grouped together in one calorimeter tower, which are then used for jet reconstruction as well as the calculation of missing transverse energy. The resolutions of the electromagnetic and hadronic calorimeters are independently parameterised depending on the particle kinematics, a stochastic term, a noise term, and a constant term. Reconstruction and identification efficiencies of leptons are also parameterised within the Delphes software. The resulting reconstructed objects, e.g. particle-jets, electron, or muons which are used for actual physics analysis, only provide a first approximation of a real detector response. Similar to the previous physics studies for the LHeC, we use Delphes for the fast simulation of the LHeC detector response, based on a dedicated configuration file [11]. In addition, we use the standard ATLAS configuration file available in the Delphes software framework, based on Refs. [12][13][14], for the fast simulation of the ATLAS detector.
In contrast to a fast detector simulation, a full simulation of a particle collisions in an LHC detector starts from the theoretical modeling of the interaction (event generation), resulting in particles which can be considered stable during their passage through the actual detector. The interaction between these particles and the detector are typically simulated using the Geant4 Signal Table 1. The cross section of the signal and all possible background samples for corresponding generator-level cuts are shown in the table. 4 Here, q represents either a quark or anti-quark of any SM flavor except top quark and = (e ± , µ ± ). Whenever we mention b-quark (anti-b-quark) we specify that the generator has at least the same number of b-quarks (anti b-quarks) at the parton level.
framework [15], which is able to simulate the interaction between final-state particles and the detector on the microscopic level. In the Geant4 simulation, each particle produced by the event generator is tracked stepby-step through the simulated detector. At each step, physical processes such as decays and interactions with material are simulated. If the interaction takes place in an active part of the detector, a hit is recorded. From these hits, the simulated response of the sub-detector is calculated in a process called digitisation. The output of this process forms raw data objects (RDOs) which should be of the same format as the real detector electronics are expected to deliver. Based on these RDOs, dedicated reconstruction algorithms are applied, which inference all relevant physical observables, such as the momentum, the trajectory, charge and flavor of particles. Therefore, it results in significantly more realistic predictions in particular when it comes to the reconstruction of fake signatures, e.g. a reconstruction of an electron which actually was caused by a particle jet. The ATLAS software framework, Athena [16], which is based on the Gaudi framework [17], contains the full simulation workflow of the experiment, starting from event generation, simulation and digitisation, up to re- 4 The signal cross-section is corrected for the correct branching ratio of H → bb.
construction. It is publicly available [18] and was setup on the Mainz computing cluster Mogon, independently from the ATLAS Collaboration. It was used to fully simulate all relevant signal and background samples in Table 1. In a second step, we convert the event information at reconstruction level of the full simulation to the same format used for the studies based on the fast simulation samples.
While the effect of triggers at the LHeC or ATLAS was not studied here explicitly, we argue that a dedicated H → bb trigger will have a sufficiently high efficiency so that the impact on the subsequent analysis is minimal.

Validation of the analysis strategy using the fast-simulation of a LHeC detector
The full potential of Higgs physics at the LHeC can only be realised using advanced analysis techniques, as is discussed in detail in [2]. However, we need a baseline analysis model which allow for direct and simple comparison of the detector response for several different signal and background processes. We argue that a validation of a simple cut-based analysis with a full simulation consolidates the more advanced techniques.
In a first step, the cut-based LHeC H → bb analysis [2,19] has been repeated using the Delphes LHeC detector simulation on our samples. Jets, reconstructed with anti-kT algorithm, with cone size of R = 0.4 and a minimal transverse momentum of p T > 5 GeV within a pseudo-rapidity range of |η| < 6.0 are pre-selected. Events with a reconstructed electron in the forward region are vetoed in order to suppress NC interactions. For the LHeC study it is assumed that an additional forward electron tagging will be available which efficiently reduces NC processes as well as the final state signatures of photo-production [2]. We therefore do not consider the pe − → e − bbq and pe − → e − tt → e − bqqbqq process for the validation of our results.
All remaining events are required to pass several kinematic selection requirements to select DIS induced processes: The missing transverse energy E T , defined as the negative vector sum of all reconstructed cluster energies in the transverse plane, is required to be greater than 30 GeV. Moreover, the fraction of the electron energy carried by the (virtual) propagator in the proton rest frame, y h , calculated using the Jacquet-Blondel method [20], where y h = hadrons E−pz 2Ee with E e =60 GeV, is required to be smaller than 0.9. In addition, the negative transferred four momentum squared, 1−y h , has to be larger than 500 GeV 2 . Since the signal process yields two b-quarks and one light-quark in the final state, each event is required to contain at least three reconstructed particle jets with a p T > 20 GeV. Two of these jets must be b-tagged, i.e. identified to be originated from a b-quark, within the detector region |η| < 2.5. The jet with the highest p T which is not b-tagged is referred as light-jet throughout the following passage. The top-quark related background processes are vetoed by excluding events with an invariant mass of the two b-jets and the light-jet below 250 GeV and events for which the invariant mass of one b-jet and the light-jet is below 130 GeV.
Furthermore, the events are required to have at least one reconstructed jet in the forward region, where η > 2.0, and the ∆Φ values between the b-jets and E T is required to be larger than 0.2. The invariant mass of the two b-jets, m bb , is required to be within a window of 100 and 130 GeV, which is defining the final signal region. The expected event yield as well as the m bb distribution for the signal and background processes, for an integrated luminosity of Ldt = 1ab −1 , after the event selection are shown in Table 2 and Figure 3. These include both the cut-based LHeC CDR analysis and our analysis.
The expected signal over background ratio using the LHeC fast simulation changes from 2.9 (in LHeC CDR) to 2.5 ± 0.2 (in our study) for the cut based analysis of the samples. The expected event yields in the signal region agree well for most processes within the statistical uncertainty. Several differences can be explained by different generator settings, e.g. the usage of Pythia8 instead of Pythia6 [21] for the showering  Table 2. Expected event yields for signal and background processes in the signal region (100 < m bb < 130), for Ldt = 1 ab −1 , from the cut-based analysis of the official LHeC CDR [2] (taken from Figure 3) alongside this study. in this study. 5 We observe a significant difference in the predicted charged current processes involving Z bosons that decay into b-quarks.
While the expected number of those background events is smaller in the signal region (i.e., for 100< m bb <130 GeV) in our study, the overall number of CC-Z events is smaller by a factor of roughly two 6 . However, this difference does not impact significantly the further analysis, as it contributes less than 10% to the overall background contribution.
It should be noted that the estimated photo-production background in our study does not rely on any electron tagging in the forward region. The cross section calculation using LHeC CDR for multijet photo-production background (pγ → qqq) is estimated to be ∼ 170 pb with a reduced invariant mass cut on two light-or bquarks (i.e., by considering, min. m bb = m jj = 65 GeV).
In order to optimise the MC production, we demand at least two b-jets at the generator level (for the process, pγ → bbq) which reduces the cross-section to ∼ 0.9 pb, results in Table 1. Figure 3 also indicates that the shapes of the signal and backgrounds processes in the m bb distribution can, in general, be successfully reproduced. We observe a shift towards lower masses in the signal samples when comparing these results to the LHeC CDR, which was traced to differences in the underlying rapidity distribution of the Higgs-Boson 7 . However, these differences will not be relevant for the purpose of our study, i.e. the validation of expected physics performance using a full detector simulation.

Electron-proton collisions at the ATLAS detector using fast and a full detector simulation
The same signal selection cuts are applied to the analysis using ATLAS detector simulations for both the fast and full detector modules. The remaining number of the signal events after each cut is shown for three independent simulations in Table 3. The largest difference is seen for the jet requirement cuts, where 25 − 35% less events survive for the ATLAS detector, mainly due to the lower η coverage of the detector components and differences in the b-tagging efficiencies for different η values of the particle jets. However, this difference is largely mitigated by the subsequent rejection cuts for top-quark events, where relatively more events with bjets at a large rapidity fail the selection. The expected signal yield between the fast simulation of ATLAS and LHeC agree within 15%.  Table 3. Cutflow for the signal samples pe → νH(→ bb)j normalised to Ldt = 1 ab −1 for different detector simulations using Delphes for the LHeC detector and ATLAS, as well as a full simulation of the ATLAS detector.
In a second step, the signal selection has been applied on the fully simulated signal and background samples of the ATLAS detector. The largest difference of about 10% compared to the fast simulation is induced by kinematic requirements on E T , Y and Q 2 , where the dominant effect arises from the E T distribution. The E T resolution is significantly worse in the full simulation compared to the assumptions made within Delphes, thus significantly less events pass the E T requirements. The cut-flow between the fast and full simulation is consistent until the requirement of a light jet, where a large difference of 20% has been observed. A further significant difference is introduced by the requirement on the invariant mass of the two b-tagged jets, yielding a final difference of 30%. This is caused by the inferior jet energy resolution in the full simulation compared to the fast simulation. This causes a broadening of Higgs signal in the full simulation.
The differences between the fast and full simulation for all background samples is summarized in Table 4. Overall a good agreement can be seen. A comparison of selected kinematic distributions, namely the invariant mass of the two b-jets as well as the p T distribution of all selected jets for the signal and background samples using the fast and the full simulation is shown in Figure 4.
The expected signal over background ratio using the ATLAS fast simulation is 3160/2560 ≈ 1.2, while the full simulation yields 2270/2520 ≈ 0.9. This difference impacts the expected precision of the cross section of the H → bb process, which can be experimentally determined via where N Data and N Background denote the expected number of data and background events, the acceptance and selection efficiency of the signal process, and Ldt the expected integrated luminosity.   CC-Z 240±20 220±20 PA bbq 1340±110 1520±130 NC-tt 1200±100 1160±110 Table 4. Expected event yields for the signal and background processes in the signal region (100 < m bb < 130), for Ldt = 1 ab −1 , for both the ATLAS fast and full simulation.
The statistical uncertainties on the data and the background for the different scenarios are summarised in Table 5. The systematic uncertainties on the selection efficiency are expected to be of a similar size to those in recent studies of top-quark pair production at the LHC [22] and are assumed to be ≈ 0.7%. The sys-tematic uncertainties on the background contributions are assumed to be 2%, since all background processes can, in principle, be studied in dedicated control regions and hence the full theoretical uncertainty on the background prediction need not be applied. This results in overall uncertainties, of 2.5% for the LHeC scenario and 3.3% and 4.4% for the fast and the full simulation of ATLAS detector respectively. This indicates a difference of only 1% between the fast and full ATLAS simulation.  Table 5. Calculation of the expected uncertainties on the cross-section for a given number of signal and background event in the different scenarios.
While the signal over background ratio is smaller in the full simulation mainly due to the limited jet energy resolutions, implying larger uncertainties, the general validity of the previously reported physics performance for the H → bb process is confirmed. A cross-section uncertainty on the H → bb process is expected to be on the percent level with an integrated luminosity of L = 1ab −1 . This assumes that no further background processes contribute and NC induced interactions can be efficiently vetoed by a forward electron tagging system. However, the signal selection is not currently optimised and only a cut based approach has been implemented for the ATLAS based result. In particular, making use of advanced signal classifiers such as boosted decision tree or deep neural networks have the potential to increase the signal selection efficiency by a factor of three to six, while reducing the background contribution. 8 It has been suggested in [2] that the precision on the cross section can be significantly improved below 1%, since most experimental uncertainties, such as b-tagging efficiencies can be measured with high precision in data, in particular using Z → bb decays.
Given the small kinematic differences within the fiducial phase-space definition of the H → bb study, it is valid to assume that the observed differences between the fast and the full ATLAS detector simulation will be good first-order approximation for the expected differences between a fast and full simulation of an LHeC detector with state-of-the art reconstruction algorithms. It should be noted that this only holds in the context of the H → bb process and will not be naively transferable to other processes, such as the study of DIS since here the forward detectors play a significantly larger role. Nevertheless, our study using a full detector simulation gives no indication that the expected physics performance in the Higgs boson sector at a future LHeC detector is unrealistic.

Background expectations without forward electron tagging
The expected rapidity range of the scattered electrons in photo-production processes that lead to a signal in the final state configuration is between −15 and −5. However, the corresponding cross sections are sufficiently small that no forward electron tagging is necessary. The scattered electron in the neutral current interactions exhibits a very low p T < 0.5 GeV (for more than 90% of the events) and are expected to cover a rapidity range down to −10 < η < −4, where the NC top-quark pair production is expected to peak at −10. It was so far assumed that a forward electron tagger could reject those processes. Since no simulation for such forward electron tagging exists in the Delphes framework, nor in any available Geant4 based simulation, we also studied the expected background contributions when no forward electron tagging can be applied. The expected number of additional background events is also shown in Table 4. The absence of forward electron tagging would therefore enhance the number of expected background events by nearly double the value compared with forward electron tagging, for both the fast and the full simulation. The resulting distributions for the signal and background processes for an LHeC detector and the fast and fully simulated ATLAS detector is shown in Figure 5.
Significantly less NC top-quark pair events are expected at the LHeC, due to significantly larger coverage of the electron identification at the LHeC detector. The top-quark pair background at an ATLAS-type detector can be reduced by more than a factor of 3, by employing the forward electron reconstruction, which is now possible within the available detector design.
As a preliminary conclusion, the measurement of the H → bb cross section will also be possible when no forward electron tagging is applied, but with a reduced precision. An independent cross check of the expected contributions from photo-production processes is necessary.

Conclusion
In this work, we estimated the prospects of the H → bb cross section measurement at the LHeC with an integrated luminosity of Ldt = 1ab −1 using the full detector simulation and state of the art reconstruction algorithms of the ATLAS Experiment. A signal over background ratio of 0.9 and a cross-section uncertainty below 4.5% are expected, where approximate statistical and systematic uncertainties have been considered. The signal selection efficiency is lower by 20% in the full detector simulation, which can be explained by the differences in the jet-energy and E T resolutions. In order to reach a sub-percent precision, the signal over background ratio would need to improve by a factor of 5 to 6. Given that simple optimisations on the top-quark rejection cut as well as the signal region definition in the full simulation would lead to an improvement by a factor of 2.5, it is realistic that a multivariate analysis could yield a final signal over background ratio of the required value. Overall, our result is in agreement with the previously obtained result in a cut based analysis. However, the expected background of photo-production processes might require an additional cross check. In summary, our studies further consolidates the strong case for the LHeC as an excellent opportunity for precision studies within the Higgs sector.