Evaluation of measurement accuracies of the Higgs boson branching fractions in the International Linear Collider

Precise measurement of Higgs boson couplings is an important task for International Linear Collider (ILC) experiments and will facilitate the understanding of the particle mass generation mechanism. In this study, the measurement accuracies of the Higgs boson branching fractions to the $b$ and $c$ quarks and gluons, $\Delta Br(H\to b\bar{b},\sim c\bar{c},\sim gg)/Br$, were evaluated with the full International Large Detector model (\texttt{ILD\_00}) for the Higgs mass of 120 GeV at the center-of-mass (CM) energies of 250 and 350 GeV using neutrino, hadronic and leptonic channels and assuming an integrated luminosity of $250 {\rm fb^{-1}}$, and an electron (positron) beam polarization of -80% (+30%). We obtained the following measurement accuracies of the Higgs cross section times branching fraction ($\Delta (\sigma \cdot Br)/\sigma \cdot Br$) for decay of the Higgs into $b\bar{b}$, $c\bar{c}$, and $gg$; as 1.0%, 6.9%, and 8.5% at a CM energy of 250 GeV and 1.0%, 6.2%, and 7.3% at 350 GeV, respectively. After the measurement accuracy of the cross section ($\Delta\sigma/\sigma$) was corrected using the results of studies at 250 GeV and their extrapolation to 350 GeV, the derived measurement accuracies of the branching fractions ($\Delta Br/Br$) to $b\bar{b}$, $c\bar{c}$, and gg were 2.7%, 7.3%, and 8.9% at a CM energy of 250 GeV and 3.6%, 7.2%, and 8.1% at 350 GeV, respectively.


I. INTRODUCTION
Precise measurement of the Higgs boson branching ratios (BRs) is an important task for the International Linear Collider (ILC) program. It is also crucial for the understanding of the nature of electro-weak symmetry breaking and provides a window to investigate physics beyond the standard model (SM). The relatively low background and well-defined initial state of the ILC experiments allow precise, model-independent study of the Higgs boson, which is not an easy task for Large * Electronic address: ono@ngt.ndu.ac.jp Hadron Collider experiments [1,2]. Measurements of the Higgs BRs to bb and cc decays at an e + e − linear collider were reported in Refs. [3][4][5][6][7]. In this study, we investigate the accuracies of BRs of the Higgs to bb, cc, and gg using Geant4 [8] based realistic simulation implemented with a proposed International Large Detector (ILD) [9].
In this study, we assume a Higgs mass of 120 GeV/c 2 and an integrated luminosity of 250 fb −1 , and estimate the accuracies of the BRs at center-of-mass (CM) energies of 250 and 350 GeV. The former value is close to the threshold of Higgs production and thus is considered as initial target of ILC experiments. The latter is close to the threshold of top quark pair production; therefore, Higgs data can be corrected simultaneously with a top threshold study. The difference between kinematical conditions at 250 and 350 GeV could yield different detection efficiencies and thus different BR accuracies. The accuracies at 250 and 350 GeV under the same conditions are studied and compared.
The experimental conditions for this study are described in section II. We selected the Higgs events in three channels: neutrino, hadronic and leptonic. The event selection and background suppression processes are described in the section III. The derivation of the BRs is presented in the section IV, and the conclusion is given in the last section.

A. ILC experiment and Higgs production
The ILC is a future electron-positron (e − e + ) linear collider for experiments at an initial centerof-mass (CM) energy ( √ s) up to 500 GeV, which can be extended to 1 TeV. The production cross section of the Higgs boson is shown in Fig. 1(a) as a function of the CM energy for a Higgs mass of 120 GeV. At a low CM energy, the Higgs boson is produced primarily through the Higgs-strahlung e + e − → ZH process, which has a maximum around 250 GeV when the effect of the initial state radiation is considered. This is about 20 GeV higher than that without the initial state radiation.
At the √ s = 350 GeV, the total cross section is reduced, although the contribution of W/Z fusion is greater than that at 250 GeV. The decay BRs of the Higgs boson in the SM are shown as a function of its mass in Fig. 1(b). The Higgs decays mainly to bb if its mass is below 140 GeV and to W W * in the case of a mass of above 140 GeV.
Higgs analysis modes are categorized in terms of the three Z boson decay channels: Z → νν (neutrino), qq (hadronic), and ℓ + ℓ − (leptonic), as shown in Fig. 2. We assumed the −80% and  Higgs-strahlung (ZH) process at low CM energies, although the neutrino and leptonic channels also include the W W and ZZ fusion processes, respectively.

B. ILD concept
We used the ILD [9] model for this study. The ILD, which is the validated detector concept for the ILC, is equipped with a highly segmented calorimeter and a hybrid tracking system consisting of gaseous, silicon-strip, and silicon-pixel trackers. They provide an excellent jet energy resolution by particle flow analysis, as well as excellent momentum resolution and vertex flavor tagging capability, which are necessary for measuring multi-jet final states in the ILC energy region. All sub-detector components of the ILD are shown in Fig. 3;  The VTX system consists of three double layers of silicon pixel sensors with a 2.8 µm point resolution located at radii between 16 mm and 60 mm, the total radiation length being 0.74%.
The TPC occupies a volume up to a radius of 1.8 m and a half-length in Z of 2.3 m, providing a stand-alone momentum resolution of σ 1/P T ∼ 9 × 10 −5 GeV −1 . The SIT and SET are placed at the inner and outer sides of the TPC with 7 and 50 µm point resolutions in the R − φ and z directions, respectively. The overall momentum resolution of the tracking system (σ 1/P T ) is 2 × 10 −5 GeV −1 ⊕ 1 × 10 −3 /P T sin θ for the momentum range 1-200 GeV [9]. The ECAL consists of 24 X 0 tungsten absorbers with highly segmented (5 × 5 mm 2 ) readouts. The HCAL consists of 5.5 λ I steel absorbers with a 3 × 3 mm 2 scintillator tile readout. With the ILD particle flow algorithm package, PandoraPFA [12], a dijet energy resolution of 25%/ E (GeV) has been achieved for a 45-GeV dijet, which corresponds to a single-jet energy resolution of σ E j /E j = 3.7% [9].

C. Analysis framework and Monte Carlo samples
Monte Carlo (MC) generator samples for the physics study were produced using the Whizard [10], and fragmentation and hadronization processes were simulated by PYTHIA [11]. The SM Higgs branching fractions in PYTHIA are 65.7%, 3.6%, and 5.5% for bb, cc, and gg, respectively.
The generated particles were passed through the Geant4 [8] based detector simulator Mokka [13] with the ILD model. The simulated hits were digitized and then reconstructed by the MarlinReco package; then, the resulting skimmed data were analyzed. The statistics of the simulated Higgs signal samples were 500 fb −1 for both CM energies of both 250 and 350 GeV, whereas those for background processes varied with the signal-to-noise ratio (S/N). They are scaled in the analysis in order to obtain results corresponding to an integrated luminosity of 250 fb −1 . The major SM background processes for the e + e − → ZH analysis are e + e − → ZZ and W + W − ; thus we considered final sample states of ννqq, νℓqq, ℓℓqq, ννℓℓ, qqqq and ℓℓℓℓ. In addition, the qq and tt backgrounds were also considered for the neutrino and hadronic channels (but only for √ s = 350 GeV because we used a top mass of 174.9 GeV/c 2 ). In the leptonic channel, most of the multi-jet backgrounds are well suppressed if dilepton identification is required; thus, only the ℓℓqq and νℓqq backgrounds were considered. We used the 250-GeV samples produced for the ILD letter of intent (LOI) studies [9]; thus, their beam parameters correspond to those defined in the ILC Reference Design Report [3].
On the other hand, the 350-GeV samples were newly produced for this study using the updated beam parameter SB2009 [14]. The instantaneous luminosities were 0.75 and 1 ×(10 34 cm −2 s −1 ) for 250 and 350 GeV, which yield integrated luminosities of 188 and 250 fb −1 , respectively, for about 3 years at 100 days of operation per year.

III. EVENT RECONSTRUCTION AND BACKGROUND SUPPRESSION
Depending on the Z decay mode, the analysis channels are categorized as the neutrino (dijet), hadronic (four-jets) and leptonic channels (dileptons + dijets), which are described in the following subsections.

A. Neutrino channel (ννH)
For neutrino channel analysis, particles in the event are first forcibly clustered into two jets by the Durham jet-finding algorithm. After the dijet clustering, background reductions are applied according to the selection criteria in Table I. At a CM energy of 250 GeV, the Higgs is produced almost at rest because it is close to the production threshold, whereas it is boosted at 350 GeV.
Thus, the cut conditions are optimized to obtain the best S/N at each energy. In this channel, Z boson decays invisibly (νν); thus, the ννqq and νℓqq processes in the SM are the main backgrounds.
To reduce them, a cut on the missing mass (M miss ) is applied. Although this cut decreases the Higgs signal from the WW fusion process, the νℓqq, ℓℓqq and qqqq backgrounds are effectively reduced. qq background is reduced by the following kinematical cuts: the transverse momentum (P t ), longitudinal momentum (P l ), and maximum momentum (P max ). The ℓℓℓℓ background is well reduced by a cut on the number of charged tracks in an event (N chd ). In addition, the νℓqq background reduction is improved by the Y 12 and Y 23 cuts. Y 12 and Y 23 are the maximum and the minimum of y values (scaled jet masses), respectively, required to cluster the event into two jets.
The background reductions for each cut are summarized in Table I   For hadronic channel analysis, particles in the event are first forcibly clustered into four jets.
Next, a Higgs and Z candidate dijet pair that minimize the following χ 2 formula are selected from the four jets: where To select the four-jet-like events, cuts on the number of charged tracks N charged and jet clustering parameter Y 34 are applied. Y 34 is the minimum scaled jet mass y required for four-jet clustering.
The leptonic backgrounds (ℓℓℓℓ, ℓℓqq) are reduced effectively by these selections. In addition, cuts on the thrust and thrust angle are applied to reduce the ZZ background, utilizing the difference between the event shape of the signal (spherical) and ZZ, qq (back-to-back). The numbers of qqqq and qq background events are reduced by a cut on the angle between the Higgs candidate jets (θ H ).
The W W and ZZ backgrounds are further suppressed by cuts on the Higgs and Z candidates after the kinematical constraint fit is applied to the four-jet system as follows. Each jet is parameterized by E j i , θ i , and φ i (i = 1 − 4) and fitted with constraints on the total energy ( i E j i = √ s), the total momentum ( i P j i = 0), and Higgs and Z mass difference where E j i , P j i , θ i , and φ i are the energy, momentum, and theta and phi angles of the i-th jet, respectively. After these cuts are applied, an additional cut is applied on the LR derived from the following input variables: thrust, cos θ thrust , minimum angle between all jets (θ min ), number of particles in Higgs candidate jets, fitted Z mass, and fitted Higgs mass. The likelihood cut position is selected to maximize signal significance; LR > 0.375 for 250 GeV and LR > 0.15 for 350 GeV.
All background reduction procedures are summarized in Table II. The background fractions after all cuts are 80% qqqq and 20% qq at 250 GeV and 60% qqqq, 30% qq and 10% tt at 350 GeV. For leptonic channel analysis, we considered the cases where the lepton is an electron or a muon.
We considered only the ℓℓqq and ℓνqq background processes. First, the following cuts were applied to selected isolated leptons: • Lepton isolation: E cone < 20 GeV (cone angle: 10 • ), • Lepton track momentum: 10 < E lep < 90 GeV at √ s = 250 GeV, where E cone is the energy sum for particles within 10 o of the lepton. The prompt lepton has a smaller E cone than nonprompt leptons. Electrons and muons are identified from their charged tracks as follows: • Electron ID: where E ECAL , E T otal and P denote the ECAL energy associated with a track, total energy deposited in the ECAL and HCAL, and track momentum, respectively. If there are more than two isolated lepton candidates after the electron or muon identification, a pair whose invariant mass is closest to Z is selected. After dilepton identification, forced two-jets clustering is applied to  Table III. After all cuts were applied, the background was dominated by the ℓℓqq whereas the νℓqq was well suppressed.

IV. BRANCHING RATIO MEASUREMENT
After event selection, the measurement accuracies of the Higgs BRs to bb, cc, and gg are evaluated on the basis of a template fitting to the flavor likeness of the Higgs dijets obtained by using the LCFIVertexing package [15]. The probabilities of b and c quarks for each jet are calculated in LCFIVertex using neural net training with a Z → qq samples at the Z-pole.
In addition, another c probability (bc 1,2 ) is also calculated whose neural-net is trained only with Z → bb sample as the background. For Higgs dijets, we define the flavor likeness X (X = b, c, bc) as follows from the x i [x i = b i , c i , bc i (i = 1, 2)] flavor probability of each jet: .
The flavor tagging performance in the ZZ → ννqq sample at the √ s = 250 and 350 GeV is shown in Fig. 4. The ZZ → ννqq samples are compared for each CM energy because they form the same final state as Z → qq, which was used to train the flavor tagging neural network. Figure 4 shows that no significant difference in the flavor tagging performance at √ s = 250 and 350 GeV is observed for any of the flavors.
To evaluate the measurement accuracy of the BRs, the b-, c-, and bc-likenesses of the selected events were binned in a three-dimensional histogram and fitted with those of the template samples, which consist of H → bb, cc, and gg and other background processes. Figure 5 shows the threedimensional histogram projected to the two-dimensional b-and c-likeness axes for the hadronic channel. The probability of entries in each template sample bin is expected to be given by the Poisson statistics: where P ijk and N data ijk are the probability of entries and the number of data entries at the (i, j, k) where N s ijk is the number of entries at the (i, j, k) bin in each H → bb, cc, and gg template; N bkg ijk is the number of entries in the background template sample, which is the sum of the SM background events and the Higgs-to-nonhadronic decay events. Furthermore, r bb , r cc , and r gg are the parameters to be determined by the template fitting. They are defined as the Higgs branching fractions to H → bb, cc and gg, respectively, normalized by that of the SM, Here σ is the Higgs production cross section and σ SM and Br(H → s) SM are the cross section and branching fraction in the SM, respectively. From Eq. (5), the measurement accuracies of σ · Br are obtained as follows; ∆ (σ · Br) σ · Br (H → s) = ∆r s r s (s = bb, cc, gg).
The r s 's values were determined by a binned log likelihood fitting, where each bin probability is given by Eq. (3).
On the basis of the three-dimensional (3D) histogram, 5000 toy MC events were generated using the Poisson distribution function for each bin, which were fitted to obtain r bb , r cc , and r gg . The The tables also show the accuracies after correction of the total cross section. From a study of the recoil mass in the process of e + e − → eeH and µµH, the accuracy of the total cross section (∆σ/σ) was estimated to be 2.5% at 250 GeV [9,16]. For 350 GeV, we assumed an accuracy of 3.5% because the recoil mass measurement relies on the ZH process, whose cross section is inversely proportional to the square of the CM energy; thus, the accuracy of the total cross section measurement would be inversely proportional to the CM energy.
From Tables IV and V, we see that the Higgs cross section times branching ratio can be measured at about 1% for H → bb and 7 to 9% for H → cc and gg. The measurement is approximately 10 − 20% better at 350 GeV than at 250 GeV. The instantaneous luminosity at 350 GeV is 25% greater than that at 250 GeV according to the ILC beam parameters. Thus, for an equal running time, measurements at 350 GeV will give us about 20 − 30% better accuracy than those at 250 GeV. On the other hand, the accuracy of the BR to bb, ∆Br/Br(H → bb), is limited by the total cross section ambiguity; thus, measurement at 250 GeV gives us better results than that at 350 GeV. In the other decay channels, comparable BR measurements are possible even if the same integrated luminosities are assumed.

V. CONCLUSION
The measurement accuracy of the Higgs branching fractions, H → bb, cc, and gg, were evaluated at √ s = 250 GeV and 350 GeV. In terms of signal significance, √ s = 350 GeV yields better background suppression than √ s = 250 GeV for each channel. The combined results for measurement accuracies of the Higgs cross section times BRs (∆(σ · Br)/σ · Br) to H → bb, cc, and gg are 1.0%, 6.9%, and 8.5% at CM energies of 250 GeV and 1.0%, 6.2%, and 7.3% at 350 GeV, respectively, assuming the same integrated luminosity of 250 fb −1 . At the ILC, the total Higgs cross-section σ is measured using the Z recoil mass process. Using ∆σ/σ = 2.5% for 250 GeV and assuming it is 3.5% at 350 GeV, Higgs BRs (∆Br/Br) to bb, cc, and gg are derived as 2.7%, 7.3%, and 8.9% at CM energies of 250 GeV and as 3.6%, 7.2%, and 8.1% at 350 GeV. Therefore, we conclude that the Higgs cross section times BR (Br × σ) can be measured better at 350 GeV than at 250 GeV owing to the higher S/N at the higher energy. However, when the accuracy of the total cross section measurement by recoil mass measurement is considered, BR of H → bb can be measured better at 250 GeV, even if the integrated luminosity is the same at both energies.