Probing light-quark Yukawa couplings via hadronic event shapes at lepton colliders

We propose a novel idea for probing the Higgs boson couplings through the measurement of hadronic event shape distributions in the decay of the Higgs boson at lepton colliders. The method provides a unique test of the Higgs boson couplings and of QCD effects in the decay of the Higgs boson. It can be used to probe the Yukawa couplings of the light quarks and to further test the mechanism of electroweak symmetry breaking. From a case study for the proposed Circular Electron-Positron Collider, assuming a hypothesis of SM-like theory, light-quark couplings with a strength greater than 9% of the bottom-quark Yukawa coupling in the standard model can be excluded.


Introduction
The successful operation of the CERN Large Hadron Collider (LHC) and the ATLAS and CMS experiments have led to the discovery of the Higgs boson, the final piece of the standard model (SM) [1,2] of particle physics. Future high precision experimental investigations on the couplings of the Higgs boson are required for a refined understanding of the nature of electroweak symmetry breaking and for searches for possible new physics beyond the SM. Higgs boson couplings can be measured to percent level precision at future lepton colliders, e.g., the International Linear Collider [3] and the Circular Electron-Positron Collider (CEPC) [4], or with less precision at the high luminosity run of the LHC (HL-LHC) [3]. In addition to high precision, e + e − colliders provide direct access to all possible decay channels of the Higgs boson, including invisible decays, in a clean environment. They can also measure the total width of the Higgs boson in a model-independent way.
An important prediction for the SM Higgs boson is that the couplings to other SM particles are proportional to their mass. It will be essential to test this relation experimentally. In the SM the Yukawa couplings of the Higgs boson to light quarks q (u, d, or s) are negligibly small due smallness of their mass. There have been, however, theoretical models that have predicted enhanced light-quark Yukawa couplings [5,6]. Experimentally, if such an enhancedcoupling scenario is observed, it will must indicate the presence of new physics; the quarks also receive masses from sources other than the Higgs boson in order to maintain a relatively small mass. However, a direct measurement of light-quark Yukawa couplings is impossible at hadron colliders due to the huge QCD backgrounds for hadronic decays of the Higgs boson. Indirect constraints can be obtained based on different kinematic distributions induced by gluon and quark production mechanisms [7][8][9] or through rare decays of the Higgs boson [10][11][12][13][14][15].
At lepton colliders, the main measurement difficulty is separation of the qq decay channel from the loop-induced gluon channel, both of which generate similar final states of two untagged jets (jj). In this work, we propose a novel idea of using hadronic event shape observables from the Higgs boson decays to separate qq from gg channels and to measure the light-quark Yukawa couplings at lepton colliders. Another possibility for lepton colliders involves utilizing discrimination of quark jets and gluon jets [16]. We leave this for future investigations. The idea is motivated by the measurement of the QCD coupling constant at LEP from hadronic event shape distributions. 1 Intuitively, in that case the next-to-leading order QCD corrections, ∼ O(α s ), generate the distribution in three-jet region. A change of α s can induce changes of the event shape distributions, e.g., the position and height of the peak. Similarly, in the case of the Higgs boson decay, the real radiation is of O(C X α s ), where C X is the QCD color factor, i.e. C A = 3 for decay to gluons and C F = 4/3 for decay to quarks. Thus, a measurement of event shape distributions can reveal the average color factor and the ratio of decay branching ratios (BR) of the gluon and the quark channel.
In the remaining paragraphs we demonstrate theoretically how the distributions differ for quark and gluon channels, and we consider a scenario of the CEPC and demonstrate a precision of < 1% can be achieved on the measurement of the decay BR to light quarks.

Event shapes
There have been 6 major observables of hadronic event shapes measured at LEP and used for the extraction of α s (M Z ), including thrust T (or τ = 1 − T ), heavy hemisphere mass M H , C parameter, total hemisphere broadening B T , wide hemisphere broadening B W , and the Durham 2 to 3-jet transition parameter y D 23 [19,20]. For example, the thrust is defined as where p i is the three-momentum of particle i and the summation runs over all measured particles. One advantage of the global event-shape observables is that their distributions can be calculated systematically in perturbative QCD [21,22]. In case of two-body hadronic decay, at the leading order (LO), the thrust distribution is a δ function at τ = 0. Finite thrust values are generated through high-order QCD radiations. Soft and collinear emissions introduce large logarithmic contributions ∼ α n s ln τ 2n−1 /τ at small-τ , the deep two-jet region. They must be resummed to all orders in QCD to make reliable predictions, e.g., the state of art Next-to-Next-to-Next-to-leading logarithmic (N 3 LL) resummation [23][24][25] for Z/γ * → qq in the extraction of α s (M Z ). Meanwhile, in the three-jet region the resummed results can be further matched with the fixed-order results, e.g., the Next-to-Next-to-leading order (NNLO) calculation for Z/γ * → 3 jets production [26,27]. Usually, for calculations done at parton level, a correction factor due to hadronization effects needs to be applied when comparing to experimental data, which can be estimated through various event generators [28][29][30][31].
To our best knowledge, no predictions at comparable precision exist for hadronic decays of the Higgs boson, although most of the ingredients are already available. Predictions at N 3 LL+NNLO level for the Higgs boson are expected in near future. In this study, we calculate the event shape distributions using the MC event generator Sherpa 2.2 [31] with the effective coupling approach of the Higgs boson. We use the CKKW scheme [32], matching parton showers with tree-level matrix elements with up to three jets, which is effectively partial next-to-leading-logarithmic and leading-order accuracy. The hadronization corrections are included automatically in Sherpa simulation through hadronization models and decays of hadrons.   1 shows the normalized distribution of the variable thrust for several different hadronic decay channels of the Higgs boson, including gg, qq, bb, and W (qq)W * (qq). We also plot the distribution in the hadronic center-of-mass (CMS) frame for e + e − → Z * /γ * → qq with a CMS energy of 125 GeV and e + e − → Zqq with a CMS energy of 250 GeV and requiring a recoil mass of 125 GeV of the Z boson as comparisons. The distribution peaks at τ ∼ 0.02 for light-quark decay channel. The peak shifts to τ ∼ 0.05 for the gluon channel, corresponding to a scaling of roughly C A /C F . The distribution is much broader for the gluon case due to the stronger QCD radiation. The distribution for the bb channel is very close to the qq case, except at very small τ , where the mass and hadronization effects become important. For the W W * channel there exist already four quarks at LO and the distribution is concentrated in the large-τ region. The distribution for qq from Z * /γ * differs from that for the Higgs boson in the three-jet region because of the different spin. The distribution for qq in Zqq production has a slightly higher peak than the case of Z * /γ * mostly due to the different hard radiation patterns and different tunings of parton showering.
In the lower panel of Fig. 1, we plot the estimated theoretical uncertainties of the normalized thrust distribution for the decay to gluons. Theoretical uncertainties due to the truncation of the perturbation series are conventionally estimated through QCD scale variations. These include variations due to the change of the renormalization scale and the matching scale [33]. The latter variation mostly affects the distribution in the large-τ region. As one includes higher-order resummation and fixed-order matching contributions, the scale variations will decrease. We assume a N 3 LL+NNLO calculation for the Higgs boson decay to gluons will be available and estimate the scale variations based on the calculation for Z/γ * [23, 34] using a scaling factor of C A /C F . Since the distribution is normalized, the uncertainties are small in the peak region. The uncertainty due to the α s (M Z ) input is negligible if the world average [35] is used.
There are also uncertainties due to the hadronization model used. Sherpa uses a cluster fragmentation model implemented in AHADIC++ [36] by default with which the results in Fig. 1 are simulated. In Fig. 2 the left plot shows the size of hadronization corrections by taking ratio of the normalized distributions with and without turning on the hadronization module in Sherpa. We can see roughly three patterns of the hadronization corrections in Fig. 2. All distributions initiated from qq and bb final states receive similar corrections. The distributions are enhanced by more than 30% around the peak region and are greatly reduced when thrust goes to one as a balance. That can be understood since the hadronization effects will distribute energies away form the jet axis. Shape of hadronization corrections for distribution of H(gg) is much broader and shifted to the right side as comparing to qq cases. Lastly the distribution of H(W W ) is further suppressed at small τ region by hadronization corrections. To estimate uncertainties due to the hadronization corrections we recalculate all the distributions with the alternative hadronization model in Sherpa by linking to the Lund string fragmentation in PYTHIA 6.4 [37]. We plot ratios of predictions from the two different hadronization models in the right plot in Fig. 2. The differences can be large for thrust greater than 0.9, about +10(-5)% for H(bb)(H(gg)) at thrust ∼ 0.95, and become even larger when entering fully non-perturbative dominant region. We can take above differences as the size of hadronization uncertainties which are summarized in Table 1 for two representative bins of τ . All the qq cases have small uncertainties in the peak region. Relative signs in Table 1 indicate the uncertainties in different bins are either fully correlated or anti-correlated. Though hadronization uncertainties of all channels discussed are derived from the same models, we decorrelate the uncertainties of different channels to be conservative, which are described by individual nuisance parameters. Below, we will discuss the possibility of measuring the distributions discussed above at a lepton collider and the sensitivity of these measurements to the light-quark Yukawa couplings.

CEPC
A circular electron-positron collider has been proposed recently with a center-of-mass energy of 250 GeV and a total integrated luminosity of 5 ab −1 [4]. It can serve as a Higgs factory with the dominant production channel being the associated production with a Z boson, with a total cross section of about 212 fb [38]. One great advantage of the e + e − collider is that the Higgs boson events can be selected by measuring the recoil mass m recoil , e.g., for ZH production with the Z boson decay into a pair of visible fermions ff , where E ff and m ff are the total energy and invariant mass of the fermion pair. The recoil mass spectrum should present a sharp peak at the Higgs boson mass. The Higgs boson events can be selected with a high signal to background ratio independent of the decay modes of the Higgs boson. Using the kinematic information of the recoil system, we can boost all decay products back to the rest frame of the Higgs boson and measure the event shape distributions in that frame. Table 2 summarizes the decay BRs of the hadronic decays of the SM Higgs boson and the expected numbers of events at the CEPC through ZH production, with the Z boson decaying into electron or muon pairs. As one can see, the qq (light quarks) channel is negligible in the case of the SM Higgs boson. All the hadronic channels in Table 2 contribute to the distribution of the event shapes. We must carefully select the one that we are interested in, which is the jj (gg+qq) channel. To suppress the heavy-quark contributions, one can use flavor tagging of the heavy quarks, b and c, a technique which is well established at hadron and lepton colliders [40]. It has been shown that, assuming an efficiency of 97.2% for identification of gluon or light quarks j, the misclassification rate of a b or c quark to j at CEPC could reach 8.9% and 40.7% respectively [4,41]. Since there are two quarks/gluons from the decay, by requiring both of them untagged one can remove 99(84)% of the bb(cc) background while only changing the signal jj by 6%. There are also backgrounds from other SM processes, especially from the SM Zqq production, which have a flat distribution in the recoil mass. After applying further selection cuts, e.g., on recoil mass, dilepton invariant mass, and the polar angle of the Higgs boson, we estimate a total signal (jj) efficiency of 50% [4,38]. We assume a total qq-like background of 30% of the signal rate from Higgs boson decays to bb, cc and the SM Zqq production of which about 10% is from bb and cc as can be calculated from the misidentification rates and various decay BRs. The normalization of Zqq background is estimated according to Fig. 7 in Ref. [38]. A second category of backgrounds are from decays to W W * , ZZ * and further to four quarks. Since they are away from the peak region of our signal, as shown in Fig. 1, they do not have a large impact to the measurement of the light-quark couplings. We estimate a total rate of 60% of the signal for these fourquark backgrounds after all selection cuts. They can be further suppressed if additional cuts on dijet invariant masses are used. Noted we do not impose any selection cuts directly in our calculations of the signal and backgrounds but rather estimate their effects on signal and background normalizations.  Table 2. The decay branching ratios of the SM Higgs boson with a mass of 125 GeV to different hadronic channels [39] and the corresponding expected numbers of events in ZH production, with subsequent decays at a e + e − collider with √ s = 250 GeV and an integrated luminosity of 5 ab −1 . Only decays of the associated Z boson to electrons and muons are included. h represents any of the quarks except the top quark and q are light quarks.
Including both the signal and backgrounds, the event shape distributions at hadron level can be expressed as (3.2) where N S , N B,1 , N B,2 , and N B,3 are the expected number of events for the signal, the qq-like backgrounds from heavy quarks in Higgs decay and from Zqq production, and the four-quark background, respectively. The interference effects between the Higgs gluonic and fermionic couplings from higher-orders in QCD are suppressed by an additional factor of quark mass f ZZ(qq) is the normalized distribution for Zqq production. We simply assume a shape of f H(bb) for the heavy-quark components of the backgrounds. Impact of using the actual mixture of bottom-and charm-quark distributions are small. We take into account 11 independent systematic uncertainties for the thrust distribution. Two of them are the perturbative uncertainties of the normalized distribution f H(gg) , as shown in Fig. 1. Each of them is (anti-)correlated among all bins. We include five systematic errors for various normalized shapes in Eq. (3.2) due to the hadronization uncertainties as discussed earlier. The other four are for the normalization of the signal N S and of the backgrounds N B,1 , N B,2 , and N B,3 in Eq. (3.2). We do not assume any correlations among them. Normalization uncertainties on each of the backgrounds are set to 4%. Normalization of the signal can be measured separately using hadronic decays of the Z boson in ZH production with the Higgs boson decay to jj, and the uncertainty is estimated to be 3% [4]. We have not included any perturbative uncertainties for the normalized shapes of qq signal and various backgrounds. We estimate their effects to be comparable or smaller than those of hadronization uncertainties with future high precision calculations.
We study the expected exclusion limit on r, as a function of λ, assuming the decay to qq vanishes. We generate a large ensemble of pseudo-data according to Eq. (3.2) with the hypothesis of r = 0. Systematic uncertainties are treated using nuisance parameters. Statistical fluctuations are included according to Gaussian distributions based on the expected event rates in each bin. For each of the pseudo-data we determine the exclusion limit on r by using the profiled log-likelihood ratio q µ as our test-statistic [42] together with the CL s method [43]. Fig. 3 shows the expected 95% CL s exclusion limit on r (in the dashed line) from the thrust distribution. The colored bands indicate the 1σ and 2σ fluctuations of the expected exclusion limit. In case the true theory is the SM, the expected exclusion limit on r can reach 0.056, which is the intersection of the curve and the vertical line. That corresponds to a decay BR of 0.48% to qq. In term of the Yukawa coupling strength, that implies y q < 0.091y b for any of q = u, d, s, with y b being Yukawa coupling of the bottom quark in the SM.
The sensitivity on r can be understood as below. There are two major discrimination powers when testing finite r against the SM case. One is from the qq-peak region and the other is from the gg-peak region. If neglecting statistical errors, in the qq-peak region, a finite r (an enhancement) can only be mimic by a systematic shift of N B,1 (2) . Thus the 95% CL s no sys. +pert. +nor. +had. limit on r 0.036 0.040 0.045 0.056 limit on r (lumi.×10 3 ) 0.0012 0.0014 0.018 0.019 Table 3. Impact of various systematic uncertainties on the expected 95% CL s exclusion limit of r with λ = 1 and a luminosity of 5 ab −1 or 5000 ab −1 . Numbers correspond to the exclusion limit without any systematic errors, and adding various systematic errors in succession.
limit approximately corresponds to r ≈ 0.3 * 0.04 * 1.64 ≈ 0.02. On the other hand, in the gg-peak region, a finite r (a deficit) can only be compensated by a systematic shift of N S . The limit is about r = 0.03 * 1.64 ≈ 0.05. When combining both the limit is better than 0.02. After considering the statistical fluctuations and other systematic errors the limit increases to 0.056 as shown in Fig. 3. We further illustrate impact of various systematic uncertainties on the exclusion limit of r in Table 3. We show numbers correspond to the exclusion limit without any systematic errors, and adding various systematic errors in succession. We can see the uncertainty is dominated by the statistical error. The hadronization uncertainties show a moderate impact. For comparison we also list the results with a data sample of 1000 times larger.
We can also include invisible decays of the associated Z boson in the analysis. They have a total rate 3 times larger than to electrons and muons and suffer from a relatively larger Zqq background due to a degradation of the signal-background separation power from the recoil mass. Thus, we simply assume that once the νν channels are included, both the signal and backgrounds will double. The expected limit is again plotted in Fig. 3, which can reach 0.047 with the SM assumption.
In principle, several of the backgrounds, e.g. f H(bb) and f ZZ(qq) can be measured directly in a controlled region from independent data sample. We briefly comment on the possibilities in below.
• Heavy-quark components: in this case one can require the quark/gluon being flavor tagged rather than untagged. With a typical b-tagging efficiency of 60%, the misidentification rate for light flavors are negligible [40] not mentioning further suppression from the Higgs boson decay branching ratio. Thus we can arrive at a pure sample of bb of around 2 × 10 4 events for CEPC. That corresponds to an uncertainty of 1.5% from statistical fluctuations for the bin [0.02, 0.03] of τ , comparable to the number in Table 1.
One question needs to be addressed is how various flavor-tagging algorithm may change distributions of the event shape observables.
Similar exclusion limits can be set based on other event shape observables which are summarized in Fig. 4 for λ = 1. Definitions of the event shape observables shown in Fig. 4 can be found in Refs. [19,20]. Here, only the statistical error and the systematic uncertainty on the signal and background normalizations are included in the analysis. Thus the limits shown here are optimistic concerning various theoretical uncertainties. As already seen in Table 3 various theoretical uncertainties contribute equally as the statistical uncertainty for the thrust distribution. The binnings used in the analysis for all other distributions are chosen to be the same as in Ref. [44]. All distributions show a similar sensitivity to the light-quark Yukawa couplings.  Figure 3. Expected 95% CL s exclusion limit on r and the 1σ and 2σ fluctuations as a function of the total cross section of the Higgs boson decay to jj normalized to the SM value. The dot-dashed line is the expected exclusion limit when invisible decays of the Z boson are also included in the analysis.

Discussion and summary
It is interesting to compare our sensitivity to the light-quark Yukawa couplings with the projection of the LHC and HL-LHC. Ref. [9] claims an expected 95% CL limit of the Yukawa couplings y u,d < 0.4y b , for LHC 13 TeV run with a total luminosity of 300 fb −1 , based on analyzing the p T distribution of the Higgs boson. Ref. [8] reports a sensitivity of y s ∼ 0.52y b for the strange quark at the HL-LHC. Comparing with results above, our method provides a much stronger sensitivity of y u,d,s < 0.091y b (95% CL s ). The major limitation on probing the light-quark Yukawa couplings at the LHC/HL-LHC is that the gg parton luminosity is much  larger than the qq ones for a Higgs boson mass of 125 GeV. Thus, a small downward shift of the gg induced cross sections comparing to experimental data, either due to the experimental or theoretical uncertainties, can allow for a much larger light-quark Yukawa coupling. We also comment on the comparison of our proposal with the possibility of using gluon/quark jet discriminators. On the theory side, the event shape distributions can be calculated systematically in perturbative QCD, and the theoretical uncertainties are under control. Experimentally, the hadronic even-shape observables have been studied extensively at LEP. The experimental systematics are well understood. By comparing with the experimental results on the α s (M Z ) measurement [44,45], we found the sensitivity obtained in this study is realistic. Even after all the experimental systematics are included, the expected exclusion limit should not change greatly.
In summary, we have proposed a novel idea for measuring the light-quark Yukawa couplings using hadronic event shape distributions in addition to the conventional measurement of Higgs couplings at lepton colliders. We show that for a e + e − collider with a center-of-mass energy of 250 GeV and an integrated luminosity of 5 ab −1 one can expect to exclude a decay BR of 0.48% for the Higgs boson decay to qq, at 95% CL s , with q be any of the u, d, s quarks, assuming a hypothesis of SM-like theory and only modifications to the Higgs boson couplings to gluon and light quarks. That corresponds to an exclusion limit on a light-quark Yukawa coupling of about 9% of the strength of the bottom quark coupling in the SM.
conversations, and J. Huston for proofreading of the paper. The work of JG is sponsored by Shanghai Pujiang Program.