Direct Higgs-top CP-phase measurement with $t\bar{t}h$ at the 14 TeV LHC and 100 TeV FCC

The study of the Higgs boson's properties is a cornerstone of the LHC and future collider programs. In this paper, we examine the potential to directly probe the Higgs-top interaction strength and CP-structure in the $t\bar t h$ channel with the Higgs boson decaying to bottom-quark pairs and top-quarks in the di-leptonic mode. We adopt the BDRS algorithm to tag the boosted Higgs and exploit the $M_2$-assisted reconstruction to compute observables sensitive to the CP-phase at the $t\bar{t}$ rest frame, where the new physics sensitivity can be enhanced. Performing a side-band analysis at the LHC to control the continuum $t\bar{t}b\bar{b}$ background, we find that the Higgs-top strength and CP-phase can be probed up to $\delta\kappa_t\lesssim 20\%$ and $| \alpha | \lesssim 36^\circ$ at 95% CL, respectively. We also derive that a similar analysis at a 100 TeV future collider could further improve the precision to $\delta\kappa_t\lesssim 1\%$ and $| \alpha| \lesssim 1.5^\circ$, where the CP-odd observables play a crucial role, boosting the sensitivity on the CP-phase.

Focusing on the 100 TeV Future Circular Collider (FCC) and semi-leptonic top pair final states, Ref. [29] shows that a combination of side-bands and tth/ttZ ratios can uplift the top-quark Yukawa strength determination with the tt(h → bb) channel. Inspired by this finding, we apply a similar methodology to control the background uncertainties for both the 14 TeV HL-LHC and 100 TeV FCC, deriving the top Yukawa CP-phase sensitivity. Instead of the semi-leptonic top pair final states, we consider the di-leptonic mode. Besides the significant background suppression, this final state benefits from the larger top quark spin analyzing power associated with charged leptons [30], resulting in stronger probes to the CP violation through spin correlations.
This paper is organized as follows. In Section 2, we present the theoretical setup and discuss the relevant observables sensitive to the Higgs-top CP-phase. In Section 3, we review the adopted reconstruction method for the di-leptonic top pair final state, which is relevant to build up prominent observables sensitive to new physics. We then move on to a detailed analysis in Section 4, where we derive the projected sensitivities to the CP-phase from the HL-LHC and FCC, exploring the side-bands and the correlation between the tth and ttZ uncertainties. Finally, we present a summary in Section 5.

Theoretical Setup
We parameterize the Higgs-top interaction as L ⊃ − m t v κ tt (cos α + iγ 5 sin α) t h , (2.1) where κ t is a real number that modulates the interaction strength, α is the CP-phase, and v = 246 GeV is the SM Higgs vacuum expectation value. The SM hypothesis displays κ t = 1 and α = 0. In contrast, a purely CP-odd particle would have α = π/2. Among the several probes sensitive to the CP-structure of the Higgs-top interaction in the tth channel, the observables defined in the tt center-of-mass frame play a special role [19]. First, this reference frame allows the definition of phenomenologically relevant CP-odd observables arising from fully anti-symmetric tensor products. A prominent example of this sort is the tensor product involving the two top quarks and the two final state charged leptons, that carry maximal top spin analyzing power, µνρσ p µ t p ν t p ρ + p σ − . This initially complex phenomenological probe can be opportunely simplified in the top pair frame to a simple triple product, p t · ( p + × p − ), being more suitable for collider studies. In particular, this mathematical property can be used to define the angular correlation between the charged leptons in the tt rest frame [19] which is also sensitive to the sign of the CP-phase. Second, there are additional robust observables that also display relevant sensitivity in this frame, such as the Collins-Soper Figure 1. The event topology considered in this paper. The blue dotted, the green dot-dashed, and the black solid boxes indicate the subsystems (b), ( ), and (b ), respectively.
angle θ * . The θ * observable is the production angle of the top with respect to the beam axis in the tt center-of-mass frame. While the definition of several phenomenological probes in the top pair center-of-mass frame is a desirable ingredient to uplift the sensitivity to the CP-phase, the presence of two neutrinos in the di-leptonic tth channel brings a challenge for the event reconstruction in a hadron collider environment. In the next section, we will describe a mass minimization method to efficiently overcome this obstacle. This approach has been proven robust against parton shower, hadronization, and detector effects [19].

Brief Review on the M 2 -Assisted Reconstruction of Top Pair
The event topology considered in this study is depicted in Fig. 1, where the blue dotted, the green dot-dashed, and the black solid boxes indicate the three subsystems (b), ( ), and (b ), respectively [31]. The Higgs decays to a pair of bottom quarks and the associated top quarks both decay leptonically. For such events with two missing particles, the on-shell constrained M 2 variable provides a good estimation for the unobserved invisible momenta and thus can be useful to discriminate the combinatorial ambiguities [32][33][34]. The M 2 [32] is defined as a (3 + 1)-dimensional version of the stransverse mass M T 2 [35]: where the actual parent masses, M P i (i = 1, 2), are considered in the minimization instead of their transverse masses, as is done in M T 2 . Note that the minimization is performed over the 3-component momentum vectors q 1 and q 2 of the two missing particles [32] under the missing transverse momentum constraint, / P T = q 1T + q 2T . We use the zero test mass (m = 0), as two missing particles are neutrinos in our study. At this point M T 2 and M 2 are known to be equivalent, in the sense that the resulting two variables lead to the same numerical value, M T 2 = M 2 max(M P 1 , M P 2 ) [32,36,37].
However, M 2 provides more flexibility in incorporating additional kinematic constraints. For example, in the tt-like production considered in this paper (tt+X, where the transverse momentum of X is known), we could use the experimentally measured W -boson mass, m W , and introduce the following variable in the (b ) subsystem: Here, the second constraint M t 1 = M t 2 requires the equality of two parent mass without use of a specific numerical value, while the true W mass is used in the third constraint Similarly, taking the top quark mass m t in the minimization, we can define a new variable in the ( ) subsystem: These distributions exhibit sharper end point structure in their kinematic distribution, due to additional mass information in the minimization, i.e., M [32,37]. While these mass-constraining variables are proposed for mass measurement originally, one could use them for other purposes, such as measurement of spins and couplings [33,38]. In our study, we use these variables to fully reconstruct the final state of our interest, with the unknown neutrino momenta obtained via minimization procedure. These momenta may or may not be true momenta of the missing neutrinos, but they provide important non-trivial correlations with other visible particles in the final state, which improves the reconstruction.
Based on Ref. [33], we take advantage of the kinematic features of the following 3dimensional mass space: where m , leading to positive values of three components in above 3-dimensional mass space. On the other hand, the wrong pairing could give either sign. By requiring that the partition which gives more "plus" sign as the "correct" one, we can resolve the two-fold ambiguity. Then, we treat the corresponding momenta of two missing particles, which are obtained via the minimization procedure, as "real" momenta of two missing neutrinos. If both partitions give the same numbers of positive and negative signs, we discard such events, since they are "unresolved cases". We note that we assign the negative sign for a partition, if a viable solution is not found during minimization. This is because the wrong pairing would fail more often than the correct one.
From Ref. [33], the efficiency of this method at the parton-level is known to be about 88%, including unresolved events with a coin flip, 50% probability of picking the right combination. Since we ignore those events to obtain a high-purity sample, the corresponding efficiency becomes 83%. In our analysis, we find that the final efficiency is about 78% including more realistic effects such as parton-shower and hadronization.
We use OPTIMASS [39] for the minimization to obtain momenta of two invisible neutrinos, following the reconstruction method described above. With the obtained neutrino momenta, now we can reconstruct momenta of W -bosons and top quarks for the measurement of the CP-phase. The fully reconstructed top quark momenta allow the Lorenz-boost transformation from the lab frame to the tt rest frame, which is crucial for our analysis.

Analysis
To directly probe the Higgs-top CP-structure, we explore the pp → tth production with the Higgs boson decay to bottom quarks, h → bb, associated with di-leptonic top quarks. We derive the new physics sensitivity for both the 14 TeV HL-LHC and 100 TeV FCC [40]. Our signal is characterized by two opposite sign charged leptons, = e or µ, and four btagged jets. The major backgrounds, in order of relevance, are ttbb and ttZ. The signal and background event samples are simulated with MadGraph5 aMC@NLO [41]. To include higher order effects, we rescale the tth, ttbb, and ttZ cross-sections with flat next-to-leading order k-factors derived with MadGraph5 aMC@NLO. The parton shower, hadronization, and underlying event effects are included with Pythia6 [42]. To secure top-quark spin-correlation effects, the top-quark decays are performed with MadSpin [43]. The adopted analysis strategy explores the boosted Higgs regime. Along with the background suppression [44,45], this kinematic configuration opportunely enhances the top-quark spin correlation effects [4]. We begin our analysis requiring two isolated and opposite charged leptons with p T > 20 GeV and |η | < 2.5. The hadronic part of the event is first reclustered using the Cambridge/Aachen jet algorithm with R = 1.2, requiring one or more boosted fat-jets with p T J > 200 GeV and |η J | < 2.5. The jet reclustering is performed with FastJet [46]. We demand one of the fat-jets to be Higgs-tagged via the BDRS algorithm [44], imposing that its two hardest subjets are b-tagged. Since the complete analysis displays four b-tags, we take advantage of the improvements reported by ATLAS, associated with the central tracking system for the operation at the HL-LHC, and use a work point with a large b-tagging efficiency [47]. We assume 85% b-tagging efficiency associated with 1% (25%) mistag rate for light-jets (c-jets), being consistent with the experimental studies from the ATLAS collaboration. Since the signal event does not display another hadronic heavy particle decay, we can safely suppress the underlying event contamination by using a smaller jet size for the rest of the event. Hence, after the Higgs tagging, we remove the fat-jet associated with the Higgs boson and recluster the remaining hadronic activity with the anti-k t jet algorithm with R = 0.4, p T j > 30 GeV, and |η j | < 2.5. We demand two extra b-tagged jets to control possible extra backgrounds. More details on the event selections are described in Tab. 1 and Tab. 2 for the 14 TeV HL-LHC and 100 TeV FCC, respectively.

CP measurement at 14 TeV HL-LHC
We present in Fig. 2     (1/σ) dσ/dφ lab  Higgs mass. In particular, this is a result of the BDRS filtering that promotes the invariant mass associated with the fat-jet to a robust observable efficiently controlling the pile-up effects [48].
We show in Fig. 3 the relevant CP sensitive probes for the tth samples used in this analysis, namely θ * (left), ∆φ lab (middle), and ∆φ tt (right). In the bottom of each panel, we show the ratio of non-zero CP phase to the SM prediction (α = 0). These distributions are presented after reconstruction of the top quark pair, with the selections outlined in Tab. 1. The ∆φ tt distribution exhibits the sensitivity on the sign of the CP-phase, while both θ * and ∆φ lab are CP-even variables. We observe that the tt reconstruction described in Section 3 is robust, resulting in observables with strong modulations for distinct top Yukawa CP-phases even at the hadron level.
To enhance the signal sensitivity, we perform a binned log-likelihood analysis exploring the Higgs candidate invariant mass profile, in the signal range m BDRS J ∈ [110, 135] GeV, together with the CP-sensitive observable θ * , defined at the tt center-of-mass frame. Since the considered tth channel with h → bb typically confronts a large ttbb background, which has a significant uncertainty [27,28], the final result displays relevant correlation with the considered background uncertainties. To estimate this effect, we derive the new physics sensitivity on the (α, κ t ) plane for two scenarios. In the first case, we assume that ttbb background rate has 20% of uncertainty, which is included as a nuisance parameter. The magnitude of the considered error is similar to the current experimental analyses [27,28]. For the second case, we assume an optimistic scenario with 5% error. The uncertainties on the tth and ttZ samples are assumed to be 10% for both scenarios [29]. The result of this analysis is presented in the left panel of Fig. 4. We obtain that the CP-mixing angle can be constrained to |α| 32 • at 68% CL at the HL-LHC for both scenarios. At the same time, we find that the sensitivity from κ t to the systematic error is more pronounced. While in the first scenario we can constrain the top Yukawa strength to δκ t 0.3, the more optimistic case leads to δκ t 0.15.   Figure 4. The exclusion at 68% (red) and 95% (green) CL in the α-κ t plane at the 14 TeV LHC with 3 ab −1 for a narrow (left) and wide (right) mass window. 20% systematics (5%) for ttbb is assumed in solid (dotted) curves, while 10% systematics is used for both ttZ and tth.   We note that in the absence of the shape information of the θ * distribution, there exists a flat direction in the (α, κ t ) plane, irrespective of the considered uncertainty scenarios, where two red (or green) curves meet tangentially as shown in the left panel of Fig. 4. In other words, along the flat direction, there is no constraint on the values of κ t and α. The constraint stems from the shape of θ * distribution. Therefore, along that flat direction, the limits on (α, κ t ) will not change and the considered uncertainties of ttbb do not affect the fit. Further, when κ t ∼ 0.4, there is no sensitivity on α for the case with large systematics (20% for ttbb). This is because the signal rate is suppressed for κ t around that region, and thus we gain no information from θ * distribution, while the large systematics from the ttbb can compensate the total event rate. When κ t is even smaller, the signal rate is further suppressed, and the fluctuation from background alone cannot explain the total event rate, excluding the small κ t region.
This observation in the left panel of Fig. 4 can be more clearly understood by studying the separate exclusions. In Fig. 5, we show the individual exclusion at 68% CL from the rate-only measurement (in magenta) and the shape-only measurement (in blue). The exclusion with the rate-only does not have significant sensitivity on α, since the signal rate remains roughly fixed as the CP angle α varies. This arises from a combination of two effects that approximately cancel out. While the inclusive tth production cross-section decreases when scanning α from 0 to π/2 [49], the signal acceptance increases for larger α due to a sizable difference in kinematics between CP-even (α = 0) and CP-odd (α = π/2) in the boosted regime [19]. The two factors roughly cancel, leading to suppressed differences in event rate for distinct α. On the other hand, the shape-only exclusion exhibits sensitivity on α. Hence, one can recover the general profile of the exclusion in Fig. 4 by combining the four curves in Fig. 5.
To illustrate how to reduce the systematics for ttbb in a realistic measurement, we enlarge the mass range of the Higgs candidate to m BDRS J ∈ [50,150] GeV. In this way, the events outside the Higgs peak, which mainly come from ttbb, can be used together with the shape of m BDRS J distribution of ttbb from MC simulation within the binned loglikelihood method. By fitting to a broader range of m BDRS J , we have a better control of the uncertainties of ttbb. The results are shown in the right panel of Fig. 4. We find that this analysis depletes the influence of the systematic uncertainties, leading to similar results for the two considered systematic uncertainty scenarios. The obtained limits are |α| 26 • (36 • ) and δκ t 0.12 (0.2) at 68% (95%) CL. Using the wider mass window, the log-likelihood analysis takes full advantage of the shape information of tth and ttbb events.
It is illuminating to analyze the number of signal and background events in the Higgs peak and side-bands to infer the uncertainty suppression. Around the Higgs peak m BDRS  N sideband =3,543), which can be estimated from MC or from the shape of the inferred background distribution using the side-bands. As an illustration, we fix κ as κ = N B /N sideband . Then the uncertainties for N S is calculated as which has significantly lower uncertainties for N S with the aid of the events from the sidebands [29]. Similar background control regions are actively used in experimental analyses, for example, for the h → γγ channel [50], and W h/Zh production in the h → bb decay channel [51].

CP measurement at 100 TeV FCC-hh
The Higgs-top CP-phase measurement would render remarkable gains at a future 100 TeV collider due to the immensely increased statistics. In Fig. 6, we show the cross-section for pp → tth and pp → ttZ production as a function of the collider energy. We require the Higgs and Z bosons in the boosted regime, p T h,Z > 200 GeV, and account for their branching ratios to bottom quarks, BR(h, Z → bb). While the tt(h → bb) and tt(Z → bb) processes are phase space suppressed at the 14 TeV LHC, with limited production cross-sections of 0.04 pb and 0.02 pb, the 100 TeV collider would result in one hundred-fold enhancement, with a cross-section of 3.8 pb and 2.1 pb, respectively. Considering the leptonic top pair decay, this corresponds to an uplift in the number of events for the tth signal from 5.8 × 10 3 at the HL-LHC with 3 ab −1 to 5.5 × 10 6 at 100 TeV with 30 ab −1 . This estimate shows that the 100 TeV FCC, with a combination of the increased energy and luminosity, can push further forward precision measurements with the tth channel. Instead of focusing on the semi-leptonic top pair mode, as in Ref. [29], we explore the di-leptonic tth system. In  Figure 6. Cross-section for pp → tth and pp → ttZ production at the parton level as a function of the pp collider energy. We require the Higgs and Z bosons in the boosted regime, p T h,Z > 200 GeV, and account for their branching ratios to a bottom-quark pair, BR(h, Z → bb). Top quarks are set stable. addition to the extra background suppression, this channel provides a better probe to the top polarization, using the charged leptons. The larger spin analyzing power associated with the charged leptons results in the stronger CP-violation observables, such as ∆φ tt , strengthening our CP-sensitivity.
We begin our discussion with the fat-jet invariant mass m BDRS J distribution for the signal and background samples at the 100 TeV FCC-hh with 30 ab −1 of data as shown Fig. 7 (for the full hadron level analysis). Note the O(10 3 ) fold enhancement in event rate compared to that in Fig. 2 for the 14 TeV. The full stacked histogram is presented in black. CP sensitive angular variables are shown in Fig. 8, where we present distributions of θ * (left), ∆φ lab (middle), and ∆φ tt (right) at both 14 TeV and 100 TeV for comparison. In the laboratory frame, ∆φ lab distributions look similar, while θ * and ∆φ tt tend to be slightly  forward or backward in the tt rest frame. However, the ratio of new physics contribution to the SM prediction remain similar, as shown in the bottom of each panel.
As mentioned previously, one main difference between 14 TeV LHC and 100 TeV FCC is the significant increase in the rate of signal and backgrounds. Especially both tth and ttZ (i) result in hugely improved statistics, (ii) have similar production mechanisms, and (iii) probe comparable energy scales. Hence, their uncertainties are highly correlated [29]. The theoretical uncertainties in the signal cross-section, that are in the range 7-10% at 100 TeV collider, can be depleted to approximately 1% in terms of a ratio measurement [29]. This reduction of the uncertainties is also depicted in Fig. 9 for the 14 TeV LHC (left) and the 100 TeV FCC (right), where we only consider the precision on the κ t measurement by fixing α = 0. We considered two different scenarios: (1) binned log-likelihood (red-solid); and (2) binned log-likelihood with tth and ttZ correlated in uncertainties (red-dashed). At the 14 TeV LHC, whether we consider the correlation between the uncertainties of tth and ttZ (i.e. we use the same nuisance parameter for the uncertainties of tth and ttZ), does not significantly affect the results, as the uncertainties are dominated by the continuum ttbb background. However, the situation improves dramatically at the 100 TeV FCC. The   Figure 10. 68% (red) and 95% (green) CL limits on the α-κ t plane for the 100 TeV FCC with 30 ab −1 without (left) and with (right) ∆φ tt . For the solid curves, 10% systematics is used for both tth and ttZ individually, while for the dashed curves, the uncertainties for tth and ttZ are assumed to be correlated. 20% systematics is used for ttbb for both scenarios.
scenario (1) is systematically limited around δκ t 5% due to the 10% systematics on the rate of the tth. When considering the correlation between tth and ttZ in scenario (2), the κ t measurement improves and can reach sub-percentage precision. Note that our results for κ t at a 100 TeV collider, δκ t 0.5 − 0.7%, are consistent with those from Ref. [29], that explores the semi-leptonic top pair final state.
In light of the aforementioned improvements on the κ t sensitivity, we perform a similar analysis considering both κ t and α. With the uplifted cross-section and enlarged luminosity, the 100 TeV FCC can boost the sensitivities on (α, κ t ), using the binned log-likelihood method, as summarized in Fig. 10. We choose a wide mass window, m BDRS ∈ [50, 150] GeV for better control of the continuum ttbb background, along with θ * in the left panel. In both panels, the solid curves correspond to the case with 20% systematics for ttbb and 10% systematics for tth and ttZ, while we assume tth and ttZ uncertainties are correlated for the dashed curves. It is clear that, at a high luminosity, the solid curves are limited by the systematic uncertainties, similarly to the solid red line scenario in the right panel of Fig. 9. However, by assuming that the systematics of tth is correlated with ttZ, the precision can be improved, as shown by the dashed curves, which can achieve δκ t 1% and |α| 3 • at 95% CL.
Finally, extending the analysis to the (m BDRS J , θ * , ∆φ tt ) plane, we find that the CPodd observable ∆φ tt brings additional improvement on the measurement of α by a factor of 2, |α| 1.5 • , as shown in the right panel of Fig. 10, which highlights the importance of the CP-odd observable in the tt rest frame.

Summary
The discovery of the Higgs boson at the LHC jump-started a comprehensive program of precision measurements for the Higgs couplings. In this context, the direct measurement of the Higgs-top coupling strength and CP-phase would have a significant impact on our understanding of the Yukawa sector and possible new sources of CP violation. In this paper, we have examined the direct probe of the top quark Yukawa coupling and the Higgs-top CP-structure in the tth production, with the Higgs boson decaying to a bottom pair and top-quarks in the di-leptonic mode. We have utilized several state-of-the-art strategies to reconstruct the final state with the missing transverse momentum and to control systematic uncertainties. We take advantage of the BDRS algorithm to tag the boosted Higgs, and exploit the M 2 -assisted reconstruction to compute observables sensitive to the CP-phase at the tt rest frame. Our log-likelihood analysis, using the side-band control region, takes full advantage of the shape information of the signal and background. We have shown that the proposed analysis significantly reduces the uncertainty in the CP-phase measurement and the Higgs-top Yukawa coupling. Our results show that the Higgs-top CP-phase (α) can be probed up to |α| 36 • and the top Yukawa (κ t ) up to ∼ 20% accuracy (95% CL) at the HL-LHC, as shown in Fig. 4. A similar analysis at a 100 TeV future collider further improves the precision on the coupling modifier and CP-phase to δκ t 1% and |α| 3 • , respectively, as shown in Fig. 10. We find that the CP-odd observable ∆φ tt augments the precision by a factor of 2, |α| 1.5 • . We note these limits represent only an upper bound, that can be further enhanced via the combination of the other relevant top-quark and Higgs decays from the tth production.