Resurrecting bb¯h\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ b\overline{b}h $$\end{document} with kinematic shapes

The associated production of a bb¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ b\overline{b} $$\end{document} pair with a Higgs boson could provide an important probe to both the size and the phase of the bottom-quark Yukawa coupling, yb. However, the signal is shrouded by several background processes including the irreducible Zh, Z →bb¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ b\overline{b} $$\end{document} background. We show that the analysis of kinematic shapes provides us with a concrete prescription for separating the yb-sensitive production modes from both the irreducible and the QCD-QED backgrounds using the bb¯γγ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ b\overline{b}\gamma \gamma $$\end{document} final state. We draw a page from game theory and use Shapley values to make Boosted Decision Trees interpretable in terms of kinematic measurables and provide physics insights into the variances in the kinematic shapes of the different channels that help us complete this feat. Adding interpretability to the machine learning algorithm opens up the black-box and allows us to cherry-pick only those kinematic variables that matter most in the analysis. We resurrect the hope of constraining the size and, possibly, the phase of yb using kinematic shape studies of bb¯h\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ b\overline{b}h $$\end{document} production with the full HL-LHC data and at FCC-hh.


Introduction
After the discovery of the Higgs at the LHC [1,2], the measurement of its couplings to the other Standard Model (SM) particles through its production and decay channels has been an ongoing quest [3]. The subsequent measurement of the tth signal marks the direct probe of the coupling of the Higgs to the third-generation fermions [4,5]. Despite the large branching ratio of the Higgs decaying to the bottom pair, the h → bb eluded measurement until much later through the V h, h → bb channel [6][7][8], bringing about sensitivity to the bottom-quark Yukawa, y b , as well. The current sensitivity on κ b = y b /y SM b is of the order of 7% and it is expected to improve to 2.2% at HL-LHC [9]. Other direct probes to the Higgs couplings to the bottom quark can be envisaged through the measurements of the JHEP04(2021)139 associated production of the Higgs with the bottom quark in the associated bh or bbh production channels.
The bbh signal is sizable at the LHC with a cross-section of about 0.5 pb. Calculations of the bbh cross-section focusing on the contribution proportional to y b has been a developing effort, with improvements from including higher order contribution, matching and resumming bottom-mass effects, evaluation of four and five-flavor matching scheme and optimal scale choices for Monte Carlo implementation with parton showers [10][11][12][13][14][15][16][17][18][19][20][21]. Despite the large inclusive rate, bbh production remains a challenging channel for experimental measurements. Given that the bulk of the signal contains only soft p T b-jets, which evade selection cuts, the signal drops by orders of magnitude after two, or even one, b-jets are explicitly tagged. It is even more challenging to achieve sensitivity on the y b coupling among all the amplitudes contributing to bbh production. Even with the large amount of luminosity that will be available at HL-LHC, the y b mediated diagrams contribute only a small fraction of the total cross section to the inclusive rate [20,21].
In this work we shall appeal to kinematic shapes, machine learning and game theory to glean information on y b from the associated production of the Higgs with a bottom-quark pair. In doing so we would like to clarify the following open questions: • The irreducible background coming from Zh production obfuscates the measurement of y b [21] in the bbh channel. Can multivariate analysis using kinematic shapes overcome this hurdle?
• Machine learning tools are infamous for providing little insight into the underlying physics. Can using high-level kinematic variables (as opposed to momenta fourvectors) be used in conjunction with interpretable machine learning tools to understand what variables are important in orchestrating a signal to background separation at HL-LHC and pave a path to measuring y b from bbh production?
• What are the prospects of making a far better measurement of y b from the bbh channel at FCC-hh?
• What can be said about the constraints on the rescaling of y b , κ b , and its phase attributed to new physics (NP), from bbh production?
Armed with these questions we proceed as follows. In section 2 we briefly explore the possibility of measuring y b at the future lepton colliders through bbh production before moving on to describing in detail the simulation of events for HL-LHC and FCC-hh. In section 3 we delve into describing the machine learning algorithms that we use to explore higher dimensional kinematic shapes. We also provide an introduction to Shapley value [22] and measures derived from it to make machine learning interpretable. As an example of the effectiveness of Boosted Decision Trees (BDT) augmented with Shapley values, we show that the Zh background can be discriminated from the signal. In section 4 we further sharpen our analysis for HL-LHC including all irreducible backgrounds and the reducible QCD-QED bbγγ dominant background. We then perform a similar analysis for FCC-hh. We round off the section with a discussion of the manner in which including systematics JHEP04(2021)139 changes our conclusions. In section 5 we extend our analysis to estimate bounds on κ b and its CP phase. The adoption of the simple κ b framework in presenting our results serves as a common and convenient tool in comparison with existing studies from other y b sensitive channels. Ultimately, our analysis should be included in a truly global EFT analysis to fully capture the complex structures that new physics might bring about and that could not be captured by a single parameter. We touch base with constraints from the EDM measurements to complete our analysis. Various necessary mathematical relations and the discrimination between the y b -and y t -driven channels are presented in the appendices. All codes necessary for reproducing our results including the simulated events are made available in a Github repository: https://github.com/talismanbrandi/Interpretable-ML-bbh.git.

Probing the bbh channel at future colliders
In this section we will discuss the various prospects that arise for studying the bbh channel at future colliders. First, we will give a short overview of what might be possible at future lepton colliders like the FCC-ee, CEPC and muon colliders. Then we move on to a more detailed study of the prospects at HL-LHC and FCC-hh which will form the bulk of the work.

Prospects at future lepton colliders
At a foreseeable electron-positron collider, the bbh signal could be produced above the threshold of about 135 GeV. Once the center-of-mass energy is sufficient for the Zh Higgsstrahlung process, the dominant production of bbh is through the decay of resonant Z to a bottom-quark pair, which is about five orders of magnitude larger than the subdominant y b -sensitive production channel. With such a large irreducible background, the channel proportional to y 2 b , with its cross-section of 2.6 × 10 −4 fb, will be completely overwhelmed by that of the irreducible background of about 35 fb from the Higgs-strahlung process.
Unlike the Higgs-strahlung process, for which the rate drops as inverse of the center-ofmass energy, √ s, at the amplitude level, the y b -sensitive channel has a weaker dependence on √ s. Thus, similar to the hadron machines, the rate of y b -sensitive channel benefits from going to higher √ s as compared to the Higgs-strahlung contribution. At 1 TeV, the cross-section for the channel proportional to y 2 b rises to about 3.5 × 10 −4 fb, while that for the Higgs-strahlung contribution drops to 2 fb. It is, however, still difficult to achieve competitive sensitivity on y b .
Another hope arises from studying the bbh channel below the Higgs-strahlung (Zh) threshold. At about 160 GeV center-of-mass energy [23], with sufficient luminosity, the bbh signal could be produced and observed. The sensitivity to a possible interference term that is linearly dependent on y b is, however, difficult to achieve. The study would thus provide sensitivity on y 2 b but not to the sign or phase of y b . Another recent work suggested the study at a lepton collider of the Higgs decaying to bbg, where they focus on the region with 2 b-jets being collinear [24]. The selection favors the kinematic region which is sensitive to a sizable interference between the ggh and bbh JHEP04(2021)139 Table 1.
Total cross section for bbh signal at leading order at various possible future lepton colliders. mediated channels. When fixing the other SM values and varying only a complex y b , the phase of the Yukawa was shown to be constrained to about 60 • at CEPC.
For completeness, we estimate the possibilities of probing the bbh signal at a muon collider at 14 TeV [25]. The bbh cross-section from y b -mediated diagram is estimated to be about 1 × 10 −5 fb. As a comparison, the Higgs-strahlung induced bbh production at the proposed muon collider has a cross-section of about 0.01 fb. Here the two contributions differ by only three orders of magnitude in total cross section. Once more, the bulk of the Higgs-strahlung process has the bottom-quark pair invariant mass peaking at around the Z mass. Provided there is a sizable luminosity to produce the y b -mediated channel, we expect bbh to be a clean signal at a muon collider to probe y b .
We summarize the leading order (LO) cross section for the bbh signal at various future lepton colliders in table 1. The interference term between y b -mediated and Zh type of signal in inclusive rate is orders of magnitude smaller than either of the contribution, and are not shown. From the results of the LO calculation, y b -sensitive channel is overwhelmed by the Zh-mediated contribution by over three orders of magnitude across collider setups considered here. With the current projected luminosity at future e + e − colliders, it is unforeseeable that sensitivity to bbh production driven by y b can be achieved.

Events simulation for HL-LHC and FCC-hh
There are four types of sizable contributions at leading order (LO) that need to be considered separately for the bbh signal at hadron colliders. The first are the pure y b -inducing diagrams, (y b ) in figure 1, which contributes at the amplitude level and are proportional to y b . Then there are the gluon-fusion induced Higgs production diagrams, (y t ) in figure 1, with an additionally radiated gluon splitting to a bottom pair, which are dominantly proportional to y t . Thirdly, there is a non-negligible interference between the first two types of diagrams proportional to y t y b . Lastly, there is the Zh production mode with the Z decaying to a pair of bottom quarks, the right-most diagram (Zh) in figure 1.
The importance of the contribution proportional to y t in the bbh signal has been recently emphasized in ref. [20], calculated to next-to-leading order (NLO) QCD in the effective hgg coupling framework, and the results are presented from the fixed order calculation. Recently, the Zh and vector-boson fusion (VBF) contribution were included to NLO in QCD and electroweak (EW) couplings, and found comparable to the contribution proportional to y b as well [21]. The various contributions to the bbh signal increase, in general, the total rate and observability of the signal at the HL-LHC. But it also poses a challenge to disentangle the different contribution necessary to improve sensitivity to y b , at least, when looking at 1D differential distributions from fixed order calculations presented in the aforementioned papers. Thus, as a first attempt to estimate a "realistic" sensitivity to the contributions from the different channels, to the size of y b and, possibly its CP-phase, we start the study with the relatively clean and well-reconstructed channel of the Higgs decaying to two photons. 1 In this work we include all the channels contributing to the bbγγ final state and are either statistically overwhelming like the bbγγ QCD-QED background, or have similar shapes, like the Zh channel or are both, like the y 2 t -driven channel. There are some other backgrounds that are worth mentioning but are much easier to separate out.
• VBF (NLO-EW): As discussed before, this channel is particularly tricky since it can be as sizable as or larger than the y b -sensitive channels and pose as a challenging background. However, as shown in ref. [21], it can be brought under control by demanding an additional veto of light jets given the topology through which this channel is produced. The light-jet veto essentially kills the VBF contribution while not affecting any of the channels that we discuss above. This can be seen from table 3 when comparing the "NLO 3 " contribution, dominated by VBF with the "LO 3 " contribution dominated by qq → Zh. In addition, from the top-right panel of figure 3 of the reference we see the same suppression of the VBF channel due to the light-jet veto. Given that the VBF contribution can be suppressed independently, we do not consider it in our work.
• hh production: The di-Higgs production can pose as a background if it decays into the bbγγ final state. The cross-section for this is comparable to the y b -sensitive channels both at the HL-LHC and the FCC-hh. However, since both m bb and m γγ will be JHEP04(2021)139 clustered around the Higgs-mass peak, the shape of the final state will be distinct enough to separate it from other channels that contribute to bbh.
• gg → Zh: This channel has a small cross-section at HL-LHC and can be safely ignored. However, it grows rapidly with √ s, about a factor of 40 from HL-LHC to FCC-hh energies for the inclusive rate. Hence, this channel will become comparable, but subdominant, to the y b -sensitive channels at FCC-hh. Nevertheless, it can be distinguished from the y b -sensitive channel because of the difference in the shape that it will have, akin to the case of qq → Zh.
For the "non-Higgs" background, there is the dominant irreducible QCD-QED background of bbγγ. There are also potential backgrounds from various fakes such as jγγ, jjγγ, cjγγ, ccγγ, or bbjγ. The rate of light jets faking a bottom or photon is at the percentage or sub-percentage level, and makes the fake backgrounds mostly negligible after considering the dominant bbγγ background. 2 Hence, we ignore these in this work.
The total cross-section at 14 TeV of the various contributing channels after basic cuts as defined in eq. (2.4) is shown in table 2. We calculate the production and decay of the signal in the four-flavor scheme at LO, following the results presented in ref. [20] with their public code for contributions proportional to y 2 b and y b y t . The interactions between Higgs and gluon are treated as point-like effective couplings ggh and gggh with massive quark effects included at one-loop defined as: As shown in table 2, the computation is first done at leading order (LO) and then an overall k-factor is applied on the LO bbh cross-sections according to the different production channels, based on the fixed order inclusive cross-section results provided in ref. [20]. The kfactors are relatively flat over the kinematic distributions for the bbh final states. We assume 2 The dominant fake contribution to bbγγ comes from the ccγγ channel. Indeed, it can be seen from table 1 of [26], with similar basic selection cuts, ccγγ is about 8 times bbγγ. A c → b mistag rate of about 0.1 makes it sub-dominant but comparable to the dominant bbγγ background. The main argument for dropping this mistagged QCD-QED background is that they shall have similar shape to the bbγγ background, and hence, should be separable by the BDT. Secondly, since c-tagging is still in the process of being perfected, we believe that this background might be better separated in the future, although a clear projection is missing from the literature.

JHEP04(2021)139
Channel LO σ (fb) NLO-k-fact 6  a further SM decay of the Higgs to di-photon and parton shower does not significantly affect this simplification. The branching ratio for the decay of Higgs to di-photon is further normalized as the Higgs cross-section working group recommended value [27]. For the bbγγ background, the k-factor is taken from the ratio between NLO and LO total cross-sections as given in ref. [28]. New topologies of VBF-like contribution that are not included in this work arise as NLO-EW effects. At the generator level, we apply the following cuts to avoid divergences arising from the bbγγ background.

(2.3)
Here Xp T implies a minimum p T cut for at least one b-jet.
For the PDF we use lhaid=320500 from the LHAPDF6 package [29] (NNPDF31_nlo_as_0118_nf_4 [30]). 3 Parton-showering is then performed on the generated LO event sample, with Pythia8 [31], without multi-parton interaction simulated. Detector simulation is applied next using Delphes [32]. The parameters are set according to the HL-LHC card, where the b-tagging rate for a central b-jet is about 75%. The jets are then reconstructed with the anti-k T algorithm with R = 0.4. We further require two photon jets 4 and at least one b-tagged jet which should satisfy: basic cuts: (2.4) 3 We use the default current version of the PDF set which is called DataVersion=2, an update from an earlier version. This causes a discrepancy of about 10% in our total cross section from the study [20], while we otherwise use the same calculation setup and input parameters. We, however, check the ratios between different jet selection and pT cut requirements that these two papers provide and find good agreement with their fixed order calculation results. 4 We use the standard isolated photons definition from Delphes, and call them photon jets in the sense of reconstructed objects.

JHEP04(2021)139
Channel LO σ (fb) NLO-k-fact 30   With these sets of basic cuts, we get the cross-section and estimate the number of events at HL-LHC (6 ab −1 ) shown in table 2. Note that the interference term between y b -and y t -types of diagrams is negative. It, in fact, contains two separate contributions which are qq-initiated and gg-initiated. The former is positive and the latter, larger and negative, due to the different types of diagrams involving s or t/u channels with a massive bottom quark propagator.
All this being said about HL-LHC, we move our focus to FCC-hh. Much like the tth process, the bbh cross-section rises much faster with the center-of-mass energy as compared to the other Higgs production channels like ggh, VBF or VH. While the cross-section of the latter processes see a growth of about a factor of ∼ 10-15 when going from 14 TeV to 100 TeV at a pp collider, the cross-section of bbh grows by about a factor of 22. This gives a distinct advantage to the bbh channel over the main irreducible background coming from Zh rendering the measurement of y b easier at the FCC-hh even beyond a simple statistical scaling to 30 ab −1 .
We use the same parameter setting, generator level cuts, LO simulation and k-factor estimates as used for HL-LHC. It is known that the k-factors generally increase with centerof-mass energy of the collider. This increase is of the order of 10%-20% in the channels that we consider. If this increase is taken into consideration, the signals will actually stand to gain over the background. In this sense, our analysis for the FCC-hh is on the more conservative side. Keeping the same cuts does not affect the analysis since they are very basic cuts used to prune out the QCD-QED background as the rest of the discrimination is left to the multivariate analysis we describe in section 3. Likewise, we perform Pythia showering, hadronization, and a detector simulation at the FCC-hh using the FCC-hh working group recommended Delphes card where the b-tagging rate for a central b-jet is projected to be 85%. We present the cross-section and expected number of events in table 3 after basic cuts.
In ref. [21], it was shown that non-negligible EW contribution from Zh and VBF-like (NLO-EW) topologies arise and could possibly overwhelm the y b -sensitive channels. In their fixed order calculation, the authors discuss distributions of p T and η of the Higgs as well as those of the leading b-jet and light-jet, and the rapidity gap between the Higgs and leading b-jet, both at LO and adding QCD and EW (N)LO contributions. From these JHEP04(2021)139 plots, it is not easy to discern the differences in the distributions in the bulk of the signal region. We will show that a careful selection of kinematic distributions backed with the discriminatory powers of simple machine learning algorithms can be effective in separating the y b -driven channel. For example m b 1 h , the invariant mass of the leading b-jet and the reconstructed Higgs, which is the second most important kinematic variable for separating out Zh contribution from the y 2 b -driven one, is not included in ref. [21]. Hence, the claim that discerning the y 2 b contribution in the bbh signal will be quite hopeless at HL-LHC, where different categories with selected number of b-jets were studied as in table 4 of ref. [21], is somewhat premature given that a more refined statistical analysis proves otherwise. As we will see in our analysis, using decision trees and Shapley values, the number of b-jets is not important for distinguishing the various Higgs channels contrary to what was attempted in ref. [21]. Instead there are other variables and, more importantly, their correlations that come from higher dimensional distributions that will be the key to extracting a signal that has been declared as a lost cause.

Exploring higher dimensional kinematic distributions
The total bbh signal with the Higgs decaying to di-photon is a resonance-bump hunt in a background of a flat di-photon spectra, with an additional requirement of at least one tagged b-jet in our selection. At the HL-LHC, with 6 ab −1 of data (ATLAS+CMS combined), and summing up the contributions from the four types of bbh signals at LO, we get a statistical significance of 6.8σ on the total SM bbh signal, with the basic cuts as defined in eq. (2.4). Performing a mass window cut of 123 < m γγ (GeV) < 127 around the Higgs mass peak further increases the significance to about 10σ. With this estimate, there is no doubt that we should be able to see a clear bbh signal on top of the dominant QCD-QED bbγγ background at the HL-LHC.
The next goal is to evaluate the sensitivity to the contribution proportional to y b out of the total signal, which is the primary motivation of looking at the bbh channel in this work. We can gather from the number of events with 6 ab −1 given in table 2, the sensitivity to the y 2 b -driven channel is only about 1.6σ after basic cuts. We will try to see if this can be improved by exploring the higher dimensional kinematic shapes using multivariate analysis.
The Zh, Z → bb channel has the distinct feature that the invariant mass of the two b-jets can be reconstructed to the Z boson mass. However, given the basic cuts, only about 20% of Zh channel has both b-jets tagged in our simulation. The fraction of events having two tagged b-jets is even smaller from other bbh channels, as shown in the last column of table 2. Given the limited signal statistics, especially at the HL-LHC, we do not require two tagged b-jets or stringent mass window cuts, to allow for more events from the y 2 b and y b y t channels. Instead we stay as inclusive as possible with generous cuts, and resort to kinematic shapes and multivariate analyses to further explore the variance in shapes of the higher dimensional distributions amongst the different channels.
As discussed before, the y 2 b -driven channel could be overwhelmed by the other y bindependent contributions such as Zh and those proportional to y 2 t . Despite the sizable contribution from these other bbh channels, and the similarity between the 1D distributions JHEP04(2021)139 of the kinematic observables, the Zh channel can still be separated from the y 2 b channel with relative ease given information from higher dimensional distributions. We will see a nice separation from the multivariate analysis, and understand the physics ramifications brought about from the higher dimensional kinematic distributions as well. The y 2 t contribution is a bit harder to disentangle, and remains as the dominant background which reduces sensitivity to y b . However, we will show systematic approaches to enhance sensitivity to contribution proportional to y 2 b or y b y t while suppressing y 2 t and all other background contamination.
After detector simulation and jet definition, we have, for most events, a final state of two photon jets and at least one b-jet, where the two photons reconstruct back to a real scalar Higgs mass for all the bbh channels. We first define and evaluate a comprehensive set of kinematic observables as the following: and η b/γ 1,2 are the p T and rapidity for the tagged leading and sub-leading b/γ-jets (in our definition the subleading b-jet could be a null 4-vector since we require one b-jet inclusive), n bj is the number of tagged and passed b-jets. Lastly, ∆R bγ 1 and ∆φ bγ 1 are the R-distance and the φ-angle between the leading b-jet and the photon jet. The remaining variables are the invariant masses, and H T is the scalar sum of the transverse mass of the system. We shall show in what follows, that it is not necessary to be very selective about the kinematic variables one chooses to use in the analysis. What is necessary is that all possibly useful kinematic variables are included. As can be seen from the list above, some of the variables seem to be interdependent and, probably, highly correlated. The beauty of using interpretable machine learning is that a hierarchy of importance for the variables will be built during the analysis using an over-complete basis of collider observables from which the most important ones can be chosen to fine tune the process.

Boosted decision trees and Shapley values
Having to assess and discriminate between multiple competing signals, we face a classification problem which requires an understanding of underlying correlations and the necessity to be able to identify kinematic distributions that will optimize the extraction of the signal while suppressing the background. Hence, in the implementation of our multivariate analysis we resort to using decision trees that are very efficient at addressing multi-class classification problems. We will work with the BDT algorithm implemented in XGBoost [33], a publicly available scalable end-to-end boosting system for decision trees. We follow the normal procedures for training and testing the BDT with simulated data. To assess the performance of XGBoost, we compare it to the BDT implementation in ROOT and see better performance with XGBoost in terms of both speed and accuracy. As a further validation JHEP04(2021)139 of our results, we implement a Deep Neural Network (DNN) in TensorFlow [34] to address the classification problem. We note that the performance of the DNN is comparable to and validates the BDT. For our final results we work with the BDT framework so that we can efficiently implement the analysis of variable importance with Shapley values as explained below.
The adoption of machine learning algorithms for theoretical or phenomenological particle physics analyses has been slower than for their experimental counterpart because of the perceived lack of interpretability of the models built by frameworks such as BDT and DNN to map variables onto outcomes. This has often led to a labeling of such algorithms as "black-box" tools. We would like to argue that this lack of interpretability is not an inherent property of machine learning algorithms and can be overcome with constructs such as the Shapley values. In conjunction with a concrete understanding of the kinematic distribution that can be attributed to the channels being studied, the use of Shapley values in interpreting machine learning algorithms can provide a far better understanding of the map between kinematic variables and how signal is extracted from dominant, and even irreducible, backgrounds than any simple cut-based analysis. In this work we show how the black-box of machine learning algorithms can be demystified leading us to an understanding of kinematic shapes in a truly multivariate manner.
In our work we choose to use the high-level kinematic variables listed in the previous section instead of low-level energy-momentum four vectors since the former provide a more concrete insight into the underlying dynamics. We test and confirm that working with high-level variables instead of low-level ones does not affect the accuracy of the classifier, be it a BDT or a DNN. One, of course, needs to build a over-complete set of high-level variables to make sure that none that is important is overlooked. With this setup we train the machine learning algorithm performing the necessary tuning to get optimal validation accuracy.
This procedure, however, provides very little insight into which kinematic distributions are paramount in discriminating between the different channels. To do this we introduce the Shapley values [22]. Formulated by Shapley in the mid-20 th century, this is a measure in game theory of how important a player is in a game of n players and a preset outcome defined as the objective of the game. A simple way of looking at Shapley values is to imagine what the outcome could have been without a particular player. Now, if one brings the player in the game, the outcome might be changed. The Shapley value provides a measure of the change of the outcome due to this player on an average. In essence, the Shapley value tells us how important the presence of a variable is in determining a certain class when compared to its absence from the multivariate problem being addressed. The process naturally and mathematically lends itself to studying the correlations between different variables since all possible combinations of variables can be taken out of the game to check the outcome. 5

JHEP04(2021)139
Typically, Shapley values are very expensive to compute for most problems. Hence, we use SHAP (SHapley Additive exPlanations) [36], which is based on Shapley values generated by the tree-explainer [37,38]. To determine the Shapley values, S v , a game is played with all possible combinations of variables as the players and with the payout being the difference of the predicted value of the outcome from the marginalized value. The S v for the i th variable in determining the value of the outcome is measured from the difference between a value function that includes the i th variable and the value function which is marginalized over the i th variable. For decision trees this translates into conditional probabilities that are determined by the end-nodes that can be reached in each game. Using the additive property of S v , the importance of the i th variable is determined by the ensemble average of the absolute Shapley values, |S v |. The higher the |S v | for a variable, the more important is the variable in determining the outcome. 6 To facilitate this, we use a BDT framework in our final analysis and not a DNN.
Let us assume that we perform a binary classification of a dependent variable, y, using a BDT (y ∈ [0, 1], y = f (x i ) and i = 1, . . . , n) with x being the vector of independent variables. The probability of belonging to the two classes is the output of a prediction of the BDT for each event. The S v , for a class for any particular event would just represent the shift to the marginalized probability for the class: and p(y = 0) + p(y = 1) = 1. (3.1) which is just the additive property of S v [22]. In a multi-class classification using a BDT with the XGBoost implementation, the output of the tree ensemble is log(odds), where odds is defined as:

The game of reducing the irreducible Zh background
As shown in ref. [21] one of the significant hurdles in measuring y b from bbh associated production is the Zh, Z → bb irreducible background. The goal of this section will be to show that this is not the case and that the Zh-driven channel and the contribution proportional to y b are indeed separable. This will also allow us to show how the BDT framework we use is interpreted using the Shapley values, S v , thus acting as a precursor to the detailed analysis with the QCD-QED background in the next section. 6 In fact, there are several other measures of variable importance used in machine learning like Gini or permutation based measures to name a few. However, these measures are known to suffer from inconsistencies. We checked the Gini measure of variable importance and the permutation measure and found reasonable agreement in the most important variables. We begin the analysis with the whole set of kinematic variables defined in section 3. After training the BDT we compute the average of the absolute Shapley values, |S v |, from a representative sample of events to assess the hierarchy of importance of the kinematic variables. From this we discard those kinematic variables that have very low |S v | since they have a very low impact in discriminating the two channels. We check to make sure that the accuracy of the BDT is not reduced by discarding these variables. This not only reduces the training time drastically, but also provides a clearer picture of the physics behind the kinematic distributions in terms of a smaller number of variables.
The BDT is freshly retrained with the reduced set of variables to produce a final hierarchy of |S v | which can be seen in the top left panel of figure 2. From the figure it is clear that the top five kinematic variables, p γγ T , m b 1 h , p b 1 t , m bb , and H T , are the most important ones in discriminating the two channels. We show the normalized distributions for these five kinematic variables corresponding to both the channels. It is clear that any one of those kinematic variables cannot provide a significant discrimination between the channels. Any attempt at a simple cut based analysis will clearly fail, especially since the JHEP04(2021)139 total cross-section of the channel proportional to y 2 b is lower than that of Zh production. In addition, we also notice that m b 1 h is an unlikely candidate as a discriminating variable since the 1D distributions for the two channels overlap significantly.

HL-LHC
The strength of a BDT algorithm lies in its ability to model shapes and correlations in multivariate spaces, and give us a linearized measure of the projections of these higher dimensional shapes. Therefore, to understand the hierarchical ordering of |S v |, we look at the correlation between the kinematic variables explicitly. The network plots on the left panel of figure 3 shows the correlation between the kinematic variables in the two channels. It can be clearly seen that the correlation structure of the two channels are quite distinct hinting at variation of the shape of the multivariate kinematic distribution. The heatmap in the right panel of figure 3 shows the difference between the correlation in the two channels. One can see that there is a misalignment of the correlation pattern for m b 1 h with several variables, and most importantly, with p γγ T . The m bbh variable also seems to be quite misaligned in the two channels from the heatmap, but it has a low |S v |. This can be understood by noting that m bbh is tightly correlated with m b 1 h in both the channels as seen from the network plots in figure 3 and hence, the misalignment of its correlation is inconsequential as evident from the low |S v | it gets.
To further understand the importance of kinematic shapes, especially in light of their relevance to the importance of a variable like m b 1 h , which would by itself be of no use in discriminating the two channels, we look at  and Zh contributions at HL-LHC. While 1D distributions show that m b1h by itself cannot be used to discriminate between these two contributions, its correlation with other kinematic variables creates shapes that a BDT can use to discriminate between the two contributions.
distributions of m b 1 h along with the other four of the five top variables in the |S v | ranking. It is evident from these density plots that even when m b 1 h does not seem like a variable worth mentioning from the 1D distributions, it becomes important in a shape based multivariate analysis. This aptly displays why a multivariate analysis performs much better in discriminating between kinematic shapes than a traditional cut-based analysis. In figure 5, we show the separation of the channel proportional to y 2 b and Zh in the signal probability space with the former channel being the signal and the latter channel the background. The possibility to separate the two channels is indisputable.
Similarly, we evaluate the separation between the contributions proportional to y 2 b and y 2 t . The analysis is shown in appendix A. The separation is not as good as the previous example. Nevertheless, judging by |S v |, we find that the leading distinguishing variable is  Figure 5. The clear discrimination between the channels proportional to y 2 b and the Zh channel from a multivariate BDT analysis with SM signal and 6 ab −1 total luminosity assumed at HL-LHC.
to y 2 b , the two bottom quarks tend to be back to back leading to a softer Higgs p T . The second most important variable is m bb where the contribution proportional to y 2 t peaks towards zero and the one proportional to y 2 b has a much larger m bb value for the small percentage of events where both b-jets are tagged. As will be shown in the next section where we discuss the details of all the background channels at HL-LHC and FCC-hh, the dominant QCD-QED bbγγ background and the irreducible contribution proportional to y 2 t will still be the limiting factors to achieving sensitivity to y b .
In what follows we will use the multivariate methods we discussed in this section to extract the bbh signal at the HL-LHC and FCC-hh. More importantly, we will study the possibility of isolating the contribution proportional to y 2 b and y b y t to show that its a viable enterprise at the future hadron colliders.

The bbh channel at future hadron colliders
Having sorted out the irreducible background, let us focus on the complete analysis including the QCD-QED background. We focus on all five channels in this section and consider the signal channels as those driven by y 2 b and y b y t terms since our goal is to probe y b . In addition, we include the background channels driven by contributions proportional to y 2 t , from Zh and bbγγ QCD-QED production. We train a multi-class BDT with simulated data using an over-complete set of high-level variables. Then, using |S v |, we filter out the variables that are the most important without compromising the accuracy of the BDT. Finally, using these variables, we retrain the BDT to project out the possibility of measuring y b at the HL-LHC with 6 ab −1 of total luminosity and at the FCC-hh with 30 ab −1 of luminosity. In addition, with the help of |S v | we point out the kinematic measurables that are the most important in these analyses to gain physics insights.
The interference term proportional to y b y t has to be dealt with considerable care. Being an interference term, it has a propensity for providing a negative contribution to the total cross-section as can be seen from table 2. This means that the weight of every JHEP04(2021)139 event needs to be kept track of throughout the analysis and correctly used when projecting the significance of measuring y b from this contribution. It must be kept in mind that the BDT is agnostic to the nature of physics encoded in the simulation and is only sensitive to the kinematic shapes. Therefore, classifying a signal with a negative cross-section is not a hurdle for the BDT but rather the prediction made after the training of the BDT must be convoluted with the correct weights.
The simplicity of a binary classification is lost in a multi-class BDT classification like the one we will now describe. The prediction of a BDT comes in the form of a n-tuple of probabilities of an event belonging to each of the n classes with the element of the n-tuple adding up to unity. An event is said to belong to the class corresponding the highest element in the n-tuple. So a plot like figure 5 is not very informative. To understand the results of the multi-class BDT analysis, we construct a confusion matrix. This is a n × n matrix, for n classes. The sum of the elements in the i th row, j N ij , gives the actual number of events of class i that would be generated in a pseudo-experiment corresponding to the collider under study. The sum of the j th column, i N ij , gives the number of events of the class j predicted (including correct classifications and misclassifications) by the BDT in this pseudo-experiment. Hence the (i, j) element of the matrix gives the number of events of the i th class that is classified as belonging to the j th class with i = j representing a misclassification. The significance of the j th channel given by signal/ √ signal + background can be defined as: where i is the row index and j is the column index. Armed with these procedures and definitions we move on to examine the prospects of measurement of y b at the future colliders.

Prospects at HL-LHC
The results for |S v | from the BDT analysis are shown in the top left corner plot of figure 6. Note that the ordering of the kinematic variables according to their importance in discriminating the channels have changed from the previous example in figure 2. This is not unexpected since the classes being discriminated determine what kinematic shapes are important. Secondly, unlike the |S v | for the case of Zh vs. y 2 b , different channels have different |S v | for each kinematic variable. From this, one can understand the importance of each kinematic variable in discriminating each channel from the others.
For example, m γγ has the largest discriminating power for separating out the bbγγ background (pink part) from the y 2 b -driven channel (red part) as can be seen from the |S v | plot in figure 6. The blue, purple and yellow parts corresponding to Zh, y 2 t -driven and y b y tdriven channels respectively are of the same length as the red part of the bar corresponding to the y 2 b -driven channel as they should be because they all have the same shape as is clear for the upper middle plot of figure 6. The distribution of bbγγ is flat in m γγ , while the others are peaked around m h since the γ pair comes from a Higgs decay. On the other hand the separation of the y 2 t -driven channel from the y 2 b -driven channel is orchestrated by H T , which is second in the hierarchy of kinematic variables as judged by |S v |. The matrix shown in table 4 shows the confusion matrix produced from a pseudoexperiment for HL-LHC assuming 6 ab −1 of data. The right-most column gives the actual number of events produced in each channel. The bottom-most row gives the significance of each channel. Note that this is purely a statistical significance and no systematics has been rolled into this estimate. We leave the discussion of systematics in section 4.3. Following these procedures, we get a resulting statistical significance of the y 2 b -driven channel of 3.33 σ. The sensitivity for y b y t channel is 0.47 σ. We shall see in section 5.1 how the projection from this analysis translates into bounds on the effective rescaling of y b .

Prospects at FCC-hh
Isolating the y b -sensitive channels at the FCC-hh is a much easier task not only due to the much larger luminosity but also because of the disproportionately larger growth of bbh production compared to Zh production as explained in section 2.2. However, keeping in mind that a multivariate analysis will still outperform a cut-based analysis, we use the combination of BDT and S v to fathom the prospects of measuring y b at FCC-hh. Since the importance of the irreducible Zh background falls, and the only other backgrounds that  need to be discriminated from are the contributions proportional to y 2 t and the QCD-QED bbγγ background, an analysis with |S v | rightly shows that m γγ is by far the most important kinematic variable in discriminating the y 2 b signal from the dominant background. The results of the BDT analysis is given in table 5. Noting that the significance quoted in the table pertains to those obtained from assuming the presence of statistical uncertainties only, we see that the contribution proportional to y 2 b can be isolated at about 64 σ significance while the interference term proportional to y b y t can be isolated to about 10 σ significance.

Discussion of theoretical and systematic uncertainties
As discussed above, compared to the Zh channel and the QCD-QED bbγγ backgrounds, the bbh channels gain a relative enhancement on statistics at the FCC-hh. This results in a much better sensitivity to the measurement of y b that scales roughly with the square-root of statistical gain. Till now, we have not included any experimental or theory systematic uncertainties, which might not be negligible at HL-LHC or FCC-hh. The current theory uncertainty from fixed order NLO calculation ranges from 20% to 50% for the channels proportional to y 2 b , y b y t , y 2 t [20], and are the dominant sources of uncertainty. No pro-

JHEP04(2021)139
systematics HL-LHC (6 ab −1 ) FCC-hh (30 ab −1 ) jection of the errors at future colliders is available, but, extrapolating from the HL-LHC analyses [9], we would expect that in the future similar or smaller theory uncertainty as compared with statistical uncertainty would be reached. As for the experimental systematic uncertainties, it is expected to be important especially when low statistical errors are reached, like at the FCC-hh. We thus include an approximate estimate of the significances including experimental systematic at the level of 0.5%, 1% and 5% stemming from the dominant background bbγγ. In practice we take the number of events classified as bbγγ in the confusion matrix, multiplied by the different systematic error estimate, and added in quadrature to the statistical error, the latter being the square-root of the total signal and background events. The results are shown in table 6. We see that even O(1%) experimental systematic uncertainty from the dominant background reduces the significance of the signal considerably. Furthermore, the errors in the bbh measurements are mostly statistics dominated at the HL-LHC but can easily get systematics dominated at the FCC-hh. Thus, a good control over the systematic uncertainties will be necessary to draw the full strength of the statistics allowed at the FCC-hh.

Constraints on the effective rescaling of y b
In this section we will take a look at the rescaling of the effective Higgs couplings to the bottom quark. To account for possible CP violation in the Yukawa sector, it is natural to go to the more general complex Yukawa assumption and constrain the 2D space of the complex Yukawa. We will address the bounds on the CP-even coupling, κ b , and its CP odd counterpart,κ b or equivalently, the magnitude, |κ b |, and the phase φ b of the complex coupling. The bounds we explore from bbh come indirectly from the combined contribution of CP-even and odd effects in the inclusive (interference) contribution. 7 Exclusive CP-odd observables give more direct and model-independent constraints on the CP violation for various couplings. 7 Its worth mentioning here that model parameters can also be extracted using the Fisher Information that sets a lower bound to the covariance matrix of the parameters according to the Fréchet-Cramér-Rao bound [39]. This concept has been revived recently in ref. [40] with later works using machine learning to learn the likelihood function. Since we build the machine learning classifier in a way that the y b (or κ b ) dependence can be isolated, a simple rescaling allows us to set bounds on the parameter within certain assumptions (cf. footnote 9).

JHEP04(2021)139
In the κ-framework, the indirect bounds are currently more stringent compared to bounds from exclusive measurements such as those for y t , when comparing bounds from inclusive Higgs rate with bounds from tth kinematic observable studies [41][42][43][44][45][46][47][48][49][50]. Exclusive CP measurements will benefit greatly from more statistics at HL-LHC and especially at FCC-hh. In comparison, the sensitivity to the CP-violating phase in y b (or strictly speaking, the relative phase between y b and g eff ggh ) is derived from the sensitivity to the y b y t channel in the bbh production process. It is in the same spirit as for the sensitivity to the CPphase of y t from inclusive th [51] production and the th + V [52] process, where sizeable and measurable interference between y t -and electroweak coupling g V -driven diagrams are expected at future colliders making it possible to probe the relative phase. Separately, there are exclusive studies of CP-sensitive observable in the tth channel, where the decay topology of the massive top quark provides additional information on the top quark spin. Hence, CP-sensitive observable can be constructed from the decay final states. The equivalent of these CP-sensitive observable are lacking in the bbh process since the decay topology is buried in the b-jet. Constructing observable which involve both jet-substructure and the rest of the event shape might be helpful and worth exploring in future works. Hence, an exclusive study of bbh with h → 4 , γγ to directly probe possible CP-violating effects in the process is left for a future work. Assuming NP effects in y b , we can interpret experimental constraints on the complex rescaling of y b including our bbh study. A global fit with a relevant set of EFT operators including corrections order by order, or interpretation of a motivated NP scenario is left for a future work too.

Constraints on a real bottom-quark Yukawa κ b
Let us first take a look at a real rescaling of y b . In the κ-framework, modifying only the Higgs coupling to the bottom quarks, we have 18 GeV, depending on the choice of scheme, is the bottom quark mass and v is the vacuum expectation value of the Higgs field normalized to 246 GeV. The rest of the couplings in the SM Lagrangian are assumed to remain unmodified. The results from the previous section can now be used to put bounds on κ b . The contribution proportional to y 2 b scales at κ 2 b , while the interference term scales with κ b and κ g , the effective coupling for gg → h. Here we need to assume that the additional gluon in the y 2 t -driven contribution is not too far off-shell and the gg → h is effectively energy independent. 8 This assumption also makes it independent of the center-of-mass energy. The rescaling of a general CP-violating gg → h coupling, κ g , as function of a CP-violating complex κ b is given in appendix B and applies for both HL-LHC and FCC-hh. From that, one can reduce to the real κ b case.
From the confusion matrices given in table 4 for HL-LHC and table 5 for FCC-hh we can get the number of signal and background events for both the y 2 b -driven part and the interference term. This can be translated into a significance with which a certain value of κ b can be excluded by assuming a SM signal injection. The significance is given by: for j = 1, 2 representing the y 2 b and y b y t channels respectively in the confusion matrix. The index i sums over all five channels with i and j being row and column indices respectively. The significance for y 2 b and y b y t and their statistical combination as functions of κ b are shown in figure 7. From the term proportional to κ 2 b there is a degeneracy in the determination of the sign of κ b . This degeneracy is broken in the determination of κ b from the interference term. Hence, combining the two channels lifts the degeneracy in the determination of κ b . While at the HL-LHC, the exclusion of the negative parameter space for κ b is not possible with high significance due to the limited sensitivity to the interference term, it is possible at FCC-hh with the negative solution being excluded at about 19σ. This translates to a 1σ confidence limits on κ b of [-0.99,-0.82] ∪ [0.84,1.14] at HL-LHC and [0.99,1.01] at FCC-hh. In comparison, the direct bound on |κ b | from h → bb is 2.2% at HL-LHC and 0.49% at FCC-hh. However, the sign of κ b cannot be resolved with h → bb measurements alone as discussed in discourse that follows.

Constraints on a 2D CP-violating bottom-quark Yukawa {κ b ,κ b } space
Extending our analysis to a CP-violating rescaling of y b with {κ b ,κ b }, we get an additional term in the SM Yukawa Lagrangian since in the SMκ b = 0: The CP violating phase is thus φ b = arctan(κ b )/(κ b ), with |κ b | = 1 and φ b = 0 being the SM prediction. All other couplings are fixed to their SM values in this analysis. The use of this simple {κ b ,κ b } framework has to be understood as a proxy to compare the sensitivity of the different channels and cannot be a replacement of a dedicated global

JHEP04(2021)139
analysis considering simultaneous deformations of the SM as expect in any realistic BSM scenario.
To translate the constraints from the studied bbh channels in this work, we replace κ 2 b in the real y b scenario discussed in eq. (5.2) by κ 2 b +κ 2 b . 9 The y 2 b -driven contribution scales as κ 2 b +κ 2 b when including the non-zero CP-odd component. On the other hand, the y b y tdriven amplitude scales linearly with κ b . From figure 1 it can be seen that the amplitudes proportional to y t generated by the top-quark loop, can also be generated by a bottomquark loop. This can be done by expressing κ g , the rescaling of the gg → h coupling, as a function of κ b . Hence, the interference term is proportional to Re[κ g ]κ b + Re[κ g ]κ b and is given by: Note that κ b andκ b are real by definition. Additional details about the functional form can be found in appendix B.
Since we shall attempt to draw a comparison between the constraints on the rescaling of y b from bbh production with other modes such as h → bb, gg → h and h → γγ, it is worthwhile to give a brief overview of the current status of h → bb measurements and the projected bounds from measurements of all the channels at HL-LHC and FCC-hh. The recently measured channel of h → bb by ATLAS [7] and CMS [8] reports bounds on the signal strengths of: 16 −0.15 , µ CMS h→bb = 1.04 ± 0.14 ± 0.14, (5.5) with the first and the second errors being statistical and systematic respectively. This translates to a bound on κ b of about 7% from the combined experiments. We will see that the decay, h → bb, will remain the most sensitive measurement for the absolute value of y b at the HL-LHC and FCC-hh. Additionally, indirect constraints from various inclusive Higgs production and decay measurements are important for the determination of a CP-violating Yukawa component, especially from the large gluon-fusion production and the di-photon decay of the Higgs. For completeness, we show in appendix B the dependence of κ g,γ andκ g,γ on κ b andκ b after integrating out the one-loop quark contribution with the external legs on-shell. With these assumptions the projected constraints on κ g and κ γ from the inclusive production and decay rate can be directly mapped to bounds on κ b .
The projected 1σ bounds at HL-LHC (ATLAS+CMS combined, 6 ab −1 , including systematics) for κ g (0.8%), κ γ (1.3%) and κ b (2.2%) are derived from the projected bounds on the production cross-section and decay branching ratios σ ggh (1.6%), BR(h → γγ) (2.6%) and BR(h → bb) (4.4%) from figures 28-29 of the HL-LHC projections' study [9], where no sources of CPV are assumed. The CP-phase contribution to the total rate comes 9 In principle, the κ b andκ b contribution are different at the amplitude level and thus produce different kinematic shapes, and this naive extrapolation should not hold. In practice, the threshold behavior in squared-amplitudes of the CP-even and odd contributions differ by a factor of h at threshold. The difference is thus suppressed by three orders of magnitude and negligible.

JHEP04(2021)139
in through |κ g,γ,b | 2 + |κ g,γ,b | 2 . This is true when the bounds on κ g,γ,b are dominated by inclusive Higgs measurements which have no additional Yukawa coupling or NP dependence other than the κ's being constrained.
For the bbh channel, we do not include systematics since no estimates are available other than the naive scaling we showed in section 4.3. It should be noted that while the error estimates for κ g , κ γ and κ b from h → bb are expected to be systematics dominated at HL-LHC, the error estimate for the bbh channel can be argued to be statistics dominated because of the low statistics for this channel at HL-LHC. This can also be seen from our naive estimate of systematics in table 6. We proceed under this assumption to perform the combined fit.
At FCC-hh, the expected sensitivity from a global κ-fit to κ g , κ γ and κ b are about 0.49%, 0.29% and 0.43%, estimated, for example in table 3 of ref. [53] including experimental and theory uncertainties. This is somewhat different from what we use as projections for HL-LHC since the numbers are taken from a global κ-framework fit for FCC-hh rather than from the fits to the individual channels. 10 Given that the aim of this section is to make a quantitative comparison between the channels widely advocated for constraining the size and phase for y b , vis-à-vis what can possibly be added by bbh, we feel this is a reasonable approach. As for HL-LHC, we do not include systematic errors for bbh measurement at FCC-hh for lack of clarity on such estimates.
Collecting all these numbers we perform a fit to all the constraints from HL-LHC and FCC-hh. In the top two panels of figure 8 we show the constraints from measurements at HL-LHC and FCC-hh of all the channels we just discussed. The weakest constraint, understandably, comes from h → γγ (pink region) since the fermionic loops generate subdominant contributions with the bulk of the contribution coming from the top loop. However, due to a constant shift in the amplitude dependent on y b , the h → γγ channel can break the degeneracy in the sign of κ b at FCC-hh. The same argument holds for gg → h production since the bottom-loop contribution is sub-leading and the constraint on κ b is shifted due to the larger constant term coming from the top-loop contribution. However, the constraints are much stronger for gg → h both at the HL-LHC and the FCC-hh (purple region) than for h → γγ. The most stringent constraint comes from the measurement of the branching fraction h → bb (red region). Being proportional to |κ b | 2 +|κ b | 2 , it does not allow for the determination of the CP-violating phase φ b . The constraints from the contributions proportional to y 2 b and y b y t are shown in blue and yellow respectively. They are not competitive with the constraint from h → bb at HL-LHC while being quite comparable at FCC-hh. Since there is a linear dependence of y b in the interference term, these two constraints are shifted from being centered at zero.
In the plots in the middle of figure 8 we show the constraints on the |κ b | and φ b parameter space from all measurements other than bbh associated production. These fits are done using a Bayesian MCMC implemented in PyMC3 [54]. The 1σ high-density intervals are given for each parameter in the corner plots.

JHEP04(2021)139
At the HL-LHC, h → bb and gg → h completely determine |κ b | and φ b as evident from the three left panels of figure 8. The contribution from bbh does not leave a mark. The picture is quite different for FCC-hh. Taking a careful look at the inset in the top right panel of figure 8, one can notice that the phase, φ b is constrained by an interplay between h → bb, and the bounds from the y b y t -and y 2 b -driven contributions of bbh. Indeed, it is the misalignment of the interference term that improves the bounds on the phase by about 15% over what is possible without bbh measurements as can be seen from comparing the middleright and the bottom-right corner plots. This clearly shows that a more comprehensive study of the bbh channel is necessary for reducing the theoretical errors and estimating the systematics for future colliders.

Bounds from EDM
We feel that a discussion on the current bounds from EDM onκ b is necessary since they are quite constraining. Recent update on electron EDM measurement from ACME [55] gives stringent bound on T -violating (or CP-violation assuming CPT symmetry) effects in the system. Hadronic EDMs such as those from neutron or mercury EDM measurements give complementary information on hadronic CP-violating effect and do not rely on the existence of leptonic Yukawa couplings. However, they usually suffer from sizable theory uncertainties. Recent discussions and review on EDM measurements and theory constraints can be found in refs. [56][57][58]. Future proposals to measure both leptonic and hadrnoic EDM [59][60][61][62][63] aim to improve these bounds by one or more orders of magnitude at these small scale experiments in the near future. When focusing on the CP-violating terms in the Yukawa sector, EDM measurements are interpreted as bounds on the CP-violating phase in the Yukawa couplings [64][65][66].
When not relying on assumption of a SM electron Yukawa, Neutron EDM currently gives the strongest bound on the CP-violating coupling,κ b 5. Assuming a SM electron Yukawa coupling, the most stringent bound onκ b comes from electron EDM withκ b 0.5 [65]. The constraints onκ b from the neutron (or other hadronic) EDM can also be diluted or evaded, if more than one Yukawa CP-phases from different quark flavors are present and cancel among themselves. In this case, studies of CP-violating observable or interference terms in the tth, bbh or h → τ τ processes at colliders would become valuable and provide complimentary information to pin down the individual Yukawa couplings and their CP-phases.

Summary
Gleaning tiny and obfuscated signals from dominating backgrounds has been the boon and the bane of particle physics, in both the theoretical and experimental realms. The evolution of methods used to do so has gone through several stages with multivariate analysis being the most evolved. The strength of multivariate analyses in the form of machine learning methods like decision trees and neural networks have dominated experimental analyses for quite a few years now. In this work we augment phenomenological analyses by bringing forth to it the strength of interpretable machine learning.

JHEP04(2021)139
The extraction of y 2 b from the associated production of the Higgs with a bb pair has been written off due to the presence of dominant irreducible backgrounds, especially from Zh production [21]. The arguments were based on the fact that Zh, Z → bb and the y 2 b -driven contribution have very similar kinematic signatures making it difficult to distinguish the weaker contribution proportional to y 2 b , from the irreducible background. In this work we show that, the dismissal of the measurability of the y 2 b -driven contribution in an attempt to isolate y b is premature. The use of multivariate analysis in the form of BDT can still allow for the separation of the hidden signal.
We go a step further and add interpretability to the machine learning method by appealing to a measure derived from game theory called the Shapley values. This facilitates us in working with high-level kinematic variables, instead of momenta four-vectors, and gives us insight into the hierarchy of importance of these kinematic variables. 11 This, in turn, allows us to narrow down and focus on those kinematic variables that are the most important, providing a way of understanding the physics underlying the possibilities of separating the y 2 b -driven contribution from the backgrounds. In summary, this work provides the following innovations and insights: • An irreducible background like Zh production can be tamed with multivariate methods that can be better understood with interpretable machine learning tools. Kinematic shapes are the key to distinguishing small signals that cut-based analyses miserably fail at.
• Machine learning algorithms do not need to be black-boxes [67]. While it is difficult to avoid this for several machine learning algorithms, much progress is being made to make machine learning interpretable. In this work we show that decision trees can be made interpretable and certain nuances of the distributions that they probe can be understood with metrics such as the Shapley values.
• At the FCC-hh, the associated production of bbh stands to gain as its production rate grows much faster than the Zh production rate and the dominant bbγγ background with rising energies. This can be exploited to get a good measurement of the magnitude and sign of y b from bbh production which can be comparable to that from a h → bb measurements.
• Since constraint from h → bb is very stringent on the magnitude of κ b and gg → h can fix the phase, φ b , in combination with h → bb, probing bbh does not add to these constraints significantly at HL-LHC ( figure 8). While at the FCC-hh, bbh production does not add much to the constraint on |κ b |, the constraint on φ b can gain by about 15% from bbh measurements at FCC-hh (figure 8) through the interplay between the y 2 b -and y b y t -driven contributions along with h → bb, resulting in the bbh channel proving to be more significant than the gg → h in constraining φ b . 11 Without the use of Shapley values as a variable importance measure it would not be possible to choose the minimal set of kinematic variable required to separate the signal from the background and we would be left with a very large set of kinematic variables to deal with that would drastically reduce the interpretability of the procedure.

JHEP04(2021)139
The emergence of machine learning as an effective tool-set to address regression and classification problems has led to a wave of applications in particle physics often at the expense of overshadowing the importance of the physics case. However, our objective lies in showing how interpretable machine learning can augment phenomenological analyses. The primary message of this work lies in the display of a method that can be instrumental in future phenomenological analyses focused on using kinematic shapes as the primary tool to probe for tiny signatures. In the precision era, this will be an important component that provides maximal statistical advantage along with a clear understanding of the underlying dynamics with interpretability. We clearly show this by resurrecting the possibility of measurement of y b from bbh production after tackling rather complex irreducible backgrounds. The robustness of our analysis brings the hope of extension of the methods developed in this work to extract difficult signatures in other phenomenological analyses.  Figure 10. BDT discrimination for separating the contributions proportional to y 2 b and y 2 t at the HL-LHC with 6 ab −1 of data. A SM signal is injected. the y 2 t -driven channel. Suffice it to say, that this further highlights the ability of the BDT to separate two channels with very similar kinematic distributions and the added insight provided by the Shapley values in understanding why that is possible.  Figure 11. Joint non-normalized 1D and 2D plots, with SM signal, representing how kinematic shapes allow the BDT to discriminate between the y 2 b and y 2 t contributions at HL-LHC. While 1D distributions show that m bbh by itself cannot be used to discriminate between these two contributions, its correlation with other kinematic variables creates shapes that a BDT can use to discriminate between the two contributions.

B Effective κ g and κ γ
Following the procedure in ref. [68], it is simpler to work out the constraints on κ b in terms of constraints on the effective ggh and γγh couplings which are defined as, The CP-even and CP-odd couplings are, Hence, κ g and κ γ , the rescaling of the effective ggh and γγh couplings, can be written as functions of κ b andκ b as, such that the modified inclusive gluon fusion rate has µ g = |κ g | 2 + |κ g | 2 . After plugging in the masses as defined in section 2.2, these gives the numerical relations, We have used these expressions in section 5 to set bounds on real and complex κ b .

C Deep neural network analysis
To construct the DNN [69,70] we used TensorFlow [34]. The following are the details of the fully-connected DNN architecture: required for training a BDT to similar accuracies with the early stopping coming into effect at about 500 epochs. The trained DNN was used for constructing the confusion matrix shown in table 7. The architecture described above was required for the five channel classification. To understand how the DNN scales with the number of channels, we tried a signal vs. background discrimination for y 2 b vs. Zh as described in section 3.2. To achieve accuracies comparable to the BDT implementation a DNN with 2 hidden layers with 16 nodes each was necessary. The rest of the architecture was the same as the larger DNN described above.
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.