Learning to pinpoint effective operators at the LHC: a study of the $t\bar{t}b\bar{b}$ signature

In the context of the Standard Model effective field theory (SMEFT), we study the LHC sensitivity to four fermion operators involving heavy quarks by employing cross section measurements in the $t\bar{t}b\bar{b}$ final state. Starting from the measurement of total rates, we progressively exploit kinematical information and machine learning techniques to optimize the projected sensitivity at the end of Run III. Indeed, in final states with high multiplicity containing inter-correlated kinematical information, multi-variate methods provide a robust way of isolating the regions of phase space where the SMEFT contribution is enhanced. We also show that training for multiple output classes allows for the discrimination between operators mediating the production of tops in different helicity states. Our projected sensitivities not only constrain a host of new directions in the SMEFT parameter space but also improve on existing limits demonstrating that, on one hand, $t\bar{t}b\bar{b}$ production is an indispensable component in a future global fit for top quark interactions in the SMEFT, and on the other, multi-class machine learning algorithms can be a valuable tool for interpreting LHC data in this framework.


Introduction
The lack of evidence for signatures of new physics at the Large Hadron Collider (LHC) has led to an increased interest in the Standard Model Effective Field Theory (SMEFT) as a model-independent approach to interpret the existing experimental measurement in the context of physics Beyond the Standard Model (BSM). The main phenomenological consequences of the presence of SMEFT operators involve heightened energy dependence and modified kinematics in Standard Model (SM) processes. It is therefore important to go beyond inclusive measurements and access the full kinematical information available in a given final state. Machine learning classifiers are well suited to the task of discriminating between SM and SMEFT effects, particularly with increasing final state multiplicity and complexity in which a great deal of inter-correlated kinematical information is present. Furthermore, the possibility of multi-class discriminants, lends itself very well to the intrinsically large parameter space of the SMEFT in which many operators can contribute to a given final state. One of the aims of our study is to quantify the potential of these methods not only to optimally constrain SMEFT operators but also to distinguish operators amongst themselves in order to more accurately pinpoint the origin of an observed deviation in the parameter space.
We focus our investigation on SMEFT operators that contribute to top pair production in association with two b-jets. Indeed, the top-quark sector provides an interesting place to search for deviations from the SM, given the relatively large production rate of tops at the LHC. Additionally, the large mass of the top is often considered as a motivation to expect BSM physics to be connected to the top quark itself. The large Yukawa coupling of the top quark makes it an ideal probe of the Higgs sector and therefore the mechanism behind electroweak symmetry breaking.
Moreover, the production of top quark pairs with additional heavy-flavour jet activity is an active field of research for the CMS and ATLAS experiments and forms part of a rich top physics programme at the LHC. More precisely the production of two top quarks in association with two bottom quarks has recently gained traction as an important background for ttH (H → bb) analyses which have recently contributed to the discovery of this particular Higgs boson production mode [1,2]. ttbb production has therefore long been investigated by CMS at 8 TeV [3] and 13 TeV [4] as well as by the ATLAS experiment at 7 TeV [5] and 8 TeV [6]. These analyses have however not yet received a lot of attention in terms of BSM interpretations.
In this work we will present the possible reach of the ttbb process at 13 TeV centreof-mass energy to a set of four-heavy-quark EFT operators of dimension six. The process provides sensitivity to previously unconstrained directions in the SMEFT parameter space. We begin with a discussion on the sensitivity that can be achieved with the current inclusive 13 TeV CMS measurement [4] with 2.3 fb −1 , as well as a projection to 300 fb −1 . Afterwards, we estimate the improvement in sensitivity obtained when exploiting the kinematical information contained in this multi-body final state. Our analysis relies on the sizeable statistics that will be available in this process at the end of Run III. In particular, we explore the use of machine learning methods to improve the LHC reach. We show that these techniques optimally combine kinematical properties to select the region in phase space that is enriched in SMEFT contributions. Moreover, by exploiting a multiclass shallow neural network we can additionally distinguish amongst different classes of SMEFT operators. This leads to improved performance in the presence of more than one SMEFT contribution, by focusing on phase space regions preferred by each class of operators. Our results suggest that the methods we explore could be beneficial for generic SMEFT interpretation beyond the final state that we consider.
The paper is structured as follows: In Section 2, we identify the relevant four-fermion operators involving heavy quarks that contribute to ttbb and discuss the complementarity provided with respect to four top production. We further discuss the validity/perturbativity of the EFT expansion concerning this process by considering some explicit power-counting schemes and ultra-violet completions. In Section 3 we outline the analysis strategy, including details on the sample generation, detector simulation, event selection and the statistical procedure used when deriving limits on the Wilson coefficients. In Section 4 the current CMS result will be used to constrain the EFT operators one at a time based purely on their effect on the cross section in the fiducial detector volume. Also prospects for an integrated luminosity of 300 fb −1 will be presented. In Section 5 we go beyond inclusive level by cutting on a single observable to enhance the EFT contributions. Section 6 is devoted to exploring the use of machine learning algorithms to search for an EFT enriched phase space. Using the latter approach we also demonstrate the benefits of using multi-class neural networks to discriminate between different classes of operators. Also the possible advantages of template fitting methods on the network outputs are demonstrated. Finally, we summarize the constraints as well as the prospects in Section 7 and formulate a conclusion.
2 ttbb in the SMEFT and its virtues 2

.1 Four-fermion operators for ttbb
In the construction of an EFT one typically extends the SM Lagrangian with operators of dimension larger than four [7][8][9]. Since dimension five operators only generate baryon or lepton number violating couplings, the first extension happens with the addition of dimension six effective operators that are suppressed by the square of an energy scale Λ, as expressed in Equation (1) where C i is the Wilson coefficient corresponding to the EFT operator O i .
Assuming flavor universality, a total of 59 independent operators are present [10].
The most important feature that makes the ttbb process different than many others, is its capability of exploring new contact interactions among the third-generation quarks. The study of the corresponding operators is well motivated in non-flavour-universal scenarios, where couplings to the third generation could be enhanced. Randall-Sundrum models of a warped extra dimension could be one example, see e.g. Refs. [11,12]. It is also natural in models addressing naturalness of the electroweak symmetry breaking scale where the third generation typically plays a special role (see e.g. [13] for composite Higgs models). In the SMEFT approach, to focus on this class of operators in a model-independent way, it is sensible to introduce some flavor assumptions such that we can single out the coefficients of operators involving the t-and b-quarks.
Inspired by Minimal Flavor Violation [14], we impose a U (2) q × U (2) u × U (2) d flavor symmetry in the light quark sector. Accordingly, four-quark operators composed of vector currents break down into three sub-classes involving either four-light, two-heavy two-light or four-heavy quarks each with an independent Wilson coefficient and a U (2) flavour symmetry among the first two generations where present (see [15] for a comprehensive review). This also permits scalar current operators only among the third generation quarks. In this work we focus on the operators in the four-heavy class. The primary reason for this is that the two-heavy two-light operators contribute to tt and bb production via the qq initial state. Precise measurements of top pair production already constrain the qqtt operators quite well [16] and some additional sensitivity is also gained by including four-top measurements [17]. Consequently, we do not expect to gain further information from ttbb. The qqbb operators can be constrained by differential dijet cross section measurements in, e.g., [18]. Such analyses do not make use of any b-tagging information and are therefore completely blind to jet flavour. From this analysis, a bound of the order 0.03 TeV −2 can be inferred on this kind of operator which lies well beyond the expected sensitivity of our study.
There are 12 independent 4-fermion operators involving only heavy quarks: Q represents the left-handed SU(2) doublet of third generation quarks (top and bottom), t and b represent the right-handed top and bottom quarks, T A denotes the SU (3) generators and ε is the totally antisymmetric Levi-Civita tensor in SU (2)-space. We additionally specify whether each operator contains ttbb and tttt interactions. It should be noted that a subset of the color singlet operators appearing in Eq. (2a)-(2l) have been indirectly constrained through RG induced contributions to electroweak precision observables [19]. Our study presents the first direct constraints on the full set of four heavy quark operators containing ttbb interactions. The Wilson coefficient corresponding to each of these operators as they appear in the Lagrangian will be denoted by replacing the O in the name of the operator by a C. We absorb the 1/Λ 2 factor into the definition of the Wilson coefficients and assume Λ = 1 TeV throughout this work. The dependence of the ttbb cross section on the Wilson coefficients, in general, forms a 10-dimensional quadratic function in this Wilson coefficient space Both the SM and EFT contributions to this process are predominantly mediated by the gg → ttbb subprocess. The dominant Feynman diagrams involving a single insertion of the EFT vertex are shown in Figure 1.  Figure 1: Dominant EFT contributions to ttbb production.

Complementarity to four top production
Out of the 12 operators in Equations (2), those that contain a tttt component can also be constrained by four top production processes, for example in References [17,20].
where the first term in O 8 QQ has a color-singlet structure because it has been Fierzed. As a result, in four-top production only one combination of the two operator coefficients is probed. In contrast, in ttbb production both degrees of freedom are probed independently, because the ttbb terms in O 1 QQ and in O 8 QQ have different color structures. This lifts the flat direction C 8 QQ = −3C 1 QQ in the C 1 QQ -C 8 QQ plane, left from the four-top measurements. The underlying reason is that, while the color singlet and octet structures for a t L t L t L t L interaction term are equivalent due to Fierz identity, it is not the case for   [17] from 4-top production. These are derived assuming an upper limit of the signal strength, µ < 1.87 is obtainable at the LHC with 300 fb −1 , as estimated in Ref. [21]. The limits are reported as a function of an upper bound on the total invariant mass of the events, M cut . The last column compares these intervals to best projections from ttbb production obtained in our work.
interaction. The projected individual LHC sensitivities with 300 fb −1 from four-top production on the operators it shares with ttbb, translated from Ref. [17] are summarised in Table 1. The final column represents the best sensitivities obtained from our ttbb study for comparison. One of the interesting points of this study will be to compare the sensitivity of ttbb to the existing and future limits from four top. In this context, one major difference between the two processes is the comparative rarity of four top production. In 13 TeV pp collisions, its cross section is of order 9 fb, compared to the ∼3 pb prediction for ttbb. The limited statistics of four top measurements at the LHC will most likely mean that it will only ever be measured at inclusive level for the foreseeable future. ttbb production does not suffer from this and the methods developed in this paper are designed to exploit the sufficiently large statistics present in 300 fb −1 of integrated luminosity to enhance the relative sensitivity to the EFT parameter space. As mentioned, operators (2c), (2d), (2i)-(2l) have never been directly constrained before and we obtain sensitivity to them in the ttbb topology. In summary, the EFT interpretation of ttbb measurements at the LHC presents the following advantages: • A sufficiently large inclusive cross section that allows for the use of differential information after 300 fb −1 of integrated luminosity.
• It directly constrains 6 four heavy quark operators for the first time.
• It breaks the degeneracy in a blind direction of the parameter space with respect to four top measurements.

EFT validity, power-counting and UV connection
The most basic requirement to satisfy when interpreting a particular measurement in the SMEFT framework is to ensure that one is probing scales below the cutoff of the theory. Above this scale, one expects resonant physics to appear which cannot be captured by the EFT description. For this reason, we introduce a new mass scale M cut and impose that all energy scales that can be associated to the production of ttbb be less than this value. Moreover we require that this new scale is smaller than the scale of new physics Λ N P , i.e.
This will be the validity criterion of the EFT. However, from the low-energy perspective one does not have access to the information on the precise scale of the new resonances. Equivalently, the scale of new physics is degenerate with the value of Wilson coefficient, as a measurement can only put limits on C, whose definition has absorbed a factor of the cutoff. Hence, in order to make quantitative statements on the EFT validity, one has to make some assumptions on the power counting rule of SMEFT [22]. Under certain conditions, this power counting can also be used to justify the truncation of the EFT expansion at dim-6, as we will discuss. From now on we assume that there is one single BSM coupling denoted g * and that the EFT operators can be expanded in terms of the following building blocks From this power-counting prescription, we see that the four-fermion operator coefficients are of order C = g 2 * /Λ 2 N P . The validity of the EFT description in Eq. (7) can then be rewritten as |C i |M 2 cut g 2 * , which is minimally conservative when g * takes its largest value g * ∼ 4π, i.e.
In this limit, such condition is equivalent to the model independent requirement of quantum perturbativity in the EFT [15]. The latter is examined through the contributions involving more and more operator insertions with higher and higher numbers of loops. The convergence of the series requires that C i M 2 cut /(4π) 2 be less than a constant, which should be roughly of order one. In the following sections we will make use of condition (9) to identify an appropriate value for the upper bound on the energy scale of the process M cut . The experimental sensitivities on C i that we will eventually find in our analysis (see Figure 15) are more stringent than (9) and hence will be in a valid regime assuming the simple power counting of (8). On the other hand, they will typically correspond to strong coupling values for g * .
With the power counting introduced in Eq. (8), one can also investigate the relative size of interference and quadratic terms in dim-6 and dim-8 SMEFT operators 1 . In particular, we will argue that the EFT truncation at dimension six can be consistently performed even if g * is large enough to compensate for the energy suppression M cut /Λ N P . Indeed, analyzing the EFT contributions to the relevant process at the amplitude level, one observes that the EFT dim-6 quadratic contributions can be as large as the dim-6 interference with the SM, while any further contribution at and beyond dim-8 can be neglected.
Concretely, using the power-counting prescription in Eq. (8) we estimate that the dim-6 interference and quadratic terms in gg → ttbb production are dim-6 interference: dim-6 quadratic term: where E is the largest energy scale characterizing the process and can be at most E ∼ M cut in our analysis. The two terms in (10) and (11) could have similar size, and the quadratic terms could even dominate over the interference, if g * is large enough such that (g * /g s ) 2 E 2 /Λ 2 N P 1. This is actually the relevant regime for the typical constraints that we will find in this work on the Wilson coefficients (see again Figure 15), meaning that both terms should be included.
On the other hand, the dim-8 interference is subleading, even though it is suppressed by the same power of Λ N P as the dim-6 squared terms. A dim-8 four-fermion operator would have the schematic form f f f f D 2 , and, according Eq. (8), a coefficient of order g 2 * /Λ 4 N P , which is not enhanced by higher powers of g * . This gives a contribution of order dim-8 interference: which is subleading compared to the dim-6 interference in Eq. (10), as far as M cut is below the scale Λ N P , i.e. the validity criterion. The dim-8 interference contribution is also definitely subleading with respect to the dim-6 quadratic contributions (11) as soon as g * > g s . Apart from four-fermion operators, a general dim-8 operator could involve more fields and thus have more powers of g * in its coefficient. For the process of interest, the relevant operators are those that lead to contact gttbb and ggttbb interactions, and should have the schematic forms f f f f G µν and f f f f D µ D ν . Note that in Eq. (8), the coupling that comes with G µν is g s , instead of g * . This is of course a model-dependent assumption, but seems natural, as the coupling between a gauge boson and a BSM particle is likely to be its own gauge coupling. Based on this assumption, the coefficients of the operators f f f f G µν and f f f f D µ D ν are of the order g 2 * g s /Λ 4 N P and g 2 * /Λ 4 N P respectively, and thus their interference contributions to the gg → ttbb amplitude, from either gttbb or ggttbb vertices, are the same as Eq. (12), and also subleading to Eq. (10), again if the validity criterion is satisfied.
In summary, we assume that the operators obey the power-counting depicted in Eq. (8), apply the analysis cut M cut < Λ N P to ensure EFT validity, and truncate the EFT expansion at dim-6 at the amplitude level, which amounts to include the dim-6 quadratic contribution while neglecting dim-8 operators and beyond. This corresponds to including the contributions with sizes given in Eqs. (10) and (11), and neglecting any additional contributions suppressed by E 2 /Λ 2 N P < M 2 cut /Λ 2 N P < 1. In Appendix A we give a concrete BSM example with a strongly coupled new particle, and derive its relevant operators in gg → ttbb, to illustrate that the above power counting assumption is satisfied and that our strategy would indeed capture the dominant BSM contributions.

Analysis strategy Simulation
We begin by describing our signal sample generation for the ttbb process that will be used through our sensitivity study. We obtain our signal samples using the Universal Feyn-Rules Output (UFO) model dim6top [15] that includes both the SM and the four heavy quark operators of Equations (2a)-(2l). We also validate our generation with an independent implementation of the same operators using the FeynRules package [25]. The ttbb final state in which both top quarks decay leptonically, is simulated at LO in the four-flavour scheme 2 from proton-proton collisions at 13 TeV center-of-mass energy us-ing MadGraph5 aMC@NLO 2.6.0 [27] (MG5 aMC@NLO). The so-called "visible" phase space as quoted in the CMS measurement of the ttbb cross section is mimicked as closely as possible by requiring the two charged leptons (electrons or muons) to have transverse momentum (p T ) > 20 GeV and pseudorapidity (η) between −2.4 and 2.4, and the four particle-level b-jets to satisfy p T > 20 GeV and |η| < 2.5. The angular separation 3 in ∆R between different jets or between jets and leptons is required to be larger than 0.5. Where necessary, parton shower/hadronisation is simulated with Pythia8 [28] and object reconstruction is modelled with the Delphes [29] detector simulation software, using the default CMS card.

Event reconstruction
The parton level, "visible" phase space prediction will be compared to the CMS inclusive measurement and future prospects for 300 fb −1 will be estimated. We will then progressively refine the selection procedure, assuming 300 fb −1 of LHC data, in order to increase the sensitivity to the operators. This however implies that one has to step away from the unfolded cross section to the fiducial detector volume, and instead impose further selection requirements on the reconstructed objects. We impose an event selection following as closely as possible the CMS analysis [4]. Each event must have two reconstructed, isolated leptons (electrons or muons) with p T > 20 GeV and |η| < 2.4, which are arbitrarily assigned the labels 1 and 2 . Missing transverse energy has to be larger than 30 GeV. At least four jets must be present with p T > 30 GeV and |η| < 2.5, of which at least two are b-tagged.
In the CMS analysis the jets with the highest b-tagging discriminator are identified as the b-jets from the top quark decay. This information is however not available in Delphes which merely parametrises the b-tagging efficiency. Instead, out of the four highest-p T jets in the event, the one closest in ∆R to 1 is assigned the label b 1 and is considered to be the b-jet associated to the top quark decaying into 1 and b 1 and the same association is applied to identify b 2 associated to 2 . Finally the two remaining jets are ordered by decreasing p T and then assigned labels add 1 and add 2 . Although this assignment is not perfect, one can expect that the higher p T b-jets from this process (typically around 100 GeV) are those most likely to be b-tagged efficiently and therefore return a higher b-tag discriminant. This is reflected in the b-tagging efficiency parametrisations of Reference [30] which show an increase with p T up to around 300 GeV. We assume here that this procedure can be improved by the end Run III of the LHC.
is able to select a pure ttjj sample, from which the ttbb component can be extracted using the b-tag discriminant information as in the original inclusive measurement. This relies on the expectation that both the presence of the EFT operators and the restriction of the selected phase space do not significantly affect the b-tagging discriminants of the additional jets. The verification of this assumption requires detailed detector simulation and lies well beyond the scope of our study.
We obtain a reconstruction and selection efficiency from Delphes that is roughly a factor of two smaller than the one quoted by CMS. This is mostly due to the parametrised lepton reconstruction and isolation requirements as well as the jet reconstruction. It should be noted that the definition of the visible phase space by CMS includes the presence of at least four particle-level jets (clustered from generated particles rather than reconstructed objects), whereas our fiducial phase space prediction does not include parton-shower and jet clustering effects. Nevertheless, this should not affect the results of our analysis as long as the acceptance and efficiency of the event selection are the same for SM and EFT contributions. It has indeed been checked that these are the same up to Monte Carlo statistical uncertainties. We therefore identify the parton-level predictions with the visible phase space measurement of CMS and safely use the outlined event selection without biasing the results of this study.

Sensitivity analysis
Having obtained reconstructed samples that should be similar to those obtained by current and future analyses in this final state, we proceed to estimate the sensitivity of various selection methods to the Wilson coefficients, one at a time. We first consider a cut on the invariant mass of the 4 b-jets in the final state (M 4b ), which should access the energy growth of the EFT contributions. Next, we construct a multi-class discriminant using a shallow neural network, trained to identify classes corresponding to the SM point, left-handed top EFT operators and right-handed top EFT operators. The discriminant should draw from the full 20-dimensional phase space of the 8-body final state and learn to distinguish samples with different top helicities in the final state through the angular correlations among the top decay products. The sensitivities are first evaluated by requiring a lower threshold on the value of the discriminant. Additionally we also evaluate the sensitivity by performing a template fit to the full discriminant distribution. Finally, we compare the constraining power of the different methods in a two dimensional parameter space spanned by a pair of left and right handed operators.
The evaluation of the sensitivity generally proceeds via the same general method. We first construct the functional dependence of a observable O on each Wilson coefficient in Equation (2), one at a time, according to where O f it is the total observed value of the observable, O SM is the SM prediction, C i is the value of the Wilson coefficient and p i (i ∈ [1,2]) are parameters to be determined. p 1 signifies the fractional importance of the interference of the EFT with the SM and p 2 represents the fractional EFT squared contribution to the observable at quadratic order in the Wilson coefficient of the EFT operator. The observable, O, may be a cross-section or a number of events given a certain integrated luminosity observed in a signal region or extracted from a template fit. Taking the experimentally measured value or assuming the SM prediction is observed in future projections and combining statistical and estimated systematic uncertainties in quadrature, we construct a ∆χ 2 where O f it and O obs are the predicted and observed observables, δO is the uncertainty on the observable and χ 2 min is the minimum value of the χ 2 function in the EFT parameter space. Typically the uncertainty is composed of statistical and systematic uncertainties at the LHC as well as some MC statistical uncertainties but not theory uncertainties. The 95% CL sensitivity interval on the individual Wilson coefficients C i is then determined by the region in which the χ 2 value is lower than 3.84, corresponding to a p-value of 0.05 for a χ 2 distribution with 1 degree of freedom in the Gaussian limit. Only in Section 6.3, where two EFT operators are allowed with non-zero Wilson coefficients simultaneously, the number of degrees of freedom is augmented to 2, with a corresponding threshold of 5.991 for the same p-value.

Cross section in the fiducial detector volume
The values for the coefficients of the fitted visible cross section for all the operators can be found in Figure 2. One immediately observes a trend between color singlet operators and color octet operators for the values of p 1 and p 2 . As expected, the singlet operators have comparatively small interference with the SM and their contribution to the cross section i ). In this notation σ SM represents the SM cross section, p 1 signifies the fractional importance of the interference of the EFT with the SM and p 2 represents the fractional pure EFT contribution to the cross section at quadratic order in the coupling strength of the EFT operator.
is dominated by the squared order in the Wilson coefficient. We observe a preferential interference of the singlets operators with opposite top and bottom helicities (O 1 Qb and O 1 Qt ) while those that mediate same-helicity ttbb configurations (O 1 QQ and O 1 tb ) are suppressed. The color octet operators, however, clearly have a stronger interference with the SM because the SM processes leading to a ttbb final state are dominantly mediated by QCD. Their quadratic contribution to the cross section is smaller compared to the color singlet operators, which can be explained by a relative color factor of 2/9 in the EFT vertex, consistent with the observed values of p 2 between the two types of operators. This factor does not apply to the scalar current operators, due to the fact that they contain both bttb and ttbb whose interference contribution will have a different color factor. The interference of these operators is m b suppressed. In any case, the interference terms only dominate the squared term for color octet operators when the Wilson coefficients are below roughly 3.3 [TeV −2 ], which is below the sensitivity that can be achieved with this measurement, meaning that the squared order contributions dominate the limits in all cases.
The measured ttbb cross section from proton-proton collisions at 13 TeV by CMS is found to be σ ttbb,CM S = 0.088 ± 0.012 (stat.) ± 0.029 (syst.) pb. The LO computation of the SM ttbb cross section with MG5 aMC@NLO in the visible phase space defined above  yields a value of 78 fb. This is comparable to the NLO prediction with Powheg [31][32][33] of 70 ± 9 fb, as quoted in the CMS measurement and within the uncertainties of the CMS measurement. The total uncertainty is obtained by adding the statistical and systematical uncertainty in quadrature and is taken to be δ ttbb,CM S = 0.031 pb (or 35%). Indicative results for the sensitivity to C 1 Qb and C 8 Qb are shown in Figure 3, where in the top panel the red band shows the fitted cross section to the sample points with uncertainties (the fitted function is also quoted in red on top of the Figure). The light brown band represents the CMS measurement with uncertainties. In the bottom panel the full line is the resulting χ 2 as a function of the Wilson coefficient and the light brown band shows the 95% CL interval for the corresponding Wilson coefficient. The minima of the χ 2 are not centred at 0, indicating the fact that the cross section obtained by the calculation of MG5 aMC@NLO at leading order is slightly below the measured value by CMS, but still well within the uncertainty of the measurement.
Overall, both the linear and quadratic EFT contributions to the cross section stay below the percent level. We obtain limits of around [-14,14] (TeV −2 ) and [-30,26] (TeV −2 ) for the example color singlet and octet operators respectively. To assess the sensitivity in the future when 300 fb −1 of integrated luminosity is available, we assume a total uncertainty on the measured cross section of 10% and assume that the SM prediction is observed. We deem this to be a realistic degree of improvement based on the fact that the current systematic uncertainty is dominated by the b-tagging scale factors. These uncertainties have already shown a multi-fold improvement between b-tagging studies performed with 2.6 fb −1 [30] and 36.1 fb −1 [34]. The next most important source of uncertainty comes from theoretical modelling and is of the order of 17%. Given the importance of this final state in the context of Higgs physics it is reasonable to expect that these uncertainties will be significantly reduced by the end of Run III. The projected sensitivities are shown with the dark brown bands and the dotted lines in Figure 3, improving the limits to around [-6,6] (TeV −2 ) and [-14,11] (TeV −2 ) respectively. The limits on all Wilson coefficients are summarized in the left panel of Figure 15 (black and red lines for 2.3 fb −1 and 300 fb −1 respectively) in the final section. Now we address whether or not these limits (with the current sensitivity or the prospects for the future) are within the perturbative regime. As discussed in Section 2, we apply a cut on all energy (or mass) scales that appear in the events 4 . Lowering the value of M cut enlarges the range of new physics mass scales compatible with EFT validity criterion at the price of a reduced sensitivity. This is illustrated in Figure 4, where the non-valid region defined by Eq. (9) is indicated by the light pink shaded area as a function of the value of M cut , together with the limits on the Wilson coefficient C 1 Qb (left) and C 8 Qb (right), both for the measurement at 2.3 fb −1 (full black line) and for the prospects at 300 fb −1 (dashed black line). The limits are almost insensitive to value of M cut down to 1.5 TeV and we can therefore safely fix it to 2 TeV throughout the rest of this study. This way, almost no sensitivity is lost and the resulting limits on the Wilson coefficients (even for the color octet operators) stay well within the perturbative regime. The dark pink shaded region indicates a more stringent perturbativity requirement on |C i |M 2 cut (4π) 2 than in Eq. (9). The specific threshold has been chosen such that the edge of the new valid region intersects the projected upper limit for M cut =2 TeV (at 300 fb −1 ). This provides a conservative estimate of the perturbative uncertainty of the EFT predictions at the edge of our sensitivity. Finally in Figure 5, the normalised distributions of the scalar sum of the transverse momentum of all visible objects, H T , in the final state is shown comparing the SM contributions (black), with those of the O 1 Qb operator with C 1 Qb fixed at 10 TeV −2 (blue) and 20 TeV −2 (red) respectively. This is a representative variable for the typical energy scale of the ttbb events and indeed we see that only a small fraction of the events are present above H T = 2 TeV.   Figure 4: Limits at 95% CL on C 1 Qb (left) and C 8 Qb (right) as a function of the mass cut M cut for an integrated luminosity of 2.3 fb −1 (full line) and projections to 300 fb −1 (dashed line). The non-perturbative regime of the EFT in which |C i |M 2 cut > (4π) 2 is indicated with the light pink shaded region. The darker red region represents a more stringent perturbativity requirement for which the upper limit on the Wilson coefficient (at 300 fb −1 ) intersects the perturbativity threshold at M cut = 2 TeV.

Tailoring the kinematical phase space
In order to optimise the sensitivity of our process to the operators of interest, we go beyond inclusive level and consider observables with and enhanced dependence on the presence of EFT operators. A first step is to select a part of the phase space in which the EFT contributions are more abundant relative to the SM ones, as first proposed for this process in Ref. [35]. After the full event selection outlined in Section 3 is applied, we define a set of reconstructed variables and identify those that show a clear difference in shape between the SM and EFT operators. Different such quantities were tested, including the transverse momenta, invariant masses and ∆R separation between final state objects. These variables are summarised in Table 3 of Appendix B. The invariant mass of the 4 b-jets in the final state (M 4b ) was found to be one of the most discriminating variables and is illustrated in Figure 6, comparing the shape of the SM (black) prediction to that of the O 1 Qb operator with the Wilson coefficient fixed at 10 TeV −2 (blue) and 20 TeV −2 (red). Note that the the tail of this distribution beyond 2 TeV is never included in our analysis due to the validity requirement on M cut . We define a signal cross section by applying the selection M 4b > M sel 4b = 1.1 TeV, chosen to maximise the sensitivity to the Wilson coefficients, as shown in Figure 7 for C 1 Qb and C 8 Qb .
[TeV] After the reconstruction and the application of the above mentioned event selections (including M 4b > 1.1 TeV), we determine the functional dependence of the effective cross section on the value of the Wilson coefficients and the resulting 95% CL interval on those coefficients. This is illustrated again for C 1 Qb and C 8 Qb in Figure 8, from which it can be    Figure 15 (blue) on the right in the final Section. The sensitivity improves by around a factor of two compared to the unfolded cross section interpretation.

Learning the effective operators
Instead of selecting a favourable part of the phase space based on one variable, one can use machine learning algorithms to optimally select a part of this higher-dimensional phase space. In this work we will demonstrate how a simple neural network (NN) can combine the information from a set of kinematical properties of the final states to separate SM events from those including an insertion of an EFT operator 5 . In the last part we will also demonstrate that by using multi-class outputs of the neural network, we are additionally able to distinguish amongst different classes of operators. This will be shown to be especially beneficial in cases where more than one Wilson coefficient is switched on.

Neural network design
To illustrate the method, we defined a set of 18 kinematical variables, consisting of transverse momenta of the final-state particles, invariant masses (m inv ) of combinations of two, four or even all six of these final-state particles and angular separations in ∆R between combinations of two particles. The list of variables can be found in Table 3 of Appendix B, and includes the invariant mass of the four b-jets used in Section 5. These are fed as input to a shallow neural network with one hidden layer, containing 50 neurons and 3 output classes. The outputs represent the probabilities (P) of an event belonging to one of the following three categories: a Standard Model event (SM), an event from an EFT operator with a left-handed top quark (t L ) current and an event from an EFT operator with a right-handed top quark (t R ) current. This means that the training is performed only on the squared order contributions from the EFT operators, neglecting possible interference effects at this stage. The full parametric dependence is, of course, included in the LHC signal samples on which the discriminant is evaluated. A proper treatment of the interference during training would ideally require a parametrized learning approach as the signal shapes would no longer be independent of the value of the Wilson coefficient. We leave for future work this interesting possibility which may improve sensitivity to certain regions of parameter space. The choice of splitting the EFT output class into two separate contributions is motivated by the fact that we expect to see differences between the kine-matics of the decay products of left-handed and right-handed top quarks. For example, the W bosons from right-handed top quark decays give a harder leptonic p T spectrum compared to those from left-handed top quarks. The complete set of distributions of the input variables, comparing the three categories are shown in Appendix B and suggest that a considerable amount of information is present that could be used to distinguish them. This will allow us to demonstrate that the network can not only identify events including an insertion of an EFT operator, but can additionally identify the nature of the EFT operator itself. The neural network was implemented with the Keras [36] software using the TensorFlow [37] backend. For more information on the neural network architecture and training, the reader is referred to Appendix B.
From the combination of the three outputs of the network, different observables can be constructed, each targeting a specific discrimination between two categories. The different options used in this work are summarized in Table 2. In case only one operator is considered at a time, as will be discussed in Section 6.2, the combined NN output is constructed to optimally separate SM events from EFT operators in its category (upper two rows of Table 2). However, when more than one operator is allowed to vary simultaneously, and contributions from both the t L and t R categories are present as will be discussed in Section 6.3, a combination of two observables is used (bottom rows of Table 2). By adding the output probabilities of the left-handed and right-handed top quark EFT outputs (P (t L ) + P (t R )) one obtains a good discrimination between SM events and EFT events in general. This is illustrated in the top of Figure 9, where on the left the normalized distributions of this combined output are shown for SM events in red and for events with a single insertion of an EFT operator in black. The corresponding receiver operating curve (ROC) is shown on the top right, showing on the x-axis the efficiency of selecting events with an insertion of an EFT operator and on the y-axis the selection efficiency for selecting a pure SM event.
Similarly, an observable can be constructed to distinguish between the second and third category, namely between events from operators containing t L or t R currents. This variable is defined as P (t L )+P (t R ) and is displayed in Figure 9 on the bottom left, together with the ROC curve on the bottom right. It can be clearly seen that the network has learned to differentiate between these two classes. We will use this distinction further along in Section 6.3 to illustrate a method which improves sensitivity when two Wilson coefficients are allowed to be non-zero at a time.

Desired Discrimination
Combined NN Output used for limits

Network predictions for individual operators
Selecting on the NN output Constraints on the Wilson coefficients are presented using the combined outputs of the network defined in the first two rows of Table 2 (depending on the chirality of the top quark 6 in the operator). By selecting only events for which this value is larger than 0.83 (again chosen to optimise the constraints as shown in Figure 10 Template fits to the NN output One can further use the separation power of the neural network by analysing the shape information of its outputs. The fully differential outputs can be used in a binned likelihood 6 For the scalar operators O 1 QtQb and O 8 QtQb , where both left-and right-handed top quarks are involved, the choice was made to assign them to the t R category. This was motivated by the fact that the distributions of the kinematical variables show more similarity to this category. fit of the data to some predefined templates for the different categories of events. The advantage of such a fit lies in the fact that the relative normalization of the different categories can be deduced from the region in phase space where that category is dominantly abundant. This reduces the systematic uncertainty related to the normalization of the measured SM cross section and may, in practise, improve the limits that can be obtained on the Wilson coefficients. To illustrate the strength of such a fit, template histograms (T 1D ) are defined for the three categories such that the NN outputs for a general point in our EFT parameter space can be parametrised as functions of the event yields for the different event categories (N SM and N L or N R )   Figure 11: Limits at 95% CL on C 1 Qb (left) and C 8 Qb (right) after requiring the network output to be above 0.83 and assuming an integrated luminosity of 300 fb −1 .
These yields normalize each template (T 1D SM , T 1D L and T 1D R ) and are extracted by fitting to data. The RooFit package [38] incorporated in the ROOT data analysis framework [39] was used to perform the fit of pseudo-data to Eqs. (16) and (17), generated for different values of the Wilson coefficients assuming 300 fb −1 of integrated luminosity. The fitted yields are used as described in Section 3 to obtain limits on the individual Wilson coefficients, which are summarised in Figure 15 (brown lines). This shows that a similar sensitivity can be achieved with this method.

Combined limits for two non-zero EFT operators
We finally illustrate the strength of the multi-class output structure of the network, which becomes apparent when both EFT operators with a t L current and with a t R current are given non-zero Wilson coefficients at the same time. We illustrate this with an example using events generated with both C 1 Qb and C 1 tb non-zero. To visualize the separation potential of the neural network between the three classes, Figure 12 shows how the different classes are distributed in the plane of the combined neural network outputs outlined in the last two rows of Table 2. The x-axis represents the summed probability P (t L ) + P (t R ) that is able to separate the SM events (red) from any kind of event that includes the insertion of an EFT operator. On the y-axis, the normalized probability is displayed, designed to distinguish between the t L (green) and the t R (blue) categories. These distributions show a clear concentration of SM events to the left, whereas the t L and t R contributions dominantly populate the upper and lower right hand corners, respectively. We therefore define two signal regions (SR1) and (SR2) as delimited in Figure 12.
Adopting a similar strategy to the individual operator case, we can make a single selection on P (t L ) + P (t R ) asking this value to be larger than 0.83. The observed cross section is now fitted according to the generalised function for two simultaneously non-zero Wilson coefficients. Under the assumption of observing the SM, this yields a two-dimensional contour of the 95% CL limit on the Wilson coefficients C 1 Qb and C 1 tb as shown by the full red line in Figure 13 on the left. When the limits are obtained additionally selecting SR1(SR2) separately, one becomes more sensitive to C 1 Qb (C 1 tb ), as indicated by the green (blue) contours. By combining these two signal regions (red dashed contour), an increased sensitivity is observed compared to the one obtained by the one-dimensional selection on P (t L ) + P (t R ).
More interesting observations can be made in the case of a potential discovery of new physics. Under the hypothesis of observing an EFT signal, this strategy can help in the determination of which type of operators are involved. To illustrate this effect, we inject a benchmark signal with C 1 Qb = 5 TeV −2 and C 1 tb = 3 TeV −2 into our pesudo-data. The 2D limit obtained at 95% CL by the one-dimensional selection on P (t L ) + P (t R ) is shown  Table 2. The dashed lines define SR1 and SR2. See text for more details.
in red in Figure 13 on the right. The shape of the contour shows a symmetry around the central point (0,0), indicating that this selection is insensitive to the sign of the Wilson coefficient as well as to relative contribution of each operator. However, a combination of confidence intervals obtained in SR1 and SR2 (dashed red) is able to reduce the best fit region. It excludes at 95% CL a value of 0 TeV −2 for C 1 Qb , which was not possible with the one-dimensional selection.
The use of template fitting methods becomes even more interesting in this two dimensional example. A two-dimensional binned maximum likelihood fit to predefined templates (T 2D ) is performed by fitting the function to pseudo-data corresponding to the SM observation and also to the observation of a potential excess as above. A χ 2 value is calculated from the sum of each of the EFT event categories separate (t L and t R ) for each sample point in the parameter space of Wilson  Figure 13: (left) Two-dimensional limits at 95% CL assuming a measurement consistent with the SM-only hypothesis (blue cross) and allowing two couplings, C 1 Qb and C 1 tb to vary simultaneously: (red) one dimensional cut on P (t L ) + P (t R ) output; (green) SR1; (blue) SR2; (red dashed) combination of SR1 and SR2; (black) two dimensional template fit. (right) Same as on the left plot, but for the EFT signal injection hypothesis. See text and Figure 12 for more details.
coefficients. The 95% CL contours of this distribution are shown in black in Figure 13 on the left (SM-only hypothesis) and the right (possible observation of a signal due to EFT operators). In the former case the more rectangular shape of the contour leads to the strongest observed limits in some parts of the parameter space. In the latter case it is clear that the template fitting procedure is able to pinpoint with more precision the values of the Wilson coefficients. By using template fits, a value of 0 (TeV −2 ) for C 1 tb is now also excluded at 95% CL, which was not the case for combined limits in SR1 and SR2. Figure  14 shows the projected distributions of the fitted templates for P (t L ) + P (t R ) on the left and for P (t L ) P (t L )+P (t R ) on the right.

Summary and conclusions
In this work, we present new methods designed to exploit the full kinematical information to interpret Standard Model searches in the SMEFT framework. The high multiplicity and complexity of the final-state, in combination with the possible contributions from multiple effective operators, make machine learning classifiers a promising candidate to maximise our sensitivity. We identify the production of a top-quark pair in association with two b-jets as an interesting process, given its 8-body final state and its dependence on 10 fourquark operators of dimension six. Its production cross section is large enough to provide the required statistics for a differential analysis of the kinematical properties with 300 fb −1 of integrated luminosity. We show that it provides sensitivity to previously unconstrained directions in the SMEFT parameter space and would therefore be an indispensable component in a future global fit for the top-quark interactions. We also present a discussion of various issues concerning the validity and perturbativity of the EFT and its eventual UV completion. We motivate making an upper cut of 2 TeV on all energy scales involved in the process, to provide a measure of control while hardly sacrificing any sensitivity to the operators. Using power-counting arguments, we show that, in the case of a strongly coupled UV completion with coupling g * > g s , the dominant SMEFT contribution arises quadratically in the dimension-6 operators while all others are parametrically suppressed. This is supported by an explicit example of an Axigluon scenario. As a starting point of our phenomenological analysis, we present new individual limits on the Wilson coefficients of four-quark operators involving only third generation quarks, using the latest inclusive measurement of the ttbb final state by the CMS Collaboration (black line, Figure 15 left plot). A fully marginalized result over all 10 operators considered is left for the future when the uncertainty on this measurement is reduced. Our individual projections for 300 fb −1 suggest an improvement by a factor 2-3 as shown in left plot of Figure 15 (red).
To improve the sensitivity, we first select a part of the phase space with a 1.1 TeV cut on the invariant mass of the four b-jets (M 4b ). The expected limits are summarized by the blue lines in Figure 15 on the right and show a further improvement of a factor of ∼2 compared to the unfolded cross section measurement (shown in red for comparison).
Then, in order to combine all the available kinematical information, a shallow neural network is trained to classify events into three categories: events from pure SM contributions, events that include an insertion of an operator with a left-handed top quark and events that include an insertion of an operator with a right-handed top quark. We note here that the network was optimised on the squared terms of the EFT amplitude (neglecting interference effects), which are shown to be the dominant contributions with the sensitivity reached in our study. We leave the interesting possibility of including interference effects in the training phase to future work, as this may increase the sensitivity in specific regions of parameter space. By cutting on the left/right-dedicated NN outputs, summarized in Table 2, the limits on the Wilson coefficients improve mildly with respect to M 4b alone, suggesting that the bulk of the sensitivity is captured by this variable. These results are displayed in Figure 15 by the green lines.
The strength of the multi-class architecture of the NN becomes apparent when multiple operators, with different helicity structures, are considered simultaneously. By selecting a dedicated signal region in the phase space of the 2-dimensional discriminant, one can improve the sensitivity to a specific class of operators. This strategy can even help in pinpointing the strength of the contribution of each class of operators involved in a possible excess. We have shown that the combination of the two dedicated signal regions results in better constraints on the Wilson coefficients. Finally, we performed binned likelihood fits to templates of the higher dimensional discriminant to extract the maximum information available and further increase the sensitivity. These methods can of course be extended to more advanced network architectures in combination with more optimal input variables and larger training datasets to further exploit the power of these machine learning algorithms to constrain the SMEFT [40,41].
Here we have presented one example where the kinematics of top decay are employed to discriminate between two broad classes of EFT operators. A comprehensive exploration of how far such a strategy could be pushed towards a discriminator capable of distinguishing individual SMEFT operators would be extremely interesting. For example, one can infer from the interference patterns shown in Figure 2 that an extension of the training to include interference effects could in principle disentangle operators with singlet and octet color structures.
To conclude, we have presented a detailed investigation of the application of various levels of machine learning classifiers in extracting SMEFT signals in the ttbb final state. This process has shown itself to be a vital component to constrain top EFT interactions. Furthermore, our study serves as a proof of principle that motivates the use of multi-class discriminants in the context of globally constraining the SMEFT at the LHC.   where C A µ is the axigluon field with mass M . The mixing angle is given by where g 1 and g 2 are the coupling strength of the SU (3) L and SU (3) R gauge fields, respectively. The QCD strong coupling is given by Below, we will demonstrate that the power counting assumption of Eq. (8) is satisfied in this model.

Gauge coupling of fermions
The gluon and axigluon couplings to the fermions are given by the covariant derivative: where the couplings are g s = g 1 g 2 The axigluon C µ couples to the fermions with coupling strength g * . In the following we consider the limit s θ 1, where we have g * g s , so the theory is strongly coupled. In this limit, the fermions couple strongly to the heavy axigluons, leading to the g * f Λ 3/2 in the power counting assumption of Eq. (8). Note that under this limit the axial coupling C A ≈ 1.

Gauge coupling of axigluon
The couplings between gluons and axigluons come from the kinetic terms of G 1 and G 2 . In terms of mass eigenstates, we find the following gauge interaction terms This implies that the gluon couples to axigluon with strength g s , not g * . This is exactly what we have argued for the power counting rule for G µν , where the coupling strength for the gGµν Λ 2 term in Eq. (8) is g s instead of g * .

Matching
We now derive the coefficients for the relevant operators, to explicitly show that the assumption in Eq. (8) indeed applies to the matched operator coefficients.

Four-fermion operator
At leading order, the BSM contribution to the ttbb amplitude is given by Figure 16 (a). The corresponding contribution is reproduced by effective operators as in Figure 16 (b). The full amplitude can be expanded: where J Aµ is the top or bottom quark current. The first term can be reproduced by the following dim-6 operator (neglecting SU(2) as it is irrelevant for our purpose): with coefficient while the second term can be reproduced by the following dim-8 operator with coefficient The above coefficients are exactly consistent with what we have expected from Eq. (8), taking Λ N P = M . It implies that dim-8 four-fermion operators will not be enhanced by more powers of g * , relative to dim-6 operators, and thus the truncation of dim-8 operators is well-motivated given that E 2 /M 2 < 1 is ensured by M cut . This is also obvious from Eq. (29), where the validity of the expansion is guaranteed, if s < M 2 . Note that this is independent of the relative size of the dim-6 quadratic and interference terms, which relies on the size of g 2 * .
f f f f DD and f f f f G µν operators We also have to check whether the dim-8 contribution from a contact ggf f f f interaction could be enhanced by more powers of g * . The amplitude in the full theory is given by Figure 16 (c). To reproduce the amplitude we find that two additional operators are needed: Together with O These coefficients are again consistent with the assumption of Eq. (8), and so as we have argued, they all lead to subleading contributions as they are not enhanced by more powers of g * . We have also checked that these three dim-8 operators reproduce the correct gttbb amplitude, as in Figures 16 (e) and (f). Since the axigluons can only contribute through the three one-light-particle-irreducible (1LPI) diagrams, i.e. Figure 16 (a), (c), and (e), up to O(Λ −4 ), we can now conclude that truncating the SMEFT at dim-6 is justified in this model, regardless of the size of g * and the relative size of dim-6 quadratic term and dim-6 interference.
A final remark is that the operator O DD is a redundant one. Its contribution to the gg → ttbb from Figure 16 (d) and from Figure 16 (f) will cancel each other. We include this operator simply to have a diagram-by-diagram matching, i.e. all three one-light-particleirreducible diagrams ttbb+0g, 1g and 2g are matched, which is intuitively more transparent.

B Neural network setup
Training and validation datasets The network was trained on 18 input variables, which are summarized in Table 3. Events are simulated in three classes: events including only SM contributions, events that have a single insertion of an EFT operator with a left-handed top quark and events that have a single insertion of an EFT operator with a right-handed top quark. Each of these categories contains 27,404 events for training and 6,852 events for testing. It is important to note that when the network is used to calculate limits on the Wilson coefficients, it is applied to events which do not strictly belong to one of these three classes. Instead the events used for the determination of the limits are generated with both the SM contributions and the EFT contributions (including possible interference) included.

Network architecture
The neural network was trained using Keras with the Tensorflow backend. The 18 input nodes are linked to a fully-connected dense layer with 50 neurons with a rectified linear unit activation. A dropout layer is added which randomly freezes 10% of the neurons in this inner layer in every mini-batch to avoid overfitting. This layer is connected to the 3 ouputs with a softmax activation such that the outputs sum up to one. A categorical crossentropy loss function is used and the minimization of this loss function is performed with a stochastic gradient descent set to an initial learning rate of 0.005 and a decay of 10 −6 . Nestrov momentum is used and is fixed to a value of 0.8. The training is performed in mini-batches of 128 events and is stopped after 100 epochs. The training curve is shown in Figure 17, showing a convergence to a plateau both for training (blue) and testing (green) datasets. The top panel shows the accuracy whereas the bottom panel shows the value of the loss function. These curves are shown both for the training (red) and for an independent validation data set (blue). These curves converge towards each other and reach a plateau after about 100 epochs.

Variable Distributions
Below the distributions of all the input variables of the neural network are shown for SM only events (red), for events with a single insertion of an EFT operator with a left-handed top quark (green) and for those with a right-handed top quark (blue).