A Monte Carlo global analysis of the Standard Model Effective Field Theory: the top quark sector

Abstract
 
 We present a novel framework for carrying out global analyses of the Standard Model Effective Field Theory (SMEFT) at dimension-six: SMEFiT. This approach is based on the Monte Carlo replica method for deriving a faithful estimate of the experimental and theoretical uncertainties and enables one to construct the probability distribution in the space of the SMEFT degrees of freedom. As a proof of concept of the SMEFiT methodology, we present a first study of the constraints on the SMEFT provided by top quark production measurements from the LHC. Our analysis includes more than 30 independent measurements from 10 different processes at 
 
 $$ \sqrt{s} $$
 
 
 s
 
 
 
 = 8 and 13 TeV such as inclusive 
 
 $$ t\overline{t} $$
 
 t
 
 t
 ¯
 
 
 
 and single-top production and the associated production of top quarks with weak vector bosons and the Higgs boson. State-of-the-art theoretical calculations are adopted both for the Standard Model and for the SMEFT contributions, where in the latter case NLO QCD corrections are included for the majority of processes. We derive bounds for the 34 degrees of freedom relevant for the interpretation of the LHC top quark data and compare these bounds with previously reported constraints. Our study illustrates the significant potential of LHC precision measurements to constrain physics beyond the Standard Model in a model-independent way, and paves the way towards a global analysis of the SMEFT.


Introduction
The Large Hadron Collider (LHC) is pursuing an extensive program of direct searches for physics beyond the Standard Model (BSM) by exploiting its unique reach in energy. Whilst these searches have not yet returned any convincing evidence for BSM physics, only a small fraction of the final LHC dataset has been analysed so far, and ample room for surprises remains. A complementary approach to the searches for direct production of new particles is that of indirect BSM searches, where precise measurements of total cross-sections and differential distributions are compared to Standard Model (SM) predictions with the hope to uncover glimpses of BSM dynamics in the interactions between SM particles. For instance, if new particles are too heavy to be directly produced at the LHC, they could still leave imprints in the kinematical distributions of the SM particles via interference or virtual effects. 1 A powerful framework to identify, constrain, and parametrise potential deviations with respect to the SM predictions in a model-independent way is the Standard Model Effective Field Theory (SMEFT) [1][2][3]. In this framework, the effects of BSM dynamics at high scales E Λ are parametrised for E Λ in terms of higher-dimensional (irrelevant) operators built up from the SM fields and respecting symmetries such as gauge and Lorentz symmetry. This approach is robust and general, since one can construct non-redundant bases of independent operators at any given mass dimension ( = c = 1) that can then be systematically matched to explicit ultraviolet-complete scenarios for their interpretation at any order in 1/Λ.
Analysing experimental data in the SMEFT framework is non trivial; even restricting oneself to operators that conserve baryon and lepton number [3], one ends up with N op = 59 operators at dimension six for one generation, growing to more than 2000 in absence of flavour assumptions. This implies that global and model-independent SMEFT analyses need to explore a complicated parameter space with a large number of degenerate ("flat") directions and local minima.
In this context, the wealth of precision measurements presented by the LHC collaborations in recent years, together with the significant progress in the corresponding theoretical calculations and modelling of collider processes, has motivated many groups to pursue (partial) SMEFT analyses of the LHC data [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21] complemented often with input from lower-energy experiments such as the LEP electroweak precision tests. In these fits, constraints on the SMEFT operators can be provided not only by "traditional" processes such as electroweak gauge boson and Higgs production, but also by other high-p T processes such as jet and top quark production. Interestingly, even when only considering electroweak processes, these constraints are comparable or even superior to those provided by LEP [4,22]. Indeed, SMEFT corrections often grow quadratically with the energy and thus directly benefit from the large kinematic reach, up to several TeV, provided by present and future LHC measurements.
From the methodological point of view, a global fit of the SMEFT from LHC measurements requires combining state-of-the-art theoretical calculations (in the SM and in the SMEFT) with a wide variety of experimental cross-sections and distributions. This should be accomplished by means of a robust statistical analysis allowing for the reliable estimation of all sources of uncertainty and for the minimisation of procedural and theoretical biases. SMEFT fits therefore represent, conceptually, a similar problem to that arising in the global QCD analysis of the quark and gluon structure of the proton in terms of parton distribution functions (PDFs) [23][24][25]. By exploiting these conceptual similarities, in this work we develop a novel strategy for global SMEFT analyses inspired by the NNPDF framework, successfully applied to the determination of the parton distributions of the proton [26][27][28][29][30][31][32][33][34][35][36] and of hadron fragmentation functions [37,38]. This approach, which we denote by SMEFiT, combines the generation of Monte Carlo (MC) replicas, to estimate and propagate uncertainties, with cross-validation to prevent over-fitting.
As a proof of concept of the SMEFiT methodology, we apply it here for the first time to the detailed study of top quark production at the LHC in the SMEFT framework at dimension six. The top quark, the only fermion with an O(1) Yukawa coupling, plays a privileged role in most BSM scenarios aiming to explain the origin of electroweak symmetry breaking and stabilise the weak scale. From the experimental data point of view, a global SMEFT analysis of top quark production at the LHC is motivated by the large number of precision measurements at √ s = 7, 8 and 13 TeV that have become available recently. This data includes total rates and differential distributions in inclusive tt and single-top production, associated production of top quarks with vector bosons and the Higgs boson, and helicity fractions in top quark decay. The wealth of data collected by the LHC is mirrored by the advancements on the theoretical side, where significant progress in higher order calculations in the top quark sector has been achieved. This is true both from the SM point of view, with the calculation of NNLO QCD and NLO electroweak corrections for inclusive top quark pair and single top production, as well as from the SMEFT side. In the latter case, LO calculations are now automatised in codes such as MadGraph5_aMC@NLO [39] within a framework agreed within the LHC Top WG [10], and NLO QCD corrections have been presented for a continuously growing number of processes.
Several SMEFT analyses of the top quark sector have been presented based on either hadron collider [40][41][42][43][44][45] or lepton collider [46] processes, in the latter case also considering the sensitivity of future machines such as the International Linear Collider (ILC). The top quark sector of the SMEFT has been in particular studied by the TopFitter collaboration [15,47,48]. Our analysis exhibits several improvements as compared to the available studies, allowing us to assess the impact of several important aspects in the fit. First, we include a broader range of input experimental measurements from different processes, which allow us to constrain a larger number of SMEFT operators. Second, we include the NLO QCD corrections to the SMEFT contributions. This entails an improved accuracy and a reduction of the theory systematic errors. Third, we always compute both the leading linear (O(Λ −2 )) and the subleading quadratic (O(Λ −4 )) contributions to the SMEFT predictions, so that effects of including or not the quadratic terms can be systematically studied. Fourth, our methodology avoids any assumption about the specific profile of the χ 2 function and in particular we do not rely on any quadratic approximation for error propagation.
By exploiting the SMEFiT methodology, here we derive the probability distribution in the space of SMEFT Wilson coefficients that follows from all available top quark production crosssections. We study the impact of individual processes on the SMEFT parameter space and the role of higher order corrections, such as NLO QCD and the SMEFT O(Λ −4 ) corrections. In general, we find that that higher order effects are non-negligible and can significantly affect the results. We also quantify the correlations between the operators, and compare the bounds derived here with previous constraints reported in the literature. Our analysis illustrates the significant potential of LHC precision measurements to constrain, and possibly identify, BSM physics in a model-independent way.
The outline of this paper is the following. In Sect. 2 we summarise the SMEFT description of the top quark sector at dimension six and introduce our choice of operator basis for the fit. In Sect. 3 we describe the experimental measurements of top quark production at the LHC which are used to constrain the SMEFT operators and the settings of the corresponding theoretical calculations of the SM and SMEFT cross-sections. The SMEFiT methodology is presented in Sect. 4, where it is validated by means of closure tests. The main results of this work are presented in Sect. 5, where we determine the confidence level intervals for the coefficients of the N op = 34 SMEFT operators and their correlations, and compare them with the bounds reported in previous studies. In Sect. 6 we summarise our main conclusions and outline possible directions for generalising our analysis to other processes.

The SMEFT in the top quark sector
In this section we describe the theoretical formalism that will be adopted in this work to interpret the LHC top quark production data within the SMEFT framework. First, we provide an introduction to the SMEFT, focusing on those operators that affect the description of the top quark sector. Then, we define the degrees of freedom that are more relevant to studying top quark production at the LHC. Operators that do not involve top quarks and their constraints are also briefly discussed. We finally describe our theory calculations at NLO QCD accuracy, and comment on some additional aspects of the SMEFT formalism relevant for this study.

The SMEFT framework
Let us begin by reviewing the SMEFT formalism [2,49], with emphasis on its description of the top quark sector. As mentioned in the introduction, the effects of new heavy BSM particles with typical mass scale M Λ can under general conditions be parametrised at lower energies E Λ in a model-independent way in terms of a basis of higher-dimensional operators constructed from the SM fields and their symmetries. The resulting effective Lagrangian then admits the following power expansion (2.1) where L SM is the SM Lagrangian, and {O (6) i } and {O (8) j } stand for the elements of the operator basis of mass-dimension d = 6 and d = 8, respectively. Operators with d = 5 and d = 7, which violate lepton and/or baryon number conservation [50,51], are not considered here. Whilst the choice of operator basis used in Eq. (2.1) is not unique, it is possible to relate the results obtained in different bases [52]. In this work we adopt the Warsaw basis for {O (6) i } [3], and neglect effects arising from operators with mass dimension d ≥ 8.
For specific UV completions, the Wilson coefficients {c i } in Eq. (2.1) can be evaluated in terms of the parameters of the BSM theory, such as its coupling constants and masses. However, in a bottom-up approach, they are a priori free parameters and they need to be constrained from experimental data. In general, the effects of the dimension-6 SMEFT operators in a given observable, such as cross-sections at the LHC, differential distributions, or other pseudo-observables, can be written as follows: where σ SM indicates the SM prediction and the Wilson coefficients c i are considered to be real for simplicity. In Eq. (2.2), the second term arises from operators interfering with the SM amplitude. The resulting O Λ −2 corrections to the SM cross-sections represent formally the dominant correction, though in many cases they can be subleading for different reasons. The third term in Eq. (2.2), representing O Λ −4 effects, arises from the squared amplitudes of the SMEFT operators, irrespectively of whether or not the dimension-6 operators interfere with the SM diagrams. In principle, this second term may not need to be included, depending on if the truncation at O Λ −2 order is done at the Lagrangian or the cross section level, but in practice there are often valid reasons to include them in the calculation. We will discuss in more details the impact of these O Λ −4 corrections at the end of this section.
An important aspect of any SMEFT analysis is the need to include all relevant operators that contribute to the processes whose data is used as input to the fit. Only in this way can the SMEFT retain its model and basis independence. However, unless specific scenarios are adopted, the number of non-redundant operators N d6 becomes unfeasibly large: 59 for one generation of fermions [3] and 2499 for three [53]. This implies that a global SMEFT fit, even if restricted to dimension-6 operators, will have to explore a huge parameter space with potentially a large number of flat (degenerate) directions.
Due to the above consideration, in this work we follow closely the strategy documented in the LHC Top Quark Working Group note [10]. In particular, we adopt the Minimal Flavour Violation (MFV) hypothesis [54] in the quark sector as the baseline scenario. We further assume that the Cabibbo-Kobayashi-Maskawa (CKM) matrix is diagonal, and that the Yukawa couplings are nonzero only for the top and bottom quarks. In other words, we impose a U (2) q ×U (2) u ×U (2) d flavour symmetry among the first two generations. In addition, we restrict ourselves to the CP-even operators only, and focus on those operators that induce modifications in the interactions of the top quark with other SM fields. As we will now show, under the above assumptions, we will explore the parameter space associated to the N op = 34 linear combinations of dimension-6 operators that are relevant for the description of the top quark sector. Following Ref. [10], we will then define the specific degrees of freedom relevant for the interpretation of top quark measurements.

The top quark sector of the SMEFT
Given the scope of this study, we will consider here only those dimension-6 operators that affect the production and decay of top quarks at the LHC through the modifications of their couplings to other SM fields. Following Ref. [10], we adopt the Warsaw basis [3] of nonredundant, gauge-invariant dimension-six operators, and then we define the specific degrees of freedom relevant for each measurement. These degrees of freedom are linear combinations of the Warsaw-basis operator coefficients, which appear in the interference with SM amplitudes, and in interactions with physical fields after electroweak symmetry breaking. These combinations are then aligned with physically relevant directions of the SMEFT parameter space. They represent the maximal information that can be extracted from measuring a certain process. The rationale for using them in a global fit instead of the basis operator coefficients directly is that they may reduce the number of relevant parameters and unconstrained combinations.
Since we only consider here those operators which contain at least one top quark under the assumed flavour symmetries, we are implicitly assuming that other operators affecting the considered processes are well constrained from measurements of other processes that do not involve top quarks. This assumption may not always be justified, but it is helpful for a better understanding of the top quark sector, and also for setting up the scope of this work. Without this assumption, it is likely that one would have to resort to a much more global analysis, including all currently available data, which goes beyond the scope of the present analysis. We will discuss explicitly how in our case this assumption is justified in the next subsection.
We are now ready to define the relevant degrees of freedom that will be used in this analysis in terms of the dimension-6 operators of the Warsaw basis. The complete set of degrees of freedom can be found in Ref. [10], and for completeness we collect in Appendix A the definitions and conventions that will be adopted in the following. To begin with, concerning the operators involving four heavy quarks (that is, either a right-handed top t, or a righthanded bottom b, or a left-handed top-bottom doublet Q), we define the following degrees of freedom: 3) and in addition we also have 3) we see that within the specific flavour assumptions adopted here there are 11 operators involving four heavy quarks. These operators can only be constrained from processes involving four heavy quarks in the final state, such as four-top quark production or ttbb production, as we will discuss below in Sect. 3.4. Concerning the dimension-6 operators of the Warsaw basis involving two light quarks and two heavy quarks, see the list in Eq. (A.2), we first note that operators involving a light-quark scalar or tensor current are vetoed by the flavour assumptions adopted here. On the other hand, vector-like interactions such asLLLL,LLRR, andRRRR type operators are allowed by our flavour scenario. We can therefore define the following degrees of freedom in terms of two-light-two-heavy operators: , where i corresponds to a light quark index, that is, it is either 1 or 2, and recall that the first two generations are massless and thus exhibit an SU(2) flavour symmetry. For these degrees of freedom involving two heavy quarks and two light quarks, we therefore end up with 14 independent coefficients. These degrees of freedom can be constrained by processes such as inclusive tt, through the quark-antiquark component of the initial state, as well as by tt production in association with gauge vector bosons. The SU(2) triplet degrees of freedom can also be constrained by single top processes. 6 Finally, we need to take into account the degrees of freedom involving operators built from two heavy quarks and bosonic fields, including the Higgs field, namely those listed in Eq. (A.2). For these operators, the following combinations are defined: ϕq , c ϕt ≡ C (33) ϕu , c ϕtb ≡ Re{C (33) ϕud }, We see for example that the c tZ degree of freedom is a combination of the O ij uB and O ij uW operators with i = j = 3, weighted by the sine and the cosine of the Weinberg angle respectively. Since we account here only for CP-conserving effects, the imaginary parts of the last five coefficients, being CP-odd, will not be included. Note that there are two additional degrees of freedom that fall into the same category (two heavy quark fields plus bosonic fields), but they are not independent from those defined above. First of all, we have the combination of O 1,3(33) ϕq operators that modifies the SM coupling of the b quark to the Z boson, defined as as well as the combination of operators that affects the electromagnetic dipole of the top quark, defined as These two degrees of freedom, c + ϕQ and c tA , are useful for instance in the interpretation of processes such as Z → bb and ttγ. Since they can be simply written as linear combinations of other degrees of freedom, we will not discuss them further in this work.
Taking stock, in total we have 9 CP-conserving degrees of freedom constructed from operators that involve two heavy quarks and gauge and Higgs bosonic fields. Operators involving gauge boson fields can be constrained either by single top production, if they modify the charged current coupling, or by the associated production of top quark pairs and single tops with electroweak bosons, i.e. processes such as ttV and tV , if they modify only the neutral current couplings. The degree of freedom c tG will enter at leading-order in top pair and top pair associated production and tW , and at NLO in t/s−channel single top production. The degree of freedom c tϕ , on the other hand, can only be constrained from the associated production of a top quark pair with a Higgs boson, as we will discuss in the next section. Fortunately in this case, the first cross-section measurements for ttH production have recently become available.
Putting everything together, in total our fitting basis will be composed of N op = 34 independent degrees of freedom constructed from the dimension-6 SMEFT operators relevant for the description of the top quark sector: N op = 11 four-heavy-quark operators, N op = 14 two-heavy-two-light quark operators, and N op = 9 operators involving two heavy quarks and bosonic fields. In Table 2.1 we summarise the definition of these 34 degrees of freedom in terms of the SMEFT operators in the Warsaw basis, as well as the internal notation that we will use in the following to refer to them. As in the above discussion, the degrees of freedom are divided into the three relevant classes: four-heavy-quark operators, two-heavytwo-light-quark operators, and operators that couple two heavy quarks to gauge and Higgs bosonic fields. We will discuss in the next section (see, in particular, Table 3.5) which of these operators are constrained by each of the LHC top quark measurements included in the analysis.
Note that some of the degrees of freedom defined in Table 2.1 have already been studied in the context of SMEFT fits of the top quark sector, see e.g. [48,55,56] and references therein. For instance, the chromomagnetic operator c tG was constrained to be within [−1.3, 1.2] for Λ = 1 TeV at the 68% confidence level in the analysis of [48]. However, so far the simultaneous determination of the complete set of degrees of freedom of Table 2.1 has never been carried out. In most cases, existing fits consider only either varying one operator at a time or marginalising over a smaller subset of operators, and the bounds derived in this way can differ significantly from those derived in a more global analysis. We will come back in Sect. 5 to the comparison of the results of our analysis with previous studies in the literature.
The SMEFT degrees of freedom defined in Table 2.1 have been implemented at LO in the UFO model dim6top as discussed in [10]. Results obtained with dim6top have been benchmarked with the independent UFO implementation available in the SMEFTsim package [57]. In this work the dim6top model is complemented with the necessary counter-terms to enable NLO computations. The UFO model has been interfaced to MadGraph5_aMC@NLO to compute the O Λ −2 and O Λ −4 SMEFT corrections to the relevant SM cross-sections as indicated in Eq. (2.2).

Operators not involving top quarks
In this work, we follow Ref. [10] and only include those operators that explicitly modify the couplings of the top quark with the other SM fields. We therefore assume that other relevant operators are well constrained by processes that do not involve top quarks. In the following, we give a brief overview of these operators and discuss how they are constrained.
Firstly, the operator enters tt(V /H) production through a modification of the triple gluon coupling. Although it has been suggested that tt production can possibly constrain it due to its non-interference with the SM in di-jet production [58], it has been shown recently that this operator is most tightly constrained by multi-jet production measurements [59,60]. The bounds on the coefficient c G are found to lie within [−0.04, 0.04] TeV −2 , beyond the sensitivity of top quark pair production or associated production. Secondly, operators involving a modification of the electroweak gauge-boson couplings to light fermions are in principle relevant to the interpretation of the single top, tZ and ttZ measurements. Most of these operators are however reasonably well constrained by electroweak precision observables. In the Warsaw basis, and under the assumed flavour structure, they are O and O ϕD . Among these 10 operators, 8 degrees of freedom are stringently constrained by electroweak observables [61], while two flat directions remain which are constrained only by diboson production processes, as discussed in Refs. [53,62,63] for example. The two flat directions can be conveniently parametrised with [64]: uϕ } Table 2.1. The notation that we will use to denote the results of the fits presented in this work. In each case, we indicate the internal notation for the degree of freedom and the corresponding definition in terms of the operators in the Warsaw basis. The degrees of freedom are divided into three classes: fourheavy-quark operators (QQQQ), two-heavy-two-light-quark operators (QQqq), and operators that couple two heavy quarks to gauge and Higgs bosonic fields (QQ + V, G, ϕ). (2.11) which are linear combinations of Warsaw basis operators. Together with another basis operator O W , they form the full set of operators that modify the triple-gauge-boson couplings (TGC). While in principle these couplings would enter the tZ process considered in this analysis (as well as measurements of tγ that we do not include), they are well constrained from diboson production at LEP2 and the LHC. An interesting question is whether processes like tZ production could enhance the sensitivity to the anomalous TGCs, as the diagrams of this process in the SM display large cancellations among each other as required by unitarity, which are then spoiled by anomalous TGC leading to enhanced cross sections at large energy. The study performed in Ref. [41] shows that while the effect is indeed present, it is not significant enough to compete with the sensitivities provided by diboson production. Therefore neglecting these operators in the associated production of single top quarks is well justified. Another operator that potentially would need to be taken into account is associated to the modification of the Zbb coupling, characterized by the coefficient of as well as by the degree of freedom c + ϕQ defined in Eq. (2.7). These are constrained by the decay rate of Z → bb and the forward-backward asymmetry of e + e − → bb at the Z pole measured by LEP. The corresponding constraints on the two coefficients are below O(0.1) TeV −2 . Therefore C ϕb can be safely ignored in this analysis. Conversely, the constraint on c + ϕQ is in principle relevant as it is a linear combination of c − ϕQ and c 3 ϕQ , which enter this fit. However, in this work we choose not to include this information, because our goal is to quantify the direct constraints provided by top quark measurements.
Another operator that one might have to consider is the Higgs-gluon operator which enters in the ttH production process. While this operator is already tightly constrained by Higgs production in gluon fusion, gg → H, this process is also affected by exactly the same top quark degrees of freedom that enter ttH, namely c tϕ and c tG . Therefore in principle the marginalised limit should be derived by combining gg → h together with the other top quark measurements, and fitting simultaneously C ϕG and the relevant top quark operators. Such a combined fit to both the ttH and gg → H processes has been studied in [40], showing that, within its marginalised bound, C ϕG does have an impact on the ttH rate, but it is not very significant. Therefore in this work we include the data on the ttH cross-sections but not the gg → H ones, since the latter will only fix the value of C ϕG without affecting the description of the ttH process too much. We should keep in mind that it would be possible to improve upon this by explicitly including the gg → H cross-sections to our experimental inputs. Another interesting operator is the following: which, in addition to all four-fermion operators involving two light quarks and two righthanded b quarks, will affect the description of ttbb production. Ref. [65] has reported O(1) TeV −2 bounds on C bG from the analysis of pp → bb production. The other four-fermion operators will enter the same process, and it is unlikely that ttbb production will provide an even stronger constraint.
Finally, two-lepton-two-top-quark operators, such as (tγ µ t)(ēγ µ e) and (Qγ µ τ I Q)(lγ µ τ I l), could in principle affect the description of ttZ and tZ production as well as the measurement of the W -helicity fractions in top quark decay, if the decays of W and Z bosons are taken into account. However, depending on the details of the analysis, the inclusion of these operators requires a reinterpretation of the experimental measurements, since for example the extrapolation from the fiducial to the total phase space could be affected by SMEFT effects. We thus postpone the inclusion of these operators to future studies.
We emphasize that the decoupling of the operators that do not involve top quarks from the interpretation of top quark measurements at the LHC is in principle only an approximation. Not all of these operators are currently strictly constrained by other processes, but they could be dealt with either by possible improvements in the future, or by extending our fit by including additional measurements. However, until the most complete global fit can be performed, approximations like the ones we adopt here are always useful, because they allow us to focus on a certain sector of SMEFT, to study a certain type of processes, and to obtain an intuitive understanding of the underlying physics. Fortunately, as we can see from the discussions above, for a study focused on the interpretation of top quark measurements, the assumption of the decoupling of non-top operators is in general already very good, and future improvements can be envisioned. We therefore expect our results to be robust and not significantly affected by the possible inclusion of SMEFT operators that do not involve top quarks.

NLO QCD effects in the SMEFT calculation
In Eq. (2.2), the coefficients σ i and σ ij can be evaluated at either leading order in both the QCD and electroweak couplings, or by also including higher-order perturbative corrections. Given the high precision of available top quark measurements, particularly from the LHC Run II, as well as the further improvements expected at Run II and during the High-Luminosity (HL) LHC, it is important to take into account the NLO QCD corrections to SMEFT effects. This is necessary for a number of reasons, including: • QCD corrections to total rates are often quite large, especially for processes that are proportional to α s at the Born level. Taking them into account results in general in an improvement of the bounds on the SMEFT Wilson coefficients. Additionally, NLO QCD corrections also reduce the theoretical uncertainties from scale variations, which is helpful in discriminating between different BSM scenarios.
• QCD corrections can distort the distributions of key observables. Given that the interpretation of differential distributions plays an important role in SMEFT global fits, providing reliable predictions for them is crucial. For instance, it is shown in Ref. [66] that in the presence of a deviation from the SM, missing QCD corrections to certain differential distributions could lead us to make incorrect conclusions on the nature of BSM physics.
• The experimental sensitivity to SM deviations can be improved by using the most accurate SMEFT predictions and by optimizing the experimental strategies in a topdown way. However, the large QCD corrections at the LHC make this improvement unrealistic without consistently taking into account NLO predictions.
A novel feature of the present work, as compared to previous SMEFT studies of the top quark sector, is that we will exploit this framework and include the theoretical predictions at NLO in QCD whenever possible. This allows us to obtain the currently most accurate bounds on the coefficients of the SMEFT operators affecting the top quark couplings. Furthermore, by switching on and off the NLO QCD corrections in the fit, we can understand better the importance of the higher-order corrections in the SMEFT calculation when constraining different operators. In the following, we briefly explain which corrections will be included. As we will discuss in Sect. 3, the SM calculations are always performed using the highest perturbative order available for each process.
Of all the degrees of freedom relevant in this work, the operators involving only two fermion fields have been fully automated already in this framework, it is therefore possible to straightforwardly evaluate their associated NLO QCD corrections. Four-fermion operators are being studied, and their complete implementation is expected to be publicly available in a short timescale. In this work we will include the NLO QCD corrections to the four-fermion operators only in the inclusive single top and top-pair production processes, which are the most accurately measured processes.
One practical difficulty in obtaining stable numerical results at NLO is that MC errors on interference terms σ i and in the cross terms σ ij , i = j can be large. These σ terms are obtained by sampling the parameter space spanned by the full set of relevant degrees of freedom, and computing the total cross section (or other observables) iteratively. The results are fitted to the most general quadratic function. Therefore the MC error on interference/cross terms can be large, in particular when these terms are suppressed (see discussions in Ref. [77]). Examples of suppressed interference contributions will be discussed in the next subsection. Given the large number of SMEFT operators relevant for the description of top quark measurements, a full simulation at NLO QCD would be very time consuming.
In this work, we adopt the following strategy: • For tt and single top production, the experimental measurements exhibit the highest precision. We therefore use the full NLO simulation. This is done by sampling the parameter space following [77]. For each point, we generate 8 × 10 5 events, and estimate the corresponding MC errors for each observable included in the fit. These MC uncertainties can be taken into account when constructing the χ 2 function, as discussed in Sect. 4.
• For associated production processes, the measurements are less accurate. We generate the full LO predictions using the implementation provided in [10]. We then apply Kfactors from previous calculations of ttZ, ttH and tZj production, wherever available [40,41,69]. For contributions or processes that have not been previously calculated (e.g. contributions from the four-fermion operators, and ttbb and tttt processes), we simply apply the SM K-factor.
• The W -helicity in top quark decay is available at NLO in the form of analytical results [78].
In Sect. 5 we will assess the stability of our results with respect to the inclusion or not of NLO QCD corrections to the SMEFT dimension-6 effects. Finally, the tttt process is a special one, because the dominant contributions can come from O(c 4 /Λ 8 ) terms, i.e. it goes beyond the parametrisation of Eq. (2.2). This is due to diagrams with two insertions of qqtt operators at the amplitude level, which leads to a rapid growth of the cross section as a function of energy, potentially causing problems with SMEFT validity [55]. In this work we only keep the terms up to O(Λ −4 ) from one insertion of the operators. This represents a good approximation if the coefficients of qqtt operators are constrained to be within the order of a few TeV −2 at most.
Beyond this limit, O(Λ −8 ) terms dominate, and our predictions are not accurate, but they do give a lower bound of the true SMEFT contribution. Given that in practice, only the upper bound of this measurement is useful for constraining operators, our approximation will always lead to conservative result. In addition, to avoid possible EFT validity problems due to the particular energy-growth behavior of this process, we impose a hard cut of 2 TeV on the four-top invariant mass in our prediction. Once again this leads to a lower bound of the true SMEFT contribution. When compared with the upper bound of the full cross section measurement without the cut, it gives a conservative bound. In Sect. 5.4 we will discuss the dependence of our final result on this cut.

General discussion
To complete this section, we briefly discuss some additional aspects of the SMEFT framework relevant to the present analysis of the top quark sector.
RG running and mixing. In general, the SMEFT operators {O (6) i } in Eq. (2.1) will run with the scale and thus the coefficients {c i } will depend on the typical momentum transfer of the process. This dependence can be evaluated using renormalisation group (RG) equations [53,79,80], but here since we focus on processes with a similar energy scale, E m t , we will not include these operator running effects. In any case, the inclusion of NLO QCD corrections will reduce this scale dependence, making the RG effects less significant [40,81].
Energy enhancements. One important feature of Eq. (2.2) is that certain SMEFT operators will induce a growth of the cross-sections σ(E) with the energy E [4]. This is a consequence of the fact that, in four space-time dimensions, field theories involving operators with mass dimensions d > 4 exhibit a strong sensitivity to the UV cut-off of the theory Λ. In other words, the coefficients {σ i } in Eq. (2.2) will include terms that grow quadratically with E. Restricting ourselves to the O Λ −2 corrections, Eq. (2.2) can be written schematically as: where v is the Higgs boson vacuum expectation value (vev). The coefficients {ω i } and { ω i } are process-dependent and arise from the {σ i } coefficients in Eq. (2.2), once we separate the energy-growing contributions.
Whether a dimension-6 operator leads to a nonzero value of the energy-growing coefficientω i depends on many factors. For example, four-fermion operators in tt and single top processes can interfere with the SM amplitude with the same helicity configurations without any additional suppression, and therefore their contributions are proportional to E 2 /Λ 2 by simple power counting. On the contrary, operators involving only two fermions contribute as v 2 /Λ 2 : the current-current operators like O (1) ϕQ always enter with two powers of the Higgs vev, while the dipole operators come with one power of the Higgs vev, but the flip of the fermion chirality leads to an additional suppression factor m t or m b upon interfering with the SM.
The associated production channels are on the other hand more complicated. There, even two-fermion operators can lead to E 2 /Λ 2 contributions. This is because already in the SM, each Feynman diagram could lead to energy-growing terms, but overall cancellations occur among the leading contributions in different diagrams as required by unitarity. With higher-dimensional operators instead, even a O(v 2 /Λ 2 ) change in one diagram could spoil the cancellation and lead to O(E 2 /Λ 2 ) modification of the total rate.
Finally, the specific observable under consideration also matters. The interference between SM and SMEFT amplitudes with different helicities are suppressed by mass factors, but this suppression can be lifted by considering the decay products of the particle with different helicities [82,83]. Note that suppression due to different helicities only applies to the interference term. As a result, at large energy the SMEFT contribution could be dominated by the O(Λ −4 ) terms. This is actually one of the reasons to include the quadratic contributions from the operators. A similar situation occurs for instance in diboson production at the LHC, see the discussion in [84].
The schematic decomposition in Eq. (2.15) indicates that the effects of those operators for which ω i = 0 will be enhanced with the energy E of the process. In turn, this will lead to an increased sensitivity of the experimental measurements at high energies with the values of the corresponding c i coefficients. This property can be uniquely exploited at the LHC, where multiple processes probe the TeV region. It has been shown that in some cases the enhancement due to energy dependence in LHC processes already leads to a sensitivity competitive with respect to the LEP measurements [85][86][87].
It is therefore an interesting question whether the same happens for top quark measurements, and which operators can benefit from the high energy reach provided by the LHC. In this work we are going to study the impact of high energy measurements by comparing results with and without high mass bins in differential distributions, for instance in the invariant mass distribution in tt production. This will be discussed in Sect. 5.4.
Validity of the SMEFT. Following up on the previous discussion, while it is useful to make use of the large energy transfer in the observed events, one has to pay special attention to remain in the region E Λ. Otherwise, the whole validity of the SMEFT power expansion would be questionable, and it would become impossible to interpret the resulting constraints within any explicit BSM model.
To make sure that SMEFT analyses remain in their validity region, it has been proposed in Ref. [88] and recommended in the Top LHC WG EFT note [10] that a kinematic cut E cut should be imposed to events being analysed, as an upper bound of the energy transfer, so that the condition is always guaranteed. Given that Λ is the scale of the BSM dynamics and is model-dependent, different values of E cut should be used, and results should be derived for each of these values. While the input data used in this work is not provided with such explicit cut, for specific distributions it is possible to remove bins with scale higher than a given value of E cut . A strong dependence of the final results on high mass bins would imply that the sensitivity is dominated by the high energy events, and that the constraints can only be interpreted for those BSM models where Λ lies above the scale of the largest bin used in the fit. Among the input data used in this work, only the m tt distribution in tt production extends to energy scales above 1 TeV. Therefore, in Sect. 5.4 we will study the dependence of our results on the high mass bins in the m tt distributions of ATLAS and CMS.
Quadratic dimension-6 contributions. The last term in Eq. (2.2) arises from the squares of dimension-six amplitudes. At order O(Λ −4 ), they are formally subleading contributions, so one can decide to include them without modifying the accuracy of the prediction of the central value. Indeed, there are good reasons why it could be worth including them in the analysis. To begin with, in BSM models with relatively large couplings the quadratic dimension-6 terms can become dominant without exiting the realm of validity of the EFT, see for instance Refs. [88][89][90]. Therefore including the quadratic terms allows the results of a SMEFT analysis to be interpreted in the context of these scenarios.
Furthermore, the interference terms in Eq. (2.2) are often suppressed, so that the leading contributions arise from quadratic dimension-6 terms. In these cases, one relies on the quadratic terms to extract meaningful bounds from the measurements. As an extreme case, the SM amplitude may not interfere with the SMEFT amplitude at all, because of different helicity and colour structures or different CP parity. As an illustration, in our analysis c ϕtb and c bW cannot interfere with the SM in the limit of m b → 0, so they can only be constrained once O(Λ −4 ) terms are included in the fit. Similarly, several of the qqtt operators do not interfere with SM due to their color singlet interaction, though this is slightly lifted once NLO corrections are added.
It is also possible that, while the interference term exists, it does not lead to an energy growth behaviour. We have already mentioned that some operators cannot lead to a nonzerõ ω i at the order O(Λ −2 ). One would then expect that the dominant sensitivity for these operators comes from O(Λ −4 ) contributions at large energies. For instance, it has been observed that in diboson production at the LHC, the helicity selection rule [91] leads to an energy suppression in the interference term, so the sensitivity to TGC couplings is dominated by O(Λ −4 ) terms [84]. Finally, a suppression of the interference term could be simply accidental. This has been observed for weak dipole operators in ttZ and ttγ production processes [69], and is relevant also in the present analysis.
Here we will follow the recommendations of Ref. [10] and repeat the analysis with and without including the quadratic SMEFT contributions. This comparison will then tell us where quadratic dimension-6 contributions are subleading and where the truncation at dimension-6 at the cross section level can be a good approximation. In addition, under certain assumptions, the O(Λ −4 ) corrections could also provide an estimate of how reliable are the SMEFT fit results with respect to higher orders in the effective theory parameter expansion. Note, however, that for the degrees of freedom whose contributions at O(Λ −2 ) are extremely suppressed, such as c ϕtb and c bW , and for the color-singlet four-fermion interactions c 1,1 Qq , c 1 tu , c 1 td , c 1 tq , c 1 Qu , and c 1 Qd , our numerical approach would lead to large MC errors on the interference, and therefore the resulting bounds from a O(Λ −2 ) fit will be at most qualitative.

Experimental data and theoretical calculations
In this section we describe the experimental measurements of top quark production at the LHC which will be used to constrain the SMEFT operators related to the top sector. For each dataset we discuss its main features, the information that it provides on the SMEFT effects, and the treatment of experimental uncertainties. We also describe the settings of the theoretical calculations of the SM and SMEFT contributions to the cross-sections that are used for each process. Finally, we summarise the main features of our choice of fitting basis in terms of the sensitivity of each of the input LHC processes to the individual operators.

Top quark production at the LHC
In the present analysis, we will constrain the top quark sector of the SMEFT by using experimental measurements from the LHC Run I at √ s = 8 TeV and from Run II at √ s = 13 TeV. We do not consider previous, less precise data at √ s = 7 TeV nor data from the Tevatron. The various experimental datasets used as input in this work are summarised in Tables 3.1, 3.2 and 3.3, for inclusive top quark pair production, tt production in association with gauge and Higgs bosons, and single top production measurements, respectively. For each dataset, we indicate the type of process, its label, the centre-of-mass energy √ s, information on the final state or the specific production mechanism, the available observables, the number of data points N dat , and the corresponding publication reference.
As we will discuss in more detail below, information on correlations between systematic uncertainties is available only for a subset of the data, specifically, for all the 8 TeV distributions in Table 3.1 (including W helicity fractions) [92][93][94][95][96] and for the top quark pair production measurement at 13 TeV of Ref. [97]. For the rest of the datasets, since this information is missing, we add statistical and systematic uncertainties in quadrature. Also, the correlation matrix among various differential distributions is not usually available. To avoid double counting, only one distribution per dataset can therefore be included in the fit. The ATLAS 8 TeV lepton+jet dataset is an exception, where such correlations have recently become available [98]. However, because the effect of correlating all differential distributions has not been studied yet in the fitting framework used here, we do not utilise them.
As indicated in Table 3.1, in the case of differential distributions we always use absolute rather than normalised cross-sections. The rationale is that absolute distributions are more sensitive to SMEFT effects than the normalised ones, except when the corresponding fiducial cross-sections are included in the fit at the same time. If the measurements are presented only in terms of normalised differential distributions, absolute distributions are reconstructed from the former using the fiducial cross-section. Uncertainties are added in quadrature. To avoid double counting, total and/or fiducial cross-sections are excluded from the fit whenever the corresponding absolute differential distributions are part of the input dataset.
To gain some intuition about the expected sensitivity of the input dataset to each of the SMEFT operators defined in Sect. 2, it is useful to recall what are the dominant production mechanism for each top-related observable in the SM. In Fig. 3.1 we display representative Feynman diagrams at the Born level for the production of top quarks at the LHC in the channels that we consider in this analysis. Specifically, we show top-quark pair production; single-top production in association with a W or Z boson and in the t-and s-channels; tt Representative Feynman diagrams at the Born level for the dominant production channels of top quarks at the LHC that are considered in the present SMEFT analysis. We show top-quark pair production; single-top production in association with a W or Z boson and in the t-and s-channels; tt production in association with tt or bb; and tt production in association with a W or Z gauge boson or with the Higgs boson H. production in association with tt or bb; and tt production in association with a W or Z gauge boson or with the Higgs boson H.
From these diagrams, one can see that measurements of inclusive top quark pair production will be particularly sensitive to SMEFT operators that induce or modify interactions of the form gtt and ggtt, such as the chromomagnetic operator c tG . In this case, the interference with the most relevant SM production mechanism will dominate over the small quark-antiquark-initiated contributions. Likewise, single-top production and associated tW and tZ production will constrain SMEFT operators that involve both top quarks and electroweak gauge bosons, such as c tW . As a third example, ttbb production should provide direct information on operators involving four heavy quarks, such as c 1 QQ and c 8 QQ . Following this overview of our input dataset, we move to describe in more detail the features of the individual measurements listed in Tables 3.1, 3.2 and 3.3.

Top quark pair production
We begin by presenting the LHC datasets of top quark pair production used in this work. We consider inclusive production first, and then tt production in association with heavy quarks,  The experimental measurements of inclusive top quark pair production at the LHC considered in the present analysis to constrain the coefficients of the SMEFT dimension-6 operators in the top sector. For each dataset, we indicate the type of process, the dataset label, the center of mass energy √ s, the final state or the specific production mechanism, the available observables, the number of data points N dat , and the publication reference. Most distributions are statistically correlated among them and one needs to be careful to avoid double counting.
with an electroweak gauge boson, and with the Higgs boson.
Inclusive top-quark pair production. At the LHC, the dominant mechanism for the production of top quarks is through the production of tt pairs. The inclusive tt process is dominated by the gluon-gluon initial state, with a small admixture of the quark-antiquark partonic luminosity [123]. In this analysis, we will limit ourselves to parton-level distributions constructed in terms of the kinematical variables of the top and anti-top quark, for which NNLO QCD corrections are available in the SM [124]. See [125] for recent progress in higher order calculations at the particle level for decayed top quarks, in terms of leptons and b-jets. For all the inclusive tt processes computed here, the SM prediction is computed up to NNLO in the QCD coupling. Theoretical predictions are obtained at NLO with Sherpa [126], for 8 TeV measurements, and with MCFM [127], for 13 TeV measurements, and are then supplemented with the NNLO QCD K-factors computed in Ref. [128].
In the present analysis we include the ATLAS and CMS differential distributions from tt production at √ s = 8 TeV in the lepton+jets final state [92,93]. These measurements are  those used in the study of [129] to constrain the large-x gluon PDF from the tt differential cross-sections, and are part of the NNPDF3.1 input dataset [36]. In both cases, the distributions in top quark transverse momentum and rapidity, p t T and y t , as well as in top-quark pair invariant mass and rapidity, m tt and y tt are available, both as absolute cross-sections and normalised to the inclusive results; only the former are used here. As discussed in [129], to avoid double counting only one distribution per experiment can be added to the fit, as long as correlations between different distributions are not available or neglected.
Besides these two datasets, we take into account the constraints from the double-differential distributions from CMS at 8 TeV, which provide a good handle on the underlying partonic kinematics [94]. Note that this dataset is based on the dilepton final state, therefore it does not overlap with the dataset used in [93], which instead is based on the lepton+jets channel. We also include the CMS differential distributions at √ s = 13 TeV in the lepton+jets [97] and dilepton [100] final states based on an integrated luminosity of L = 2.3 fb −1 , as well as the more recent measurements in the lepton+jet channel based on L = 35.8 fb −1 [99]. A measurement based on the same dataset but with the dilepton final state was presented in [130]. Double-differential distributions from CMS at 13 TeV [97] are excluded since they overlap with the single-inclusive distributions from the lepton+jets datasets.
We do not include ATLAS measurements at 13 TeV since the published differential crosssections at 13 TeV in the lepton+jets [131] and dilepton [132] channels are provided at the particle level. In this work, we restrict ourselves to parton-level observables. Note that in principle ATLAS measurements at 13 TeV are also available for the fully hadronic final state in the highly boosted regime [133]. These measurements are not considered here since their analysis requires jet substructure information alongside the consistent inclusion of electroweak [134] and threshold resummation [135] corrections.
Helicity fractions and spin correlations in tt production. A further window on the underlying dynamics of top quark pair production is provided by the measurement of observables sensitive to the spin structure of top quark production and decay. Among them, polarisation, W helicity fractions, and spin correlations provide direct constraints on the structure of the tW b vertex. In this work, we include the helicity fractions F L , F 0 , and Single t CMS_t_sch_8TeV 8 TeV s-channel σtot(t +t) 1 [110] Single t ATLAS_t_sch_8TeV 8 TeV s-channel σtot(t +t) 1 [111] Single t ATLAS_t_tch_8TeV Single t CMS_t_tch_13TeV_inc 13 TeV t-channel σtot(t +t) (Rt) 1 (1) [114] Single t CMS_t_tch_8TeV_dif 8 TeV t-channel dσ/dp Single t CMS_t_tch_13TeV_dif 13 TeV t-channel dσ/dp  [136]. Related types of available angular observables in tt production include the polarisation asymmetry A P ± , the spin correlations variable A ∆φ , and the A c 1 c 2 and A cos φ asymmetries, which discriminate between the correlated and uncorrelated t andt spins. For example, CMS has presented measurements of tt spin correlations and top quark polarisation in the lepton+jet and dilepton final states at √ s = 8 TeV [137]. Measurements of the tt spin correlations at √ s = 13 TeV in the eµ final state are also available from ATLAS, specifically the differential cross-section in the angular separation between the two leptons.This measurement deviates from the SM predictions by more than three sigma [138]. We leave the inclusion of these observables in the SMEFT global fit to future work. ttV production. In this analysis we also include data for the production of a tt pair in association with either a Z or a W boson, which is directly sensitive to the top quark couplings with the gauge bosons (see Fig. 3.1). Specifically, we include the measurements of the total inclusive cross-sections for ttZ and ttW production at √ s = 8 TeV and √ s = 13 TeV from ATLAS [105,106] and CMS [103,104]. Note that, for ttW , the W boson is often emitted from initial-state light quarks, however, when it is emitted from a final-state leg, it becomes sensitive to operators involving only one heavy quark, which is a unique feature of this process. We do not include the ttγ production measurements [139-141], whose interpretation is hampered by issues related to photon isolation and fragmentation, as well as to initial-and final-state radiation. Because of electroweak symmetry, the ttV process is closely related to the ttH one, to be discussed next.
Higgs production in association with a tt pair. The production of a top-antitop pair together with a Higgs boson allows for a direct probe of the Yukawa coupling of the top quark, as illustrated by the dominant mechanism indicated in Fig. 3.1. Recently, 5σ evidence for this production mode was presented by both the ATLAS and CMS collaborations [107,108]. In the CMS case [107], we utilise their measurement of the signal strength µ tth at √ s = 13 TeV (normalised to the SM prediction), rather than the cross-section, because the latter is obtained by combining data at different centre-of-mass energies. In the ATLAS case [108], we utilise their measurement of the total cross-section for tth production at √ s = 13, extrapolated to the full phase space.
ttbb and tttt production. The production of a top quark pair in association with a bottomantibottom pair is a purely QCD process, where a bb pair is radiated either from a gluon emitted from the initial state or from the final state (see Fig. 3.1). The production of four top quarks at the LHC, tttt, obeys a similar underlying mechanism in the SM, with the crosssection now being rather smaller due to the heavier top quark mass. The relevance of this process on the top quark sector of the SMEFT has been discussed in Ref. [45].
Concerning ttbb, the total cross-section for ttbb, production, extrapolated to the full phase space at √ 13 TeV, is available from CMS [101], together with the corresponding ratio to ttjj production. This single data point is included in our fit. Differential cross-sections for ttbb production at √ s = 8 TeV have also been presented as a function of the kinematics of the b-jets [142]. These measurements, however, are not included in the fit. The ATLAS collaboration has presented the results of the measurements of top quark pair production in association with multiple b-jets at √ s = 13 TeV. Fiducial cross-sections for tt in association with more than two b quarks, both in the lepton+jets and in the dilepton channels, are provided [143], which supersede a previous measurement at 8 TeV [144]. We do not include these results in the fit as the cuts needed to simulate the cross-sections are not fully provided, and are therefore not reproducible.
Concerning tttt, a first measurement of its cross-section at √ s = 13 TeV has been presented by CMS [102], albeit with a statistical significance of only 1.6σ. This measurement supersedes previous upper bounds at 8 TeV [145] and 13 TeV [146]. In the case of ATLAS, upper bounds based on the 2015 dataset at 13 TeV were presented in [147] and then updated in [148] from the 2016 dataset. In this analysis we utilise the CMS cross-section measurement of [102].

Single top quark production
We turn to discuss single top quark production, first inclusively in either the t− or the s−channel, and then in association with an electroweak gauge boson.
Inclusive single top quark production. As highlighted in Table 2.1, some of the SMEFT d = 6 operators that contribute to single top production via interference with the SM amplitudes are different from the corresponding ones in top quark pair production, whence their relevance in a global fit. There exist three main modes to produce single top quarks [149]: by means of the exchange of a W ± boson, either in the t-channel or in the s-channel, and by means of the associated production with a W ± (Z) boson that leads to the q t, tb, and tW ± (tZq) Born-level final states. Representative diagrams for these three modes are shown in Fig. 3.1. In this work, we include all relevant single top production datasets in the t− and s−channels from ATLAS and CMS at 8 and 13 TeV, see Table 3.3. We restrict ourselves to parton-level measurements, that is, to un-decayed top quarks.
From ATLAS, we include the differential cross-sections at √ s = 8 TeV [113], specifically the dσ(tq)/dp t T and dσ(tq)/dy t T distributions, as well as the corresponding measurements for anti-top quarks. From CMS, we include the inclusive cross-sections for t and fort production at √ s = 8 TeV [109], as well as the corresponding differential distributions in p (t+t) T and |y t+t | [115]. In the case of the inclusive measurements, the ratio R t = σ(tq)/σ(tq) is also provided, the use of which would be advantageous if the knowledge on correlations were lacking, due to the partial cancellation of experimental and theoretical systematic uncertainties between the numerator and the denominator.
We now move to single top t-channel based on the Run II dataset at √ s = 13 TeV. We include the transverse momentum p t+t T and rapidity |y t+t | differential distributions for single top production from CMS [116], the ATLAS and CMS measurements of the total inclusive cross-sections for single t andt production [113,114]. The ratio R t is once more provided in both cases.
Concerning single top s−channel measurements, we include the CMS total cross-sections in the s−channel at 8 TeV [110]. We also include the total cross-sections at 8 TeV from AT-LAS [111]. No measurements of s-channel single top production at 13 TeV are available from either experiment. Neither the ATLAS nor the CMS differential distributions are provided with a full breakdown of experimental systematic uncertainties. Therefore, we sum all statistical and systematic uncertainties in quadrature. To avoid double counting, we do not include total cross-sections if the corresponding absolute differential distributions are already part of the input dataset. For example, using the labelling of Table 3.3, if the CMS_t_tch_13TeV_dif distributions are used, then the associated CMS_t_tch_13TeV_inc total cross-sections are excluded from the fit.
For both inclusive and differential single-top measurements in the t-channel, NNLO QCD corrections have been computed [150]. For all single-top processes for which the measurements have been published (all measurements except the CMS differential measurements), we use as a theory input the NNLO calculation. For unpublished measurements, we use NLO QCD.
tV associated production. The associated production of a top quark and a W boson has a very distinctive signature that allows one to reconstruct the decay products from both the top quark and the W decay. Measurements of tW associated production have been presented by ATLAS and CMS. Here we include the ATLAS measurements at 8 and 13 TeV of the total σ(tW ) cross-section [117,119] extrapolated to the full phase space. A measurement of differential distributions at 13 TeV based on a luminosity of L = 36 fb −1 [151] was also presented by ATLAS. However, this measurement is at particle level (of leptons and b-jets from the W and top quark decays), therefore we do not include it in the fit. We also include the CMS measurements of σ tot (tW ) at 8 and 13 TeV [118,120]. These measurements supersede the previous ones at 7 and 8 TeV [118,152].
The associated production of a single top quark in association with a Z boson, shown in Fig. 3.1, is also an interesting probe of the top quark sector of the SMEFT. The tZ production cross-section has been measured by CMS at 13 TeV in the W bl + l − q final state, where the dilepton pair arise from the decay of the Z boson [121] (see [153] for an update based on L = 77.4 fb −1 ). The tZ production cross-section has been measured at 13 TeV by ATLAS in the tri-lepton final state and extrapolated to the full phase space [122]. We use these two measurements as data points in the fit, for a total of four tV input cross-sections.

Theory overview and sensitivity to the SMEFT degrees of freedom
In Table 3.4, we summarise the details of the theoretical calculations used for the description of the LHC top quark production measurements included in the present analysis. We indicate, for both the SM and the SMEFT contributions to the cross-sections in Eq. (2.2), the perturbative accuracy and the codes used to produce the corresponding predictions. In all cases, the same theoretical settings have been used for the calculation of both the total cross-sections and the differential distributions, where available. We emphasise that we have used state-of-the-art theory calculations for both the SM and the SMEFT pieces, which are instrumental to reduce the theoretical uncertainties associated to the missing perturbative higher orders. We have adopted common input settings for the theory calculations, in particular all the SM cross-sections are consistently evaluated with a NNPDF3.1 PDF set at NNLO accuracy that does not include any top data (henceforth labelled NNPDF3.1NNLO no-top). We do not include top quark data in the PDF fit as the datasets used in NNPDF are also used in the current SMEFT fit. Adding tt distributions into both fits would imply to double-counting the data, otherwise, as MC replicas are also used in NNPDF it would require us to keep track of correlations between the two sets of MC replicas. We therefore choose to exclude the top quark data from the PDF fit.
It should be clear from the above discussion, as well as from the considerations presented in Sect. 2, that each of the input LHC processes will have a rather different sensitivity to each of the N op = 34 SMEFT degrees of freedom considered in the analysis. To illustrate this point, in Table 3.5 we indicate the sensitivity of each of the LHC processes included in the present analysis along with the degrees of freedom in our fitting basis (for their definition, see Table 2.1). A check mark outside (inside) brackets indicates that a given process constrains the corresponding operator O(Λ −2 ) (O(Λ −4 )). A check mark in square brackets indicates that the operator enters at O(Λ −2 ) but only at NLO.
The comparison in Table 3.5 illustrates the importance of a global approach to the SMEFT analysis of top quark production. On the one hand, several operators are constrained by many different processes, and this allows independent and complementary constraints. For instance, the chromomagnetic operator c tG is relevant for the description of all the input processes with the exception of single-top production. On the other hand, other operators are constrained by one or two processes at most, so that information on them can be obtained only by including a wide range of different input observables. For instance, c tϕ , constrained only by ttH production; c bW , sensitive only to single-top production at O Λ −4 ; and the four-heavy-quark operators for which the only available information is from ttbb and tttt.
From Table 3.5 we also observe that adding the formally subleading O(Λ −4 ) contributions from the dimension-six operators increases the sensitivity of many different processes. For Table 3.4. Summary of the theoretical calculations used for the description of the LHC top production cross-sections included in the present analysis. We indicate, for both the SM and the SMEFT contributions to the cross-sections, the perturbative accuracy and the codes used to produce the corresponding predictions. example, the ObW and Off operators can only be constrained once O(Λ −4 ) terms are included in the fit. This is also true for several of the four-fermion operators, for which additional constraints can be obtained from the tt, ttV and ttH production processes once O(Λ −4 ) corrections are taken into account.
Needless to say, it is in principle inconsistent to account only for the O(Λ −4 ) effects arising from the dimension-six operators and not from the dimension-eight operators. However, there are good reasons why it could be worth including them in the analysis. First, including or not the O(Λ −4 ) corrections provides an estimate of whether the SMEFT fit results are stable upon higher orders in the effective field theory parameter expansion. Second, one might consider scenarios where the dimension-eight operators do not interfere with the SM amplitudes. In this case the only physically relevant O(Λ −4 ) effects are those arising from the dimension-six operators, which we include here.

The SMEFiT fitting methodology
In this section, we describe the SMEFiT fitting approach that we adopt here to constrain the SMEFT operators summarised in Table 2  the propagation of experimental and theoretical uncertainties, and explain how we determine the best-fit parameters in a way that avoids over-fitting. Finally, we describe how the fitting methodology can be validated by means of closure tests, analogously to the PDF case, and apply this strategy to study the robustness of the results and their dependence with a number of fit settings.

The Monte Carlo replica method
In this work, we adopt the MC replica method to propagate the experimental uncertainties from the input experimental cross-sections to the fitted SMEFT coefficients {c i }. The idea underlying this method is to construct a sampling of the probability distribution in the space of the experimental data, which then translates into a sampling of the probability distribution in the space of the SMEFT coefficients by means of the fitting procedure. This strategy can be implemented by generating a large number (N rep ) of artificial replicas of the original data.
The replica generation is based on the available information on the experimental central values, uncertainties, and correlations associated to each of the input data points. It can then be shown that averages, variances, and correlations computed over the sample of N rep MC replicas reproduce the corresponding experimental values.
In practice, the MC replica method works as follows. Given an experimental measurement of a hard-scattering cross-section, denoted by O where the index i runs from 1 to N dat , the total number of points in a specific dataset, and where the normalisation prefactor is given by In Eqns. (4.1) and (4.2), r i,α , and r (k) i,n are univariate Gaussian random numbers. Correlations between data points induced by systematic uncertainties are accounted for by ensuring that r A similar condition is applied for multiplicative normalisation uncertainties if the n-th normalisation uncertainty is common to the entire dataset, i.e. r The MC approach is conceptually different from the commonly adopted Hessian method, based on the expansion of the χ 2 around its best-fit minimum assuming a quadratic behaviour. Nevertheless, under specific conditions, the two methods can be shown to reproduce equivalent results for the determination of the uncertainties in fitted parameters, see e.g. Ref. [23] for studies in the PDF context. The main advantage of the MC method is that it does not require any assumption about the underlying probability distribution of the parameters, and in particular it is not restricted to Gaussian distributions. Moreover, it is suited to problems where the parameter space is large and complicated, with a large number of quasi-degenerate minima and flat directions. For these reasons, adopting the MC approach rather than the Hessian method is rather advantageous in the case of SMEFT fits.    An important aspect to address in the MC method is how many replicas N rep need to be generated for each specific application. In order to determine this, we assess the robustness of our results with respect to the number of MC replicas used in the fit. To do so, in Fig. 4.1 we show the dependence of the bounds δc i /Λ 2 , determined at the 95% confidence level, on the value of N rep from a level 2 closure test, discussed in detail in Sect. 4.4. Each line corresponds to one of the N op = 34 degrees of freedom defined in Table 2.1. From Fig. 4.1, we find that for N rep 100 the fit estimate for the bounds is affected by large fluctuations. These fluctuations are dampened as the number of replicas increases, and for N rep 500 the results become independent from N rep . In order to ensure that no residual MC fluctuations remain, we will use N rep = 1000 as our baseline. We note however that the validity of this conclusion, in general, will depend on the input dataset, and should therefore be reconsidered if this is modified, in particular if the dataset is significantly extended.

χ 2 definition
For each of the MC replicas generated with Eq. For each MC replica, the corresponding best-fit values are determined from the minimisation of a figure of merit, the error function, defined as where N dat is the number of data points used in the fit, and O Note that in Eq. (4.3) the theory predictions are compared to the MC replicas, rather than to the original experimental central values. Once the best-fit parameters have been determined for all the N rep replicas, the overall fit quality can be quantified by means of the χ 2 where now the theoretical predictions, computed using the expectation value (the mean) for the degree of freedom c l , are compared to the central experimental data. This is evaluated as the average over the resulting MC best-fit sample {c Both the error function, Eq. (4.3), and the χ 2 , Eq. (4.4), are expressed in terms of the total covariance matrix, cov ij , which should contain all the relevant sources of experimental and theoretical uncertainties. Assuming that theoretical uncertainties follow an underlying Gaussian distribution, and that they are uncorrelated to the experimental uncertainties, it can be shown [154] that the total covariance matrix can be expressed as that is, as the sum of the experimental and theoretical covariance matrices. Concerning the experimental covariance matrix, we use the so-called 't 0 ' definition [155] (cov t 0 ) where one treats the additive ('sys') relative experimental systematic errors separately from the multiplicative ('norm') ones. In the additive case, one uses the central value of the experimental measurement, O (exp) i . In the multiplicative case, one uses instead a fixed set of theoretical predictions, {O (th,0) i }. These theoretical predictions are typically obtained from a previous fit; the fit is then iterated until consistency is reached. The use of the t 0 covariance matrix defined in Eq. (4.7) avoids the bias associated to multiplicative uncertainties, which would lead to a systematic undershooting of the best-fit values as compared to their true values [156].
As mentioned in Sect. 3, we construct the experimental covariance matrix, Eq. (4.7), from all available sources of statistical and systematic uncertainties for a given dataset. Information on the bin-by-bin correlations of systematic uncertainties is available only for a subset of the data listed in Tables 3.1-3.2, specifically, for all the 8 TeV top-quark pair differential distributions in Table 3.1 and for the corresponding CMS distributions at 13 TeV from Ref. [97]. For all the other measurements, we add all uncertainties in quadrature; our analysis can be easily updated should more correlations become available.
In addition to the experimental uncertainties, there are at least two main classes of theoretical uncertainties that are in principle relevant for the present fits: (i) uncertainties associated to missing higher orders (MHOs) in the perturbative calculation, and (ii) PDF uncertainties. The impact of the former is not expected to affect this analysis significantly, because we perform the SM calculation at the highest available perturbative order. In particular, we take into account NNLO QCD corrections for the two families of processes that are more precisely known experimentally, namely the absolute differential distributions in inclusive tt and single top (t-channel) production. Furthermore, as discussed in Sect. 2.4, for most of the SMEFT contributions, the NLO QCD calculation is used.
The inclusion of PDF uncertainties, instead, is more important. In this work, we use as input to all our theory calculations the NNPDF3.1 NNLO no-top PDF set [36], which differs from the NNPDF3.1 baseline set only for the exclusion of the top-quark pair production data from the dataset. As explained in Sect. 3, this is necessary to avoid double-counting in the fit. However, this implies that the SM calculation of top quark pair production could be affected by sizeable PDF uncertainties, especially in the tails of the differential distributions, which are not constrained by alternative gluon-sensitive processes in the fit such as transverse momentum Z-boson [157], jet [158], and direct photon production [159]. Therefore, not accounting for PDF uncertainties may bias the results of the fit.
With this motivation, we construct the theoretical covariance matrix from the contributions of the PDF uncertainty as cov (th) where the theoretical predictions O (th)(r) i are computed using the SM theory and the r-th replica from the NNPDF3.1NNLO no-top PDF set, and averages · rep are performed over the N rep = 100 replicas of this PDF set. Note that replicas in the PDF set are not directly related to replicas in the SMEFT set, since the two sets represent different probability distributions.
In general, the theoretical covariance matrix, Eq. (4.8), induces correlations between all the datasets included in the fit. However we account for them only within a given dataset, in the same way as for experimental measurements. If the PDF-induced correlations between data points i and j are neglected, Eq. and vanishes for i = j. This corresponds to adding the PDF error in quadrature to the experimental uncertainties.
For consistency, PDF uncertainties should be included in the fit not only via the covariance matrix in Eqs. (4.3)-(4.4), but also in the MC replica generation. That is, the generation of the data replicas according to Eq. (4.1) includes an additional source of fluctuation determined from the theoretical covariance matrix, Eq. (4.8). Note that, for the k-th data replica, the theory predictions O l }) are evaluated using a different PDF replica from the NNPDF3.1NNLO no-top set. Since in general the number of data replicas, N rep = 1000, is much larger than the number of PDF replicas, N rep = 100, the latter are selected at random with repetition for each data replica.

Minimisation and stopping
In the case of current SMEFT fits, the minimisation of the error function, E, Eq. (4.3), may be achieved by exploiting gradient descent methods, which rely on variations of E. This is because the relationship between the theory cross-sections and the fitted parameters is at most quadratic, see Eq. (2.2). Taking this into account, the optimiser that we use here to determine the best-fit values of the degrees of freedom {c i } is the sequential least squares programming algorithm SLSQP [160] available in the SciPy package. It belongs to the family of sequential quadratic programming methods, which are based on solving a sequence of optimisation subproblems, where each of them optimises a quadratic model. An advantage of using SLSQP is that it allows one to provide the optimiser with any combination of constraints on the coefficients, including existing bounds, a feature that might become useful for future studies.
Since the dimensionality of this parameter space is not that different from the total number of input cross-sections (N dat = 103 points), one needs to avoid over-fitting, i.e. fitting the statistical fluctuations of the experimental data rather than the underlying physical law. Such an effect is particularly dangerous in a situation like the current one, where there are a large number of flat directions with several parameters strongly (anti)-correlated.
To prevent the minimiser from over-fitting the data, we use (MC) cross-validation. For each replica, the data is randomly split with equal probability into two disjoint sets, known as the training and validation sets. Only the data points in the training set are then used to compute the figure of merit being minimised, Eq. From this comparison, we can observe the expected behaviour for both the training and validation sets; namely that the validation χ 2 , once it reaches its lowest value, increases rapidly, while the training χ 2 continues to decrease. We find E tr E val as the number of iterations increase, which is an indication that the optimisation algorithm is over-fitting. It is therefore clear that, without adopting cross-validation, the absolute minimum found by the optimisation algorithm would not correspond to the true underlying law, but rather to fitting statistical noise. We will quantify the importance of cross-validation in the SMEFT fit results in Sect. 4.5, where we will show that without it one obtains unreliable results, including spurious deviations from the SM predictions.

Closure test validation
A reliable fitting framework should be able to fit a wide range of different datasets without tuning the methodology and without biasing the results. Validating a new methodology can be complicated by issues such as potential inconsistencies (internal or external) in the experimental data, or by limitations in the theoretical calculations. To validate the fitting methodology used in this SMEFT analysis, we carry out a series of closure tests, based on pseudo-data generated with a known underlying physical law, see [33] for more details.
The basic idea underlying a closure test is to test the SMEFT fitting procedure by performing fits where the "correct" result is known, i.e. by fitting pseudo-data generated from a fixed reference set of values for the SMEFT degrees of freedom, {c (ref) i }. Closure tests allow one to check that the fitting methodology can reproduce the underlying law, which is known by construction. The SM and SMEFT theory calculations can be assumed to be exact, since we use the same theory settings to generate and fit the pseudo-data. As a consequence, the theoretical uncertainties associated to MHOs and PDFs do not enter closure tests, where only methodological and experimental uncertainties are checked. In the case of SMEFT fits, we can perform a closure test assuming that the underlying truth is the SM, i.e. {c  In the following, we consider three levels of closure tests according to the type of pseudodata that is used as input to the fit.
• In a level zero (L0) closure test the pseudo-data coincides with the true underlying law, without any additional fluctuations. Then N rep fits are performed to exactly the same pseudo-data, with the only difference being the random initial conditions in each case. For instance, if the pseudo-data is generated with the SM hypothesis {c , then the same values should be reproduced at the fit level within uncertainties. For a L0 closure test, the training/validation partition is not necessary, since the information contained in both sets would be identical.
In a L0 closure test, one expects the error function E to tend to zero for a large enough number of iterations. Therefore, direct evidence that a L0 test is successful is to show how the error function decreases with the number of iterations. A L0 closure test therefore allows one to check that the minimiser is efficient enough to properly explore the entire parameter space.
• In a level one (L1) closure test, one adds noise on top of the pseudo-data. Two types of noise may be added; in a L1a closure test we generate MC replicas of the pseudo-data generated in a L0 closure test in the same way as in a real fit to data. Alternatively, in a L1b closure test one adds stochastic noise directly to the pseudo-data, in order to replicate experimental uncertainties included in the fits to data. In this work we adopt a L1a-type closure test, in contrast to NNPDF, where L1b-type closure tests are used as a default. We note that adopting a L1a-type test over L1b simply means that a different type of uncertainty is being probed at L1 -we discuss the various types of uncertainty in Sec. 4.5 where we characterise the types of fit uncertainties.
In comparison to a L0 closure test, a L1 closure test propagates the experimental uncertainties into the fitted coefficients, and can therefore be used to demonstrate that the quoted uncertainties in the fit parameters admit a robust statistical interpretation.
One expects E ∼ 1 for a successful closure test.
• In a level two (L2) closure test, one adds the aforementioned stochastic noise on top of the MC replicas included in the L1 closure test. This statistical noise is generated according to the experimental covariance matrix of the real data. A L2 closure test is therefore equivalent to a fit to the real data, the only difference being that data and theory are perfectly consistent by construction.

L0 closure tests.
First of all, we want to demonstrate that the optimiser is efficient enough to explore the full 34-dimensional parameter space. With this motivation, pseudo-data corresponding to the SM has been generated for all the cross-sections described in Tables 3.1-3.3 and fitted without introducing any additional noise. As mentioned above, here all data is fitted since the training/validation separation is not required. In Fig. 4.3 we show the error function E for L0 closure tests based on the SM scenario as a function of the number of iterations in the minimiser for three replicas with different initial boundary conditions. We see how the error function decreases with the number of iterations, approaching the limit E → 0 which corresponds to the case where the fit results reproduce the reference values {c i = 0}. In Fig. 4.4 we show the results of the L0 closure tests; in the left plot we show the fit residuals for the N op = 34 degrees of freedom included in the fit. They are defined as where c i and δc i indicate the expectation value, Eq. (4.5), and the 95% CL.
As before, c represent the reference values of the SMEFT degrees of freedom used to generate the pseudo-data, which here are set to zero. We find that the residuals are all very close to zero, i.e. the optimiser has managed to identify with good accuracy the true underlying values of the fit parameters.
In the same figure, we also show the corresponding values of the 95% CL on the fit parameters δc i . The units of the δc i are TeV −2 , and as in the rest of this work for reference we are assuming that Λ = 1 TeV. We find that for the L0 closure tests the values of the δc i bounds are small, indicating that all replicas coincide at essentially the same point in the parameter space at the end of the fit.   corresponding results at the level of the fits to the real data. We see from this comparison that some degrees of freedom will be constrained rather better than others: for instance one expects the bounds on OtG to be in the range of δc i 0.1 TeV −2 , while the bounds on Otp to be in the range δc i 100 TeV −2 . In Fig. 4.6 we show the values of the SMEFT degrees of freedom and their uncertainties, c i ± δc i , for the L2 closure test in the BSM scenario where one has set c 8 tu /Λ 2 = 20 TeV −2 , which is approximately twice as large as the 95% CL found by the L2 closure test to the SM. This allows us to ensure the starting point of the fit is BSM for this operator, but is not so far away from the SM as to make the closure test redundant. We firstly observe that the closure test does indeed find a best-fit value for c 8 tu /Λ 2 ≈ 13 TeV −2 , which is outside the error for this operator reported in the SM closure test. If one computes the fit residuals, Eq. (4.10) we find that the central value lies outside the 95% CL, which roughly corresponds to 2σ. Therefore at the fit level we would expect to find at least a 2σ deviation from the SM, and larger if the value of the coefficient is much larger than the size of the error associated to it.
It is however important to emphasise that the bounds reported in Figs. (4.5,4.6) need to be taken with a grain of salt, since some degrees of freedom are highly (anti-)correlated. To quantify this, in Fig. 4.7 we show the values of the correlation coefficient between the different degrees of freedom c i for the L2 closure test with SM reference values. The correlation coefficient between two of the degrees of freedom in the fit c i and c j is computed using the standard MC expression, namely From this comparison we see that some degrees of freedom are very correlated, for example the chromomagnetic operator OtG is highly correlated with the two-heavy-two-light operators O81qq and O11qq.

Methodological variations
We now turn to study the robustness of the baseline results with respect to a number of variations in the fitting methodology. In particular: (i) the impact of cross-validation; (ii) the effects of experimental uncertainties in determining the bounds on the SMEFT degrees of freedom; and (iii) the role of O Λ −2 corrections on these same bounds. We will always assume the SM; as we have shown above, closure tests will likewise work in the case of BSM scenarios.

Cross-validation.
As discussed in Sect. 4.3, it is important to ensure that over-fitting is avoided, and, to do so, we adopt cross-validation. To quantify the role that cross-validation plays on fit results, we perform two L2 closure tests, with the only difference that crossvalidation is absent in one of them.
In Fig. 4.8 we compare the fit residuals and the 95% CL of the fit parameters obtained from the two closure tests. When cross-validation is absent, the central values of the fitted degrees of freedom c i fluctuate around the true result (the SM in this case) rather more than when cross-validation is used. This is a consequence of the fact that the fit without cross-validation has overfitted the experimental data, and therefore the fluctuations around the true result have been enhanced. For example, r bW 2.5 without cross-validation, while it should be r bW 0 as we can see from the left panel. Moreover, from the right panel of Fig. 4.8 we see that the bounds obtained by the fit are usually stronger when cross-validation is switched off. However, in this case, they are methodologically biased, and one would incorrectly claim to have derived more stringent limits than the truth. These comparisons highlight that reliable results in a global SMEFT analysis can be obtained only if overfitting is avoided. Otherwise, deviations between experimental data and theory calculations, and/or stringent bounds on the fitted degrees of freedom can be misinterpreted as a sign of new physics, while they are instead a sign of methodological bias.
Characterising fit uncertainties. As explained in Sect. 4.4, L2 closure tests differ from L1 closure tests for the introduction of an additional set of fluctuations. Comparing closure tests at different levels allows one to identify the different components that build up the total uncertainty on the fit parameters δc i , for a more-in depth discussion applied to PDFs, see Ref. [33]. To begin with, L0 closure test results might have interpolation and extrapolation uncertainties: even if the fit to the data points is perfect, there will be non-zero uncertainties  in-between and outside the data region. In the SMEFT case, however these uncertainties vanish in L0 closure tests, since the associated parameter space is discretised over the N op = 34 independent degrees of freedom, and additional directions are never explored. The comparison between the values of δc i in L1 and L2 closure tests is more subtle. In the L1 case, the data uncertainty is propagated into the fit, see Eq. (4.1). Therefore, the component of δc i , that L1 closure tests identify, is associated to the finite precision of the input experimental measurements, and hence we call this the experimental component of the uncertainty. At L2, we additionally account for the fact that there are infinite different sets of {c i } that optimise the error function equally well. The spread among these solutions represent the irreducible redundant component of the uncertainty.
To illustrate the relative weight of these two components on the overall size of δc i , in Fig. 4.9 we show the bounds that are obtained in L0, L1 and L2 closure tests, leaving everything else unchanged. We find that there is a significant increase in the size of δc i when going from L0 to L1, but then there is only a very slight increase when going from L1 to L2.
The role of O Λ −4 corrections. Closure tests can also be used to assess the dependence of the fit results upon variations of the details of the theory calculations. Specifically, we are interested in the role played by O Λ −4 corrections in the determination of the bounds on the fitted degrees of freedom. As highlighted in Table 3.5, including O Λ −4 terms in the theoretical model modifies rather significantly the parameter space, by opening up new directions and by enhancing the sensitivity to those directions already covered by O Λ −2 terms. Therefore, despite the fact that pseudo-data are generated according to a given theory in a closure test, including or not O Λ −4 corrections implies that the corresponding results should in general be different.
In Fig. 4.10 we show the comparison of the residuals r i (left panel) and of the bounds δc i (right panel) for L2 closure tests between two fits that differ only for the inclusion (or not) of the O Λ −4 terms. Two degrees of freedom, Off and ObW, are not constrained in the fit without O Λ −4 terms, and are therefore set to zero. From this comparison, we see that the bounds on the coefficients δc i generally improve when O Λ −4 corrections are included in the theoretical calculation. For example, the bound on OtZ decreases from δc tZ 6 TeV −2 to δc tZ 2 TeV −2 . The slight worsening observed for the bounds on some few operators when   only linear terms are included is consistent with statistical fluctuations, and is therefore not significant. In any case, the fit results are qualitatively similar irrespective of the inclusion of O Λ −4 corrections. Note that some of the degrees of freedom are highly correlated, therefore the interpretation of the results at the individual bound level should be taken with care. 38

The top quark sector of the SMEFT at NLO
In this section, we present the main results of this work, namely we derive the constraints on the N op = 34 SMEFT dimension-6 degrees of freedom relevant for the interpretation of top quark production measurements at the LHC. We first discuss the fit quality and the agreement between experimental data and theoretical predictions for individual processes. We then present the best-fit values, the 95% confidence level intervals and the correlations for these degrees of freedom, and we compare our results with other related analyses in the literature. We also study the impact that both NLO QCD perturbative corrections and quadratic O Λ −4 terms have on the results. Finally, we assess the dependence of the fit results on the choice of input dataset, and quantify the dependence of the derived bounds on the high-energy limit of the cross-sections included in the fit.

Fit quality and comparison with data
We will first assess the quality of the fit at the level of both the total dataset and of individual measurements, and then compare the fit results with the input experimental cross-sections. In the following, as discussed in Sect. 2, our baseline fit is based on N rep = 1000 MC replicas and includes both NLO QCD corrections for the SMEFT contributions and the quadratic O Λ −4 higher order terms. In Table 5.1 we indicate the values of the χ 2 per datapoint for each of the datasets included in the fit. In each case, we indicate the values of χ 2 /n dat first when the theory calculations include only the SM contributions (second column) and then once they account for the SMEFT corrections after the fit (third column). In the last column, we indicate the number of data points n dat for each dataset. The datasets are classified into three groups following the structure of Tables 3.1-3.3: inclusive tt, tt in association with V , H, or heavy quarks, and single top production. In the case of datasets consisting of multiple differential distributions, we indicate the one that has been included in this analysis.
From the values in Table 5.1 we find that the overall fit quality to the n dat = 103 data points included in the fit is satisfactory, with of χ 2 /n dat = 1.06 (1.11) after (before) the fit. We find therefore a slight improvement in the overall fit quality once the dimension-6 SMEFT corrections are taken into account. Note however that this improvement is not inconsistent with statistical fluctuations, since for 103 points one expects ∆ χ 2 /n dat 0.1. For most of the individual datasets, the SM description of the input measurements is already good to begin with. In several cases, the χ 2 decreases once the SMEFT corrections are accounted for. For instance, the ATLAS m tt distribution at 8 TeV improves from χ 2 /n dat =1.51 to 1.25, and the CMS ttbb cross-section improves from 5.0 to 1.29. As expected in a global fit, given that the figure of merit being optimised is the total χ 2 , Eq. (4.4), for some datasets the overall fit quality is unchanged or slightly worsened as compared to the SM prediction.
From Table 5.1, we notice that the only experiment for which the χ 2 /n dat worsens significantly after the fit is the ATLAS ttZ cross-section measurement at 8 TeV, whose SM value of χ 2 /n dat = 1.32 increases to 5.29 after the fit. The origin of this poor χ 2 value can be traced back to some tension between the ATLAS and CMS measurements of the same observable. Indeed, as shown in Fig. 5.3, the ATLAS ttZ cross-section at 8 TeV lies somewhat below other measurements of the same quantity, in particular of the precise CMS measurement at 13 TeV. This exception aside, we find overall a good agreement between the theory calculations and  The values of the χ 2 per data point for each of the datasets included in the fit. In each case, we indicate the values of χ 2 /n dat first when the theory calculations include only the SM contributions (second column) and then once they account for the SMEFT corrections, after the fit (third column).
In the last column we indicate the number of data points n dat . Datasets are classified in three groups following the structure of Tables 3.1-3.3: inclusive top quark pair production; tt production in association with heavy quarks, vector bosons, and Higgs bosons; and inclusive and associate production of single top quarks. In the case of datasets made of multiple differential distributions, we indicate the one that has been used in the analysis.  Comparison between ATLAS and CMS experimental data on the total inclusive tt (left) and single top t-channel (right) production cross-sections at 8 TeV and 13 TeV with the corresponding SM calculations and with the results of the SMEFT analysis. In the case of the SM calculations, we also shown the associated PDF uncertainties. Results are shown normalised to the central value of the SM prediction. Note that these inclusive cross-sections are not used as input to the fit (to avoid double counting with the corresponding differential distributions). the data used in the fit.
We now turn to present the comparisons between the results of the present SMEFT fit and the ATLAS and CMS input experimental data. We will also show comparisons for observables that are not included in the fit to avoid double counting, but which are anyway interesting to visualise in order to understand the main features of our results.
To begin with, in Fig. 5.1 we show a comparison between the ATLAS and CMS experimental data on the total inclusive tt and single top t-channel production cross-sections at 8 TeV and 13 TeV with the corresponding SM calculations and with the results of the SMEFT fit. For the single top case, we show separately the top and the anti-top cross-sections. In the case of the SM calculations, we also show the associated PDF uncertainties. Note that none of these total cross-sections (apart from the ATLAS 13 TeV single-top cross-section) are included in the SMEFT fit, since we already include the corresponding differential distributions. See Sect. 3 for more details about measurements shown in this comparison. In Fig. 5.1, and in all subsequent comparisons, results are shown normalised to the central SM prediction.
From the comparisons in Fig. 5.1, we find good agreement between the data and the SM predictions. The SMEFT fit result typically moves towards the direction of the central experimental data point by an amount which corresponds to at most |δ th | 1% and 3% of the SM prediction for inclusive tt and single-top production respectively, well below the experimental uncertainties. This SMEFT-induced shift in the theory predictions at the fit level is defined as with c i and c i c j represent the averages of the fitted SMEFT coefficients computed over the MC replica sample. While this shift is small for these precisely measured inclusive processes, this is not necessarily the case for differential distributions and for rarer top production processes, such as for single and top-pair production in association with vector bosons, as we will show below. Next, in Fig. 5.2 we show a similar comparison as in Fig. 5.1 now for differential distributions in inclusive top quark pair and single top t-channel production. Specifically, we show the invariant mass distribution in tt production from ATLAS at 8 TeV and CMS at 13 TeV (2016 dataset), and the rapidity distributions in single top quark production in the t-channel from ATLAS at 8 TeV and from CMS at 13 TeV. In the latter case the top and anti-top quarks are combined into a single distribution.
From these comparisons, we find a similar level of agreement for the differential distributions as for the inclusive cross-sections. In the case of the m tt distributions from AT-LAS and CMS, the most marked effect comes from the rightmost bin of the distributions, where energy-growing effects are more important. We find that the SMEFT-induced shift is δ th = +13% (+40%) at m tt 1.4 TeV (1.6 TeV) for the ATLAS 8 TeV (CMS 13 TeV) measurements. In Sect. 5.4 we will show that results do not change if the m tt distributions are replaced by the corresponding y tt ones where the energy-growing effects are absent. In the case of the ATLAS y t+t distribution in single top t-channel production, we observe how the data pulls the fit results. For this process, the SMEFT-induced shifts are around δ th −2.5% for all the data bins for the rapidity distributions in t-channel single-top production, both at 8 TeV and at 13 TeV.
In Fig. 5.3 we show the corresponding comparison between experimental data and theory predictions for the ATLAS (labelled as 'A') and CMS (labelled as 'C') measurements of the cross-sections for single top production in the s-channel and in the tW and tZ associated production channels. We include in this comparison the results for the most updated measurements both at 8 TeV and at 13 TeV. In general, there is good agreement between the theory calculations and experimental data. The biggest SMEFT-induced shift is found for the s−channel cross-sections at 8 TeV, where δ th +35%. For single top production in association with a W boson, there is a negative shift of δ th −6%, similar for the two centre-of-mass energies. From the comparison of Fig. 5.3 we can also observe how in some cases the SMEFT fit interpolates between the ATLAS and CMS measurements, for instance for the t + W cross-sections at 13 TeV and the s−channel cross-sections at 8 TeV.
Considering now the ttV processes, in Fig. 5.3 we show the corresponding plot for the measurements of the production cross-section of a top quark pair associated with a W or Z vector boson. We may observe here the origin of the poor agreement of the ATLAS ttZ measurement at 8 TeV with the theory prediction after the fit reported in Table 5.1. Indeed, we find that for this process the ATLAS 8 TeV measurement (normalised to the SM prediction) barely agrees within uncertainties with the corresponding CMS 13 TeV crosssection, which exhibits the smallest uncertainties and thus dominates in the fit. For these tt + V processes, the SMEFT-induced shifts are δ th +23% (+11%) for tt + W at 8 TeV (13 TeV) and δ th +26% (+31%) for tt + Z at 8 TeV (13 TeV). These shifts are rather larger than for the corresponding inclusive cross-sections shown in Fig. 5.1, as allowed by the larger experimental uncertainties.
Finally, to complete this set of comparisons between the input experimental data and the corresponding theory calculations before and after the fit, we show in Fig. 5.4 the W helicity fractions F 0 , F 1 , and F 2 from ATLAS and CMS. There is good agreement between data and theory, and the δ th shifts are quite small. In the same figure, we also show the corresponding comparisons between data and theory predictions for the CMS measurements of ttbb and tttt at 13 TeV, as well as for the ttH cross-section measurements from ATLAS  and CMS at 13 TeV. Here the SMEFT-induced shifts are larger than for other processes, and we find δ th +10% for tttt production, δ th −21% for ttbb production, and δ th +15% for Higgs boson production in association with a tt pair.
As expected from the good agreement between the experimental data and the theory calculations already at the SM level reported in Table 5.1, the overall pattern that is observed in these data/theory comparisons is that the SMEFT-induced shifts are (in relative terms) larger for observables with larger experimental uncertainties, and smaller for more precisely measured cross-sections such as in inclusive tt production. In all cases, these shifts δ th are smaller or at most comparable to the corresponding uncertainties of the experimental data.

The top quark degrees of freedom of the SMEFT
We now discuss the main results of this work. In the following, we present the fit results for the central values c i , Eq. (4.5), and the corresponding 95% CL uncertainties, δc i , for the N op = 34 dimension-6 SMEFT degrees of freedom relevant for the interpretation of top quark production measurements at the LHC. We also study the cross-correlations between these degrees of freedom. They provide an important piece of information since we know from the closure tests of Sect. 4.4 that these correlations might be large because of flat directions in From this comparison, we find that the fit results are in good agreement with the SM within uncertainties, the fit residuals satisfying |r i | ≤ 0.4 for all operators. Note that the correlations between degrees of freedom imply that the fluctuations around the best-fit results are in general smaller as compared to the case in which all operators are completely independent.
From Fig. 5.5, we also observe that there is a rather wide range of values for the fit uncertainties δc i obtained for the different degrees of freedom. For example, a very small uncertainty is found for the coefficients associated to OtG or O83qq, while much larger uncertainties are obtained for the fit coefficients associated to other degrees of freedom, including all the four-heavy-quark operators, such as OQQ1, and for Otp. In most cases, the origin of these differences in the size of the δc i uncertainties can be traced back to Table 3.5: different degrees of freedom are constrained by different processes, and in each case the available amount of experimental information varies widely. For instance, the four-heavy-quark operators are constrained by only two data points (the bbtt and tttt cross-sections), hence the large uncertainties of the associated coefficients. Likewise, Otp is only constrained from the tth cross-section measurements.
The interpretation of the 95% CL uncertainties shown in Fig. 5.5, requires some care. The reason is that the available data on top production at the LHC, summarised in Tables 3.1-3.3, does not allow us to fully separate all possible independent directions in the SMEFT parameter space. As a consequence, as illustrated in Sect. 4.4 at the closure test level, there will be in general large (anti-)correlations between the fit parameters, reflecting this degeneracy in the parameter space. As we will show now, in general more stringent bounds are obtained if each To quantify this point, in Fig. 5.6 we show a heat map indicating the values of the correlation coefficient, Eq. (4.11), between the 34 degrees of freedom constrained from the fit. In this heat map, dark blue regions correspond to degrees of freedom that are significantly correlated, while light green regions are instead degrees of freedom that are significantly anticorrelated. Indeed, we find that specific pairs of coefficients c i exhibit a significant amount of (anti-)correlation, such as for instance O1qd and Otp. The effects of such correlations are ignored in fits where these degrees of freedom are constrained individually rather than marginalised from the global fit results, and lead in general to artificially tighter constraints.
Given the overall agreement between the fit results and the SM, it becomes possible to interpret the uncertainties δc i as upper bounds on the parameter space of the SMEFT degrees of freedom. Such upper bounds provide important information for BSM model building, since they need to be satisfied for any UV-complete theory at high energies that has the SM as the low-energy effective theory. These bounds can also be compared with previous SMEFT studies of the top quark sector reported in the literature. While on the one hand our global SMEFT analysis is based on a wider LHC dataset than previous analysis of top quark production, on the other hand it explores a larger parameter space with reduced model assumptions. Therefore, a priori, one could either expect stronger (from the larger dataset) or weaker (from the reduction in model assumptions) bounds as compared to previous studies: only performing the actual fit itself can shed light on this question.
In order to compare with previous results, we will follow here the discussion in Appendix A of the Top LHC Working Group EFT note [10], to where we direct the reader for further details. We note that the results quoted in [10] are in many cases restricted to fitting one operator at a time, or at most marginalising over a small subset of operators, and thus these limits might be too optimistic due to neglecting correlations with other directions in the SMEFT parameter space. We will quote here both the direct limits obtained from the top-quark measurements, and the indirect limits derived from non-top processes such as lowenergy observables, the decays of B mesons, electroweak precision observables, and Drell-Yan production. See Sect. 2.3 for a related discussion of the existing experimental constraints on SMEFT degrees of freedom that do not involve top quarks. In Table 5.2 we report the values of the 95% confidence level bounds (in units of TeV −2 , assuming Λ = 1 TeV) for the coefficients of the 34 SMEFT degrees of freedom derived from the marginalisation of the results of the SMEFiT global analysis. We compare our results with those obtained elsewhere in the literature either from the direct analysis of top quark production ("direct") or from indirect bounds from other processes not involving top quarks ("indirect"). We note that for several degrees of freedom, such as for Off and Otb1, the bounds reported here have been obtained for the first time. In Table 5.3 we additionally show the results for the differing theory settings used in the global fit; namely using only O(Λ −2 ) corrections and LO QCD in the SMEFT calculations.
As recommended in [10], it is important to also quote the bounds derived from fitting individual coefficients, one at a time, in order to compare them with the global fit results. The results from such single-operator fits are provided in Table 5.4 using the same settings as in the baseline global fit (as well as by varying the theory settings, see the discussion in Sect. 5.3). In the case of the individual fits of the operators that are very loosely constrained (in particular, for most of the four-heavy-quark degrees of freedom) we find that the SMEFiT approach is affected by numerical stability issues. Therefore, for such operators (identified in italics), it is more reliable to quote instead the 95% CL bounds obtained from the analytical minimisation of the χ 2 , which for these cases has a relatively simple form.
By comparing the bounds obtained in the global and individual fits, Tables 5.2, 5.3 and 5.4 respectively, one finds that for essentially all degrees of freedom the bounds obtained from the individual fits are either more stringent than or comparable to the marginalised results from the global fit. As discussed above, the reason for this can be traced back to the fact that within the single-operator fits one is neglecting cross-correlations between the different directions spanned by the fitted degrees of freedom. For instance, the 95% CL bound associated to OtG is [−0.4, +0.4] in the global fit, while it is [−0.08, +0.03] if the corresponding coefficient is fitted individually. Another example is Otp, whose bound is [−60, +10] in the global fit, and [−5.3, +1.6] in the individual fit, i.e. it is more stringent by about an order of magnitude. Another important advantage of providing the results for the individual operators is that it allows us to better assess the impact that varying the theory settings has on the fit results. For instance, as we will discuss in Sect. 5.3, accounting for the quadratic O(Λ −4 ) terms leads to an improvement in the bounds of most operators, but assessing this effect is more transparently done in the case of the individual than in the global fits, where one has additional factors to take into account in the interpretation of the results.
The graphical representation of the comparison between the global fit results and the bounds reported in the LHC top WG EFT note (Table 5.2), as well as with the individual fit results (Table 5.4), is shown in Fig. 5  the individual bounds are in general rather tighter than the marginalised ones, except for some of the four-heavy-quark operators (and for OtZ) where they are instead comparable. Another useful way to present our results is by representing the bounds on Λ/ |c i | that are derived from the fit. This is interesting because, assuming UV completions where the values of the fitted degrees of freedom c i are O(1), plotting the results this way indicates the approximate reach in energy that is being achieved by the SMEFT global analysis. This comparison is shown in Fig. 5.8, which is the analogous plot as Fig. 5.7 now representing the same bounds as bounds on the ratio Λ/ |c i | (now only for the marginalised bounds from the global fit). We find that for the degrees of freedom that are better constrained we achieve sensitivity up to scales as high as Λ 1.5 TeV, in particular thanks to the chromomagnetic operator OtG which is well determined from the differential measurements of top quark pair production. Future measurements based on larger statistics should allow us to prove even higher scales, in particular by means of the high-luminosity LHC datasets.

The impact of the NLO QCD and O(Λ −4 ) corrections
The baseline fit results presented above are based on theory calculations that account both for the NLO QCD corrections to the SMEFT contributions and for the quadratic O Λ −4 terms in Eq. (2.2), see also the discussion in Sect. 2. Here we aim to assess the robustness and stability of our results by comparing the baseline fit results with those of fits based on two alternative theory settings. Firstly we compare with a fit where only LO QCD effects are included for the SMEFT contributions, and then with a fit that includes only the linear O Λ −2 terms in the effective theory expansion (but still based on NLO QCD for the SMEFT

contributions).
These comparisons have been carried out in the case of both the marginalised results obtained from the global fit and of the fits to individual degrees of freedom. In Table 5.3 we show the 95% CL bounds on the fitted degrees of freedom obtained in the global analysis, and compare the results obtained using the baseline theory settings with those obtained either when only the linear O Λ −2 terms are included or when only LO QCD calculations are used for the SMEFT contribution. In Table 5.4 we show the corresponding comparison in the case of individual fits. Recall that, as mentioned above, some of the individual bounds reported in Table 5.4 have been evaluated from the analytical minimisation of the χ 2 , which for those cases is more robust than the numerical minimisation.
As can be seen from Table 5.4, the individual bounds that one obtains at O Λ −2 are very loose for most of the four-heavy-quark operators. This indicates that, using only the linear SMEFT contribution, one has very limited sensitivity to these degrees of freedom. For this reason, we do not attempt to quote any bounds for the four-heavy-quark operators in the global fit based on O Λ −2 theory in Table 5.3: this small sensitivity might hinder the reliability of numerical approaches such as the ones we adopt here. This problem goes away once we include the O Λ −4 contributions, due to the additional sensitivity provided by the quadratic terms. In this case, we can reliably quote 95% CL bounds for both global and individual fits.
In Fig. 5.9 we show the graphical representation of the bounds reported in Table 5.3 for  Similar considerations apply to those operators whose bounds in the global fit worsen when the NLO QCD corrections to the SMEFT contributions are missing. First, for these three operators, namely Otp, Ofq3, and Off, the bounds are relatively loose due to the limited fit sensitivity, so they are potentially affected by larger statistical fluctuations. Second, at the level of individual fits, one finds that including or not NLO QCD effects has essentially no impact on the resulting bounds. Therefore, the observed effect is most likely a consequence of the fact that adding NLO QCD corrections rearranges the weight of the different degrees of freedom in the global fit, leading to an overall modification of the bounds.
Concerning the impact of the quadratic O Λ −4 terms, from the comparisons in Fig. 5.9, we find that for most degrees of freedom the bounds are similar regardless of whether or not these quadratic terms are included in the fit. This is the expected behaviour for those operators for which the dominant sensitivity in the fit arises already at O Λ −2 , as indicated in Table 3 As already mentioned several times, within a global fit it is in general not possible to precisely pinpoint how a variation of the theory settings translates into a difference in the resulting constraints on the fitted degrees of freedom, with obvious exceptions such as for those operators whose contributions vanish at O Λ −2 . For such assessment, the results of the single-operator fits reported in Table 5.4 are more suitable. For example, from the results obtained in the single operator fits we can confirm that the improvement in the bounds obtained for the O8qt and OtZ degrees of freedom upon the inclusion of the quadratic O Λ −4 corrections is genuine, rather than an artefact of the global fit. The impact of including the quadratic terms is particularly manifest for the four-heavy-quark degrees of freedom, where one finds improvements of up to several orders of magnitude. For instance, while for Otb1 the linear bounds are almost non-existent, −2 · 10 4 , −1.4 · 10 3 , they are improved down to [−6.8, +6.8] once the O Λ −4 contributions are taken into account.

Dataset dependence and high-energy behaviour
Within the SMEFiT framework it is straightforward to repeat the analysis with arbitrary variations of the input dataset. To investigate the dependence of our results with respect to this choice of input dataset, in Fig. 5.10 we show a similar comparison as that of Fig. 5.7, now assessing how the baseline fit results vary if a different input dataset is used. In the first case, instead of the m tt distributions indicated in Table 5.1, we use the corresponding y tt distributions for the inclusive tt production measurements. In the second case, the fit is performed only using inclusive tt production measurements as input, and excluding all other processes. Note that in the latter case the fit has sensitivity to only a subset of 15 degrees of freedom.
The rationale behind performing a fit replacing the m tt distributions in inclusive top-quark pair production with the corresponding y tt ones is to gauge the sensitivity of our results to the high-energy region, since the m tt distribution is the one more directly sensitive to it. This was also illustrated by the large values of the SMEFT-induced shifts δ th found in the comparisons with experimental data at large m tt in Fig. 5.2. Although high-energy measurements enhance the sensitivity to SMEFT effects, one should avoid being dominated by the highest energy bins since this could jeopardise the effective theory interpretation. Therefore, one would ideally like to see that the bounds do not become markedly worse once the m tt distributions are replaced by the y tt ones, since that would otherwise indicate that fit results are determined by high-energy events.   Table 5.1 we use the corresponding y tt distributions. In the second case, the fit is performed only using inclusive tt production measurements as input.
Concerning the fit based only on inclusive tt measurements, one would like to find that the bounds obtained from a SMEFT fit to a partial dataset are comparable to or looser than those from the baseline global dataset. Note that this is a non-trivial consistency check of the whole methodology; when additional experimental constraints are included in the analysis, then the bounds on the fitted coefficients must by necessity be either unchanged or smaller. If this were not the case, it would imply that fit results are driven not by the experimental data but by biased methodological choices.
From the comparison in Fig. 5.10 between the fits with either the baseline dataset or the tt-only dataset, we find that the constraints on OtG are unchanged. This result is not unexpected, since it is well-known that the information on the chromomagnetic operator is dominated by inclusive tt production. We also observe that the bounds for some of the 2-light-2-heavy degrees of freedom such as O83qq and O81qq worsen, presumably as a consequence of the missing constraints provided by other processes, such as tt production in association with W or Z bosons. Indeed, for all the degrees of freedom directly constrained by the inclusive tt measurements, the bounds found in the global fit are comparable or superior to those obtained in the tt-only fit.
The other comparison shown in Fig. 5.10 is that between the fit with the baseline dataset and with the same dataset where we have replaced the m tt distributions with the corresponding y tt ones. In this case, we find that the results are qualitatively stable, and do not display large differences. For a subset of the degrees of freedom, in particular those constrained by inclusive tt data, we find that somewhat more stringent bounds are obtained in the fits based on the m tt , rather than the y tt , distributions. For instance the bounds on the coefficient of OtG are found to be [−0.4, +0.4] when fitting m tt and [−0.8, +0.8] when fitting instead y tt . These results suggest that indeed the fit benefits from the high-energy reach of the m tt distributions, although only slightly.
Another way to study the impact that the SMEFT corrections have on the description of the experimental data at high energies is to focus on the constraints provided by the tail of the invariant mass distribution m tt in top quark pair production, where energy-growing effects enhance the sensitivity to SMEFT corrections [161]. In order to highlight the impact that these energy-growing effects have on the description of the m tt tails, it is useful to compute the shift induced by the SMEFT corrections to the SM calculation separated into the contributions from different degrees of freedom. For simplicity, in the following we restrict ourselves to the linear O Λ −2 corrections. In this case, following the notation of Eq. (2.2), we want to compare the size of the individual corrections defined as for the different bins of the m tt distribution, and identify which degrees of freedom dominate at high energy. Note that, as discussed in Sect. 2.5, in general there are several reasons why a given operator might or might not lead to energy-growing effects. In Eq. (5.3), we will use as δc i the 95% CL bounds for the baseline fit reported in Table 5.2. In Fig. 5.11 we show the values of the SMEFT-induced shifts, Eq. (5.3), for the different bins of the m tt distribution from the CMS measurement at 13 TeV in the lepton+jets final state, based on an integrated luminosity of L = 36 fb −1 [99], which has the best coverage of the TeV region. To facilitate the visualisation, we restrict ourselves to the contributions associated to four representative degrees of freedom: OtG, O81qq, O8qt, and O8ut. For reference, we also show the corresponding total experimental uncertainty for each of the m tt bins.
We observe that several operators lead to effects that grow with the energy. The steepest growth is found for the O8ut degree of freedom, but other operators that lead to energygrowing effects are O81qq and O8qt. Other operators are less sensitive to the high-energy region. This is illustrated by the case of OtG, whose sensitivity is concentrated in the tt threshold production region. It is therefore clear that pushing the reach of the experimental measurements deep into the TeV region will further increase the sensitivity to these energy-growing degrees of freedom. In this respect, a major concern will be to appropriately disentangle potential SMEFT signatures from the information used to constrain the proton structure in global fits, in particular the large-x gluon.
It should be emphasised that the individual shifts in Fig. 5.11 cannot be directly combined to construct the actual shift to the SM prediction in each cross-section bins, due to the replicaby-replica correlations between the various degrees of freedom. With this caveat, it is clear that the SMEFT-induced shifts could not be much larger than the bounds derived in this analysis without degrading the agreement between theory predictions and experimental data, a similar conclusion that what was derived from the comparisons with experimental data shown in Figs. 5. 1-5.4 To conclude this discussion about high-energy effects, another of the input processes in the fit that in principle is sensitive to the high energy region is tttt production, where the invariant mass of the 4-top final state m tttt can reach values of up to several TeV. In order to further assess the stability of our results with respect to the high-energy region, we have repeated the baseline fit imposing different cuts on the value of the 4-top invariant mass, from a loose cut requesting m tttt ≤ 3 TeV to a more stringent cut with m tttt ≤ 1 TeV. The  production @ 13 TeV, CMS lepton+jets L=36 fb t t Figure 5.11. The shifts induced by representative SMEFT degrees of freedom to the SM crosssections, Eq. (5.3), for the m tt distribution in the top quark pair production measurements at √ s = 13 TeV from CMS, based on L = 36 fb −1 and the lepton+jets channel [99]. We show the shifts arising from the linear (left) and from the purely quadratic (right) terms. The shifts ∆ (smeft) i have been computed assuming the 95% CL bounds δc i of the baseline fit reported in Table 5.2. For reference, we also indicate the size of the corresponding experimental uncertainties.
results of these fits are displayed in Fig. 5.12, and do not show any sensitivity to the value of m tttt adopted in the theory calculation. We recall that in the current analysis a single tttt cross-section has been included; future measurements of this process, including possibly in differential form, could then become more sensitive to the high-energy region.

Summary and outlook
In this work we have presented a novel approach to carry out global analyses of the SMEFT. This new framework, which we have denoted by SMEFiT, is flexible, modular, robust upon enlarging the fitted parameter space, and resilient with respect to problems that arise frequently in SMEFT fits such as degeneracies and flat directions. Its main ingredients are the MC replica method to construct a representation of the probability distribution in the space of dimension-6 SMEFT degrees of freedom, and cross-validation, which prevents of over-fitting. Our results are provided as a sample of N rep MC replicas, which can be used to derive predictions for related cross-sections and combined with other constraints on the SMEFT parameter space.
As a proof-of-concept of the SMEFiT framework, we have presented a detailed analysis of top quark production measurements at the LHC 8 TeV and 13 TeV. We have included a wide range of top quark data, in terms of total rates and differential distributions. The theoretical SM and SMEFT cross-sections have been evaluated including NLO QCD corrections by default; in the SM case, we have also considered NNLO effects for the most accurately measured processes, namely differential distributions in tt and single top t-channel production. This combination of state-of-the art calculations with precision LHC measurements has allowed us to provide constraints on N op = 34 independent operators from the dimension-six Lagrangian in the Warsaw basis.
Our results are in good agreement with the SM expectations: we find that all the N op = 34 fitted SMEFT degrees of freedom are consistent with the SM result within uncertainties at the  95% CL. We have compared our results with existing bounds on the same operators presented in the literature, and have provided individual constraints on the operators in the SMEFiT framework. We have also studied the robustness of our results with respect to the inclusion of higher-order NLO QCD corrections, or O Λ −4 effects, and variations of the input dataset. We have found that including either NLO QCD corrections to the SMEFT contributions or the quadratic O Λ −4 terms leads to stronger bounds for most of the degrees of freedom in the fit. The results of this analysis are available upon request as a sample of N rep = 1000 MC replicas representing the probability distribution in the space of Wilson coefficients for the N op = 34 SMEFT operators considered here. These replicas can be used to compute statistical properties of the distribution such as variances, correlations, and higher moments, and can be combined with other processes that provide complementary information on the SMEFT parameter space.
The study presented in this work is the first proof-of-principle application of the SMEFiT framework. Further studies and extensions can be envisioned. The next steps will be to consider a larger basis of fitted SMEFT operators by including other types of LHC processes beyond top quark production in the input dataset. These new measurements should include total rates and differential distributions in Higgs production, single and pair production of electroweak vector bosons, and also other processes directly sensitive to the TeV region, such as di-jet and multi-jet production. Eventually, one also might need to account for measurements from previous colliders such as LEP and from lower energy experiments. In this respect, our results pave the way towards a truly global fit of the SMEFT at dimension-six where direct constraints are simultaneously provided for the majority of the operators.
With these considerations, the dimension-6 SMEFT four-quark operators relevant for the interpretation of top quark measurements at the LHC are the following: Recall that these operators satisfy all the symmetries of the SM, in particular gauge symmetry before electroweak symmetry breaking. Another class of relevant SMEFT operators are those that contain two quarks coupled to Higgs fields or gauge boson fields; the ones relevant for 59 top quark measurements are given by: where W I µν and B µν are the field-strength tensors of the electroweak interaction and G A µν is the QCD one.
In Eqns. (A.1) and (A.2), non-Hermitian operators are indicated with a double dagger symbol. In the case of Hermitian operators involving vector Lorentz bilinears, complex conjugation is the same as the transposition of generation indices: O (ij) * = O (ji) and by extension, for four-fermion operators, O (ijkl) * = O (jilk) . In addition, it is understood in the notation above that the implicit sum over flavour indices only includes independent combinations.