1 Introduction

The lack of any clear signals of New Physics from the experiments at the Large Hadron Collider (LHC) suggests the need to move towards precision measurements, with the aim to use these as a means to detect beyond the Standard Model effects indirectly. Of the Standard Model (SM) parameters, the top-quark mass is of significant importance, due to the strength of its coupling to the Higgs boson, the rôle it plays in governing the stability of the electroweak vacuum, and the fact that it is an important input to calculations of several backgrounds for important LHC processes.

Approaches to top-quark mass extractions typically follow one of two paths: either a direct reconstruction of the top-quark decay products is attempted, or the dependence of kinematic distributions and total cross sections on the mass is exploited. The former approach generally relies on modelling by Monte Carlo event generators, resulting in the determination of a so-called ‘Monte Carlo mass’. There remains some debate as to the interpretation of this quantity and whether it can be connected to a well-posed definition. The reliability of the latter method, meanwhile, is dependent upon the perturbative accuracy of the theoretical calculation which is used to extract the mass. It is also dependent, to a certain extent, on Monte Carlo modelling, which is used to extrapolate the measurement to the full phase space from the fiducial region. A comprehensive review of these issues can be found in Ref. [1].

At the same time, the methodology of the fitting procedure must be carefully considered. It is important to bear in mind that, in addition to any dependence on SM parameters, any theoretical calculation at a hadron collider also implicitly relies on a number of parton distribution function (PDF) parameters which are normally extracted in separate fits. There may be significant correlations between the extracted SM parameters and the externally-fitted PDF parameters, which if not taken into account may bias the extracted result. This has been explicitly demonstrated in the case of extractions of the strong coupling, \(\alpha _s(M_Z)\) [2], and consequently extractions of the strong coupling are typically performed within PDF collaborations [3,4,5,6,7]. The same considerations apply more generally [8, 9], and hence also hold in the case of the top-quark mass.

Previous top-quark mass extractions have largely relied on measurements of the \(t\bar{t}\) total cross section [10,11,12], utilising next-to-next-to-leading order (NNLO) theory predictions obtained via the program top++ [13, 14]. In addition, attempts to incorporate differential information for the \(t\bar{t}\) process have been obtained using NLO theory predictions both alone [15,16,17,18,19,20] and alongside a PDF fit [21, 22]; indeed, the ability of the differential information to constrain \(m_t\) in a global fit context was pointed out in Ref. [23], albeit in the context of error updating rather than full refitting. Extensions to include NNLO theory predictions have been performed in a joint fit with \(\alpha _s(M_Z)\) (including the exact NNLO top-quark mass dependence) [24] or in a global fit (where the NNLO dependence for top-quark masses other than \(m_t=173.3~\textrm{GeV}\) is approximated through the use of NLO K-factors) [25]. Finally, the single-top production process has been used to assess top-quark mass bounds [26, 27], in some cases using NNLO theory predictions [28], although the process itself shows a reduced sensitivity. A summary of further top-quark mass measurements can be found in Ref. [29].

In this work, we combine calculations at the highest available order in the strong coupling for top-quark pair production (NNLO) with differential measurements from the ATLAS and CMS experiments taken at a centre-of-mass energy of 8 TeV [30, 31]. We utilise the MSHT global PDF fitting framework to simultaneously constrain the top-quark mass as well as the PDF parameters, thereby naturally incorporating any correlations between these quantities.

The paper is organised as follows. In Sect. 2, we detail the datasets which we use in this work as well as the theoretical inputs. We discuss the fitting procedure which we use to assess the top-quark mass sensitivity and to ultimately obtain best-fit values with associated uncertainties. In Sect. 3 we present the resulting global fit in the \(\alpha _s(M_Z),\,m_t\) plane and comment on the individual distributions. Section 4 examines the constraints on the mass obtained using the default MSHT setup, while Sect. 5 analyses the effects of different treatments of the available measured kinematic distributions. In Sect. 6 we discuss the extent to which the top-quark datasets can contribute to constraints on the strong coupling, and in Sect. 7 we illustrate the effect of the mass on the gluon PDF. We present a brief summary of our findings in Sect. 8 and finally conclude in Sect. 9.

2 Setup of the analysis

2.1 Experimental datasets

We consider the subset of top-quark measurements included in the MSHT20 PDF fit [32] for which the NNLO QCD predictions at a variety of top-quark masses are currently available. We will therefore primarily focus on the single differential data in the lepton+jet channel at 8 TeV from ATLAS [30] and CMS [31]. These data are each presented differentially in four distributions, namely the top-quark pair invariant mass, \(M_{t\bar{t}}\), the individual top-quark/antiquark transverse momentum, \(p^T_t\), and the individual and pairwise rapidities, \(y_{t}\) and \(y_{t\bar{t}}\). In MSHT, the absolute distributions are chosen in preference to the normalised, in order not to lose constraining information from the total cross-section integrated over bins.

For the case of the ATLAS data, the data is provided in both absolute and normalised form and so we choose the former. In addition, the full statistical correlations between distributions are provided [30, 33], in principle allowing all four distributions to be fit simultaneously. In practice, this has been found to be very difficult by several groups [4, 23, 32, 34,35,36,37,38,39,40,41], and problems have been encountered not only in fitting the different distributions together but even in fitting the individual rapidity distributions. As a result, several different approaches have been taken to the inclusion of these data [4, 32, 33, 35, 40, 41]: here we will follow the MSHT20 approach and decorrelate the parton shower systematic between all four distributions and additionally into two components within each of the distributions. This is a conservative approach and we will analyse the effects of different choices of decorrelation and distribution in Sect. 5, where we will also elaborate further on our procedure.

For the corresponding CMS data, the statistical correlations between the distributions are not provided – we therefore fit one distribution at a time, with our default choice being the top-antitop pair rapidity, as in MSHT20. We consider alternative choices of distribution in Sect. 5. All CMS measurements which we consider were originally presented as normalised distributions but have been converted to absolute distributions using the corresponding total cross section data [11]. Further details on the inclusion of these datasets within MSHT20 are provided in Refs. [32, 35, 42].

In this study we do not include data taken in the dilepton channel, e.g. the ATLAS single differential data at 7 and 8 TeV [43] or the CMS double differential data [44], even though they are included by default in MSHT20. This is because the NNLO theoretical predictions are only publicly available at a single top-quark mass.

Whilst we focus our attention largely on the more constraining single differential measurements in the lepton+jet channel at 8 TeV, a number of measurements of the total \(t\bar{t}\) cross section from ATLAS [45,46,47,48,49,50,51] and CMS [52,53,54,55,56,57,58,59] at 7 TeV and 8 TeV, and at the Tevatron [60] were included in the MSHT20 PDF fit. We retain this (non-exhaustive) set of data in our analysis. We leave an analysis of the full set of top-quark total cross section data to a future study.

2.2 Theoretical predictions

Throughout this work, we use theoretical predictions incorporating at least NNLO QCD corrections and work with the top-quark pole mass. For the total \(t\bar{t}\) cross section we use the NLO QCD prediction from APPLGrid-MCFM [61, 62] and the NNLO prediction as implemented in top++, evaluated at the scale \(\mu _R=\mu _F=m_t\)  [13, 14, 63,64,65]. Central predictions are made at a pole mass of \(m_t= 172.5~\textrm{GeV}\), and a variation of \(1~\textrm{GeV}\) in the mass is translated into a 3% change in the cross section.Footnote 1 The differential cross section for this process is known up to NNLO QCD [68, 69] with NLO EW and resummation effects [70, 71]. In this work we use the NNLO QCD predictions [72,73,74] computed using the Stripper framework [75, 76]. These are implemented via fastNLO tables [77,78,79], which allow for rapid evaluations for a fixed set of observables and binnings. We use the same tables which were first made available in Ref. [24], with top-quark masses of \(m_t=\{169.0,171.0,172.5,173.3,175.0\}~\textrm{GeV}\) and which share the binnings of the experimental measurements. In addition, we supplement the NNLO QCD predictions with NLO electroweak (EW) K-factors [71].

The renormalisation and factorisation scales chosen are those found to be optimal in Ref. [74], namely,

$$\begin{aligned} \mu _R&=\mu _F=H_T/4, \;\; \text {for} \;\; M_{t\bar{t}},\, y_{t}, \, y_{t\bar{t}}\,, \end{aligned}$$
(1)
$$\begin{aligned} \mu _R&=\mu _F=M_T/2, \;\; \text {for} \;\; p^T_t\,. \end{aligned}$$
(2)

2.3 The MSHT global fit

We use the MSHT global PDF fitting framework to combine the experimental measurements with the theoretical predictions. MSHT20 [32] is a global PDF fit utilising 4363 data points across 61 different datasets spanning older fixed target data, HERA deep inelastic scattering, neutrino dimuon, Tevatron and a wide range of recent LHC data to provide a state of the art determination of proton structure in terms of unpolarised proton PDFs. An extensive parameterisation of the PDFs at the input scale (\(Q_0^2=1~\textrm{GeV}^2\)) incorporating 52 parton parameters is used to define the central values of the PDFs. The PDF uncertainties are defined via the Hessian method, with a 32-member subset of the parton parameters used to define an eigenvector basis. Given the global nature of the PDF dataset and the finite order at which the theory is implemented (NNLO QCD + NLO EW), rather than implementing a \(\Delta \chi ^2=1\) criterion to define the PDF uncertainties we use the so-called “dynamic tolerance procedure”. This is more conservative and is motivated by a weaker hypothesis-testing criterion [80] (see below in the context of the \(m_t\) and \(\alpha _s(M_Z)\) bounds in Eq. 5); it ensures that the bounds on each eigenvector are such that each dataset sits within its 68% confidence level limit.

We have implemented the NNLO QCD theoretical predictions (with NLO EW corrections where specified in Sect. 2.2) within the MSHT20 global fit, and refit for each of the five available top-quark masses. In this way, we naturally account for correlations between the extracted PDFs, \(m_t\) and \(\alpha _s(M_Z)\). Performing this across a range of \(\alpha _s(M_Z)\) and \(m_t\) values we are then able to use the qualities of the different fits to analyse the sensitivity of the experimental data to the parameters of interest. After examining the two-dimensional dependence in the \(m_t\)-\(\alpha _s(M_Z)\) plane, we will follow the procedure typically used in \(\alpha _s(M_Z)\) extractions from PDFs (including in the most recent MSHT20 determination of the strong coupling [5]) to consider the extent to which bounds can be placed. We outline this procedure here.

We consider the variation in the \(\chi ^2_n\) for the n-th dataset with N degrees of freedom (data points) as we scan along a particular parameter or eigenvector direction, and assume that it follows a \(\chi ^2\) distribution, i.e.

$$\begin{aligned} P_N(\chi ^2) = \frac{(\chi ^2)^{N/2-1}\exp {(-\chi ^2/2)}}{2^{N/2}\Gamma (N/2)}\,. \end{aligned}$$
(3)

We can then obtain the m-th percentile, \(\xi _m\) by solving:

$$\begin{aligned} \int _0^{\xi _m} \textrm{d}\chi ^2 P_N (\chi ^2) = m/100 \end{aligned}$$
(4)

Then \(\xi _{50} \approx N\) is the most probable value, and \(\xi _{68}\) will be used in the definition of the 68% confidence level uncertainties. The ratio \(\xi _{68}/\xi _{50}\) will then reduce with the number of data points N.

In order to define the 68% confidence limit on a quantity (\(m_t\) or \(\alpha _s(M_Z)\)), or indeed on a PDF eigenvector, from this particular dataset about the global minimum \(\chi ^2_{n,0}\), we then choose to set the bounds by the condition:

$$\begin{aligned} \chi _n^2 < \frac{\chi ^2_{n,0}}{\xi _{50}}\xi _{68}, \end{aligned}$$
(5)

where \(\chi ^2_{n,0}\) is the \(\chi ^2\) of the dataset at the global minimum.

This effectively rescales the \(\chi ^2_n\) for this dataset up by a factor of \(\xi _{68}/\xi _{50}\) and accounts for the fact that the \(\chi ^2\) of the dataset at the global minimum is likely to be different from that of the particular dataset. Alternatively, it can be equivalently understood as a rescaling of the 68th percentile \(\xi _{68}\) by \(\chi ^2_{n,0}/\xi _{50}\). As the ratio \(\xi _{68}/\xi _{50}\) reduces with the number of data points N, this also reflects the intuition that datasets with more data points N are likely to sit closer to the global minimum and so require a smaller rescaling of the uncertainty condition. Once this condition is exceeded in the scan along the parameter value or eigenvector direction, we interpret this as a bound. We interpolate between this and the penultimate point to obtain the precise value of the bound.

This procedure is repeated for all of the datasets in the fit, and in order to ensure that each dataset lies within its 68% confidence level (as defined by Eq. 5), the most stringent of the bounds is then taken. The dataset corresponding to said bound is then interpreted as constraining the quantity of interest.

This is the methodology used within MSHT20 to define the uncertainties on each PDF eigenvector, and as a consequence on the PDFs themselves. It can be extended to consider any further parameter by fitting it together with the PDFs (and so account for correlations). This has been used on several occasions to provide bounds on \(\alpha _s(M_Z)\) [5, 81], and we extend this here to consider \(m_t\).

3 Sensitivity of \(t\bar{t}\) distributions to \(m_t\) and \(\alpha _s(M_Z)\)

Before performing any parameter extraction, it is instructive to examine the two-dimensional dependence of the global fit quality on \(m_t\) and \(\alpha _s(M_Z)\). To that end, we perform fits for the five values of the top-quark mass available to us and with 9 equally-spaced \(\alpha _s(M_Z)\) values from 0.114 to 0.122. We present the results of the global fits in Fig. 1. We observe that we are able to constrain both parameters simultaneously, with the fit finding a global minimum for \(m_t\sim 173.3~\textrm{GeV},\,\alpha _s(M_Z)\sim 0.118\). We do not observe any clear signs of degeneracy, which would indicate significant correlation between the parameters. In Fig. 2 we examine the impact of the subset of the global fit dataset corresponding to top-quark measurements. We show the heatmap from the same fits as in Fig. 1, but with top-quark data removed, as well as the corresponding plot including only the top-quark data. We observe that the former plot is unable to constrain the top-quark mass, while the latter shows a similar pattern in \(m_t\) to that observed in Fig. 1, but displays a weaker dependence on \(\alpha _s(M_Z)\). This indicates the importance of the top-quark datasets in constraining \(m_t\), while the remainder of the global dataset offers stronger constraints on \(\alpha _s(M_Z)\).

Fig. 1
figure 1

Heat map showing the minimum \(\chi ^2\) value obtained from fits with varying \(m_t\) and \(\alpha _s(M_Z)\). All datasets are included, as described in Sect. 2.1

Fig. 2
figure 2

Heat map showing the minimum \(\chi ^2\) value obtained from fits with varying \(m_t\) and \(\alpha _s(M_Z)\). Left: top-quark datasets removed. Right: top-quark data only. Note that these are the same fits as those shown in Fig. 1

We turn to an examination of the individual top-quark datasets. In Fig. 3 we show again the same fits as in Figs. 1 and 2, but this time separating out the contributions from the total \(t\bar{t}\) cross section data, the ATLAS multi-differential data and the single-differential CMS data in \(y_{t\bar{t}}\) (the MSHT20 default), the latter two both measured in the lepton+jets channel at 8 TeV. We notice a clear degeneracy in the total cross section data, indicative of the well-known fact that this alone is unable to constrain both \(m_t\) and \(\alpha _s(M_Z)\) due to the compensatory nature of the joint dependence. The CMS data seem to offer slightly improved constraining power, although signs of degeneracy are still present. The ATLAS data, in contrast, provide very strong constraints on \(m_t\) while being much more weakly constraining in the \(\alpha _s(M_Z)\) direction. This is likely to be due to the fact that while the ATLAS data contain measurements of all four kinematic distributions, the included CMS data are taken from a rapidity distribution, which one expects on theoretical grounds to be naturally less strongly dependent on the top-quark mass.

Fig. 3
figure 3

Heat map showing the minimum \(\chi ^2\) value obtained from fits with varying \(m_t\) and \(\alpha _s(M_Z)\). Top: \(t\bar{t}\) total cross section only. Lower left: CMS \(y_{t\bar{t}}\) only. Lower right: ATLAS \(p^T_t,\,M_{t\bar{t}},\,y_{t},\,y_{t\bar{t}}\). Note that these fits are identical to those shown in Fig. 1

Table 1 The quality of the fit as a function of the top-quark mass \(m_t\) with \(\alpha _s(M_Z)\) left free

Finally, in Table 1 we examine the best-fit \(\alpha _s(M_Z)\) value obtained in the fit for the five different \(m_t\) values, as well as the global and top-quark data total \(\chi ^2\) values. One can again see the preference for \(m_t\sim 173.3~\textrm{GeV}\), with a corresponding best fit \(\alpha _s(M_Z)=0.1175\). Moreover, it can be observed that the best fit \(\alpha _s(M_Z)\) does not vary significantly with \(m_t\) – indeed the whole range shown here is within the uncertainties of the result quoted in Ref. [5], whilst the best fit is very close to the best fit of 0.1174 obtained there. Whilst \(\alpha _s(M_Z)\) does increase slightly with \(m_t\), likely to counter-balance the effect of reducing the cross-section with increasing \(m_t\), the change is relatively small.

The results in this section therefore strongly suggest that the correlation between \(m_t\) and \(\alpha _s(M_Z)\), in the context of the global fit which we perform, is limited, at least in the vicinity of the best fit minimum. Indeed, both Fig. 1 and Table 1 imply that the \(\alpha _s(M_Z)\)-\(m_t\) dependence is quasi-one-dimensional in the fit as a whole. We will exploit this property in order to constrain individually the two parameters, following the method described in Sect. 2.3.

4 Constraining the top-quark mass within the MSHT default setup

Given the findings of Sect. 3, in this section we proceed with a one-dimensional extraction of \(m_t\) at a fixed value of \(\alpha _s(M_Z)=0.118\). Following the methodology described in Sect. 2.3, we interpolate the \(\chi ^2\) dependence of the fit as a function of \(m_t\) and assume that it follows a cubic dependence about its minimum. We have found a cubic function to be necessary to describe in particular the ATLAS dataset, i.e. we include the next term in the Taylor expansion of the \(\chi ^2\) function about its minimum. This allows us to include all data points in \(m_t\), even when some of these are relatively far from the minimum for this strongly constraining dataset. As discussed previously, we adopt a more conservative tolerance-based definition of the \(\Delta \chi ^2\) in order to set limits on the parameters, rather than using a simple \(\Delta \chi ^2=1\) criterion.

In Fig. 4 we show the dependence of the \(\chi ^2\) values for various top-quark datasets as a function of \(m_t\) at fixed \(\alpha _s(M_Z)=0.118\). In particular, we compare the constraining power of the total cross section data, the ATLAS multi-differential data and the CMS pair-rapidity distribution \(y_{t\bar{t}}\) in the same fit. We remind the reader that this combination of distributions defines the MSHT default setup, albeit omitting the ATLAS and CMS dilepton datasets. The horizontal lines represent the bounds on \(\Delta \chi ^2_n\) of the three top-quark datasets from the global minimum (c.f. Eq. 5).

We observe that the CMS \(y_{t\bar{t}}\) distribution provides only a one-sided bound on \(m_t\) (favouring high values) over the region of \(m_t\) sampled,Footnote 2 reflecting the limited sensitivity of this distribution. This is in accordance with Fig. 3 and with observations in Ref. [24]. We find a lower bound of \(m_t\sim 171.3~\textrm{GeV}\). Next we consider the total cross section \(\sigma _{t\bar{t}}\). We note that this dataset is approximately locally quadratic about the global minimum, and hence is able to provide a two-sided constraint on the mass. Whilst we find a relatively weak upper bound (\(\sim 175.2~\textrm{GeV}\)), the lower (\(\sim 171.8~\textrm{GeV}\)) is slightly stronger than the CMS \(y_{t\bar{t}}\) distribution. Finally, the ATLAS multi-differential data show a greater sensitivity to \(m_t\), again providing a two-sided bound \(172.4~\textrm{GeV}< m_t< 173.6~\textrm{GeV}\). Taking the tightest bounds from all datasets considered, we find that the ATLAS dataset provides both the upper and lower values.

Fig. 4
figure 4

\(\Delta \chi ^2\) of the included top-quark datasets in the MSHT default setup (see Sect. 2.1) as a function of \(m_t\) and with fixed \(\alpha _s(M_Z)=0.118\)

Before moving on, we assess how our choice of interpolation affects the bounds on \(m_t\) which we are able to set. In Fig. 5, we consider three alternative options which are all based on a quadratic \(m_t\) dependence: first, assigning all five points an equal weight; second, discarding the first two points (which are furthest from the global minimum); third, weighting the points to favour those closer to the global minimum. In the last case, we have considered several different possibilities for the weights – we show in the figure a representative case where the points have been assigned relative weights of \(\{1,2,3,5,1\}\).

Fig. 5
figure 5

\(\Delta \chi ^2\) of our default included top-quark datasets (see Sect. 2.1), but with alternative interpolation procedures as a function of \(m_t\) and with fixed \(\alpha _s(M_Z)=0.118\). Upper left: quadratic fit, all points equally weighted. Upper right: quadratic fit, neglecting first two \(m_t\) points. Lower: quadratic fit, up-weighting points closest to the minimum

We begin with the case of providing an equal weight to all points with a quadratic polynomial in Fig. 5 (upper left). This is observed to work well for the total cross section and CMS datasets, causing the net lower and upper bounds across both datasets to change by \(\sim 0.2-0.3~\textrm{GeV}\) and \(\sim 0.5~\textrm{GeV}\) respectively. Nonetheless, as these data provide neither our most stringent upper nor lower bounds on \(m_t\) this has no global effect. The ATLAS data provided our most constraining limits and thus the effects on this dataset are more important for the extraction of the \(m_t\) bounds. However, assigning all points an equal weight in a quadratic fit clearly provides a very poor description of the ATLAS data – sufficiently poor, indeed, that the bounds would be meaningless.

On the other hand, considering the second case of dropping the first two points in \(m_t\) in Fig. 5 (upper right), the ATLAS data is much better locally described by a quadratic function. Indeed, the bounds obtained would then be substantially stronger due to the tighter nature of the profile around the minimum, with the lower bound increasing by \(\sim 0.2~\textrm{GeV}\). Nevertheless, following this procedure causes difficulty in extracting a bound for the CMS dataset, given its limited sensitivity. In this case, the total cross section data bounds are unaffected relative to our default cubic option.

Finally, the intermediate option of up-weighting points closest to the minimum shown in Fig. 5 (lower) improves the situation relative to the equally-weighted case, whilst also removing the need to drop information from the first two \(m_t\) values. In this case the bounds from the total cross section data shift by only \(\sim 0.1~\textrm{GeV}\) relative to the better cubic fit. Once more, however, issues with the more constraining ATLAS dataset in particular remain.

Overall, whilst we see slight changes in the bounds for the total cross section and CMS \(y_{t\bar{t}}\) data depending on the interpolation chosen, for the ATLAS dataset (which ultimately provides our most stringent bounds overall) only two forms produce reasonable fits in the vicinity of the \(m_t\) minimum - the default cubic and the quadratic using only the last three \(m_t\) values. The former has the advantage of using all the \(m_t\) information available and additionally provides the more conservative bound. Therefore we note that whilst there is some uncertainty due to the exact interpolation performed, we justify our decision to utilise the cubic fit as our default on the basis that it provides more conservative bounds. In fact, the tighter bound which could be obtained from the ATLAS data is encompassed within the looser default range from the cubic fit: our default bounds therefore also include to some extent the effects of changing the interpolation. The difficulties we encounter in using a quadratic form for the \(m_t\) dependence further motivate our cubic fit – since the ATLAS dataset appears so strongly constraining, the extreme value \(m_t=169.0~\textrm{GeV}\) therefore lies some distance from the minimum and it is not surprising that higher terms in the Taylor expansion of the \(\chi ^2\) function are needed to properly describe this region.

Fig. 6
figure 6

\(\Delta \chi ^2\) profiles for the baseline global fit as \(m_t\) is changed. \(\Delta \chi ^2\) for the global dataset and the top-quark data only are shown

Finally, in Fig. 6 we plot the \(\Delta \chi ^2\) as the top-quark mass is changed across the five different fixed values, again using a cubic interpolation. Both the change in \(\chi ^2\) for the total global fit and for the top-quark data are shown on the same scale, again demonstrating that the top-quark data contributes the overwhelming majority of the \(m_t\) dependence to the PDF fit. We can also use these profiles to determine the \(\Delta \chi ^2\) of the global fit corresponding to our \(m_t\) bounds: we find these correspond to \(\Delta \chi ^2 =\{3.2,\, 4.1\}\) for the lower and upper bounds respectively.Footnote 3 On the other hand, if we had taken the less conservative approach of using the \(\Delta \chi ^2=1\) criterion rather than the default MSHT dynamic tolerance then more stringent bounds would be obtained, viz. \(172.7~\textrm{GeV}< m_t< 173.3~\textrm{GeV}\). However the usual issues of dataset tensions, methodological limitations and the finite-order nature of the theoretical predictions amongst other effects mean that the textbook scenario does not apply. Instead, in a global PDF fit the tolerance is used to account for these considerations, thus enlarging the uncertainties and providing our bounds.

5 Assessing alternative treatments of the CMS and ATLAS data

In this section, we consider alternative possibilities for the inclusion of the differential CMS and ATLAS top-quark data in the lepton+jets channel, which differ from the default MSHT20 treatment [35]. Specifically, in the case of CMS we assess the options for the included distribution, while in the ATLAS case this picture is somewhat complicated by the statistical correlations between distributions. We therefore investigate the effect of different decorrelation treatments, which are themselves tied to the choice of distributions.

5.1 Choices of CMS distribution

Our comparison of the different datasets in Sect. 4 revealed that the ATLAS multi-differential data were able to provide a significantly stronger constraint on \(m_t\) than the CMS \(y_{t\bar{t}}\) distribution or the total cross section data alone. This is to some extent expected, since the availability of the statistical correlations between kinematic distributions enables the simultaneous inclusion of the \(p^T_t,\,M_{t\bar{t}},\,y_{t}\) and \(y_{t\bar{t}}\) data in the ATLAS case. In contrast, this information is not publicly available in the CMS case and we are therefore only able to fit a single distribution at a time. In this section, we investigate different choices for this distribution compared to our default choice of \(y_{t\bar{t}}\). In all cases complete refits are done with the alternative CMS distribution.

Fig. 7
figure 7

\(\Delta \chi ^2\) of the default ATLAS and total cross section datasets, but with alternative choices of CMS distribution relative to our default. Plots are again as a function of \(m_t\) and with fixed \(\alpha _s(M_Z)=0.118\). Upper left: CMS single rapidity \(y_{t}\). Upper right: CMS top-quark pair invariant mass \(M_{t\bar{t}}\). Lower: CMS top-quark transverse momentum \(p^T_t\)

Turning first to the case of the CMS \(y_{t}\) distribution (Fig. 7, upper left), we see that we can still only obtain a one-sided bound over the region of \(m_t\) sampled. In general, the behaviour is very similar to the \(y_{t\bar{t}}\) case, albeit showing a slightly more quadratic dependence. The \(M_{t\bar{t}}\) distribution (Fig. 7, upper right) instead shows a distinctly different behaviour, again providing only a reasonable one-sided bound but in this case at high, rather than low, \(m_t\). Finally, in contrast the \(p^T_t\) case (Fig. 7, lower panel) shows a quadratic behaviour in the vicinity of the global minimum and sets limits \(169.7~\textrm{GeV}< m_t< 174.0~\textrm{GeV}\). Of the four cases, this choice provides the greatest sensitivity but remains notably worse than the ATLAS data, as observed in other studies – for comparison, the corresponding ATLAS limits are \(172.4~\textrm{GeV}<m_t<173.6~\textrm{GeV}\) (which remain the same regardless of the CMS distribution included in the fit), while the total cross section provides \(171.9~\textrm{GeV}<m_t<175.4~\textrm{GeV}\) (which varies by at most \(0.2~\textrm{GeV}\) on the upper and lower bounds as the CMS distribution is altered). The fact that the CMS \(M_{t\bar{t}}\) and \(p^T_t\) distributions are generally better able to constrain \(m_t\) is expected based on theoretical grounds [34, 82, 83] and has also been demonstrated in e.g. Ref. [24].Footnote 4 Nonetheless, in all cases the CMS bounds are weaker than the corresponding ATLAS bounds.

5.2 Choices of ATLAS kinematic distributions and decorrelation models

In order to fit multiple ATLAS distributions simultaneously, it is necessary to include information about both the statistical and systematic correlations within and between the different kinematic variables. As remarked in Sect. 2.1, using the correlation matrices as provided by the experimental collaboration leads to very poor fit qualities (large \(\chi ^2/N\)). The default MSHT20 procedure for dealing with this issue was detailed in Refs. [32, 35] and entails decorrelating the two-point parton shower systematic across the four distributions. In addition, we make the conservative choice of further separating this source of uncertainty within the individual distributions into two pieces according to a trigonometric decomposition. The evaluation of this two-point systematic involves using the difference between two Monte Carlos to define a systematic uncertainty, which is taken as fully correlated across all bins. In reality, the correlations on these systematic uncertainties are not well known and therefore different levels of correlations can in principle be used. This was first investigated in the context of inclusive jet data from ATLAS [85]. The assumption of full correlation across all bins is a strong one and in fact several studies have shown that by applying a small degree of decorrelation across the bins the fit quality can be improved significantly, see e.g. Refs. [32, 35].

It has been observed [23, 35, 37, 38, 41] that, while it is possible to fit the \(p^T_t\) and \(M_{t\bar{t}}\) distributions simultaneously by following the first part of the above prescription alone, as soon as rapidities are included the second part also becomes necessary. We therefore begin by simply fitting the \(p^T_t\) and \(M_{t\bar{t}}\) distributions, decorrelating only between the distributions and not within. Given the findings of Sect. 5.1, we do this using the CMS \(p^T_t\) distribution, which was found to be the most constraining of the four options, rather than using the MSHT20 default of the CMS \(y_{t\bar{t}}\). We remind the reader, however, that the precise choice of CMS distribution was found to have a negligible impact on the ATLAS bounds.

We show the results in Fig. 8 (left) where again complete refits are performed with the new ATLAS distributions – we note that the total cross section and CMS curves are largely unaffected by this change, while the ATLAS data become slightly less quadratic in the region near the minimum. With respect to the lower panel of Fig. 7, the upper bound is unchanged while the lower bound becomes more stringent. This further demonstrates the conservative nature of our final uncertainty estimate on \(m_t\) and the robustness of our procedure. We remark that this choice of treatment of the ATLAS data (i.e. including just \(p^T_t\) and \(M_{t\bar{t}}\) and only decorrelating between distributions) is similar to that followed by the CT18 global PDF fit [4]. We could instead take a choice similar to that made by the NNPDF4.0 global PDF fit [40] and include both the single and pair rapidity distributions \(y_{t}\) and \(y_{t\bar{t}}\), with the parton shower systematic then decorrelated between the two distributions but not within. Doing so, we observe poor fit qualities as expected (\(\sim 2-3\) per point, similar to but actually lower than in the NNPDF4.0 study), but for the sake of completeness we still examine the extent to which the top-quark mass can be constrained in this case. Figure 8 (right) shows the \(\chi ^2\) profiles observed for the top-quark data. The reduced sensitivity of the ATLAS data to the top-quark mass is immediately clear, with the shape now not quadratic and rather flat in the vicinity of the global minimum \(m_t\). This further attests to the fact that the constraints on \(m_t\) in the ATLAS data arise predominantly from the \(p^T_t\) and \(M_{t\bar{t}}\) distributions.

Fig. 8
figure 8

\(\Delta \chi ^2\) of the ATLAS datasets (left) \(p^T_t\) and \(M_{t\bar{t}}\) and (right) \(y_{t}\) and \(y_{t\bar{t}}\), along with those of the CMS \(p^T_t\) data and the total cross section data as a function of \(m_t\) and with fixed \(\alpha _s(M_Z)=0.118\)

We turn to consider including single ATLAS distributions, again refitting in every case. Having observed in Sect. 5.1 that CMS rapidity distributions show a reduced sensitivity to the top-quark mass relative to the \(p^T_t\) and \(M_{t\bar{t}}\) distributions (as expected theoretically), and verified this for the pairs of ATLAS distributions, we wish to gauge the extent to which this is the case for the individual ATLAS distributions. To that end, we repeat our fits using single ATLAS distributions, choosing either the \(M_{t\bar{t}}\) or \(y_{t}\). This also allows us to assess the effect of the ATLAS data in a ‘clean’ environment, without making any assumptions about the correlations (or otherwise) between distributions. We present our results in Fig. 9.

Fig. 9
figure 9

\(\Delta \chi ^2\) of a single ATLAS dataset, the CMS \(p^T_t\) data and the total cross section data as a function of \(m_t\) and with fixed \(\alpha _s(M_Z)=0.118\). Left: ATLAS \(M_{t\bar{t}}\). Right: ATLAS \(y_{t}\)

We note that by including only the ATLAS \(M_{t\bar{t}}\) distribution we retain significant constraints on the top-quark mass – this single kinematic variable is sufficient to provide relatively tight bounds on \(m_t\) (the same holds for \(p^T_t)\). Using only the ATLAS \(y_{t}\) (or indeed \(y_{t\bar{t}}\)) instead has a very large and detrimental effect on the sensitivity, and all constraining power of this dataset is effectively lost. This confirms our naïve expectation and also verifies that our ability to place constraints on \(m_t\) using the ATLAS data is largely independent of the exact correlation model between distributions.

Table 2 Bounds on \(\alpha _s(M_Z)\), obtained via a one-dimensional fit with \(m_t=173.3~\textrm{GeV}\), for different datasets. In the case of the global fit where all datasets are considered, the single dataset giving rise to the tightest constraint is indicated. Results are shown for the original MSHT20 fit [32] in Ref. [5] and for the fit we consider in this work, with dileptonic \(t\bar{t}\) data removed. Results for \(m_t=172.5~\textrm{GeV}\) appear in Appendix A

6 Study of \(\alpha _s\) sensitivity

An advantage of using differential top-quark data is their ability to constrain both the top-quark mass and the strong coupling. Although extractions of the strong coupling using pair production cross section data alone have been performed [12, 86,87,88], only by supplementing this with differential information is it possible to extract also the top-quark mass [22, 24]. Our focus in this work so far has been on the top-quark mass, largely because the MSHT global PDF fit already contains several datasets able to place stringent bounds on \(\alpha _s(M_Z)\) [5] but which are less able to bound \(m_t\), as seen in Fig. 2 (right). In addition, the top-quark datasets also show somewhat greater sensitivity to \(m_t\) than \(\alpha _s(M_Z)\), see Fig. 2 (left). Nonetheless it is instructive to analyse their \(\alpha _s(M_Z)\) sensitivities and the extent to which they are able to provide bounds competitive with other datasets in the global fit.

First, we analyse which datasets provide the tightest bounds on \(\alpha _s(M_Z)\), both in the default setup of our analysis and also in the original MSHT20 \(\alpha _s(M_Z)\) extraction. We remind the reader that these are expected to differ slightly, as a result of the exclusion of the ATLAS and CMS dilepton data and other minor changes. In the MSHT20 analysis of Ref. [5] the top-quark mass dependence was not available for the differential top-quark datasets, and so while \(\alpha _s(M_Z)\) bounds were analysed, they were not used, given the potential for correlation between \(m_t\) and the extracted \(\alpha _s(M_Z)\) value. In this analysis, we now include the top-quark mass dependence for the single differential lepton+jets channel, which gives us the confidence to perform this study. We consider the \(\alpha _s(M_Z)\) sensitivity in the \(m_t=173.3~\textrm{GeV}\) slice (given the results of Sect. 3) which contains the best overall global fit in the two-dimensional \(\alpha _s(M_Z)-m_t\) plane and is closest to our best extracted mass value \(m_t=173.0~\textrm{GeV}\). We present our findings in Table 2. We have nonetheless verified that the results are similar at \(m_t=172.5~\textrm{GeV}\) – we provide these results for the interested reader in appendix A.

In the MSHT20 analysis the best fit at NNLO in QCD was found to be \(\alpha _s(M_Z)=0.1174\), with the CMS 8 TeV W data [89] providing a bound 0.0012 lower and the BCDMS proton data providing a bound 0.0013 higher, as indicated in Table 2. It was observed that for the top-quark data, the \(t\bar{t}\) total cross section data [45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60] was able to provide reasonable lower and upper bounds, albeit not competitive with the best bounds across the global fit. The ATLAS 8 TeV multi-differential \(t\bar{t}\) lepton+jet data [30] did not provide competitive bounds. In contrast, the CMS 8 TeV single differential \(y_{t\bar{t}}\) [31] provided an upper bound on \(\alpha _s(M_Z)\) almost as strong as that provided by BCDMS proton data [90], which is known to favour lower \(\alpha _s(M_Z)\) values. We stress again, however, that due to the missing \(m_t\) dependence this was not considered in the final MSHT20 quoted values.

We now find that the best fit at \(m_t=173.3~\textrm{GeV}\) is at \(\alpha _s(M_Z)=0.1175\), in close agreement with Ref. [5]. The most constraining lower bound of 0.1165 is provided by the ATLAS 8 TeV double differential Z data [91]. This dataset also provided a lower bound in the original MSHT20 fit, although not the most constraining. The CMS 8 TeV single-differential \(y_{t\bar{t}}\) in the lepton+jets channel now provides the strongest upper bound on \(\alpha _s(M_Z)\) at 0.1189, essentially identical to the BCDMS proton data. The upper bound on \(\alpha _s(M_Z)\) provided by the CMS 8 TeV \(y_{t\bar{t}}\) data reflects its preference for slightly lower theory predictions, as is also seen in its preference for large \(m_t\) (i.e. it bounds \(m_t\) from below) in Sect. 4. The total cross section data is again able to provide significant lower and upper bounds, in this fit now closer to the overall \(\alpha _s(M_Z)\) bounds. Finally, the ATLAS \(t\bar{t}\) data provide almost identical bounds to the MSHT20 analysis, and are equally poor in their constraining power. The bounds placed on \(\alpha _s(M_Z)\) at NNLO by a selection of the most relevant datasets included in the MSHT20 global fit are shown in Fig. 10, and exhibit good consistency with previous analyses in Ref. [5]. In addition it demonstrates the competitive bounds placed by the top-quark datasets, shown in blue. The global fit bounds on \(\alpha _s(M_Z)\) of \(-\)0.0010 and +0.0014 correspond to \(\Delta \chi ^2 = +10, +17\) respectively. The overall \(\alpha _s(M_Z)\) \(\chi ^2\) profile is given in Fig. 11 at \(m_t=173.3~\textrm{GeV}\).

We note that the original MSHT20 analysis observed that the dileptonic \(t\bar{t}\), single differential data were able to provide an upper bound on \(\alpha _s(M_Z)\) similar to that given by the CMS 8 TeV \(y_{t\bar{t}}\). Since we do not have the theoretical predictions for these measurements for different top-quark masses, we do not consider this dataset in our analysis here (the corresponding entry is left vacant in Fig. 10) and leave its examination to a future study. In addition, given the focus of this work is largely on the \(m_t\) bounds, we have not here analysed the extent to which different CMS distributions for the single differential lepton+jet data may bound \(\alpha _s(M_Z)\). We leave this to potential future work focused on \(\alpha _s(M_Z)\) bounds in global PDF fits.

Fig. 10
figure 10

\(\alpha _s(M_Z)\) bounds placed by different datasets in the fits in this work at \(m_t=173.3~\textrm{GeV}\). This can be compared with fig. 6 (lower) from Ref. [5] or equivalently fig. 19 from Ref. [7]. The total \(t\bar{t}\) cross-section and single differential \(t\bar{t}\) lepton + jet datasets now included the \(m_t\) dependence and are shown in blue

7 Impact on the gluon PDF

In this section, we examine the effect of different top-quark masses on the gluon parton distribution function. Top-quark pair production data is important in the context of global PDF fits, due to its ability to constrain the high-x gluon PDF [92]. Indeed, a reduction in the high-x gluon uncertainty motivated the choices of kinematic distribution from the ATLAS and CMS lepton+jets data for NNPDF3.1 [34]. The small number of data points, can, however, limit the usefulness of these datasets in comparison to jet data for this purpose [4, 37]. There is a general agreement among collaborations that the rapidity distributions provide the strongest constraints [4, 32, 34, 35, 37], although the ATLAS rapidity distributions seem to be in some tension with CMS jet data [5].

Fig. 11
figure 11

The total \(\chi ^2\) of the whole global fit data as a function of \(\alpha _s(M_Z)\), with \(m_t=173.3~ \textrm{GeV}\) taken for the top-quark datasets and at NNLO

In Fig. 12 we show the gluon PDF g(x) for various values of the top-quark mass. We show both the default MSHT20 choice using the CMS \(y_{t\bar{t}}\) distribution, as well as the alternative choice we investigate in Sect. 5 of using the \(p^T_t\) distribution. In both cases we normalise to the gluon PDF for \(m_t=173.3~\textrm{GeV}\). We find the effect of changing \(m_t\) is limited to large x values and causes an increase of the PDF with increasing mass, as expected. Moreover, the effects are well within the PDF uncertainties, shown for the default value of \(m_t=173.3~\textrm{GeV}\). Comparing the two choices of CMS distribution, we find a slightly greater dependence on the mass in the \(y_{t\bar{t}}\) case than in the \(p^T_t\) case. We note that no data is available for \(x \gtrsim 0.5\), and so beyond this range one has no clear physical interpretation.

In Fig. 13 we instead present the effects of changing the included CMS or ATLAS \(t\bar{t}\) lepton + jet distributions on the gluon PDF relative to the baseline case of the CMS \(p^T_t\) and all ATLAS distributions (with standard MSHT decorrelations described previously). The different choices of CMS distribution (left) have notable effects on the gluon at \(x \gtrsim 0.1\), but are well within the large PDF uncertainties. Meanwhile the different choices of included ATLAS distributions (right) also lead to notable differences, reflecting both the changes of data included and the effects of decorrelations where multiple distributions are simultaneously included. This is consistent with Ref. [32] where it was observed that decorrelation in the ATLAS \(t\bar{t}\) data has a significant effect on high x gluon, albeit within the large PDF uncertainties in this region.

Fig. 12
figure 12

Gluon PDF as a function of x for various values of the top-quark mass \(m_t\) and for \(\alpha _s(M_Z)=0.118\). The ratio to the case \(m_t=173.3~\textrm{GeV}\) is shown; PDF uncertainties are shown for this default \(m_t\) value. Left: CMS \(y_{t\bar{t}}\) distribution. Right: CMS \(p^T_t\) distribution

Fig. 13
figure 13

Gluon PDF as a function of x for various choices of included kinematic distribution. Fixed values of \(m_t=173.3~\textrm{GeV}\) and \(\alpha _s(M_Z)=0.118\) are considered. Left: all ATLAS and total cross section data included, with different choices for the CMS distribution. Right: all total cross section data and CMS \(p^T_t\) distribution included, with various choices for the ATLAS distributions

8 Summary of findings for \(m_t\)

Our analysis in Sects. 4 and 5 has shown that the single-differential ATLAS data in the lepton+jets channel are able to place relatively strong constraints on the top-quark mass. In contrast, the CMS data in the same channel generally provide weaker constraints, particularly when the \(y_{t\bar{t}}\) distribution is chosen (as is the case in the MSHT20 default setup), though the \(p^T_t\) distribution provides the strongest constraint of the four choices available. With this in mind, we present our final results for \(m_t\) using all available ATLAS distributions, the total cross section data and the CMS \(p^T_t\) distribution (the bounds however are independent of the CMS distribution chosen, even after refitting). Maintaining a cubic parameterisation for the \(m_t\) dependence, the global fit returns

$$\begin{aligned} m_t^\textrm{pole}=173.0\pm 0.6~\textrm{GeV}\,. \end{aligned}$$
(6)

The bounds in this case arise solely from the ATLAS data – we note, however, that the lower bound obtained from the total cross section data is only \(0.5~\textrm{GeV}\) lower, while the upper bound from the CMS \(p^T_t\) distribution is only \(0.4~\textrm{GeV}\) higher. We stress that the dynamic tolerance approach we adopt returns a conservative estimate of the uncertainty compared to a \(\Delta \chi ^2=1\) criterion (which would result in \(m_t^\textrm{pole}=173.0\pm 0.3~\textrm{GeV}\)). We also emphasise that our choice of a cubic interpolation provided the most conservative estimate of the uncertainty, fully containing the range of values obtained when using other reasonable interpolations.

Notably, the central value we obtain of \(173.0~\textrm{GeV}\) is the same as seen in Ref. [23] for the absolute data. We also remark that, considering the study on our \(\alpha _s(M_Z)\) sensitivity in Sect. 6, our best fit value of \(m_t\) lies between the values considered in Tables 2 (\(m_t=173.3~\textrm{GeV}\)) and 3 (\(m_t=172.5~\textrm{GeV}\)). Given that the limits on \(\alpha _s(M_Z)\) are very similar in those two cases, we might also expect the limits corresponding to our best fit value of \(m_t\) to be comparable.Footnote 5 The Particle Data Group quote a pole mass from cross section measurements of \(172.5\pm 0.7~\textrm{GeV}\) [93]. This is entirely compatible with our result, with a similar level of uncertainty. Furthermore, the authors of Ref. [25] find a central value of \(m_t=172.58~\textrm{GeV}\) in the context of their global fit, which uses NNLO K-factors on NLO predictions to approximate the full NNLO mass dependence of the theory predictions. This is also consistent with our analysis.

Table 3 Bounds on \(\alpha _s(M_Z)\), obtained via a one-dimensional fit with \(m_t=172.5~\textrm{GeV}\), for different datasets. In the case of the global fit where all datasets are considered, the single dataset giving rise to the tightest constraint is indicated. Results are shown for the original MSHT20 fit in Ref. [32] and for the fit we consider in this work, with dileptonic \(t\bar{t}\) data removed

9 Conclusions

In this work, we have examined the ability of differential measurements of the \(t\bar{t}\) process to constrain the top-quark mass \(m_t\). Specifically, we compared NNLO theory predictions for the top-quark transverse momentum \(p^T_t\), the pair invariant mass \(M_{t\bar{t}}\) and the single and pair rapidities \(y_{t},\,y_{t\bar{t}}\) with measurements taken by the ATLAS and CMS collaborations at a centre-of-mass energy of 8 TeV and in the lepton+jets channel. We performed our study in the context of the global MSHT20 PDF fit, thus fully accounting for any correlations between our parameters of interest and those of the parton distribution functions.

We find that the combined ATLAS data provide stronger constraints on the top-quark mass than those from CMS. We have explored different choices of distribution and treatments of the experimental correlation matrices, finding that certain options, e.g. the ATLAS \(M_{t\bar{t}}\) distribution, are much better than others, e.g. rapidity distributions. This confirms theoretical expectations about the mass dependence of kinematic variables. In addition to the differential information, total cross section data are able to provide some constraining ability.

We studied the sensitivity of the top-quark datasets to the strong coupling \(\alpha _s(M_Z)\) and performed fits for a fixed value of \(m_t=173.3~\textrm{GeV}\) using the same lepton+jets data as present in the original MSHT20 extraction. We found that the CMS \(y_{t\bar{t}}\) distribution in particular was able to provide an upper bound on \(\alpha _s(M_Z)\), competitive with the BCDMS dataset which proved most tightly constraining in the original fit. While the total cross section data was also able to provide reasonable limits on \(\alpha _s(M_Z)\), the ATLAS data was significantly less useful for this purpose.

Finally, we examined the effect of different top-quark masses on the gluon PDF. We observed that higher values of \(m_t\) result in an upwards shift of the high-x gluon, as expected, but remain well within PDF uncertainties. We also noted that for fixed values of the mass, the choice of distribution included in the fit had a significant impact, albeit within the large uncertainties.

In further studies, it would be interesting to examine the impact of including both the single differential ATLAS data taken in the dilepton channel [43] as well as the double-differential CMS data [44], since both of these datasets are included in the MSHT20 fit. Our ability to do so relies on the availability of NNLO computations for these distributions at different values of the top-quark mass. Similarly, inclusion of the 13 TeV datasets would be a natural extension. The difference in constraining power of absolute and normalised data could also be investigated, as could fits using theory calculations based on resummed predictions, see e.g. Refs. [94,95,96,97]. Additionally, it may be interesting to investigate the effect of different scale choices on the fit, again dependent on the availability of theoretical predictions. Finally, an examination of the impact of the different choices of distribution on the gluon uncertainty could be undertaken. We leave these topics to potential future work.