1 Introduction

Collimated streams of particles, conventionally called jets, are abundantly produced in highly energetic proton–proton collisions at the LHC. At high transverse momenta \(p_{\mathrm {T}}\) these collisions are described by quantum chromodynamics (QCD) using perturbative techniques (pQCD). Indispensable ingredients for QCD predictions of cross sections in \(\mathrm {p}\) \(\mathrm {p}\) collisions are the proton structure, expressed in terms of parton distribution functions (PDFs), and the strong coupling constant \(\alpha _S\), which is a fundamental parameter of QCD. The PDFs and \(\alpha _S\) both depend on the relevant energy scale Q of the scattering process, which is identified with the jet \(p_{\mathrm {T}}\) for the reactions considered in this report. In addition, the PDFs, defined for each type of parton, depend on the fractional momentum x of the proton carried by the parton.

The large cross section for jet production at the LHC and the unprecedented experimental precision of the jet measurements allow stringent tests of QCD. In this study, the theory is confronted with data in previously inaccessible phase space regions of Q and x. When jet production cross sections are combined with inclusive data from deep-inelastic scattering (DIS), the gluon PDF for \(x \gtrsim 0.01\) can be constrained and \(\alpha _S(M_{\mathrm{Z}})\) can be determined. In the present analysis, this is demonstrated by means of the CMS measurement of inclusive jet production [1]. The data, collected in 2011 and corresponding to an integrated luminosity of 5.0\(\,\text {fb}^{-1}\), extend the accessible phase space in jet \(p_{\mathrm {T}}\) up to 2\(\,\text {TeV}\), and range up to \(|y | = 2.5\) in absolute jet rapidity. A PDF study using inclusive jet measurements by the ATLAS Collaboration is described in Ref. [2].

This paper is divided into six parts. Sect. 2 presents an overview of the CMS detector and of the measurement, published in Ref. [1], and proposes a modified treatment of correlations in the experimental uncertainties. Theoretical ingredients are introduced in Sect. 3. Section 4 is dedicated to the determination of \(\alpha _S\) at the scale of the \({\mathrm{Z}}\)-boson mass \(M_{\mathrm{Z}}\), and in Sect. 5 the influence of the jet data on the PDFs is discussed. A summary is presented in Sect. 6.

2 The inclusive jet cross section

2.1 Overview of the CMS detector and of the measurement

The central feature of the CMS detector is a superconducting solenoid of 6\(\,\text {m}\) internal diameter, providing a magnetic field of 3.8\(\,\text {T}\). Within the superconducting solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass/scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Muons are measured in gas-ionisation detectors embedded in the steel flux-return yoke outside the solenoid. Extensive forward calorimetry (HF) complements the coverage provided by the barrel and endcap detectors. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [3].

Jets are reconstructed with a size parameter of \(R=0.7\) using the collinear- and infrared-safe anti-\(k_{\mathrm {T}}\) clustering algorithm [4] as implemented in the FastJet package [5]. The published measurements of the cross sections were corrected for detector effects, and include statistical and systematic experimental uncertainties as well as bin-to-bin correlations for each type of uncertainty. A complete description of the measurement can be found in Ref. [1].

The double-differential inclusive jet cross section investigated in the following is derived from observed inclusive jet yields via

$$\begin{aligned} \frac{{\mathrm{d}}^2\sigma }{{\mathrm{d}}p_{\mathrm {T}} \,{\mathrm{d}}y} = \frac{1}{\epsilon \cdot \mathcal {L}_{\text {int}}} \frac{N_\text {jets}}{\Delta p_{\mathrm {T}} \,\left( 2\cdot \Delta |y | \right) }, \end{aligned}$$
(1)

where \(N_\text {jets}\) is the number of jets in the specific kinematic range (bin), \(\mathcal {L}_{\text {int}}\) is the integrated luminosity, \(\epsilon \) is the product of trigger and event selection efficiencies, and \(\Delta p_{\mathrm {T}} \) and \(\Delta |y | \) are the bin widths in \(p_{\mathrm {T}}\) and \(|y |\). The factor of two reflects the folding of the distributions around \(y=0\).

2.2 Experimental uncertainties

The inclusive jet cross section is measured in five equally sized bins of \(\Delta |y | = 0.5\) up to an absolute rapidity of \(|y | = 2.5\). The inner three regions roughly correspond to the barrel part of the detector, the outer two to the endcaps. Tracker coverage extends up to \(|y | = 2.4\). The minimum \(p_{\mathrm {T}}\) imposed on any jet is 114\(\,\text {GeV}\). The binning in jet \(p_{\mathrm {T}}\) follows the jet \(p_{\mathrm {T}}\) resolution of the central detector and changes with \(p_{\mathrm {T}}\). The upper reach in \(p_{\mathrm {T}}\) is given by the available data and decreases with \(|y |\).

Four categories [1] of experimental uncertainties are defined: the jet energy scale (JES), the luminosity, the corrections for detector response and resolution, and all remaining uncorrelated effects.

The JES is the dominant source of systematic uncertainty, because a small shift in the measured \(p_{\mathrm {T}}\) translates into a large uncertainty in the steeply falling jet \(p_{\mathrm {T}}\) spectrum and hence in the cross section for any given value of \(p_{\mathrm {T}}\). The JES uncertainty is parameterized in terms of jet \(p_{\mathrm {T}}\) and pseudorapidity \(\eta = -\ln \tan (\theta /2)\) and amounts to 1–2 % [6], which translates into a 5–25 % uncertainty in the cross section. Because of its particular importance for this analysis, more details are given in Sect. 2.3.

The uncertainty in the integrated luminosity is 2.2 % [7] and translates into a normalisation uncertainty that is fully correlated across \(|y |\) and \(p_{\mathrm {T}}\).

The effect of the jet energy resolution (JER) is corrected for using the D’Agostini method [8] as implemented in the RooUnfold package [9]. The uncertainty due to the unfolding comprises the effects of an imprecise knowledge of the JER, of residual differences between data and the Monte Carlo (MC) modelling of detector response, and of the unfolding technique applied. The total unfolding uncertainty, which is fully correlated across \(\eta \) and \(p_{\mathrm {T}}\), is 3–4 %. Additionally, the statistical uncertainties are propagated through the unfolding procedure, thereby providing the correlations between the statistical uncertainties of the unfolded measurement. A statistical covariance matrix must be used to take this into account.

Remaining effects are collected into an uncorrelated uncertainty of \(\approx \)1 %.

2.3 Uncertainties in JES

The procedure to calibrate jet energies in CMS and ways to estimate JES uncertainties are described in Ref. [10]. To use CMS data in fits of PDFs or \(\alpha _S(M_{\mathrm{Z}})\), it is essential to account for the correlations in these uncertainties among different regions of the detector. The treatment of correlations uses 16 mutually uncorrelated sources as in Ref. [1]. Within each source, the uncertainties are fully correlated in \(p_{\mathrm {T}}\) and \(\eta \). Any change in the jet energy calibration (JEC) is described through a linear combination of sources, where each source is assumed to have a Gaussian probability density with a zero mean and a root-mean-square of unity. In this way, the uncertainty correlations are encoded in a fashion similar to that provided for PDF uncertainties using the Hessian method [11]. The total uncertainty is defined through the quadratic sum of all uncertainties. The full list of sources together with their brief descriptions can be found in Appendix A.

The JES uncertainties can be classified into four broad categories: absolute energy scale as a function of \(p_{\mathrm {T}}\), jet flavour dependent differences, relative calibration of JES as a function of \(\eta \), and the effects of multiple proton interactions in the same or adjacent beam crossings (pileup). The absolute scale is a single fixed number such that the corresponding uncertainty is fully correlated across \(p_{\mathrm {T}}\) and \(\eta \). Using photon \(+\) jet and Z \(+\) jet data, the JES can be constrained directly in the jet \(p_{\mathrm {T}}\) range 30–600\(\,\text {GeV}\). The response at larger and smaller \(p_{\mathrm {T}}\) is extrapolated through MC simulation. Extra uncertainties are assigned to this extrapolation based on the differences between MC event generators and the single-particle response of the detector. The absolute calibration is the most relevant uncertainty in jet analyses at large \(p_{\mathrm {T}}\).

The categories involving jet flavour dependence and pileup effects are important mainly at small \(p_{\mathrm {T}}\) and have relatively little impact for the phase space considered in this report.

The third category parameterizes \(\eta \)-dependent changes in relative JES. The measurement uncertainties within different detector regions are strongly correlated, and thus the \(\eta \)-dependent sources are only provided for wide regions: barrel, endcap with upstream tracking, endcap without upstream tracking, and the HF calorimeter. In principle, the \(\eta \)-dependent effects can also have a \(p_{\mathrm {T}}\) dependence. Based on systematic studies on data and simulated events, which indicate that the \(p_{\mathrm {T}}\) and \(\eta \) dependence of the uncertainties factorise to a good approximation, this is omitted from the initial calibration procedure. However, experiences with the calibration of data collected in 2012 and with fits of \(\alpha _S(M_{\mathrm{Z}})\) reported in Sect. 4 show that this is too strong an assumption. Applying the uncertainties and correlations in a fit of \(\alpha _S(M_{\mathrm{Z}})\) to the inclusive jet data separately for each bin in \(|y |\) leads to results with values of \(\alpha _S(M_{\mathrm{Z}})\) that scatter around a central value. Performing the same fit taking all \(|y |\) bins together and assuming 100 % correlation in \(|y |\) within the JES uncertainty sources results in a bad fit quality (high \(\chi ^2\) per number of degrees of freedom \(n_\mathrm {dof}\)) and a value of \(\alpha _S(M_{\mathrm{Z}})\) that is significantly higher than any value observed for an individual bin in \(|y |\). Changing the correlation in the JES uncertainty from 0 to 100 % produces a steep rise in \(\chi ^2/n_\mathrm {dof}\), and influences the fitted value of \(\alpha _S(M_{\mathrm{Z}})\) for correlations near 90 %, indicating an assumption on the correlations in \(|y |\) that is too strong. The technique of nuisance parameters, as described in Sect. 5.2.2, helped in the analysis of this issue.

To implement the additional \(\eta \)-decorrelation induced by the \(p_{\mathrm {T}}\)-dependence in the \(\eta \)-dependent JEC introduced for the calibration of 2012 data, the source from the single-particle response JEC2, which accounts for extrapolation uncertainties at large \(p_{\mathrm {T}}\) as discussed in Appendix A, is decorrelated versus \(\eta \) as follows:

  1. 1.

    in the barrel region (\(|y | < 1.5\)), the correlation of the single-particle response source among the three bins in \(|y |\) is set to 50 %,

  2. 2.

    in the endcap region (\(1.5 \le |y | < 2.5\)), the correlation of the single-particle response source between the two bins in \(|y |\) is kept at 100 %,

  3. 3.

    there is no correlation of the single-particle response source between the two detector regions of \(|y | < 1.5\) and \(1.5 \le |y | < 2.5\).

The additional freedom of \(p_{\mathrm {T}}\)-dependent corrections versus \(\eta \) hence leads to a modification of the previously assumed full correlation between all \(\eta \) regions to a reduced estimate of 50 % correlation of JEC2 within the barrel region, which always contains the tag jet of the dijet balance method [10]. In addition, the JEC2 corrections are estimated to be uncorrelated between the barrel and endcap regions of the detector because of respective separate \(p_{\mathrm {T}}\)-dependences of these corrections.

Table 1 The PDF sets used in comparisons to the data together with the evolution order (Evol.), the corresponding number of active flavours \(N_f\), the assumed masses \(M_{\mathrm{t}}\) and \(M_{\mathrm{Z}}\) of the top quark and the \({\mathrm{Z}}\) boson, respectively, the default values of \(\alpha _S(M_{\mathrm{Z}})\), and the range in \(\alpha _S(M_{\mathrm{Z}})\) variation available for fits. For CT10 the updated versions of 2012 are taken

Technically, this can be achieved by splitting the single-particle response source into five parts (JEC2a–e), as shown in Table 8. Each of these sources is a duplicate of the original single-particle response source that is set to zero outside the respective ranges of \(|y | < 1.5\), \(1.5 \le |y | < 2.5\), \(|y | < 0.5\), \(0.5 \le |y | < 1.0\), and \(1.0 \le |y | < 1.5\), such that the original full correlation of

$$\begin{aligned} \mathrm {corr}_\mathrm {JEC2,old} = \begin{pmatrix} 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ \end{pmatrix} \end{aligned}$$
(2)

is replaced by the partially uncorrelated version of

$$\begin{aligned} \mathrm {corr}_\mathrm {JEC2,new} = \begin{pmatrix} 1 &{}\quad 0.5 &{}\quad 0.5 &{}\quad 0 &{}\quad 0\\ 0.5 &{}\quad 1 &{}\quad 0.5 &{}\quad 0 &{}\quad 0\\ 0.5 &{}\quad 0.5 &{}\quad 1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1\\ \end{pmatrix}, \end{aligned}$$
(3)

which is more accurate as justified by studies based on 2012 data. For the proper normalisation of the five new correlated sources, normalisation factors of \(1/\sqrt{2}\) (JEC2a, JEC2c–JEC2f) and 1 (JEC2b) must be applied. With these factors, the sum of the five sources reproduces the original uncertainty for each \(|y |\), while the additional freedom gives the estimated level of correlation among the \(|y |\) regions.

All results presented in this paper are based on this improved treatment of the correlation of JES uncertainties. While some decorrelation of these uncertainties versus \(\eta \) is important for the fits of \(\alpha _S(M_{\mathrm{Z}})\) described in Sect. 4, the exact size of the estimated decorrelation is not. Varying the assumptions according to Eq. (3) from 50 % to 20 or 80 % in the barrel region, from 100 to 80 % in the endcap region, or from 0 to 20 % between the barrel and endcap region leads to changes in the fitted value of \(\alpha _S(M_{\mathrm{Z}})\) that are negligible with respect to other experimental uncertainties.

3 Theoretical ingredients

The theoretical predictions for the inclusive jet cross section comprise a next-to-leading order (NLO) pQCD calculation with electroweak corrections (EW) [12, 13]. They are complemented by a nonperturbative (NP) factor that corrects for multiple-parton interactions (MPI) and hadronization (HAD) effects. Parton shower (PS) corrections, derived from NLO predictions with matched parton showers, are tested in an additional study in Sect. 4.3, but are not applied to the main result.

3.1 Fixed-order prediction in perturbative QCD

The same NLO prediction as in Ref. [1] is used, i.e. the calculations are based on the parton-level program NLOJet++ version 4.1.3 [14, 15] and are performed within the fastNLO framework version 2.1 [16]. The renormalization and factorisation scales, \(\mu _r\) and \(\mu _f\) respectively, are identified with the individual jet \(p_{\mathrm {T}}\). The number of active (massless) flavours \(N_f\) in NLOJet++ has been set to five.

Five sets of PDFs are available for a series of values of \(\alpha _S(M_{\mathrm{Z}})\), which is a requisite for a determination of \(\alpha _S(M_{\mathrm{Z}})\) from data. For an overview, these PDF sets are listed in Table 1 together with the respective references. The ABM11 PDF set employs a fixed-flavour number scheme with five active flavours, while the other PDF sets use a variable-flavour number scheme with a maximum of five flavours, \(N_{f,\mathrm {max}} = 5\), except for NNPDF2.1 which has \(N_{f,\mathrm {max}} = 6\). All sets exist at next-to-leading and next-to-next-to-leading evolution order. The PDF uncertainties are provided at 68.3 % confidence level (CL) except for CT10, which provides uncertainties at \(90\,\%\) CL. For a uniform treatment of all PDFs, the CT10 uncertainties are downscaled by a factor of \(\sqrt{2}{{\mathrm{erf}}}^{-1}{(0.9)} \approx 1.645\).

The electroweak corrections to the hard-scattering cross section have been computed with the CT10-NLO PDF set for a fixed number of five flavours and with the \(p_{\mathrm {T}}\) of the leading jet, \(p_{\mathrm {T,max}}\), as scale choice for \(\mu _r\) and \(\mu _f\) instead of the \(p_{\mathrm {T}}\) of each jet. At high jet \(p_{\mathrm {T}}\) and central rapidity, where the electroweak effects become sizeable, NLO calculations with either of the two scale settings differ by less than one percent. Given the small impact of the electroweak corrections on the final results in Sects. 4 and 5, no uncertainty on their size has been assigned.

3.2 Theoretical prediction from MC simulations including parton showers and nonperturbative effects

The most precise theoretical predictions for jet measurements are usually achieved in fixed-order pQCD, but are available at parton level only. Data that have been corrected for detector effects, however, refer to measurable particles, i.e. to colour-neutral particles with mean decay lengths such that \(c\tau >10\,\text {mm} \). Two complications arise when comparing fixed-order perturbation theory to these measurements: emissions of additional partons close in phase space, which are not sufficiently accounted for in low-order approximations, and effects that cannot be treated by perturbative methods. The first problem is addressed by the parton shower concept [2325] within pQCD, where multiple parton radiation close in phase space is taken into account through an all-orders approximation of the dominant terms including coherence effects. Avoiding double counting, these parton showers are combined with leading-order (LO) calculations in MC event generators, such as pythia  [26] and herwig++ [27].

The second issue concerns NP corrections, which comprise supplementary parton-parton scatters within the same colliding protons, i.e. MPI, and the hadronization process including particle decays. The MPI [28, 29] model for additional soft-particle production, which is detected as part of the underlying event, is implemented in pythia as well as herwig++. Hadronization describes the transition phase from coloured partons to colour-neutral particles, where perturbative methods are no longer applicable. Two models for hadronization are in common use, the Lund string fragmentation [3032] that is used in pythia, and the cluster fragmentation [33] that has been adopted by herwig++.

Beyond LO combining fixed-order predictions with parton showers, MPI, and hadronization models is much more complicated. Potential double counting of terms in the perturbative expansion and the PS has to be avoided. In recent years programs have become available for dijet production at NLO that can be matched to PS MC event generators. In the following, one such program, the powheg package [34, 35] will be used for comparisons with dijet events [36] to the LO MC event generators.

3.3 NP corrections from pythia6 and herwig++

For the comparison of theoretical predictions to the measurement reported in Ref. [1], the NP correction was derived as usual [37] from the average prediction of two LO MC event generators and more specifically from pythia version 6.4.22 tune Z2 and herwig++ version 2.4.2 with the default tune of version 2.3. Tune Z2 is identical to tune Z1 described in [38] except that Z2 employs the CTEQ6L1 [39] PDF set, while Z1 uses the CTEQ5L [40] PDF set. The NP correction factor can be defined for each bin in \(p_{\mathrm {T}}\) and \(|y |\) as

$$\begin{aligned} C _\mathrm {LO}^{\text {NP}} = \frac{\sigma _{\mathrm {LO+PS+HAD+MPI}}}{\sigma _{\mathrm {LO+PS}}}\, \end{aligned}$$
(4)

where \(\sigma \) represents the inclusive jet cross section and the subscripts “LO+PS+HAD+MPI” and “LO+PS” indicate which steps of a general MC event generation procedure have been run, see also Refs. [37, 41]. The central value is calculated by taking the average of the two predictions from pythia6 and herwig++.

In applying these factors as corrections for NP effects to NLO theory predictions, it is assumed that the NP corrections are universal, i.e. they are similar for LO and NLO.

3.4 NP and PS corrections from powheg \(+\) pythia6

Alternative corrections are derived, which use the powheg box revision 197 with the CT10-NLO PDF set for the hard subprocess at NLO plus the leading emission [42] complemented with the matched showering, MPI, and hadronization from pythia6 version 6.4.26. The NLO event generation within the powheg framework, and the showering and hadronization process performed by pythia6 are done in independent steps.

For illustration, Fig. 1 shows the comparison of the inclusive jet data with the powheg \(+\) pythia6 tune Z2* particle-level prediction complemented with electroweak corrections. The tune Z2* is derived from the earlier tune Z2, where the pythia6 parameters PARP(82) and PARP(90) that control the energy dependence of the MPI are retuned, yielding 1.921 and 0.227, respectively. The error boxes indicate statistical uncertainties. Ratio plots of this comparison for each separate region in \(|y |\) can be found in Appendix B.

The corrections to NLO parton-level calculations that are derived this way consist of truly nonperturbative contributions, which are optionally complemented with parton shower effects. They are investigated separately in the following two sections. A previous investigation can be found in Ref. [43].

Fig. 1
figure 1

Measured inclusive jet cross section from Ref. [1] compared to the prediction by powheg \(+\) pythia6 tune Z2* at particle level complemented with electroweak corrections. The boxes indicate the statistical uncertainty of the calculation

Fig. 2
figure 2

NP corrections for the five regions in \(|y |\) as derived in Ref. [1], using pythia6 tune Z2 and herwig++ with the default tune of version 2.3, in comparison to corrections obtained from powheg using pythia6 for showering with the two underlying event tunes P11 and Z2*

Fig. 3
figure 3

PS corrections for the five regions in \(|y |\) obtained from powheg using pythia6 for showering for different upper scale limits of the parton shower evolution in pythia6 tune Z2*. The curves parameterize the correction factors as a function of the jet \(p_{\mathrm {T}}\)

3.4.1 NP corrections from powheg \(+\) pythia6

The NP corrections using a NLO prediction with a matched PS event generator can be defined analogously as in Eq. (4):

$$\begin{aligned} C_\mathrm {NLO}^{\text {NP}} = \frac{\sigma _{\text {NLO+PS+HAD+MPI}}}{\sigma _{\text {NLO+PS}}}, \end{aligned}$$
(5)

i.e. the numerator of this NP correction is defined by the inclusive cross section, where parton showers, hadronization, and multiparton interactions are turned on, while the inclusive cross section in the denominator does not include hadronization and multiparton interactions. A NLO calculation can then be corrected for NP effects as

$$\begin{aligned} \frac{{\mathrm{d}}^2 \sigma _\mathrm {theo}}{{\mathrm{d}}p_{\mathrm {T}} \, {\mathrm{d}}{}y} = \frac{{\mathrm{d}}^2 \sigma _\mathrm {NLO}}{{\mathrm{d}}p_{\mathrm {T}} \, {\mathrm{d}}{}y} \cdot C_\mathrm {NLO}^\mathrm {NP}. \end{aligned}$$
(6)

In contrast to the LO MC event generation with pythia6, the parameters of the NP and PS models, however, have not been retuned to data for the use with NLO \(+\) PS predictions by powheg. Therefore two different underlying event tunes of pythia6 for LO \(+\) PS predictions, P11 [44] and Z2*, are used. In both cases a parameterization using a functional form of \(a_0 + a_1 / p_{\mathrm {T}} ^{a_2}\) is employed to smoothen statistical fluctuations. For \(p_{\mathrm {T}} > 100\,\text {GeV} \) the difference in the NP correction factor between the two tunes is very small such that their average is taken as \(C_\mathrm {NLO}^{\text {NP}}\).

Since procedures to estimate uncertainties inherent to the NLO \(+\) PS matching procedure are not yet well established and proper tunes to data for powheg \(+\) pythia6 are lacking, the centre of the envelope given by the three curves from pythia6, herwig++, and the powheg \(+\) pythia6 average of tunes Z2* and P11 is adopted as the final NP correction for the central results in Sects. 4 and 5. Half the spread among these three predictions defines the uncertainty.

The NP correction, as defined for powheg \(+\) pythia6, is shown in Fig. 2 together with the original factors from pythia6 and herwig++, as a function of the jet \(p_{\mathrm {T}}\) for five ranges in absolute rapidity \(|y |\) of size 0.5 up to \(|y | = 2.5\). The factors derived from both, LO \(+\) PS and NLO \(+\) PS MC event generators, are observed to decrease with increasing jet \(p_{\mathrm {T}}\) and to approach unity at large \(p_{\mathrm {T}}\). Within modelling uncertainties, the assumption of universal NP corrections that are similar for LO \(+\) PS and NLO \(+\) PS MC event generation holds approximately above a jet \(p_{\mathrm {T}}\) of a few hundred \(\,\text {GeV}\).

3.4.2 PS corrections from powheg \(+\) pythia6

Similarly to the NP correction of Eq. (5), a PS correction factor can be defined as the ratio of the differential cross section including PS effects divided by the NLO prediction, as given by powheg, i.e. including the leading emission:

$$\begin{aligned} C _\mathrm {NLO}^{\text {PS}} = \frac{\sigma _{\text {NLO+PS}}}{\sigma _{\text {NLO}}}. \end{aligned}$$
(7)

The combined correction for NP and PS effects can then be written as

$$\begin{aligned} \frac{{\mathrm{d}}^2 \sigma _\mathrm {theo}}{{\mathrm{d}}p_{\mathrm {T}} \, {\mathrm{d}}{}y} = \frac{{\mathrm{d}}^2 \sigma _\mathrm {NLO}}{{\mathrm{d}}p_{\mathrm {T}} \, {\mathrm{d}}{}y} \cdot C_\mathrm {NLO}^\mathrm {NP} \cdot C_\mathrm {NLO}^\mathrm {PS}. \end{aligned}$$
(8)

The PS corrections derived with powheg \(+\) pythia6 are presented in Fig. 3. They are significant at large \(p_{\mathrm {T}}\), particularly at high rapidity, where the factors approach \(-20\) %. However, the combination of powheg \(+\) pythia6 has never been tuned to data and the Z2* tune strictly is only valid for a LO \(+\) PS tune with pythia6, but not with showers matched to powheg. Moreover, powheg employs the CT10-NLO PDF, while the Z2* tune requires the CTEQ6L1-LO PDF to be used for the showering part. Therefore, such PS corrections can be considered as only an illustrative test, as reported in Sect. 4.3.

The maximum parton virtuality allowed in the parton shower evolution, \(\mu _\mathrm {PS}^2\), is varied by factors of 0.5 and 1.5 by changing the corresponding parameter PARP(67) in pythia6 from its default value of 4–2 and 6, respectively. The resulting changes in the PS factors are shown in Fig. 3. The powheg \(+\) pythia6 PS factors employed in an illustrative test later are determined as the average of the predictions from the two extreme scale limits. Again, a parameterization using a functional form of \(a_0 + a_1 / p_{\mathrm {T}} ^{a_2}\) is employed to smoothen statistical fluctuations.

Finally, Fig. 4 presents an overview of the NP, PS, and combined corrections for all five ranges in \(|y |\).

Fig. 4
figure 4

NP correction (top) obtained from the envelope of the predictions of pythia6 tune Z2, herwig++ tune 2.3, and powheg \(+\) pythia6 with the tunes P11 and Z2*, PS correction (middle) obtained from the average of the predictions of powheg \(+\) pythia6 tune Z2* with scale factor variation, and combined correction (bottom), defined as the product of the NP and PS correction, for the five regions in \(|y |\)

4 Determination of the strong coupling constant

The measurement of the inclusive jet cross section [1], as described in Sect. 2, can be used to determine \(\alpha _S(M_{\mathrm{Z}})\), where the proton structure in the form of PDFs is taken as a prerequisite. The necessary theoretical ingredients are specified in Sect. 3. The choice of PDF sets is restricted to global sets that fit data from different experiments, so that only the most precisely known gluon distributions are employed. Combined fits of \(\alpha _S(M_{\mathrm{Z}})\) and the gluon content of the proton are investigated in Sect. 5.5.

In the following, the sensitivity of the inclusive jet cross section to \(\alpha _S(M_{\mathrm{Z}})\) is demonstrated. Subsequently, the fitting procedure is given in detail before presenting the outcome of the various fits of \(\alpha _S(M_{\mathrm{Z}})\).

4.1 Sensitivity of the inclusive jet cross section to \(\alpha _S(M_{\mathrm{Z}})\)

Figures 5, 6, 7 and 8 present the ratio of data to the theoretical predictions for all variations in \(\alpha _S(M_{\mathrm{Z}})\) available for the PDF sets ABM11, CT10, MSTW2008, and NNPDF2.1 at next-to-leading evolution order, as specified in Table 1. Except for the ABM11 PDF set, which leads to QCD predictions significantly different in shape to the measurement, all PDF sets give satisfactory theoretical descriptions of the data and a strong sensitivity to \(\alpha _S(M_{\mathrm{Z}})\) is demonstrated. Because of the discrepancies, ABM11 is excluded from further investigations. The CT10-NLO PDF set is chosen for the main result on \(\alpha _S(M_{\mathrm{Z}})\), because the value of \(\alpha _S(M_{\mathrm{Z}})\) preferred by the CMS jet data is rather close to the default value of this PDF set. As crosschecks fits are performed with the NNPDF2.1-NLO and MSTW2008-NLO sets. The CT10-NNLO, NNPDF2.1-NNLO, and MSTW2008-NNLO PDF sets are employed for comparison.

Fig. 5
figure 5

Ratio of the inclusive jet cross section to theoretical predictions using the ABM11-NLO PDF set for the five rapidity bins, where the \(\alpha _S(M_{\mathrm{Z}})\) value is varied in the range 0.110–0.130 in steps of 0.001. The error bars correspond to the total uncertainty

Fig. 6
figure 6

Ratio of the inclusive jet cross section to theoretical predictions using the CT10-NLO PDF set for the five rapidity bins, where the \(\alpha _S(M_{\mathrm{Z}})\) value is varied in the range 0.112–0.126 in steps of 0.001. The error bars correspond to the total uncertainty

Fig. 7
figure 7

Ratio of the inclusive jet cross section to theoretical predictions using the MSTW2008-NLO PDF set for the five rapidity bins, where the \(\alpha _S(M_{\mathrm{Z}})\) value is varied in the range 0.110–0.130 in steps of 0.001. The error bars correspond to the total uncertainty

Fig. 8
figure 8

Ratio of the inclusive jet cross section to theoretical predictions using the NNPDF2.1-NLO PDF set for the five rapidity bins, where the \(\alpha _S(M_{\mathrm{Z}})\) value is varied in the range 0.116–0.122 in steps of 0.001. The error bars correspond to the total uncertainty

4.2 The fitting procedure

The value of \(\alpha _S(M_{\mathrm{Z}})\) is determined by minimising the \(\chi ^2\) between the N measurements \(D_i\) and the theoretical predictions \(T_i\). The \(\chi ^2\) is defined as

$$\begin{aligned} \chi ^2 = \sum _{ij}^N (D_i - T_i) \mathrm {C}_{ij}^{-1} (D_j - T_j), \end{aligned}$$
(9)

where the covariance matrix \(C_{ij}\) is composed of the following terms:

$$\begin{aligned} C= & {} {{\mathrm{cov}}}_\text {stat} + {{\mathrm{cov}}}_\text {uncor} + \left( \sum _\text {sources}{{\mathrm{cov}}}_\mathrm {JES}\right) + {{\mathrm{cov}}}_\text {unfolding} \nonumber \\&+ {{\mathrm{cov}}}_\text {lumi} + {{\mathrm{cov}}}_\mathrm {PDF}, \end{aligned}$$
(10)

and the terms in the sum represent

  1. 1.

    \({{\mathrm{cov}}}_\text {stat}\): statistical uncertainty including correlations induced through unfolding;

  2. 2.

    \({{\mathrm{cov}}}_\text {uncor}\): uncorrelated systematic uncertainty summing up small residual effects such as trigger and identification inefficiencies, time dependence of the jet \(p_{\mathrm {T}}\) resolution, or the uncertainty on the trigger prescale factor;

  3. 3.

    \({{\mathrm{cov}}}_\mathrm {JES\,sources}\): systematic uncertainty for each JES uncertainty source;

  4. 4.

    \({{\mathrm{cov}}}_\text {unfolding}\): systematic uncertainty of the unfolding;

  5. 5.

    \({{\mathrm{cov}}}_\text {lumi}\): luminosity uncertainty; and

  6. 6.

    \({{\mathrm{cov}}}_\mathrm {PDF}\): PDF uncertainty.

All JES, unfolding, and luminosity uncertainties are treated as 100 % correlated across the \(p_{\mathrm {T}}\) and \(|y |\) bins, with the exception of the single-particle response JES source as described in Sect. 2.3. The JES, unfolding, and luminosity uncertainties are treated as multiplicative to avoid the statistical bias that arises when estimating uncertainties from data [4547].

Table 2 Determination of \(\alpha _S(M_{\mathrm{Z}})\) in bins of rapidity using the CT10-NLO PDF set. The last row presents the result of a simultaneous fit in all rapidity bins

The derivation of PDF uncertainties follows prescriptions for each individual PDF set. The CT10 and MSTW PDF sets both employ the eigenvector method with upward and downward variations for each eigenvector. As required by the use of covariance matrices, symmetric PDF uncertainties are computed following Ref. [39]. The NNPDF2.1 PDF set uses the MC pseudo-experiments instead of the eigenvector method in order to provide PDF uncertainties. A hundred so-called replicas, whose averaged predictions give the central result, are evaluated following the prescription in Ref. [48] to derive the PDF uncertainty for NNPDF.

As described in Sect. 3.4.1, the NP correction is defined as the centre of the envelope given by pythia6, herwig++, and the powheg \(+\) pythia6 average of tunes Z2* and P11. Half the spread among these three numbers is taken as the uncertainty. This is the default NP correction used in this analysis. Alternatively, the PS correction factor, defined in Sect. 3.4.2, is applied in addition as an illustrative test to complement the main results.

The uncertainty in \(\alpha _S(M_{\mathrm{Z}})\) due to the NP uncertainties is evaluated by looking for maximal offsets from a default fit. The theoretical prediction T is varied by the NP uncertainty \(\Delta \mathrm {NP}\) as \(T\cdot \mathrm {NP} \rightarrow T\cdot \left( \mathrm {NP} \pm \Delta \mathrm {NP}\right) \). The fitting procedure is repeated for these variations, and the deviation from the central \(\alpha _S(M_{\mathrm{Z}})\) values is considered as the uncertainty in \(\alpha _S(M_{\mathrm{Z}})\).

Finally the uncertainty due to the renormalization and factorisation scales is evaluated by applying the same method as for the NP corrections: \(\mu _r\) and \(\mu _f\) are varied from the default choice of \(\mu _r =\mu _f =p_{\mathrm {T}} \) between \(p_{\mathrm {T}}/2\) and \(2p_{\mathrm {T}} \) in the following six combinations: \((\mu _r/p_{\mathrm {T}},\mu _f/p_{\mathrm {T}}) = (1/2,1/2)\), (1 / 2, 1), (1, 1 / 2), (1, 2), (2, 1), and (2, 2). The \(\chi ^2\) minimisation with respect to \(\alpha _S(M_{\mathrm{Z}})\) is repeated in each case. The contribution from the \(\mu _r\) and \(\mu _f\) scale variations to the uncertainty is evaluated by considering the maximal upwards and downwards deviation of \(\alpha _S(M_{\mathrm{Z}})\) from the central result.

4.3 The results on \(\alpha _S(M_{\mathrm{Z}})\)

The values of \(\alpha _S(M_{\mathrm{Z}})\) obtained with the CT10-NLO PDF set are listed in Table 2 together with the experimental, PDF, NP, and scale uncertainties for each bin in rapidity and for a simultaneous fit of all rapidity bins. To disentangle the uncertainties of experimental origin from those of the PDFs, additional fits without the latter uncertainty source are performed. An example for the evaluation of the uncertainties in a \(\chi ^{2}\) fit is shown in Fig. 9. The NP and scale uncertainties are determined via separate fits, as explained above.

For the two outer rapidity bins (\(1.5<|y | <2.0\) and \(2.0<|y | <2.5\)) the series in values of \(\alpha _S(M_{\mathrm{Z}})\) of the CT10-NLO PDF set does not reach to sufficiently low values of \(\alpha _S(M_{\mathrm{Z}})\). As a consequence the shape of the \(\chi ^2\) curve at minimum up to \(\chi ^2 +1\) can not be determined completely. To avoid extrapolations based on a polynomial fit to the available points, the alternative \(\alpha _S\) evolution code of the HOPPET package [49] is employed. This is the same evolution code as chosen for the creation of the CT10 PDF set. Replacing the original \(\alpha _S\) evolution in CT10 by HOPPET, \(\alpha _S(M_{\mathrm{Z}})\) can be set freely and in particular different from the default value used in a PDF set, but at the expense of losing the correlation between the value of \(\alpha _S(M_{\mathrm{Z}})\) and the fitted PDFs. Downwards or upwards deviations from the lowest and highest values of \(\alpha _S(M_{\mathrm{Z}})\), respectively, provided in a PDF series are accepted for uncertainty evaluations up to a limit of \(|\Delta \alpha _S(M_{\mathrm{Z}}) | = 0.003\). Applying this method for comparisons, within the available range of \(\alpha _S(M_{\mathrm{Z}})\) values, an additional uncertainty is estimated to be negligible.

For comparison the CT10-NNLO PDF set is used for the determination of \(\alpha _S(M_{\mathrm{Z}})\). These results are presented in Table 3.

Fig. 9
figure 9

The \(\chi ^2\) minimisation with respect to \(\alpha _S(M_{\mathrm{Z}})\) using the CT10-NLO PDF set and data from all rapidity bins. The experimental uncertainty is obtained from the \(\alpha _S(M_{\mathrm{Z}})\) values for which \(\chi ^2\) is increased by one with respect to the minimum value, indicated by the dashed line. The curve corresponds to a second-degree polynomial fit through the available \(\chi ^2\) points

Table 3 Determination of \(\alpha _S(M_{\mathrm{Z}})\) in bins of rapidity using the CT10-NNLO PDF set. The last row presents the result of a simultaneous fit in all rapidity bins
Table 4 Determination of \(\alpha _S(M_{\mathrm{Z}})\) using the CT10 and MSTW2008 PDF sets at NLO and the CT10, NNPDF2.1, MSTW2008 PDF sets at NNLO. The results are obtained by a simultaneous fit to all rapidity bins

The final result using all rapidity bins and the CT10-NLO PDF set is (last row of Table 2)

$$\begin{aligned} \alpha _S(M_{\mathrm{Z}})= & {} 0.1185 \pm 0.0019\,\text {(exp)} \nonumber \\&\pm 0.0028\,(\mathrm {PDF}) \pm 0.0004\,(\mathrm {NP})^{+0.0053}_{-0.0024}\,(\text {scale}) \nonumber \\= & {} 0.1185 \pm 0.0034\,\text {(all except scale)}^{+0.0053}_{-0.0024}\,(\text {scale}) \nonumber \\= & {} 0.1185^{+0.0063}_{-0.0042}, \end{aligned}$$
(11)

where experimental, PDF, NP, and scale uncertainties have been added quadratically to give the total uncertainty. The result is in agreement with the world average value of \(\alpha _S(M_{\mathrm{Z}}) = 0.1185 \pm 0.0006\) [50], with the Tevatron results [5153], and recent results obtained with LHC data [5456]. The determination of \(\alpha _S(M_{\mathrm{Z}})\), which is based on the CT10-NLO PDF set, is also in agreement with the result obtained using the NNPDF2.1-NLO and MSTW2008-NLO sets, as shown in Table 4. For comparison this table also shows the results using the CT10, MSTW2008, and NNPDF2.1 PDF sets at NNLO. The \(\alpha _S(M_{\mathrm{Z}})\) values are in agreement among the different NLO PDF sets within the uncertainties.

Applying the PS correction factor to the NLO theory prediction in addition to the NP correction as discussed in Sect. 3.4.2, the fit using all rapidity bins and the CT10-NLO PDF set yields \(\alpha _S(M_{\mathrm{Z}}) = 0.1204 \pm 0.0018\,(\text {exp})\). This value is in agreement with our main result of Eq. (11), which is obtained using only the NP correction factor.

To investigate the running of the strong coupling, the fitted region is split into six bins of \(p_{\mathrm {T}}\) and the fitting procedure is repeated in each of these bins. The six extractions of \(\alpha _S(M_{\mathrm{Z}})\) are reported in Table 5. The \(\alpha _S(M_{\mathrm{Z}})\) values are evolved to the corresponding energy scale Q using the two-loop solution to the renormalization group equation (RGE) within HOPPET. The value of Q is calculated as a cross section weighted average in each fit region. These average scale values Q, derived again with the fastNLO framework, are identical within about 1\(\,\text {GeV}\) for different PDFs. To emphasise that theoretical uncertainties limit the achievable precision, Tables 6 and 7 present for the six bins in \(p_{\mathrm {T}}\) the total uncertainty as well as the experimental, PDF, NP, and scale components, where the six experimental uncertainties are all correlated.

Table 5 Determination of \(\alpha _S\) in separate bins of jet \(p_{\mathrm {T}}\) using the CT10-NLO PDF set
Table 6 Uncertainty composition for \(\alpha _S(M_{\mathrm{Z}})\) from the determination of \(\alpha _S(Q)\) in bins of \(p_{\mathrm {T}}\) using the CT10-NLO PDF set
Table 7 Uncertainty composition for \(\alpha _S(Q)\) in bins of \(p_{\mathrm {T}}\) using the CT10-NLO PDF set

Figure 10 presents the running of the strong coupling \(\alpha _S(Q)\) and its total uncertainty as determined in this analysis. The extractions of \(\alpha _S(Q)\) in six separate ranges of Q, as presented in Table 5, are also shown. In the same figure the values of \(\alpha _S\) at lower scales determined by the H1 [5759], ZEUS [60], and D0 [52, 53] collaborations are shown for comparison. Recent CMS measurements [55, 56], which are in agreement with the \(\alpha _S(M_{\mathrm{Z}})\) determination of this study, are displayed as well. The results on \(\alpha _S\) reported here are consistent with the energy dependence predicted by the RGE.

Fig. 10
figure 10

The strong coupling \(\alpha _S(Q)\) (full line) and its total uncertainty (band) as determined in this analysis using a two-loop solution to the RGE as a function of the momentum transfer \(Q=p_{\mathrm {T}} \). The extractions of \(\alpha _S(Q)\) in six separate ranges of Q as presented in Table 5 are shown together with results from the H1 [58, 59], ZEUS [60], and D0 [52, 53] experiments at the HERA and Tevatron colliders. Other recent CMS measurements [55, 56] are displayed as well. The uncertainties represented by error bars are subject to correlations

5 Study of PDF constraints with HERAFitter

The PDFs of the proton are an essential ingredient for precision studies in hadron-induced reactions. They are derived from experimental data involving collider and fixed-target experiments. The DIS data from the HERA-I \(\mathrm {e}\) \(\mathrm {p}\) collider cover most of the kinematic phase space needed for a reliable PDF extraction. The \(\mathrm {p}\) \(\mathrm {p}\) inclusive jet cross section contains additional information that can constrain the PDFs, in particular the gluon, in the region of high fractions x of the proton momentum.

The HERAFitter project [61, 62] is an open-source framework designed among other things to fit PDFs to data. It has a modular structure, encompassing a variety of theoretical predictions for different processes and phenomenological approaches for determining the parameters of the PDFs. In this study, the recently updated HERAFitter version 1.1.1 is employed to estimate the impact of the CMS inclusive jet data on the PDFs and their uncertainties. Theory is used at NLO for both processes, i.e. up to order \(\alpha _S ^2\) for DIS and up to order \(\alpha _S ^3\) for inclusive jet production in \(\mathrm {p}\) \(\mathrm {p}\) collisions.

5.1 Correlation between inclusive jet production and the PDFs

The potential impact of the CMS inclusive jet data can be illustrated by the correlation between the inclusive jet cross section \(\sigma _{\text {jet}}(Q)\) and the PDF \(xf(x,Q^2)\) for any parton flavour f. The NNPDF Collaboration [63] provides PDF sets in the form of an ensemble of replicas i, which sample variations in the PDF parameter space within allowed uncertainties. The correlation coefficient \(\varrho _f(x,Q)\) between a cross section and the PDF for flavour f at a point (xQ) can be computed by evaluating means and standard deviations from an ensemble of N replicas as

$$\begin{aligned}&\varrho _f (x,Q) = \frac{N}{(N-1)}\nonumber \\&\quad \frac{ \langle \sigma _{\text {jet}}(Q)_i \cdot xf(x,Q^2)_i \rangle - \langle \sigma _{\text {jet}}(Q)_i \rangle \cdot \langle xf(x,Q^2)_i \rangle }{\Delta _{\sigma _{\text {jet}}(Q)} \Delta _{xf(x,Q^2)}}.\qquad \end{aligned}$$
(12)

Here, the angular brackets denote the averaging over the replica index i, and \(\Delta \) represents the evaluation of the corresponding standard deviation for either the jet cross section, \(\Delta _{\sigma _{\text {jet}}(Q)}\), or a PDF, \(\Delta _{xf(x,Q^2)}\). Figure 11 presents the correlation coefficient between the inclusive jet cross section and the gluon, u valence quark, and d valence quark PDFs in the proton.

Fig. 11
figure 11

The correlation coefficient between the inclusive jet cross section and the gluon (top row), the u valence quark (middle row), and the d valence quark PDFs (bottom row), as a function of the momentum fraction x of the proton and the energy scale Q of the hard process. The correlation is shown for the central rapidity region \(|y | <0.5\) (left) and for \(2.0<|y | <2.5\) (right)

The correlation between the gluon PDF and the inclusive jet cross section is largest at central rapidity for most jet \(p_{\mathrm {T}}\). In contrast, the correlation between the valence quark distributions and the jet cross section is rather small except for very high \(p_{\mathrm {T}}\) such that some impact can be expected at high x from including these jet data in PDF fits. In the forward region the correlation between the valence quark distributions and the jet cross sections is more pronounced at high x and smaller jet \(p_{\mathrm {T}}\). Therefore, a significant reduction of the PDF uncertainties is expected by including the CMS inclusive jet cross section into fits of the proton structure.

5.2 The fitting framework

5.2.1 The HERAFitter setup

The impact of the CMS inclusive jet data on proton PDFs is investigated by including the jet cross section measurement in a combined fit at NLO with the HERA-I inclusive DIS cross sections [19], which were the basis for the determination of the HERAPDF1.0 PDF set. The analysis is performed within the HERAFitter framework using the Dokshitzer–Gribov–Lipatov–Altarelli–Parisi [6466] evolution scheme at NLO as implemented in the QCDNUM package [67] and the generalised-mass variable-flavour number Thorne–Roberts scheme [68, 69].

In contrast to the original HERAPDF fit, the results presented here require the DIS data to fulfill \(Q^2 > Q_\text {min}^2 = 7.5 \,\text {GeV} ^2 \) instead of \(3.5\,\text {GeV} ^2 \). The amount of DIS data left out by the increased \(Q_\text {min}^2\) threshold is rather small and concerns a phase space where a perturbative description is less reliable. A similar, higher cutoff has been applied by the ATLAS Collaboration [70, 71]. As a crosscheck all fits have been performed for a cutoff of \(Q^2 > Q_\text {min}^2 = 3.5 \,\text {GeV} ^2 \), and the results are consistent with the ones obtained using the more stringent cutoff. Differences beyond the expected reduction of uncertainties at low x have not been observed.

The following PDFs are independent in the fit procedure: \(xu_v(x)\), \(xd_v(x)\), xg(x), and \(x\overline{U}(x)\), \(x\overline{D}(x)\), where \(x\overline{U}(x) = x\overline{u}(x)\), and \(x\overline{D}(x) = x\overline{d}(x) + x\overline{s}(x)\). Similar to Ref. [72], a parameterization with 13 free parameters is used. At the starting scale \(Q_0\) of the QCD evolution, chosen to be \(Q_0^2 = 1.9 \,\text {GeV} ^2 \), the PDFs are parameterized as follows:

$$\begin{aligned} xg(x)&= A_g x^{B_g} (1-x)^{C_g} - A'_g x^{B'_g} (1-x)^{C'_g}, \nonumber \\ xu_v(x)&= A_{u_{v}} x^{B_{u_{v}}} (1-x)^{C_{u_{v}}} (1 + E_{u_{v}}x^2), \nonumber \\ xd_v(x)&= A_{d_v} x^{B_{d_v}} (1-x)^{C_{d_{v}}}, \\ x\overline{U}(x)&= A_{\overline{U}} x^{B_{\overline{U}}} (1-x)^{C_{\overline{U}}}, \text {and} \nonumber \\ x\overline{D}(x)&= A_{\overline{D}} x^{B_{\overline{D}}} (1-x)^{C_{\overline{D}}}. \nonumber \end{aligned}$$
(13)

The normalisation parameters \(A_g\), \(A_{u_{v}}\), and \(A_{d_{v}}\) are constrained by QCD sum rules. Additional constraints \(B_{\overline{U}}=B_{\overline{D}}\) and \(A_{\overline{U}} = A_{\overline{D}}(1-f_s)\) are applied to ensure the same normalisation for the \(\overline{u}\) and \(\overline{d}\) densities for \(x \rightarrow 0\). The strangeness fraction is set to \(f_s = 0.31\), as obtained from neutrino-induced dimuon production [73]. The parameter \(C'_g\) is fixed to 25 [20, 69] and the strong coupling constant to \(\alpha _S(M_{\mathrm{Z}}) = 0.1176\).

Table 8 The 19 independent sources of systematic uncertainty considered in the CMS inclusive jet measurement. Out of these, 16 are related to the JES and are listed first. In order to implement the improved correlation treatment as described in Sect. 2.3, the single-particle response source JEC2, see also Appendix A, has been split up into five sources: JEC2a–JEC2e. The shift from the default value in each source of systematic uncertainty is determined by nuisance parameters in the fit and is presented in units of standard deviations

5.2.2 Definition of the goodness-of-fit estimator

The agreement between the N data points \(D_i\) and the theoretical predictions \(T_i\) is quantified via a least-squares method, where

$$\begin{aligned} \chi ^2&= \sum _{ij}^N \left( D_i - T_i - \sum _k^K r_k \beta _{ik}\right) \nonumber \\&\quad \times \mathrm {C}_{ij}^{-1} \left( D_j - T_j - \sum _k^K r_k \beta _{jk} \right) + \sum _k^K r_k^2. \end{aligned}$$
(14)

For fully correlated sources of uncertainty following a Gaussian distribution with a zero mean and a root-mean-square of unity as assumed here, this definition is equivalent to Eq. (9) [74]. As a bonus, the systematic shift of the nuisance parameter \(r_k\) for each source in a fit is determined. Numerous large shifts in either direction indicate a problem as for example observed while fitting \(\alpha _S(M_{\mathrm{Z}})\) with this technique and the old uncertainty correlation prescription.

In the following, the covariance matrix is defined as \(\mathrm {C} = {{\mathrm{cov}}}_{\text {stat}} + {{\mathrm{cov}}}_{\text {uncor}}\), while the JES, unfolding, and luminosity determination are treated as fully correlated systematic uncertainties \(\beta _{ik}\) with nuisance parameters \(r_k\). Including also the NP uncertainties, treated via the offset method in Sect. 4, in the form of one nuisance parameter in total K such sources are defined. Of course, PDF uncertainties emerge as results of the fits performed here, in contrast to serving as inputs, as they do in the fits of \(\alpha _S(M_{\mathrm{Z}})\) presented in Sect. 4.

Table 9 Partial \(\chi ^2\) values, \(\chi ^2_\mathrm {p}\), for each data set in the HERA-I DIS (middle section) or in the combined fit including CMS inclusive jet data (right section). Here, \(n_{\mathrm {data}}\) is the number of data points available for the determination of the 13 parameters. The bottom two lines show the total \(\chi ^2\) and \(\chi ^2/n_\mathrm {dof}\). The difference between the sum of all \(\chi ^2_\mathrm {p}\) and the total \(\chi ^2\) for the combined fit is attributed to the nuisance parameters

All the fully correlated sources are assumed to be multiplicative to avoid the statistical bias that arises from uncertainty estimations taken from data [4547]. As a consequence, the covariance matrix of the remaining sources has to be re-evaluated in each iteration step. To inhibit the compensation of large systematic shifts by increasing simultaneously the theoretical prediction and the statistical uncertainties, the systematic shifts of the theory are taken into account before the rescaling of the statistical uncertainty. Otherwise alternative minima in \(\chi ^2\) can appear that are associated with large theoretical predictions and correspondingly large shifts in the nuisance parameters. These alternative minima are clearly undesirable [62].

Fig. 12
figure 12

The gluon (top) and sea quark (bottom) PDFs as a function of x as derived from HERA-I inclusive DIS data alone (left) and in combination with CMS inclusive jet data (right). The PDFs are shown at the starting scale \(Q^2 = 1.9 \,\text {GeV} ^2 \). The experimental (inner band), model (middle band), and parameterization uncertainties (outer band) are successively added quadratically to give the total uncertainty

5.2.3 Treatment of CMS data uncertainties

The JES is the dominant source of experimental systematic uncertainty in jet cross sections. As described in Sect. 2.3, the \(p_{\mathrm {T}}\)- and \(\eta \)-dependent JES uncertainties are split into 16 uncorrelated sources that are fully correlated in \(p_{\mathrm {T}}\) and \(\eta \). Following the modified recommendation for the correlations versus rapidity of the single-particle response source as given in Sect. 2.3, it is necessary to split this source into five parts for the purpose of using the uncertainties published in Ref. [1] within the \(\chi ^2\) fits. The complete set of uncertainty sources is shown in Table 8.

By employing the technique of nuisance parameters, the impact of each systematic source of uncertainty on the fit result can be examined separately. For an adequate estimation of the size and the correlations of all uncertainties, the majority of all systematic sources should be shifted by less than one standard deviation from the default in the fitting procedure. Table 8 demonstrates that this is the case for the CMS inclusive jet data.

Fig. 13
figure 13

The u valence quark (top) and d valence quark (bottom) PDFs as a function of x as derived from HERA-I inclusive DIS data alone (left) and in combination with CMS inclusive jet data (right). The PDFs are shown at the starting scale \(Q^2 = 1.9 \,\text {GeV} ^2 \). The experimental (inner band), model (middle band), and parameterization uncertainties (outer band) are successively added quadratically to give the total uncertainty

In contrast, with the original assumption of full correlation within the 16 JES systematic sources across all \(|y |\) bins, shifts beyond two standard deviations were apparent and led to a re-examination of this issue and the improved correlation treatment of the JES uncertainties as described previously in Sect. 2.3.

5.3 Determination of PDF uncertainties according to the HERAPDF prescription

The uncertainty in the PDFs is subdivided into experimental, model, and parameterization uncertainties that are studied separately. In the default setup of the HERAFitter framework, experimental uncertainties are evaluated following a Hessian method [74], and result from the propagated statistical and systematic uncertainties of the input data.

Fig. 14
figure 14

The gluon (top left), sea quark (top right), u valence quark (bottom left), and d valence quark (bottom right) PDFs as a function of x as derived from HERA-I inclusive DIS data alone (dashed line) and in combination with CMS inclusive jet data (full line). The PDFs are determined employing the HERAPDF method with a \(Q^2_\mathrm {min} = 7.5\,\text {GeV} ^2 \) selection criterion. The PDFs are shown at the starting scale \(Q^2 = 1.9\,\text {GeV} ^2 \). Only the total uncertainty in the PDFs is shown (hatched and solid bands)

For the model uncertainties, the offset method [75] is applied considering the following variations of model assumptions:

  1. 1.

    The strangeness fraction \(f_s\), by default equal to 0.31, is varied between 0.23 and 0.38.

  2. 2.

    The b-quark mass is varied by \(\pm 0.25\,\text {GeV} \) around the central value of \(4.75\,\text {GeV} \).

  3. 3.

    The c-quark mass, with the central value of \(1.4\,\text {GeV} \), is varied to 1.35 and \(1.65\,\text {GeV} \). For the downwards variation the charm production threshold is avoided by changing the starting scale to \(Q_0^2=1.8\,\text {GeV} ^2 \) in this case.

  4. 4.

    The minimum \(Q^2\) value for data used in the fit, \(Q^2_\mathrm {min}\), is varied from 7.5 to 5.0 and \(10\,\text {GeV} ^2 \).

The PDF parameterization uncertainty is estimated as described in Ref. [19]. By employing the more general form of parameterizations

$$\begin{aligned} xg(x)&= A_g x^{B_g} (1-x)^{C_g} (1 + D_g x + E_g x^2) \nonumber \\&\quad - A'_g x^{B'_g} (1-x)^{C'_g},\\ xf(x)&= A_{f} x^{B_{f}} (1-x)^{C_{f}} (1 + D_{f}x + E_{f}x^2) \nonumber \end{aligned}$$
(15)

for gluons and the nongluon flavours, respectively, it is tested whether the successive inclusion of additional fit parameters leads to a variation in the shape of the fitted results. Furthermore, the starting scale \(Q_0\) is changed to \(Q^2_0 = 1.5\) and \(2.5\,\text {GeV} ^2 \). The maximal deviations of the resulting PDFs from those obtained in the central fit define the parameterization uncertainty. The experimental, model, and parameterization uncertainties are added in quadrature to give the final PDF uncertainty according to the HERAPDF prescription [19].

Using this fitting setup, the partial \(\chi ^2\) values per number of data points, \(n_{\mathrm {data}}\), are reported in Table 9 for each of the neutral current (NC) and charged current (CC) data sets in the HERA-I DIS fit and for the combined fit including the CMS inclusive jet data. The achieved fit qualities demonstrate the compatibility of all data within the presented PDF fitting framework. The resulting PDFs with breakdown of the uncertainties for the gluon, the sea, u valence, and d valence quarks with and without CMS inclusive jet data are arranged next to each other in Figs. 12 and 13. Figure 14 provides direct comparisons of the two fit results with total uncertainties. The parameterization and model uncertainties of the gluon distribution are significantly reduced for almost the whole x range from \(10^{-4}\) up to 0.5. When DIS data below \(Q^2_\mathrm {min} = 7.5 \,\text {GeV} ^2 \) are included in the fit, the effect is much reduced for the low x region \(x < 0.01\), but remains important for medium to high x. Also, for the u valence, d valence, and sea quark distributions some reduction in their uncertainty is visible at high x (\(x \gtrsim 0.1\)).

At the same time, some structure can be seen, particularly in the parameterization uncertainties that might point to a still insufficient flexibility in the parameterizations. Therefore, a comparison is presented in the next Sect. 5.4, using the MC method with the regularisation based on data, which is also implemented within the HERAFitter framework.

Fig. 15
figure 15

The gluon (top left), sea quark (top right), u valence quark (bottom left), and d valence quark (bottom right) PDFs as a function of x as derived from HERA-I inclusive DIS data alone (dashed line) and in combination with CMS inclusive jet data (full line). The PDFs are determined employing the MC method with data-derived regularisation. The PDFs are shown at the starting scale \(Q^2 = 1.9\,\text {GeV} ^2 \). Only the total uncertainty in the PDFs is shown (hatched and solid bands)

Fig. 16
figure 16

The gluon (top left), sea quark (top right), u valence quark (bottom left), and d valence quark (bottom right) PDFs as a function of x as derived from HERA-I inclusive DIS data alone (dashed line) and in combination with CMS inclusive jet data (full line). The PDFs are determined employing the MC method with data-derived regularisation. The PDFs are evolved to \(Q^2 = 10^4 \,\text {GeV} ^2 \). Only the total uncertainty in the PDFs is shown (hatched and solid bands)

Fig. 17
figure 17

Overview of the gluon, sea, u valence, and d valence PDFs before (dashed line) and after (full line) including the CMS inclusive jet data into the fit. The plots show the PDF fit outcome from the HERAPDF method (top) and from the MC method with data-derived regularisation (bottom). The PDFs are shown at the starting scale \(Q^2 = 1.9 \,\text {GeV} ^2 \). The total uncertainty including the CMS inclusive jet data is shown as a band around the central fit result

5.4 Determination of PDF uncertainties using the MC method with regularisation

To study more flexible PDF parameterizations, a MC method based on varying the input data within their correlated uncertainties is employed in combination with a data-based regularisation technique. This method was first used by the NNPDF Collaboration and uses a more flexible parameterization to describe the x dependence of the PDFs [63]. To avoid the fitting of statistical fluctuations present in the input data (over-fitting) a data-based stopping criterion is introduced. The data set is split randomly into a “fit” and a “control” sample. The \(\chi ^2\) minimisation is performed with the “fit” sample while simultaneously the \(\chi ^2\) of the “control” sample is calculated using the current PDF parameters. It is observed that the \(\chi ^2\) of the “control” sample at first decreases and then starts to increase again because of over-fitting. At this point, the fit is stopped. This regularisation technique is used in combination with a MC method to estimate the central value and the uncertainties of the fitted PDFs. Before a fit, several hundred replica sets are created by allowing the central values of the measured cross section to fluctuate within their statistical and systematic uncertainties while taking into account all correlations. For each replica, a fit to NLO QCD is performed, which yields an optimum value and uncertainty for each parameter. The collection of all replica fits can then provide an ensemble average and root-mean-square. Moreover, the variations to derive the model dependence of the HERAPDF prescription do not lead to any further increase of the uncertainty.

Similarly to Fig. 14 for the HERAPDF method, a direct comparison of the two fit results with total uncertainties is shown in Fig. 15 for the MC method. The total uncertainty derived with the MC method is almost always larger than with the HERAPDF technique, and in the case of the gluon at low x, it is much larger. In both cases a significant reduction of the uncertainty in the gluon PDF is observed, notably in the x range from \(10^{-2}\) up to 0.5. Both methods also lead to a decrease in the gluon PDF between \(10^{-2}\) and \(10^{-1}\) and an increase for larger x. Although this change is more pronounced when applying the MC method, within the respective uncertainties both results are compatible. For the sea quark only small differences in shape are observed, but, in contrast to the HERAPDF method that exhibits reduced uncertainties for \(x > 0.2\), this is not visible when using the MC method. Both methods agree on a very modest reduction in uncertainty at high \(x > 0.05\) in the u valence quark PDF and a somehwat larger improvement for the d valence quark PDF, which is expected from the correlations, studied in Fig. 11, where the quark distributions are constrained via the \({\mathrm{q}}\) \({\mathrm{q}}\) contribution to jet production at high \(|y |\) and \(p_{\mathrm {T}}\). Changes in shape of the d valence quark PDF go into opposite directions for the two methods, but are compatible within uncertainties.

All preceding figures presented the PDFs at the starting scale of the evolution of \(Q^2 = 1.9 \,\text {GeV} ^2 \). For illustration, Fig. 16 displays the PDFs derived with the regularised MC method after evolution to a scale of \(Q^2 = 10^4 \,\text {GeV} ^2 \). Finally, Fig. 17 shows an overview of the gluon, sea, u valence, and d valence distributions at the starting scale of \(Q^2 = 1.9 \,\text {GeV} ^2 \) for both techniques, the HERAPDF and the regularised MC method.

5.5 Combined fit of PDFs and the strong coupling constant

Inclusive DIS data alone are not sufficient to disentangle effects on cross section predictions from changes in the gluon distribution or \(\alpha _S(M_{\mathrm{Z}})\) simultaneously. Therefore \(\alpha _S(M_{\mathrm{Z}})\) was always fixed to 0.1176 in the original HERAPDF1.0 derivation. When the CMS inclusive jet data are added, this constraint can be dropped and \(\alpha _S(M_{\mathrm{Z}})\) and its uncertainty (without Q scale variations) is determined to \(\alpha _S(M_{\mathrm{Z}}) = 0.1192^{+0.0023}_{-0.0019}\,\text {(all except scale)}\). Repeating the fit with the regularised MC method gives \(\alpha _S(M_{\mathrm{Z}}) = 0.1188\pm 0.0041\,\text {(all except scale)}\).

Since a direct correspondence among the different components of the uncertainty can not easily be established, only the quadratic sum of experimental, PDF, and NP uncertainties are presented, which is equivalent to the total uncertainty without scale uncertainty. For example, the HERA-I DIS data contribute to the experimental uncertainty in the combined fits, but contribute only to the PDF uncertainty in separate \(\alpha _S(M_{\mathrm{Z}})\) fits. The HERAPDF prescription for PDF fits tends to small uncertainties, while the uncertainties of the MC method with data-derived regularisation are twice as large. For comparison, the corresponding uncertainty in \(\alpha _S(M_{\mathrm{Z}})\) using more precisely determined PDFs from global fits as in Sect. 4 gives a result between the two: \(\alpha _S(M_{\mathrm{Z}}) = 0.1185\pm 0.0034\,\text {(all except scale)}\).

The evaluation of scale uncertainties is an open issue, which is ignored in all global PDF fits given in Table 1. The impact is investigated in Refs. [20, 7678], where scale definitions and K-factors are varied. Lacking a recommended procedure for the scale uncertainties in combined fits of PDFs and \(\alpha _S(M_{\mathrm{Z}})\), two evaluations are reported here for the HERAPDF method. In the first one, the combined fit of PDFs and \(\alpha _S(M_{\mathrm{Z}})\) is repeated for each variation of the scale factors from the default choice of \(\mu _r =\mu _f =p_{\mathrm {T}} \) for the same six combinations as explained in Sect. 4.2. The scale for the HERA DIS data is not changed. The maximal observed upward and downward changes of \(\alpha _S(M_{\mathrm{Z}})\) with respect to the default scale factors are then taken as scale uncertainty, irrespective of changes in the PDFs: \(\Delta \alpha _S(M_{\mathrm{Z}}) =\,^{+0.0022}_{-0.0009}\,\mathrm {(scale)}\).

The second procedure is analogous to the method employed to determine \(\alpha _S(M_{\mathrm{Z}})\) in Sect. 4. The best PDFs are derived for a series of fixed values of \(\alpha _S(M_{\mathrm{Z}})\) as done for the global PDF sets. Using this series of PDFs with varying values of \(\alpha _S(M_{\mathrm{Z}})\), the combination of PDF and \(\alpha _S(M_{\mathrm{Z}})\) that best fits the HERA-I DIS and CMS inclusive jet data is found. The \(\alpha _S(M_{\mathrm{Z}})\) values determined both ways are consistent with each other. The fits are now repeated for the same scale factor variations, and the maximal observed upward and downward changes of \(\alpha _S(M_{\mathrm{Z}})\) with respect to the default scale factors are taken as scale uncertainty: \(\Delta \alpha _S(M_{\mathrm{Z}}) =\,^{+0.0024}_{-0.0039}\,\mathrm {(scale)}\).

In contrast to the scale uncertainty of the first procedure, there is less freedom for compensating effects between different gluon distributions and \(\alpha _S(M_{\mathrm{Z}})\) values in the second procedure, and the latter procedure leads to a larger scale uncertainty as expected. In overall size the uncertainty is similar to the final results on \(\alpha _S(M_{\mathrm{Z}})\) reported in the last section: \(\Delta \alpha _S(M_{\mathrm{Z}}) =\,^{+0.0053}_{-0.0024}\,\mathrm {(scale)}\).

6 Summary

An extensive QCD study has been performed based on the CMS inclusive jet data in Ref. [1]. Fits dedicated to determine \(\alpha _S(M_{\mathrm{Z}})\) have been performed involving QCD predictions at NLO complemented with electroweak and NP corrections. Employing global PDFs, where the gluon is constrained through data from various experiments, the strong coupling constant has been determined to be

$$\begin{aligned} \alpha _S(M_{\mathrm{Z}})&= 0.1185 \pm 0.0019\,(\text {exp}) \pm 0.0028\,(\mathrm {PDF})\\&\quad \pm 0.0004\,(\mathrm {NP})^{+0.0053}_{-0.0024}\,(\text {scale}), \end{aligned}$$

which is consistent with previous results.

It was found that the published correlations of the experimental uncertainties adequately reflect the detector characteristics and reliable fits of standard model parameters could be performed within each rapidity region. However, when combining several rapidity regions, it was discovered that the assumption of full correlation in rapidity y had to be revised for one source of uncertainty in the JES, which suggested a modified correlation treatment that is described and applied in this work.

To check the running of the strong coupling, all fits have also been carried out separately for six bins in inclusive jet \(p_{\mathrm {T}}\), where the scale Q of \(\alpha _S(Q)\) is identified with \(p_{\mathrm {T}}\). The observed behaviour of \(\alpha _S(Q)\) is consistent with the energy scale dependence predicted by the renormalization group equation of QCD, and extends the H1, ZEUS, and D0 results to the \(\,\text {TeV}\) region.

The impact of the inclusive jet measurement on the PDFs of the proton is investigated in detail using the HERAFitter tool. When the CMS inclusive jet data are used together with the HERA-I DIS measurements, the uncertainty in the gluon distribution is significantly reduced for fractional parton momenta \(x \gtrsim 0.01\). Also, a modest improvement in uncertainty in the u and d valence quark distributions is observed.

The inclusion of the CMS inclusive jet data also allows a combined fit of \(\alpha _S(M_{\mathrm{Z}})\) and of the PDFs, which is not possible with the HERA-I inclusive DIS data alone. The result is consistent with the reported values of \(\alpha _S(M_{\mathrm{Z}})\) obtained from fits employing global PDFs.