The impact of the LHC Z-boson transverse momentum data on PDF determinations

The LHC has recently released precise measurements of the transverse momentum distribution of the Z-boson that provide a unique constraint on the structure of the proton. Theoretical developments now allow the prediction of these observables through next-to-next-to-leading order (NNLO) in perturbative QCD. In this work we study the impact of incorporating these latest advances into a determination of parton distribution functions (PDFs) through NNLO including the recent ATLAS and CMS 7 TeV and 8 TeV pTZ data. We investigate the consistency of these measurements in a global fit to the available data and quantify the impact of including the pTZ distributions on the PDFs. The inclusion of these new data sets significantly reduces the uncertainties on select parton distributions and the corresponding parton-parton luminosities. In particular, we find that the pTZ data ultimately leads to a reduction of the PDF uncertainty on the gluon-fusion and vector-boson fusion Higgs production cross sections by about 30%, while keeping the central values nearly unchanged.


Introduction
The production of a Z-boson that subsequently decays into a pair of leptons is a benchmark Standard Model (SM) process at the Large Hadron Collider (LHC). Thanks to its large production rate and clean experimental signature, it can be measured very accurately by the LHC experiments. It can also be calculated to high accuracy within the Standard Model, with the first prediction to next-to-next-to-leading order (NNLO) in the strong coupling constant appearing more than two decades ago [1], and predictions for differential cross sections appearing over one decade ago [2][3][4][5]. This combination of precise experimental data and highly-developed theory allows this process to be used to determine quantities of fundamental importance to our understanding of high-energy phenomena, such as parton distribution functions (PDFs).
Among the many distributions in Z-boson production that have been measured, the transverse momentum (p T ) distribution stands out as an especially interesting one. First of all, the Z-boson p T spectrum is sensitive to the gluon and the light-quark PDFs in the not-so-well constrained intermediate Bjorken-x region, which makes it a promising observable for constraining these distributions. The fact that the Higgs production cross section at the LHC is also sensitive to the same PDF combinations in the same region of Bjorken-x, makes the measurement of this process of direct importance to the search for beyond-the-SM phenomena in the Higgs sector. Second, the transverse momentum JHEP07(2017)130 spectrum of the Z-boson is sensitive to both soft QCD radiation (at small p T ) and to large electroweak (EW) Sudakov logarithms (at large p T ). Given that PDF fits typically rely on fixed-order perturbative QCD, it is interesting to test how well fixed-order QCD predictions can describe this data. This has direct impact on which range of data can be included into PDF fits.
The potential for p Z T measurements to provide valuable constraints on PDF determinations has been considered previously, both on general grounds [6,7], and when considering a recent measurement performed by the CMS collaboration [8]. Both of these studies, which are based on NLO QCD, show the potential of these measurements. At the same time, they also stress the importance of including the full NNLO QCD corrections to the Z-boson transverse momentum distribution in order to fully exploit the constraining power of the data.
In present global PDF determinations, the gluon distribution at medium and large x is primarily constrained by the inclusive-jet p T spectrum measurements. The full NNLO prediction for this observable has been recently calculated in the leading-color approximation [9], but results have not yet been made available for all jet data sets included in PDF fits. This deficiency motivates the study of other cross sections known to NNLO for this purpose, such as the Z-boson p T spectrum, or top-pair production. For the latter, studies have appeared that explored in great detail the possibility of making use of the total cross section [10,11] and more recently of the differential distribution [12] measurements. In particular, it was shown that differential distributions from top-pair production provide significant constraints on the large-x gluon that are comparable to those obtained from inclusive jet production data.
The importance of including NNLO corrections is especially clear in the case of the Z-boson transverse momentum distribution given the recent experimental progress in measuring this observable. The data sets from the 7 and 8 TeV LHC runs from both ATLAS and CMS feature percent-level experimental errors, clearly requiring predictions beyond NLO in order to achieve a comparable theoretical precision.
It is our intent in this manuscript to investigate the inclusion of the p Z T data from the LHC into a global PDF fit. We perform this study in a framework based on the NNPDF3.0 global analysis [13]. The data sets we consider in our work are the 7 TeV measurement of the Z-boson p T by the ATLAS collaboration [14], and the 8 TeV measurements from both ATLAS and CMS [15,16]. These data sets include doubly-differential distributions in both the rapidity and invariant mass of the lepton pair coming from the Z-boson decay. Our theoretical predictions are based on the NNLO QCD calculation of ref. [17]. We also study the impact of including approximate NLO electroweak corrections, as described later in the text. The major findings of our study are summarized below.
• The inclusion of the NNLO QCD corrections generally improves the agreement of theory with the experimental data. This conclusion is consistent with previous observations [18,19]. The simultaneous inclusion of the NLO electroweak contributions together with NNLO QCD, done here for the first time, further improves the data/theory agreement at high p T .

JHEP07(2017)130
• The experimental errors, particularly in the higher-luminosity 8 TeV measurements from ATLAS and CMS, have dropped to the percent level. With the data becoming so precise, a very careful accounting of both experimental and theoretical errors is needed. We observe difficulties in fitting the data without the introduction of an additional uncorrelated error in the fit. This can come from a combination of Monte Carlo integration errors on the theory calculation, residual theoretical uncertainties in the prediction, or from underestimated experimental errors. We expect this issue to become increasingly prevalent in future PDF fits as data becomes more precise.
• We observe difficulties when attempting to simultaneously fit the 7 TeV and 8 TeV LHC data. The ATLAS 7 TeV data is provided only in terms of normalized distributions, while the 8 TeV measurements are also provided as absolute, unnormalized distributions. The normalization to the fiducial cross section performed for the AT-LAS 7 TeV data introduces correlations between the low-p Z T bins and the p Z T > 30 GeV region to which we must restrict our fit due to the appearance of large logarithms in the low-p Z T region that require resummation. The covariance matrix provided for the whole data set then turns out to be incorrect when used for fitting a subset of the data. This prevents us from consistently including the ATLAS 7 TeV data in the fit. To validate this hypothesis, in section 5.3 we perform a fit including the normalized ATLAS 8 TeV data rather than the unnormalized ones but, in analogy to what is done for the 7 TeV data, using the covariance matrix provided for the whole data set, and explore the differences in the fit results. It would be interesting to revisit this issue if the unnormalized data for the 7 TeV measurement were released or if the experimental covariance matrix for the p Z T > 30 GeV region was available. Attempting to include resummed predictions for the low-p Z T region is also possible, although this would introduce additional theoretical uncertainties.
• When adding the 8 TeV LHC Z-boson p T data to the global NNPDF3.0-like fit, we observe a significant decrease of the gluon PDF uncertainty in the Bjorken-x region 10 −3 to 10 −1 as well as a reduction of the uncertainty for light quarks. This leads to a reduction of the PDF uncertainty on the gluon-fusion and Vector Boson Fusion (VBF) Higgs boson cross section of 30%, while the central value prediction for both processes increases by roughly 1%.
Our manuscript is organized as follows. In section 2 we describe the experimental measurements of p Z T that we include in our fit. We also present the baseline fits that do not include these data that we use to assess their impact. In section 3 we discuss the details of the theoretical calculation and settings that we use in the fit. We give a comparison of theory with the p Z T data in section 4. We discuss the agreement observed upon using NLO QCD, NNLO QCD or a combined NNLO QCD + NLO EW prediction, and also consider several different global PDF sets. Our fit to the p Z T data and several baseline fits is described in section 5. We briefly discuss the phenomenological impact of the new fits on the Higgs cross section in section 6. Finally, we conclude in section 7.

JHEP07(2017)130
2 Description of the experimental data and fit settings In this section we first discuss the features of the available experimental measurements. We then describe the methodology and settings of our fit to the parton distribution functions including these data.

p Z
T measurements from the LHC In this work we consider the most recent differential cross section measurements of the Zboson transverse momentum spectrum from ATLAS [14,15] and CMS [16], both with √ s = 7 TeV and √ s = 8 TeV . The ATLAS measurement of the Z-boson transverse momentum spectrum at the centre-of-mass energy of √ s=7 TeV [14] is performed in the Z → e + e − and Z → µ + µ − channels, using data based on an integrated luminosity of 4.7 fb −1 . The results from each channel are combined for transverse momenta up to 800 GeV. The measurement is provided both inclusive in the Z-boson rapidity up to 2.4, and separated into three rapidity bins: 0.0 < |y Z | < 1.0, 1.0 < |y Z | < 2.0 and 2.0 < |y Z | < 2.4. In order to maximize the constraints on PDFs, we include the data in the three exclusive rapidity bins in our analysis. In the experimental paper only the normalized distributions are provided. The measurement is very accurate, with statistical and systematical uncertainties below 1% in all p Z T bins up to 150 GeV and for central rapidities (|y Z | < 2.0), and about 3% for the largest rapidity bin.
In the ATLAS measurement at √ s = 8 TeV [15], the transverse momentum distribution is based on the full 8 TeV data set, with 20.3 fb −1 of integrated luminosity. Measurements are performed in the electron-pair and muon-pair channels and then combined. Compared to the 7 TeV measurement [14], this measurement has higher statistics and an improved control of experimental systematics. Measurements are performed in six invariant mass bins: four bins at low invariant mass below the Z-peak, one on-peak invariant mass bin, and one bin at high invariant mass above the Z-peak, reaching up to M ll = 150 GeV. Results for the off-peak bins are provided in one inclusive rapidity bin (0.0 < |y Z | < 2.4), while the Z-peak measurement results are given both inclusive over the whole rapidity range 0.0 < |y Z | < 2.4 and separated in six rapidity bins 0.0 < |y Z | < 0.4, 0.4 < |y Z | < 0.8, 0.8 < |y Z | < 1.2, 1.2 < |y Z | < 1.6, 1.6 < |y Z | < 2.0 and 2.0 < |y Z | < 2.4. Again, in order to maximize the constraints on PDF, we include the on-peak exclusive rapidity bins in our analysis.
The measurement by the CMS collaboration at the center-of-mass energy √ s = 8 TeV [16] is performed differentially in five rapidity bins: 0.0 < |y Z | < 0.4, 0.4 < |y Z | < 0.8, 0.8 < |y Z | < 1.2, 1.2 < |y Z | < 1.6 and 1.6 < |y Z | < 2.0. The analysis uses the data sample of pp collisions collected with the CMS detector at the LHC in 2012, which corresponds to an integrated luminosity of 19.7 fb −1 . The Z-boson is identified via its decay to a pair of muons. We only include the measurements exclusive in the muon rapidities up to |y Z | = 1.6, given that the data in the highest rapidity bin display a significant incompatibility with respect to the corresponding ATLAS measurement. We leave this issue to further investigation by the experimental collaborations.

Settings for the PDF analysis
The PDF fits presented in this work are based on the NNPDF3.0 global analysis [13] framework. As in the NNPDF3.0 fit, both PDF evolution and DIS structure functions are evaluated in the fit using the public APFEL library [20][21][22], with heavy-quark structure functions computed in the FONLL-C general-mass variable-flavor-number scheme [23] with pole masses and with up to n f =5 active flavors. The DGLAP evolution equations are solved up to NNLO using a truncated solution, and the input parametrization scale is taken to be Q 0 = 1 GeV. The strong coupling α s is set to α s (M Z ) = 0.118, in accordance with the PDG average [24]. The charm and bottom PDFs are generated perturbatively from light quarks and gluons and the value of the heavy-quark masses are set to m c = 1.51 GeV and m b = 4.92 GeV, corresponding to the values recommended by the Higgs Cross section Working Group [25]. Note that these values are different from the ones used in NNPDF3.0, which were instead set to the PDG value of the MS masses. These values will be used in the forthcoming NNPDF3.1 release [26]. The dependence of the fit on the values of the heavy quark masses is moderate, and in particular is negligible for the observables under consideration.
In the analysis performed in this work, we consider two baseline data sets. One consists of all available HERA deep inelastic scattering (DIS) data. An important difference with respect to the NNPDF3.0 HERA-only baseline is that the HERA inclusive structure function data, which in NNPDF3.0 were separated into the HERA-II measurements from H1 and ZEUS [27][28][29], have been replaced by the HERA legacy combination [30] that has become available recently. This data is supplemented by the combined measurements of the charm production cross section σ red cc [31], and the H1 and ZEUS measurement of the bottom structure function F b 2 (x, Q 2 ) [32,33]. The other baseline, a global one, contains all data mentioned in the paragraph above along with the other data analyzed in the NNPDF3.0 global fit: fixed-target neutral-current DIS structure functions from NMC [34,35], BCDMS [36,37], and SLAC [38]; charged-current structure functions from CHORUS inclusive neutrino DIS [39] and from NuTeV dimuon production data [40,41]; fixed-target E605 [42] and E866 [43][44][45] DY production data; Tevatron collider data including the CDF [46] and D0 [47] Z rapidity distributions; and LHC collider data including ATLAS [48][49][50], CMS [51][52][53][54] and LHCb [55, 56] vector boson production measurements, adding up to a total of N dat = 3530 data points. A further difference from the global baseline (on top of the use of the HERA combined measurements) is that in order to ensure a consistent treatment of NNLO corrections, we exclude jet production measurements [57-60] from the global baseline data set. Only the leading color approximation has been made available at NNLO for this process [9] and K-factors are not yet available for all data sets included in global PDF determinations.

Description of the theoretical calculation
For our study we have calculated the Z-boson transverse momentum distribution through next-to-next-to-leading order in perturbative QCD. This computation uses a recent result for the related process of Z-boson in association with a jet [17,61] based on the N -jettiness JHEP07(2017)130 subtraction scheme for NNLO calculations [62][63][64]. As the Z-boson obtains its transverse momentum through recoil against jets, these two processes are identical in perturbation theory as long as the cuts on the final-state jets are relaxed sufficiently so that the entire hadronic phase space is integrated over for the Z-boson p T values under consideration. Since at most three jets can recoil against the Z-boson at NNLO, we take the lower cut on the leading-jet p T to be less than 1/3 times the lowest Z-boson p T included in our study. We have confirmed that our predictions are not sensitive to the exact choice of this jet cut. We furthermore remove completely any constraints on the pseudorapidities of final-state jets. We note that the low transverse momentum region of Z-boson production requires the resummation of large logarithmic corrections of the form (α s ln 2 (M Z /p Z T )) n to all orders in perturbation theory for a proper theoretical description. This resummation is not present in our fixed-order calculation. We consequently restrict our attention to the region p Z T > 30 GeV when comparing our predictions to the experimental data. In section 5.3 we study the effect of raising the cut on p Z T to 50 GeV and observe that results are stable upon the choice of the p Z T cut. We compare the theoretical predictions against both the unnormalized p T spectra provided by the 8 TeV ATLAS and CMS measurements, and also to the distributions normalized to the fiducial Z-boson production cross section provided by the 7 TeV ATLAS measurement. For the normalized distributions we compute the fiducial Z-boson production cross section using the N -jettiness subtraction scheme as implemented in MCFM v8.0 [65]. We cross-check this result against FEWZ [3,5]. For the normalized distributions we do not expand the ratio in the strong coupling constant; i.e., we compute both the numerator and denominator through relative O(α 2 s ). We make the following choices for the electroweak input parameters in our calculation: We use the G µ electroweak renormalization scheme. All other couplings are therefore derived using the parameters above, including the electromagnetic couplings and the weak mixing angle. We choose the following dynamical scale choices for both the renormalization and factorization scales: Here, M ll denotes the invariant mass of the final-state lepton pair. We note that our calculation includes both the Z-boson production and decay to lepton pairs, the contribution from virtual photons, as well as all interferences. The residual theoretical uncertainty on the prediction as estimated by independently varying µ R and µ F around this central value is at the few-percent level. As we will see later it is also important when describing the high-p T data to include the effect of electroweak perturbative corrections. The exact NLO electroweak corrections to the Z-boson transverse momentum spectrum, including the leptonic decay of the Z boson, are known in the literature [66][67][68]. However, they are not publicly available in the form of a numerical code. To account for their effect in our calculation we instead utilize JHEP07(2017)130 the approximate expressions presented in refs. [69,70]. These include all one-loop weak corrections up to terms power-suppressed by the ratio M 2 Z /((p Z T ) 2 + M 2 Z ), and additionally the leading two-loop electroweak Sudakov logarithms. These expressions are strictly valid only after inclusive integration over the final-state lepton phase space; we apply them also to the cross sections with fiducial cuts on the leptons. For the Z-boson peak region in 8 TeV collisions we have checked that these approximations reproduce the numerical magnitude of the exact electroweak corrections to 2% or better in the high-p Z T range where the EW effects become relevant. Since the electroweak corrections themselves do not exceed 10% for the entire region studied, this furnishes an approximation to the distributions we study that is good to the few-per-mille level or better, which is sufficient for our purposes 1 When we study normalized distributions, the NLO electroweak corrections to the fiducial Zboson cross section are obtained from FEWZ [71]. To combine the electroweak and QCD corrections we assume that the two effects factorize, leading to a multiplicative combination. Denoting the differential cross sections at the m-th order in the strong coupling constant relative to the LO result and the n-th order in the QED coupling constant relative to the LO result as dσ (m,n) , we assume that This factorization of the electroweak and QCD corrections is supported by a calculation of the dominant mixed O(αα s ) corrections in the resonance region [72]. The experimental errors in the Z-peak region have reached an unprecedented level for a high-energy collider experiment, approaching the per-mille level over two orders of magnitude in transverse momentum. Numerous effects that were previously not relevant may now come into play, and it is worthwhile to briefly discuss the theoretical issues that arise when attempting to reach this precision. While we can not currently address these issues, they should be kept in mind when considering these data sets.
• The uncalculated N 3 LO perturbative QCD corrections may be needed to further improve the agreement between theory and experimental data. As we will see in a later section the theoretical predictions are generally below the experimental measurements. The inclusion of the NNLO corrections greatly improves the agreement between theory and experiment, but one may expect a further increase from the N 3 LO corrections.
• The electroweak corrections become important for p Z T ∼ 100 GeV, reaching the percent level at this point and continuing to grow as p Z T is increased. While we assume that the electroweak and QCD corrections factorize, this assumption should be addressed, particularly in the high-p Z T region. Non-factorizing O(αα s ) effects could conceivably affect the cross section at the percent level.

JHEP07(2017)130
• Finally, at this level of precision non-perturbative QCD effects that shift the p Z T distribution must be considered. 2 Since the Z-boson transverse momentum distribution is generated by recoil against a final-state jet, there may be linear non-perturbative power correction of the form Λ QCD /p Z T . Simple Monte Carlo estimates indicate that this could reach the half-per-cent level [73].
We note that a previous study of the Z-boson p T spectrum at NNLO was performed in the literature [18]. A comparison of the underlying Z+jet prediction of this calculation with the one used here was performed in ref. [17], where per-mille agreement between the results was found. In addition we have compared the K-factors from the plots of ref. [18] with our results. We have found agreement to the percent-level or better for the studied p Z T bins.

Comparison of theory with LHC data
In this section we compare the theoretical predictions for the p Z T spectrum to the experimental measurements described in section 2. We assess the impact of NNLO QCD and NLO electroweak corrections and quantify the agreement between data and theory by computing the fully-correlated χ 2 for each of the experiments that we include in our analysis using as input the most recent public releases of four PDF determinations: ABMP16 [74] CT14 [75], MMHT2014 [76] and NNPDF3.0 [13].
In figure 1 we compare the NLO and NNLO predictions to the experimental measurements performed by the 7 TeV ATLAS measurements, described in ref. [14], after imposing the additional cut of p Z T > 30 GeV discussed earlier. We also include the NLO EW corrections as described in section 3. All three rapidity bins measured by ATLAS are shown.
We observe that the NNLO corrections significantly increase the NLO predictions, bringing them closer to the measured values of the distribution. The NNLO corrections are approximately constant as a function of p Z T . The EW corrections become significant only for the last three p Z T bins. The quantitative agreement with the theory is summarized in table 1, in which the fully-correlated χ 2 is provided, for each bin separately and for the three bins together.
For MMHT2014, CT14 and NNPDF3.0 the agreement is improved for central rapidities after the inclusion of NNLO QCD corrections, with a further improvement observed upon including NLO electroweak corrections. For ABMP16 only the NNLO fit is available, so in this case we can only test that the agreement is improved upon adding electroweak corrections. In the highest rapidity bin this improvement is only observed for NNPDF3.0. The CT14 χ 2 d.o.f. remains unchanged after including NNLO QCD+NLO electroweak, while the result for MMHT2014 becomes slightly worse. For all PDF sets the χ 2 d.o.f. is much larger than one, indicating a poor agreement between theory and data (before the fit) even after including higher-order corrections.
In figures 2 and 3 a similar comparison is performed for the off Z-peak bins of the 8 TeV ATLAS measurement [15]. the NLO result that is approximately independent of p Z T , with NLO electroweak corrections causing a approximatively constant upwards (downwards) shift for the bins below (above) the Z-peak. While the NNLO predictions are in better agreement with the data than the NLO ones, the data are again higher than the theoretical predictions. The quantitative comparison of the NNPDF3.0, MMHT2014, CT14 and ABMP16 PDF sets using the χ 2 defined previously is shown in table 2. In all cases an improvement is seen upon inclusion of the NNLO QCD corrections, while the incorporation of the NLO electroweak corrections as well further improves the agreement in all individual bins below the Z peak.
We next consider the 8 TeV ATLAS data on the Z-peak divided into rapidity bins.

JHEP07(2017)130
Bin Order upon NLO for five of the six bins, while for MMHT the improvement is only observed for two bins. One reason that the inclusion of the NNLO corrections does not improve the theory/data agreement as significantly as in the other data sets is because the experimental error in this case is very small, and is dominated by the correlated systematic error. Even if NNLO reduces the normalization difference between theory and experiment, remaining shape differences between the predictions and data prevent a large improvement in χ 2 d.o.f. from being obtained. This issue will arise again when we attempt to add this data set to the PDF fit.
Finally, in figures 6 and 7, we show the comparison of the various theoretical predictions with the CMS 8 TeV data divided into rapidity bins [16]. The χ 2 d.o.f. is shown in table 4. As discussed when describing the data in section 2, we focus on the region |y Z | < 1.6. Including NNLO corrections improves the agreement between theory and data in all four rapidity bins, while adding NLO EW corrections further improves the comparison in all but the highest rapidity bins. We note that the CMS relative errors are larger than those found by ATLAS, and the issues seen in the χ 2 d.o.f. comparison are not as pronounced as for the ATLAS 8 TeV data set. Interestingly, even though each individual rapidity bin is improved upon including NNLO, the χ 2 d.o.f. combining all bins is slightly worsened at NNLO, again showing the impact of the correlated uncertainties when attempting to describe these very precise data sets. Fitting the data modifies the PDF shape, thus significantly improving the data description.

JHEP07(2017)130
Bin Order    figure 1 for the ATLAS 8 TeV on-peak data divided into rapidity bins [15]. The three lowest rapidity bins are displayed.    Figure 6. Same as figure 1 for the CMS on-peak 8 TeV data divided into rapidity bins [16]. Only the total uncertainty of data points is displayed, given that separate statistical and uncorrelated uncertainties are not available.

JHEP07(2017)130
Bin Order  Table 3. Same as table 1 for the ATLAS 8 TeV p Z T on-peak distributions in the separate rapidity bins before their inclusion in the fit.

Inclusion of the p Z T distribution in PDF fits
In this section we first look at the correlation between the measured distributions and the various PDF combinations, which provides a first intuition for what parton distributions and at what value of x we should expect to observe the largest impact when including these data in the fit. We then add each data set separately to a DIS HERA-only fit to determine basic compatibility of different data sets and to assess the impact of including EW corrections. Finally, we perform a fit adding p Z T data to a global data set to estimate the impact of including these data in a realistic PDF determination.

Correlations between PDFs and p Z T measurements
To determine the specific PDFs and regions in x for which the Z-boson transverse momentum distribution measurements from ATLAS and CMS provide the most stringent constraints we study the correlation coefficient as a function of x (ρ(x)), between PDFs   at a given scale Q and each bin of the measurements included in the present analysis. In figure 8 we plot the correlations, computed using the SMPDF code [77], of the gluon, up-quark and down-quark distributions with the lowest invariant mass bin of the ATLAS 8 TeV measurement, and with the on-peak 8 TeV measurement of ATLAS, for the lowest rapidity bin. Each line corresponds to one p Z T bin. These are representative examples, the pattern of correlations found for the other measurements is similar. We observe a strong correlation between the gluon distribution in the region x ≈ 10 −3 − 10 −2 with the p Z T measurements, with the correlation coefficient reaching nearly 90%. Slightly weaker correlations of approximately 60% are found for the up-quark and down-quark distributions. These plots make it clear that these data sets have a strong potential to improve our knowledge of PDFs in the 10 −3 − 10 −2 region. The largest p Z T bins are correlated to the 10 −2 − 10 −1 region, thus an increase in the experimental statistics in that region would provide a stronger constraint also in the large-x region.

Impact of the p Z
T data on a DIS HERA-only fit We begin by assessing the quality of a fit to the HERA DIS data upon inclusion of the available p Z T data at 8 TeV. The inclusion of the normalized ATLAS 7 TeV data is problematic and we discuss it separately in section 5.3. We perform several fits that add the individual ATLAS and CMS data sets to HERA separately and in various combinations. As discussed in previous sections we impose the following cuts on the p Z T data:
These constraints leave us with 60 data points for the ATLAS 8 TeV doubly-differential distributions in rapidity and p T on the Z-peak, 44 data points for the ATLAS 8 TeV doubly-differential distributions in the dilepton invariant mass and p T , and 36 data points for the CMS 8 TeV doubly-differential distributions in rapidity and p T on the Z-peak. Additionally, we consider fits using pure NNLO QCD theory and fits with NNLO QCD and NLO EW corrections combined. In the pure NNLO fits we remove the p Z T bins for which the EW corrections are larger than the sum quadrature of the statistical and uncorrelated systematic uncertainty of that data point to avoid fitting EW effects. 3 This imposes the additional constraints These cuts reduce the number of data points to 48 for the ATLAS 8 TeV doubly-differential distributions in rapidity and p T on the Z-peak, 44 data points for the ATLAS 8 TeV doublydifferential distributions in invariant mass and p T , and 28 data points for the CMS 8 TeV doubly-differential distributions in rapidity and p T in the Z-peak region.
Since we have considered numerous combinations of the available data and several different settings, we begin by summarizing the fits in table 5. These are labelled (a)-(j). Our baseline fit with only HERA data is labelled (a). Fits (b) and (c) add individually the ATLAS 8 TeV data and CMS 8 TeV data sets. Fit (d) adds all 8 TeV data sets. A new feature we find necessary in our analysis is the inclusion of an additional uncorrelated uncertainty. This uncertainty is due primarily to the Monte-Carlo integration uncertainties on the computationally expensive NNLO theoretical calculation. Adopting the methodology of ref. [78], we have compared the NNLO/NLO K-factors to a smooth interpolation of them. We find deviations from the smooth interpolation that reach 0.5%. A conservative estimate of this additional uncertainty, which is uncorrelated between bins, is 1%. The need and approximate size of this contribution to the uncertainty can be inferred from an JHEP07(2017)130 HERA y y y y y y y y y y ATLAS8TEV n y(1%) n y(1%) y(0.5%) n y(0.5%) y n y CMS8TEV n n y(1%) y(1%) n y(0.5%) y(0.5%) n y y Table 5. Overview of fits run with HERA-only as a baseline. For each fit, we indicate which measurements from ATLAS and CMS has been included, whether an uncorrelated uncertainty has been added to the χ 2 (in brackets unless it is set to 0). analysis based on modelling the NLO and NNLO theoretical predictions and their fluctuations along the lines of the one described for inclusive jet production in [78]. The addition of this new effect is needed to obtain a good χ 2 in our fit, as shown later in this section.
To study the stability of our fit with respect to this uncertainty we consider the values 0%, 0.5%, and 1%. Fits (b)-(d) use a 1% uncorrelated uncertainty, while fits (e)-(g) use 0.5%. This uncertainty is removed in fits (h)-(j). We will see later that the fitted PDFs are insensitive to the value of this parameter. • The addition of ∆ improves the description of the ATLAS 8 TeV on-peak and CMS 8 TeV data. The χ 2 d.o.f. decreases from 1.66 to 0.77 for the ATLAS 8 TeV set and from 2.51 to 1.21 for the CMS 8 TeV set as ∆ is changed from 0% to 1% in the baseline fit. This effect is less noticeable for the invariant-mass binned ATLAS data due to the slightly larger errors for this set.
• Comparing fit (b) (where only the ATLAS 8 TeV data is fit along with HERA) to fit (c) (where only the CMS 8 TeV data is fit together with HERA) shows that the ATLAS 8 TeV data is slightly more consistent with HERA than CMS. The χ 2 d.o.f. is below one for the ATLAS sets in fit (b) after including them in the fit, while it is at 1.21 in (c) when CMS is combined with HERA.
• Fit (d) shows that it is possible to obtain a reasonably good fit of ATLAS 8 TeV data, CMS 8 TeV data, and HERA with the inclusion of a ∆ = 1% additional uncorrelated uncertainty. Reducing this uncertainty to 0.5% in fit (g) leads to a noticeably worse description of the CMS data. Both the CMS and on-peak ATLAS 8 TeV data sets get a worse χ 2 d.o.f. if ∆ is removed completely, as in fit (j).
• It is clear from the We now study the implications of these fits for the PDF sets. All plots have been done by using the on-line interface of APFEL [20]. We consider the gluon and the singlet-quark combination. To avoid too large a proliferation of plots we focus on the ∆ = 1% and ∆ = 0% cases. In figure 9 we display the impact of the inclusion of these data on the gluon and singlet-quark PDFs by adding them with an additional uncertainty ∆ = 1%. As can be seen from the upper left panel of figure 9, including either the ATLAS 8 TeV and CMS 8 TeV data sets leads to a gluon consistent with the HERA result but with a slightly smaller uncertainty. The upper right panel shows that HERA+8 TeV gives a gluon similar to HERA-only but with a significantly smaller uncertainty for x > 10 −3 .
The situation for the singlet-quark distribution is similar. However the ATLAS and CMS data seem to pull in slightly different directions, the former preferring a harder singlet in the x = 10 −1 region, as it can be observed in the lower-left panel. The lower-right panel shows that the ATLAS data have a stronger pull in the fit and that the simultaneous inclusion of the ATLAS and CMS data at 8 TeV leads to a significantly reduced uncertainty.
The effects of the ∆ = 1% fits on the down-quark and up-quark distributions is similar to the effect on the singlet and thus is not shown here: the PDF errors when HERA and the 8 TeV data sets are simultaneously fit decreases significantly for both the up and down distributions.
In figure 10 we show the results for the PDFs assuming no additional uncertainty, ∆ = 0%. The observed patterns of PDF shifts when 8 TeV data sets are included is very similar to those seen for ∆ = 1%, with only small differences in the estimated PDF errors in certain x regions.           Figure 10. Impact of the inclusion of 8 TeV Z pT data with ∆ = 0% error on the gluon in a HERA-only fit.

JHEP07(2017)130
HERA y y y y y y y ATLAS7TEV n y(1%) y(1%) y(0.5%) y(0.5%) y y ATLAS8TEV n n y(1%) n y(0.5%) n y CMS8TEV n n y(1%) n y(0.5%) n y Table 7. Overview of the fits run with HERA-only as a baseline including the normalized ATLAS 7 TeV along with the other data sets. For each fit, we indicate which measurements from ATLAS and CMS has been included, whether an uncorrelated uncertainty has been added to the χ 2 (in brackets unless it is set to 0).

Normalized versus unnormalized distributions
In this section we focus on the inclusion of the normalized ATLAS 7 TeV data and give details on the tension we observe with the 8 TeV data. We consider a NNLO fit, applying the following cuts where the latter is motivated by the fact that in the last p Z T bin the EW corrections are larger than the sum in quadrature of the statistical and uncorrelated systematic uncertainties of the data. We are left then with 39 data points for the ATLAS 7 TeV distribution.
We summarize the fits in table 7. These are labelled (k)-(p). The baseline is the same as the one presented in the previous section. Fits (k), (m) and (o) add individually the ATLAS 7 TeV data by adding an uncorrelated uncertainty of 1%, 0.5% and none respectively. Fits (l), (n) and (p) add them along with the unnormalized ATLAS and CMS data at 8 TeV with an extra uncorrelated uncertainty of 1%, 0.5% and none respectively.
The results of fits (k)-(p) are summarized in table 8. For each fit the χ 2 per degree of freedom (χ 2 d.o.f. ) of the experiments included in the fit, and of the prediction for the observables not included in the fit (in brackets), are displayed. The additional uncorrelated uncertainty added to the fit is denoted by ∆. Again, we have repeated the baseline HERAonly fit (a) at the beginning of each table section for ease of comparison. A few things are apparent from the table.
• The ATLAS 7 TeV data is inconsistent with the HERA-only fit, with a χ 2 d.o.f. over 20 regardless of the ∆ chosen. A primary reason for this is that the ATLAS 7 TeV data is normalized to the fiducial cross section in each rapidity bin, while the 8 TeV data sets are unnormalized. The normalization performed for the ATLAS 7 TeV data introduces correlations between the low-p Z T bins and the p Z T > 30 GeV region to which we must restrict our fit due to the theoretical considerations discussed earlier. Due to this cut on the data the covariance matrix provided by the experiments for the whole data set cannot be used to consistently include the 7 TeV data in the fit. It would be interesting to revisit this issue if the unnormalized data became available. • Studying fits (l), (n) and (p) shows that it is hard to simultaneously fit the ATLAS 7 TeV data with the 8 TeV data sets. In We now study the implications of these fits for the PDF sets. We consider the gluon, up-quark and down-quark distributions and focus on the ∆ = 1% case only, as we have seen that PDFs remain basically unchanged upon a reduction of ∆. In figure 11 we display the impact of the inclusion of these data on the gluon, up and down quark PDFs by adding them with an additional uncertainty ∆ = 1%. An important feature of these plots is the difference between the impact of the ATLAS 7 TeV data on the gluon, compared to the impact of the 8 TeV data sets. As can be seen from the upper left panel of figure 11, including either the ATLAS 8 TeV and CMS 8 TeV data sets leads to a gluon consistent with the HERA result but with a slightly smaller uncertainty. Adding the ATLAS 7 TeV data leads to an increased gluon distribution for x > 5 · 10 −3 . The upper right panel shows that HERA+8 TeV gives a gluon similar to HERA-only but with a significantly smaller uncertainty for x > 10 −2 . Attempting to fit both 7 TeV and ATLAS 8 TeV data leads to an increased uncertainty, which is barely visible. The tension present between the ATLAS 7 TeV data, and the combined HERA+8 TeV data observed for the gluon PDF is also observed for the up and down distributions. The middle right panel shows that the error on the up-quark PDF is greatly increased for x ≈ 10 −3 when we attempt to simultaneously fit all data. The reason for this can be seen from the left middle panel. The ATLAS 7 TeV data prefers a peak in the up-quark distribution at this value. In contrast, the upper right panel shows a decrease in the PDF error when HERA and the 8 TeV data sets are simultaneously fit. A similar pattern is observed for the down-quark distribution, as is shown in the lower two panels of figure 11. In order to confirm that the anomalous behaviour of PDFs upon the inclusion of the 7 TeV data is due to the fact that they are normalized, we notice that for the ATLAS 8 TeV data in the Z-peak region both an absolute and a normalized measurement are available, with the respective properly determined covariance matrices made available. We can therefore perform an additional fit including the normalized data in a HERA-only fit, treating them in the same way we did with the 7 TeV data, and compare the results with the ones of the fit including the 7 TeV normalized data and the 8 TeV absolute ones.
As far as the quality of the fit is concerned, we observe that these data are harder to fit than both the 7 TeV normalized and more significantly the corresponding 8 TeV absolute ones, with the obtained χ 2 d.o.f. after fitting ranging from 9 (for a fit with ∆ = 0%) to 2.1 (for a fit with ∆ = 1%). While the worst fit quality with respects to the 7 TeV data could be attributed to the higher precision of the 8 TeV ones, it is difficult to find an JHEP07(2017)130 explanation for the differences between the fits including the absolute and the normalized data if one assumes that experimental uncertainties are properly treated in both of them. As far as PDFs are concerned, in figure 12 it is apparent that, while the inclusion of the on-peak ATLAS 8 TeV unnormalized data reduces the uncertainty of the gluon and upquark distributions, the inclusion of the on-peak ATLAS 8 TeV normalized data inflates their uncertainties, thus pointing to their inconsistency with respect to the baseline.
In the case of normalized distributions imposing cuts to remove the low p Z T bins that are affected by large higher order corrections, due to the normalization of the data to the corresponding fiducial cross-section, induces a non trivial change in the covariance matrix is not taken into account by simply dropping the corresponding lines and columns and would need information that is not available outside the experimental collaborations. We, therefore, conclude that we cannot properly include in the fit the normalized data in their present format.

Impact of NLO EW corrections
Another interesting aspect that we can investigate is the impact of electroweak corrections on the obtained PDFs. To probe this we perform fits to the HERA and 8 TeV data sets, with NNLO QCD corrections and both with and without EW corrections. We recall that in the pure NNLO QCD fit we remove bins where the EW corrections are larger than the combined uncorrelated uncertainty, as explained previously. We first display the gluon, singlet, down-quark and up-quark distributions with and without EW corrections in figure 13. The EW corrections have a small but noticeable effect on the PDFs, lowering both the gluon and singlet distributions in the intermediate-x regions. The χ 2 d.o.f. is shown in table 9. The quality of the fit deteriorates slightly upon including EW corrections. This results primarily not because EW corrections worsen the agreement between theory and data, but because with EW corrections included we are able to include additional high-p Z T bins in the fit that were excluded in the pure NNLO QCD fit, and these bins are slightly more discrepant than the lower-p Z T ones. The agreement with the 7 TeV data is marginally improved upon including EW corrections, although it is still inconsistent with the HERA+8 TeV combined fit.  Table 9. Fully correlated χ 2 for the experiments in the HERA + p Z T 8 TeV fit.  Figure 13. Impact of the inclusion of 8 TeV p Z T data with ∆ = 1% PDFs using NNLO or NNLO+EW theory.

Impact of the p Z T data on a global fit
Having investigated the impact of the LHC p Z T data in a fit consisting of only HERA data, which allowed us to consider several aspects of this new data in detail, we turn to their inclusion in a global fit of the available measurements. We follow the NNPDF3.0 analysis with the modifications explained in section 2.2. We set the additional uncorrelated error to ∆ = 1%, and, having established that we cannot consistently include the normalized 7 TeV data in a PDF fit, we only add the unnormalized 8 TeV data to the global baseline. The results for the χ 2 per degree of freedom of each fit is shown in In figure 14 we display the agreement of the NNLO predictions and the data before and after the fit. We observe that the agreement improves and uncertainties shrink.  Table 10. Fully-correlated χ 2 per degree of freedom when the p Z T data is added to the global fits. The numbers in brackets correspond to the χ 2 for experiments which are not fitted. The total χ 2 is computed over all data in the baseline fit and the included p Z T distributions. We have labeled our slightly-modified NNPDF3.0 global baseline as NN30red in the table below.  Figure 14. p Z T observables computed at NNLO with input PDFs before and after the addition of the p Z T data in the global baseline.
In figures 15 and 16 we show the impact of the precise 8 TeV p Z T data on the various PDFs determined from the global fit of the available data. The observed shifts of the PDFs are similar to those seen in the HERA-only fit. The reduction of the uncertainty is milder but still significant. The new PDFs obtained after including the 8 TeV p Z T data are consistent with those found in the baseline.
It is interesting to compare our results with those presented in [12], in which a similar baseline was used and the impact of including top-pair production differential distributions in PDF fits was studied in detail for the first time. The gluon is pulled in the same direction by both data sets, thus displaying a perfect compatibility between these two complementary measurements. The inclusion of the p Z T data decreases the uncertainties on the gluon PDF more than the top-pair data in the intermediate-x region between 10 −3 and 10 −2 . The impact of the top-pair data is much stronger for x > 10 −2 . This result follows the correlation patterns presented in section 5.1 for p Z T and in [12] for top-quark differential distributions, from which it is clear that the latter are strongly correlated with the gluon in the large-x region, while the former are mostly correlated with the gluon (and slightly less with the light-quark distributions) in the intermediate-x region. Given that these two observables provide such strong and complementary constraints, we expect that their JHEP07(2017)130   Figure 15. Impact of the inclusion of the 8 TeV p Z T data on the global gluon and singlet-quark distributions. impact in a joint fit will be stronger than the impact of the jet data, which were traditionally thought to be the best probe of the gluon in the intermediate and large-x regions.
To conclude, we explore the stability of our results upon increasing the p Z T cut from 30 GeV to 50 GeV. As shown in figure 17, both the gluon and singlet central values are very stable, with uncertainties that are larger when a larger p Z T cut is used. We note that the number of p Z T data points in the fit decreases from 48 to 40 for the ATLAS 8 TeV on-peak data, from 44 to 36 for the ATLAS 8 TeV off-peak data and from 28 to 24 for the CMS 8 TeV on-peak data. Thus an increase in the PDF uncertainty when the cut is raised is expected. Everything else is consistent with expectations.

Phenomenological implications
Having derived a new global fit of PDFs with the 8 TeV p Z T data included, it is interesting to investigate the impact of these new measurements on quantities of phenomenological interest.
Parton luminosities directly show the impact of the inclusion of a given data set on the computation of processes. A comparison of the 13 TeV parton-parton luminosities before the p Z T data, and after including the unnormalized 8 TeV data, is presented in figure 18.   Figure 17. Impact of the choice of p Z T cut on the gluon and singlet-quark distributions.   Figure 18. Impact of the inclusion of p Z T data taken at 8 TeV on various parton-parton luminosities at LHC 13 TeV.
The uncertainties significantly decrease in all three luminosities, while their central values remain nearly the same as before.
Furthermore, we present below the 13 TeV predictions for both the gluon-fusion Higgs production cross section and the VBF Higgs production cross section before and after the inclusion of the p Z T data in our global baseline fit. For the gluon-fusion production cross section we set m H = 125 GeV and µ R = µ F = m H /2 and use the code ggHiggs v3.5 [79] to compute the result through N 3 LO in QCD perturbation theory [80]. The result below includes no charm or bottom quarks running in the loop, and no quark mass effects beyond leading order. As shown in table 11, the impact on the Higgs production cross section uncertainties is significant. The error on the gluon-fusion production cross section is reduced by 30%, following the corresponding improvement in the gluon-gluon luminosity observed in figure 18. The central value is increased by only 1%, indicating consistency with the cross section obtained using the previous global fit. For Higgs production in Vector Boson Fusion we compute the total cross section to N 3 LO in QCD using the proVBFH-inclusive code [81] based on the computation presented in [82,83].

Conclusions
In this manuscript we have included for the first time the precision p Z T measurements from the LHC into a global fit of parton distribution functions to next-to-next-to-leading order in QCD. This result is made possible by the recent theoretical predictions of this process 3.92 ± 0.06 (1.5%) 3.96 ± 0.04 (1.0%) Table 11. Predictions for the Higgs cross sections in 13 TeV pp collisions before and after inclusion of the p Z T data in the global fits. The indicated errors are the PDF errors computed according to the NNPDF prescription.
to the necessary order. We have performed a detailed study of the impact of various perturbative corrections, including higher-order QCD and electroweak corrections, on the agreement between theory and data. To asses in detail the impact of these new data we have tested the effect of adding them to several baseline fits, including a DIS HERA-only PDF determination and a global fit with settings closely following those of NNPDF3.0.
The major findings of our study are summarized below. In their current form the normalized ATLAS 7 TeV data cannot be fit simultaneously with the 8 TeV p Z T data. It also cannot be fit together with HERA data, nor in a global fit. The normalization performed on the 7 TeV data ties together the low and high p Z T regions. When we perform the fit on the high−p Z T region needed for a stable fixed-order QCD prediction, thus on a region in p Z T which is different from the one used to normalise the data, the correlations between the bins are lost. The inclusion of this data requires either the experimental covariance matrix for the p Z T > 30 GeV range only, the unnormalized data, or the inclusion of low-p Z T resummation in the theoretical prediction. This last option would introduce an additional theoretical uncertainty into the fit. The extreme precision of the 8 TeV p Z T data binned in rapidity, with uncertainties at the few-per-mille level for the majority of bins, necessitates the introduction of an additional uncorrelated uncertainty for a fit with a low χ 2 per degree of freedom. This additional parameter is meant to cover the residual theoretical uncertainty and the Monte-Carlo integration uncertainty on the theoretical prediction, as well as possible under-reported experimental errors. While the introduction of this extra uncertainty improves the χ 2 per degree of freedom of the fit, we have varied the chosen value of this parameter to check that it has little impact on the actual PDFs obtained from the fit.
Including the 8 TeV p Z T data into a global fit based on the NNPDF3.0 settings results in a significant reduction of the 13 TeV gluon-gluon, quark-gluon and quark-antiquark luminosity errors. To quantify this we have computed the gluon-fusion Higgs production using our NNPDF3.0 baseline, before and after including the p Z T data in the fit. We find that the PDF uncertainty on the Higgs cross section decreases by 30%, while the central value of the prediction increases by 1%, within the previously-estimated uncertainty. We caution that this quantitative estimate of uncertainty reduction holds upon including only the p Z T data into the NNPDF3.0 baseline fit. If additional data sets are included as well, these numbers will change. However, given the power of the p Z T data found in our study, we expect that future global fits using this data will observe similar results.