Parton Distribution Benchmarking with LHC Data

We present a detailed comparison of the most recent sets of NNLO PDFs from the ABM, CT, HERAPDF, MSTW and NNPDF collaborations. We compare parton distributions at low and high scales and parton luminosities relevant for LHC phenomenology. We study the PDF dependence of LHC benchmark inclusive cross sections and differential distributions for electroweak boson and jet production in the cases in which the experimental covariance matrix is available. We quantify the agreement between data and theory by computing the chi2 for each data set with all the various PDFs. PDF comparisons are performed consistently for common values of the strong coupling. We also present a benchmark comparison of jet production at the LHC, comparing the results from various available codes and scale settings. Finally, we discuss the implications of the updated NNLO PDF sets for the combined PDF+alphaS uncertainty in the gluon fusion Higgs production cross section.


Introduction
Parton distribution functions (PDFs) are one of the dominant sources of systematic uncertainty in many of the LHC cross sections relevant for Standard Model precision physics, Higgs boson characterization and new physics searches. The dependence of benchmark total cross sections on PDFs at the 7 TeV LHC was discussed in Refs. [1,2]. The purpose of the present paper is on the one hand to update these benchmark comparisons by including the most recent PDF sets from the various collaborations, and on the other hand to perform quantitative comparisons with 7 TeV data for differential distributions, and with 8 TeV data for inclusive cross sections.
There have been several new NNLO PDF releases since the previous benchmark studies [1]. The ABM collaboration have released ABM11 [3], which supersedes ABKM09 [4]. It uses the combined HERA-I data, MS running heavy quark masses for DIS structure functions [5], and provides PDF sets for a range of values of α s in a fixed flavor number scheme with N f = 5. The CT collaboration have recently released a CT10 NNLO PDF set [?, 6], based on the same global dataset as CT10 NLO [7], and using a NNLO implementation of the S-ACOT-χ variable flavor number scheme for heavy quark structure functions [8]. The HERAPDF collaboration have released the HERAPDF1.5 NNLO PDF set [9,10], which in addition to the combined HERA-I dataset uses the inclusive HERA-II data from H1 [11] and ZEUS [12] 1 . The latest release from NNPDF is the NNPDF2.3 [13] set. Like the previous NNPDF2.1 release this uses the FONLL VFNS at NNLO [14], and now also includes relevant LHC data for which the experimental correlation matrix is available. This is currently the only set which include LHC data in the fit.
As in previous benchmarks, we also use the MSTW08 NNLO PDFs [15]. Although no new public release has been provided, several partial updates have been presented, discussing the impact on the MSTW08 PDFs of the combined HERA-I data and the Tevatron W lepton asymmetry [16] and of the LHC W lepton asymmetry data [17], and additionally the ATLAS W, Z and inclusive jet data in [18]. We do not include in this benchmark study the JR09 PDF set [19] because it is available only for a single value of α s (M Z ). PDF sets will be compared consistently for a common value of α s . All the PDF sets included in this benchmark comparison provide α s (M Z ) variations in a relatively wide range, as summarized in Table 1. Unless otherwise specified, in the rest of the paper we will always quote α s at a scale Q = M Z . We will show results for PDFs, parton luminosities, physical cross sections and χ 2 values for α s (M Z ) = 0.118 as a baseline, and whenever we want to study the effect of varying α s we will provide results for two values of α s (M Z ), α s = 0.117 and 0.119. The motivation for this choice is that these values approximately bracket the current 2012 PDG best fit value [20], α s (M Z ) = 0.1184 ± 0.0007. They also include the preferred or best-fit α s values of CT, MSTW and NNPDF at NNLO [6,[21][22][23]. When error sets are only provided at a single value of α s we will determine uncertainties at other values of α s by computing percentage uncertainties at the value of α s at which error sets are provided, and then applying the same percentage uncertainty to the central value computed for other α s values. For the PDF plots of Sect 2 only (but not for luminosities) the uncertainty shown on the plot for values of α s for which error sets are not available will be taken as the absolute PDF uncertainty computed at the α s value at which error sets are provided: this is because relative uncertainties on PDFs become meaningless in regions where the PDF is very close to zero. PDF Table 1: PDF sets used in this paper. We quote the value α (0) s (M Z ) for which PDF uncertainties are provided, and the range in α s (M Z ) in which PDF central values are available (in steps of 0.001). For ABM11 the α s (M Z ) varying PDF sets are only available for the N f = 5 PDF set.
The structure of this paper is the following: in Sect. 2 we begin by comparing the various sets of NNLO PDFs and the associated parton luminosities, and discuss the similarities and differences between each of the sets. In Sect. 3 we compute predictions for LHC inclusive cross sections at 8 TeV, including Higgs cross sections. Finally in Sect. 4 we compare PDF predictions for all available LHC data at 7 TeV with experimental covariance matrix, and quantify the data theory agreement for each of the PDF sets. Then we turn to discuss in more detail the case of the ATLAS inclusive jet data in Sect. 5, where we compare different codes and theory scale settings for jet production. Finally in Sect. 6 we discuss the implications of this benchmarking for the particular case of the Higgs cross section in gluon fusion and examine possible extensions of the current (2010) PDF4LHC recommendation. Then we conclude and discuss the prospects for future benchmarking studies in Sect. 7. A more technical appendix summarizes the issue of the dependence on the χ 2 definition.
All the above groups provide versions of the respective PDF sets both at NLO and at NNLO. In this paper we will show only the NNLO PDFs, for the particular values of α s mentioned above. We have however produced the results presented here also at NLO and for a wider range of α s values. The complete catalog of plots can be obtained online from HepForge: http://nnpdf.hepforge.org/html/pdfbench/catalog.

Parton distributions and parton luminosities
In this section we compare PDFs and then parton luminosities between the various groups. For definiteness we show here comparisons only between PDFs and luminosities at NNLO for α s = 0.118. Results for several other values of α s and at NLO can be obtained from the catalog of plots on the HepForge website.

Parton distributions
We compare parton distributions at Q 2 = 25 GeV 2 , above the b quark threshold since ABM11 only provide their N f = 5 PDFs for a range of values of α s . 2 For each PDF we compare first NNPDF2.3, CT10 and MSTW08, and then NNPDF2.3, ABM11 and HERAPDF1.5 (with NNPDF2.3 thus being used as a common reference). We consider PDF uncertainties only and not the α s uncertainty, except for the ABM11 PDFs, where the α s uncertainty is treated on a equal footing to the PDF parameters in the covariance matrix. The ABM11 and HERAPDF results also include an uncertainty on quark masses, whereas other groups provide sets with a variety of masses.
In Fig. 1 we show the total quark singlet PDF Σ(x, Q 2 ) = 5 i=1 q i (x, Q 2 ) +q i (x, Q 2 ) , both on a linear and on a logarithmic scale, while in Fig. 2 we show various gluon PDFs g(x, Q 2 ), also on linear and logarithmic scales. There is a good agreement between all the sets for the quark singlet, though the uncertainty band at small x is rather wider for NNPDF and HERAPDF. The gluons of CT10, MSTW and NNPDF are also in reasonable agreement: the PDF one-sigma uncertainty bands overlap for all the range of x. Differences are larger for ABM11. At small x the ABM11 gluon has much smaller uncertainties than other groups, even for x values where there is little constraint from the data, reflecting perhaps the more restrictive underlying PDF parametrization. At high x the ABM11 gluon is smaller than that of CT, MSTW and NNPDF, though the uncertainty band overlaps that of HERAPDF in most places. For HERAPDF1.5 the gluon at large x has larger uncertainties due to the lack of collider data, while at small x it is close to the other PDF sets as expected, since in this region it is only the precise HERA-I data that provides any handle on the gluon.
The total strangeness s + (x, Q 2 ) = s(x, Q 2 )+s(x, Q 2 ) is shown on a logarithmic scale in Fig 3; HERAPDF1.5 is not included because it does not have an independent strangeness parametrization, as HERA data alone do not allow disentangling of the strange contribution. The CT10 strange distribution is somewhat higher than that of other groups. The origin of this difference is under study, which is likely due to different non-perturbative parametrization of the PDFs and differences in the heavy quark treatment of neutrino dimuon data. Both theoretical studies and data from the LHC, both from electroweak vector boson production, and from the exclusive W + c data, should shed light on this issue in the future. First ATLAS data did give some indication on strangeness [24] at small x, but they are still not accurate enough [13] to lead to definite conclusions.
Finally we compare non-singlet distributions: the nonsinglet triplet and the total va-  lence PDFs, respectively defined as in Fig. 4, and the quark sea asymmetry ∆ S =d −ū and the strangeness asymmetry s − = s−s in Fig. 5. There is reasonable agreement for T 3 and V , except for ABM11, for which T 3 at large x is significantly higher than in the other sets. This is due to a larger u distribution in this region. The HERAPDF1.5 PDF uncertainties in T 3 are rather larger, reflecting the fact that HERA data does not provide much information on quark flavor separation. All sets are in a broad agreement on the light sea asymmetry, apart from HERAPDF1.5, which does not include the Drell-Yan and electroweak boson production data and cannot separatē u andd flavors. Only MSTW08 and NNPDF2.3 provide independent parametrizations of the strange asymmetry PDF and are in reasonable agreement within uncertainties.

Parton luminosities
Now we compare parton luminosities. At a hadron collider, all factorizable observables for the production of a final state with mass M X depend on parton distributions through a parton luminosity, which, following Ref. [25], we define as where f i (x, M 2 ) is a PDF at a scale M 2 , and τ ≡ M 2 X /s. As the PDFs, all parton luminosities will be compared for a common value of the strong coupling α s = 0.118. The parton luminosities are displayed as ratios to the NNPDF2.3 set. We assume a center-ofmass energy of 8 TeV.
The gluon-gluon and quark-gluon luminosities are shown in Fig. 6, and the quarkquark and quark-antiquark luminosities are shown in Fig. 7. There is a reasonably good agreement between the NNPDF2.3, MSTW08 and CT10 PDF sets for the full range of invariant masses. However, the PDF uncertainties increase dramatically at M X > 1 TeV, relevant for searches and characterization of heavy particles. Future data from the LHC on high-E T jet production and high-mass Drell-Yan process should be able to provide constraints in this region. 3 Differences with other PDFs are more pronounced for the ABM11 and HERAPDF1.5 PDF sets. For HERAPDF1.5, there is generally an agreement in central values, but the uncertainty is rather larger in some x ranges, particularly for the gluon luminosity, but also to some extent for the quark-antiquark one. For ABM11 instead, the quark-quark and quark-antiquark luminosity are systematically higher by over 5% below 1 TeV, and above this the quark-antiquark luminosity becomes much softer than either NNPDF2.3 or MSTW08. The gluon-gluon luminosity becomes smaller than all the other PDFs at high invariant masses, overlapping only with the very large HERAPDF1.5 uncertainty.
It is also useful to compare the relative PDF uncertainties in the parton luminosities. In Fig. 8 we show this relative PDF uncertainty for the quark-antiquark and gluon-gluon luminosities. Here we see clearly the much larger HERAPDF1.5 uncertainty. At high invariant mass, the uncertainty in the ABM11 gluon-gluon luminosity becomes smaller, despite the fact that this is an extrapolation region due to the scarcity of experimental data.
The larger quark-antiquark luminosity from ABM11 as compared to the other PDF sets could be inferred from the PDF comparison plots at lower Q 2 : the ABM gluon is a little larger than the central value of the other groups below about x = 0.05, and this drives more quark and antiquark evolution at small x values. It has been recently suggested [26], based on results of a NLO fit to DIS data only, that some of these features could be at least in part the consequence of the ABM treatment of heavy quark contributions (see also [27]). Indeed, while CT, MSTW and NNPDF use a variable flavour number scheme [8,14,28], ABM11 uses a fixed flavour number scheme for heavy-quark PDFs. This may explain the increase in the medium-x and small-x light quarks and gluons, and the corresponding softer large-x gluon required by the momentum sum rule, found in the ABM fits [26], though more studies would be required in order to conclusively establish this.
As an alternative explanation, a higher twist contribution has been invoked to explain part of the differences between ABM11 and the other PDF groups. While ABM fit a higher twist contribution, all groups minimize the impact of higher twists by suitable kinematic cuts in Q 2 and W 2 = Q 2 (1/x − 1). The HERAPDF fit includes no data at low W 2 , so that no cut is required. In addition, NNPDF2.3 includes exactly kinematical target mass  corrections [29], known to be a substantial part of the higher twist corrections.
The kinematical cuts Q 2 min and W 2 min applied to the fitted DIS data sets are summarized for each group in Table 2 (the value of the scale Q 2 0 where the PDFs are parametrized is also shown for completeness). It should be observed that the ABM11 fit also imposes an upper cut Q 2 max = 10 3 GeV 2 on the HERA data. Stability under variation of the default MSTW08 kinematical cuts was studied in Ref. [30]. The inclusion of higher twists in MRST fits has previously been shown to lead to only a small effect on high-Q 2 PDFs [31,32], and an ongoing extension of the study in [26] suggests this is qualitatively the same with more up-to-date PDFs. This conclusion has been confirmed in similar studies by NNPDF.
In the latter case we show only the results for MSTW08 and NNPDF2.3, the only PDF sets that introduce an independent parametrization of the strangeness asymmetry.

LHC inclusive cross sections
In this section we compute inclusive cross sections at 8 TeV for various benchmark processes and compare the results for all NNLO PDF sets. We consider electroweak gauge boson production, top quark pair production and Higgs boson production in various channels. We will provide results for α s = 0.117 and α s = 0.119. The Higgs case is discussed in more detail in Sect. 6, together with the interplay between the PDF and α s uncertainties. The comparisons of data and theory predictions for 7 TeV inclusive cross sections has been discussed in detail in previous benchmark studies [1,2], Similar comparisons, but regarding various differential distributions, will be discussed in the next section.
For these inclusive benchmark cross sections, we use the following codes and settings: • Higgs boson production cross sections in the gluon fusion channel have been computed at NNLO with the iHixs code [33]. The central renormalization and factorization scales have been taken to be µ F = µ R = m H . This is the same choice used for the default predictions for Higgs production in gluon fusion adopted by the Higgs Cross Section Working Groups [34]. In all the Higgs production cross sections, we take m H = 125 GeV.
• Higgs production in the Vector Boson Fusion (VBF) channel has been computed at NNLO with the VBF@NNLO code [35], with the scale choice µ F = µ R = m H .
• Higgs production in association with W and Z bosons has been computed at NNLO with the VH@NNLO program [36,37]. Also in this case the scale choice is µ F = µ R = m H .
• Higgs production in association with a top quark pair, ttH, has been computed at LO with the MCFM program [38]. Here the scale choice is µ F = µ R = 2m t + m H .
• Electroweak gauge boson production has been computed at NNLO using the Vrap code [39]. The central scale choice is µ R = µ F = M V .
• Top quark pair production has been computed at NNLO approx +NNLL with the top++ code [40], including the latest development of the calculation of the complete NNLO corrections to the qq → tt production, documented in [41], as implemented in v1.3. The factorization and renormalization scales have been set to µ R = µ F = m t . The settings of the theoretical calculations are the default ones in Ref. [42]. In all calculations we use m t = 173.2 GeV.
Let us emphasize that in this work we consider only PDF uncertainties, and it is beyond the scope of this paper to provide a careful assessment of all relevant theoretical uncertainties into consideration for each of the studied processes.Before any strong statements can be made about the constraining power of various experimental data to discriminate between PDF sets, relevant theoretical uncertainties should be properly included.
We begin with the Higgs production cross sections. Results at 8 TeV for all relevant production channels and different PDF sets and α S (M Z ) values have been collected in Table 3. In all cases the same value of α S is used consistently in both the PDFs and in the matrix element calculation. Results are also represented graphically in Fig. 9  that the error bands shown correspond to the PDF uncertainty only, with the exception for ABM11 and, to a lesser extent, HERAPDF. The main features which emerge from the plots are the following: • The relative sizes of the cross sections obtained using different PDF sets are almost independent of α s : when α s is varied all cross sections get rescaled by a comparable amount.
• The ABM11 and HERAPDF1.5 central predictions for gluon fusion are contained within the envelope of the NNPDF2.3, CT10 and MSTW results. However, the HERAPDF1.5 uncertainty is bigger than this envelope. The agreement with ABM11 would be spoiled if their default value of α s (M Z ) = 0.1134 were used.
• For VBF, W H and ttH production, there is a reasonable agreement between CT10, MSTW and NNPDF2.3 both in central values and in the size of PDF uncertainties. ABM11 instead leads to rather different results, even when a common value of α s is used. For quark-initiated processes, like VBF and W H, the ABM11 cross section is higher than that of the other sets, especially for W H production. For ttH, which receives the largest contribution from gluon-initiated diagrams, the ABM11 cross section is smaller.
• The HERAPDF1.5 PDF uncertainties are distinctly larger, especially for ggH and ttH, mostly due to fact that HERA data do not constrain well the large-x gluon.
A more detailed discussion of the interplay of PDF and α s uncertainties for Higgs production, focused on the gluon fusion channel, will be presented in Sect. 6 below.  An interesting result from Table 3 is that the CT10 and NNPDF2.3 prediction for Higgs production via gluon fusion do not agree within the respective 1-sigma errors, with MSTW lying in between. It is not clear to the authors which is the origin of this discrepancy. It could be related to differences in the gluon parametrization, different datasets, or differences in the statistical methodology. On purely statistically grounds, some discrepancy at the one or two sigma level is not surprising, given different data sets and methodologies, and in spite of the unfortunate location of the discrepancy at the phenomenologically important mass of m H =125 GeV.
Next we consider inclusive top quark pair production. Theoretical progress towards the full NNLO result has been made recently [41][42][43][44][45][46], including the recent calculation of the full NNLO qg initiated contribution [43] (which amounts to a small O(1%) correction, contrary to previous approximate estimates [46]). The approximate NNLO top quark pair production cross sections at 8 TeV for different PDF sets and for different values of α S (M Z ) are been collected in Table 4. In all cases the same value of α S is used consistently in the PDFs and in the matrix element calculation. Results are also shown in Fig. 10, and compared to the recent CMS measurements [47] 4 . The variation in the cross sections with α s shows that the tt total cross section has some sensitivity to the value of α s . This sensitivity has been recently used by CMS to provide the first ever determination of α s from top cross sections [48]. For the tt cross section, we see a reasonable agreement between NNPDF2.3, CT10 and MSTW, while ABM11 is somewhat lower. Using the default value of α s = 0.1134 in ABM11 would make the difference even more marked. The HERAPDF1.5 central value is in good agreement with the global fits but, as usual, the PDF uncertainties are larger.
Finally, we discuss the inclusive electroweak gauge boson production at 8 TeV. Here we can also compare with the recent CMS measurements [49]. The cross section results for α s = 0.117 and 0.119 are collected in Table 5, where from top to bottom we show the results for the W + , W − and Z total cross sections and then for the W + /W − and W/Z cross section ratios. Results are collected graphically and compared to the recent CMS data in Fig. 11. In the figure we show results only for α S (M Z ) = 0.118, since the strong coupling dependence of these cross sections is rather mild, particularly for the cross section ratios.
We find good agreement between MSTW, CT10 and NNPDF2.3 and HERAPDF1.5: this is to be expected, since from Fig. 7 we know that the respective qq parton luminosities are similar in the relevant regions. On the other hand, ABM11 leads to systematically higher cross sections (particularly for the u-quark dominated cross sections), consistent with the larger luminosities seen in Fig. 7. The available LHC 8 TeV data is in good  agreement with the theory predictions, perhaps disfavoring the harder ABM11 cross sections, although the accuracy is not enough for full discrimination. Future data for lepton differential distributions at 8 TeV will be an important ingredient for the next generation of PDF determinations.
1.151 ± 0.016 nb Table 6: Benchmark cross sections at 8 TeV using the settings described as the cross sections in in Tables 3, 4  values. The only case where this difference is significant is for ABM11, since the default value α s (M Z ) = 0.1134 is not close to the values explored above. Therefore, in Table 6 we collect some of the ABM11 NNLO benchmark cross sections, but this time with the default α s (M Z ) value. As is clear by comparing with the results in Tables 3, 4 and 5, using this default value increases the difference between ABM11 and the other PDF sets for Higgs production via gluon fusion and for top quark production (predominantly via gluon fusion at the LHC), whose cross sections are also sensitive to the value of α s , while it brings ABM11 closer to the other PDF sets (and to the CMS data) for the electroweak boson production cross sections.

PDF dependence of LHC differential distributions
We now study the PDF dependence of LHC differential distributions. Since we want to quantify the agreement between data and theory, we consider only the LHC data sets for which the the full experimental covariance matrix is available. These were all taken at 7 TeV centre of mass energy: the 8 TeV data on differential distributions have yet to be released. We will provide a comparison of theory and data for electroweak vector boson and inclusive jet production, and examine whether these data can discriminate between the PDFs. 5 In the next section we will present a more detailed study of jet production, including comparison between different codes, a discussion of scale dependence, and a study of systematic shifts for each PDF set in the description of ATLAS data. We will also provide comparisons for the Tevatron Run II jet production experiments, updating hence the analysis of Ref. [30], based on previous PDF sets.
Specifically, the experimental data that we consider in this section is: • The ATLAS measurement of the W lepton and Z rapidity distributions from the 2010 dataset (36 pb −1 ) [52].
• The LHCb measurements of the W + and W − lepton level rapidity distributions in the forward region from the 2010 data set [54].
• The ATLAS measurement of the inclusive jet production from the 2010 dataset (36 pb −1 ) [55]. We consider the R = 0.4 dataset only, very similar results are obtained if the R = 0.6 radius is also used. 6 • The Tevatron Run II inclusive jet production from the CDF and D0 collaborations, based on the k t and code jet reconstruction algorithms respectively [?, ?].
Theoretical predictions have been obtained as follows: • For electroweak vector boson production, we have computed differential distributions at NLO with the MCFM code [58] interfaced to the APPLgrid software [59] that allows a fast computation of the observable when PDFs are varied, and cross checked against the DYNNLO code [60]. For ATLAS W, Z data we have also cross-checked against the APPLgrid implementation used in the ATLAS strangeness determination [24]. NNLO predictions have been obtained using local K-factors determined with DYNNLO.
• For inclusive jet production at the LHC, we have used the NLOjet++ program interfaced to the APPLgrid software. The scale is chosen to be the p T of the hardest jet in the event within each rapidity bin. Comparisons with FastNLO [61] and MEKS [62] are presented in the next section. Note that, even though NNLO PDFs are used, 5 In addition to these sets, ATLAS data on differential top quark pair production have been recently presented [50]. They include the experimental covariance matrix, hence they could be included in global PDF fits to constrain the gluon PDF. We do not consider inclusive photon production, since the covariance matrix is not available. The impact of the photon data on the PDF analysis was studied in Ref. [51]. 6 Recently, the ratio of these jet cross sections to the 2.76 TeV ones where also presented [56], although in preliminary form. These cross section ratios [57] have the potential to improve the PDF constraints as compared to the 7 TeV data alone, thanks to the cancellation of systematic uncertainties. the accuracy of the calculation is NLO, as NNLO partonic cross sections are not yet available.
• For inclusive jet production at the Tevatron, we have used the FastNLO [61] computation with the default scale choice.
For inclusive jet production, the approximate NNLO coefficient functions, derived from threshold resummation in FastNLO, are used for the Tevatron predictions but not for the LHC. In the latter case they are found to be unnaturally high, much larger than NLO corrections in a region far from kinematical threshold. An improved understanding of threshold corrections at the LHC would be required before they can be used reliably for phenomenology.
In order to provide quantitative comparisons we compute the χ 2 using different PDF sets. Note that, unlike other sets, NNPDF2.3 already includes these data in their fit, so it necessarily provides a good description of all of them. For consistency of comparison, we use the same definition Eq. (7) of the χ 2 with the experimental covariance matrix Eq. (8), even though this is not in general the quantity which has been minimized when determining PDFs. Results at NLO and at NNLO are summarized in Tables 7 and 8, where common values of α s (M Z ) = 0.117 and α s (M Z ) = 0.119 respectively have been used.
As in the previous section, it is useful to provide as well the χ 2 values for ABM11 NNLO with the default value α s (M Z ) = 0.1134, since this value is far from the range explored in this paper and is used in many phenomenological comparisons. Therefore, in Table 9 we collect the χ 2 for the ABM11 NNLO LHC and Tevatron distributions, but this time with the default α s (M Z ) value. By comparing with the results in Tables 7 and 8, we see that the use of α s (M Z ) = 0.1134 somewhat improves the description of the LHC electroweak production data, but at the price of worsening the description of jet production, specially of the precise CDF Run II k T inclusive jet distributions.
The main conclusions which can be drawn from these comparisons are the following: • All PDF sets lead to predictions in reasonable agreement with ATLAS jet data. In general, the description improves when NNLO PDFs are used as compared to NLO PDFs. While the ATLAS jet data appear to have only moderate constraining power, larger impact is expected when the full 7 TeV 5 fb −1 data from CMS and ATLAS will become available.
• The ATLAS and CMS electroweak data appear to have considerable discriminating power, and thus are likely to constrain significantly quarks and anti-quarks at medium and small-x, and specifically strangeness [24]. The worst description of the electroweak data is provided by MSTW08: this will be discussed in more detail below.
• The LHCb data also appears to have discriminating power. This data is sensitive to flavor separation at the smallest values of x, and to fairly high-x quarks, thanks to the forward coverage of the LHCb detector. Predictions obtained using all PDF sets describe the data quite well, with the exception of ABM11. It should be noticed that while at NNLO HERAPDF1.5 agrees with the data, at NLO instead it provides a poor description, due to the large antiquark PDF at high x.  The main reason why MSTW08 provides a rather poor description of the ATLAS W, Z, and especially of the CMS W data is understood [17,18] as a consequence of the behavior of the u v − d v distribution around x ∼ 0.03. Indeed, in Ref. [17] it is shown that once the LHC W asymmetry data is included in MSTW08 using PDF reweighting [63,64], the fit quality improves substantially. In [18] it is shown that an extended parameterisation for quarks (and to a lesser extent a consideration of deuteron corrections) automatically alters the form of u v − d v for the standard MSTW08 fit in the relevant region without including new data, and the predictions for the asymmetry improve enormously -the χ 2 for the prediction for the asymmetry data decreases to about one per point. It is also demonstrated explicitly that this is a very local discrepancy which has a very small effect on more inclusive cross sections, much less than PDF uncertainties.
We can also compare the agreement of the different PDF sets with the data by examining plots, although of course this will be less quantitative than the χ 2 comparison. Note, in particular that the correlated systematical error (shown as a band in the bottom of each plot) is quite large, and typically dominates over the uncorrelated statistical uncertainty. As a consequence, it is difficult to judge the fit quality by simple inspection of the plots. The main motivation to show the plots is to provide a link between the quantitative χ 2 numbers and the visual data versus theory comparisons, that are frequently used, and to make clear that the quantitative information can be provided only by the quantitative estimator. So this plots only serve the purpose of giving a rough indication of the trend of the data versus theory comparison, for example, one see from plots if there are systematic differences between predictions and data or just fine details in shape.
As before, we show on the one hand a comparison of NNPDF2.3, CT10 and MSTW, and on the other of NNPDF2.3, ABM11 and HERAPDF1.5. The comparison for the ATLAS electroweak boson production data is shown in Fig. 12 Table 9: Same as Table 7, but for α s (M Z ) = 0.1134, for the ABM11 predictions at NNLO. production in Fig. 13, and for ATLAS inclusive jet data in Fig. 14

ATLAS inclusive jet production at NLO
As outlined in the last section, jet production is one of the cornerstone processes of the physics program at the LHC. It has reached unprecedented statistical precision and can serve both for detailed tests of perturbative QCD and searches for hypothetical new interactions. Inclusive jet production measurements impose direct constraints on the gluon PDF, and the LHC data can in principle be sensitive to the gluon PDF in a very wide range of momentum fractions x [65]. Inclusive jet production at the Tevatron and LHC can be used to reduce the gluon uncertainty, and thus improve the predictions for important processes like Higgs production in gluon fusion. The last section gave a brief outline of the current comparison of the QCD predictions with various PDF sets to the current ATLAS data, but here we point out some more detailed features of the analysis, which will become more important as the precision of the data collected improves.
There exist two independent computer programs for computing single-inclusive jet and dijet production at NLO at the parton level, EKS [66] and NLOjet++ [67,68]. The EKS code was written in the early 1990's and was used to tabulate point-by-point NLO/LO K factors for jet production in previous CTEQ global fits. As the precision of the jet data increased, it became necessary to develop a new version of EKS with enhanced numerical stability and percent-level accuracy. It also became clear that the PDFs that are constrained by the jet cross sections may depend on the theoretical assumptions made in the computation of NLO theoretical cross sections. To address this issue, a deeply revised version of the EKS code, designated as MEKS [62], was recently released and compared against the other independent code, NLOjet++ [67,68]. This study documented specific settings in the two codes that bring them into agreement to within 1-2% at both the Tevatron and LHC.
The MEKS and NLOjet++ calculations are relatively slow and require significant CPU time to reach acceptable accuracy, so that their direct use in the PDF fits is impractical. Instead, the global PDF analyses reproduce the NLO cross sections by fast numerical approximations. Besides the interpolation of the tabulated NLO/LO K factors that was utilized until recently by CTEQ, a more flexible approach is provided by the programs FastNLO [61,69,70] and APPLgrid [59]. They quickly and accurately interpolate the tables of NLO jet cross sections initially computed in NLOjet++. The threshold corrections to inclusive jet production of O(α 2 s ) [71] are also available as an estimate of the unknown NNLO terms. 7 Besides fixed-order QCD calculations, NLO event generators such as POWHEG [72] and SHERPA [73] combine the NLO hard cross section for inclusive jet production with leading-log showering evaluated by HERWIG or PYTHIA. POWHEG predictions for AT-LAS jet production are different from the fixed-order predictions [55] and also show quite a strong dependence on the parton showering, even at the highest p T , while the SHERPA results are in general closer to NLO. The reasons of the differences between SHERPA and POWHEG are still not well understood, and until this is settled by the Monte Carlo authors, it is more reliable to stick to fixed order NLO QCD calculations. Thus only fixedorder calculations will be considered in the rest of this section. Electroweak corrections to dijet production have also been studied in Refs. [74,75].
In their most recent PDF sets, FastNLO is used by the CTEQ and MSTW groups, while APPLgrid is used by NNPDF. 8 Predictions from either program depend significantly on the choices for the QCD renormalization and factorization scales (µ R and µ F ), recombination scheme, and realization of the jet algorithm [62]. In the case of inclusive jet production, the default hard scale specifying the µ F and µ R values in each event can be taken to be equal to "p T of each individual jet" (FastNLO version 2), "p T of the hardest jet", "p T of the hardest jet in each rapidity bin" (APPLgrid), "average p T in each p T bin (FastNLO version 1)". Differences between these choices are relevant in modern comparisons, as will be shown below. Similar ambiguities are present in computations for dijet production. We will explicitly distinguish between these various scale prescriptions to avoid a common inaccuracy of referring to all of them as "the scales that are equal to jet p T ".

Comparison of computer programs and scale dependence
In this section, we compare predictions of APPLgrid, FastNLO, and MEKS for inclusive jet production in ATLAS at 7 TeV [55]. In Fig. 15 T ) into all three bins. The scale choice in APPLgrid sets µ R and µ F equal to the p T of the hardest jet in each rapidity bin. It coincides with the MEKS1 convention if all p T values fall into different rapidity bins, but will select the larger of the two p T values as the scale if two jets are in the same rapidity bin.
In Fig. 15, we can see that, at the largest p T values, all four predictions agree to within 1%. FastNLO and MEKS1 agree to about 1% even at low p T , apart from minor fluctuations caused by Monte-Carlo integration errors. Their agreement is not surprising, since FastNLO and MEKS1 follow the same scale choice.
At low p T , the APPLgrid event rate shows a systematic deficit of up to 4% compared to FastNLO, while the MEKS2 rate is even smaller in this region. This is the consequence of using the QCD scale that is equal or close to the hardest jet p T , which suppresses the cross section compared to other scale choices. The MEKS2 curve lies, for the most part, within the scale uncertainty band of the FastNLO prediction, with the exception of the p T < 200 GeV region. We conclude that the most up-to-date versions of the parton-level NLO programs show a very good agreement for the same scale choice. However, the scale dependence of the NLO cross section is an important systematic uncertainty, its magnitude is of the same order as the experimental correlated systematic errors. In Fig. 15, which shows the experimental data without the correlated systematic errors, the difference between the theoretical predictions T k and the unshifted central data values D k provides a crude estimate of the size of the correlated systematic error. As seen in the last section, the quality of the fit is very good, so the data and theory predictions can be brought into line using shifts of data corresponding to the size of the correlated errors, or less. In fact, it can be checked from the results in [55] that this is a reasonable approximation, especially at the highest p T values and the highest rapidity bins, where the systematic uncertainty is larger than the difference between T k and D k in Fig. 15. The scale uncertainty, defined as above, varies from about 15% of T k − D k in the bins with the small rapidity to 40% at the largest |y|. Hence, the contribution of the scale uncertainty is significant compared to the experimental systematic uncertainty, and reduces the sensitivity of LHC inclusive (di)jet production to different PDF models, particularly at the highest rapidities. 9

PDF dependence
As already seen in the previous section, all available PDF sets can fit well the current ATLAS jet data, which therefore does not provide much discrimination. However, there are still interesting features to pick out which will become more important for future data. Fig. 16 compares the corresponding NLO predictions made using APPLgrid and various NNLO PDFs: ABM11, CT10, HERA1.5, MSTW08, and NNPDF2.3. We take α s (M Z ) = 0.119 both in the hard cross sections and PDFs for all PDF sets. All the predictions are normalized to the central prediction based on the CT10 NNLO PDF set (with α s = 0.119). For the NNPDF2.3 and CT10 sets, we show the 68% C.L. PDF uncertainties by the hatched bands. The CT10 central predictions are larger than NNPDF2.3 or MSTW2008, mainly due to the harder gluon distribution in the CT10 set. In general, predictions from different PDFs agree with each other within the range of PDF uncertainties, apart from ABM11, particularly at low rapidities. It is also instructive to compare the scale uncertainties shown in Fig. 15 with the PDF uncertainties shown in Fig. 16. In the low p T region, i.e. less than p T ∼ 200 GeV, the scale uncertainty of NLO predictions is comparable to, or even larger than, the PDF uncertainties from CT10. This is another indication that the scale uncertainty presents a limiting factor in the discrimination between the PDF sets, especially for PDFs which are already well-constrained, in this case by HERA data.   (11) using FastNLO (version 2) and various NNLO PDF sets and the CT10 NLO set. The χ 2 D and χ 2 λ contributions to χ 2 from the data residuals and penalties for systematic shifts defined in Eqs. (12) and (13) are shown. The last column contains the best-fit luminosity parameter shift λ 0,lum for each PDF set. We have used α s = 0.119 for all sets.

Systematic shifts in a fit to the ATLAS jet data
When the NLO theoretical predictions are compared to the ATLAS inclusive jet data without including the systematic errors, as in Fig. 15, one generally finds a very poor agreement for any PDF set. In this case, the χ 2 value can reach several thousand units for a total of N pt = 90 data points. The agreement is improved dramatically after the correlated systematic errors are considered. This can be done, e.g., by including a term with a correlation matrix β kα into the log-likelihood function χ 2 [77], as described in the appendix. We will use the definition of χ 2 provided by Eq. (11), which introduces a normally distributed nuisance parameter λ α (with the central value of zero and standard deviation of one) to characterize each of N λ correlated errors.
The ATLAS measurement provides 88 sources of correlated systematic errors, including the luminosity error and the uncertainty in the nonperturbative correction. Each of these errors can cause variations (shifts) of the experimental points from their central values. In addition, each data point is affected by an uncorrelated systematic error, which is significant compared to the statistical error. When both uncorrelated and correlated systematic uncertainties are included into χ 2 , the resulting χ 2 /N pt values are less than 1 for all considered NNLO PDFs, as shown in Table 10. In this comparison, χ 2 is computed according to the procedure summarized in Sec. A.2 and numerically equivalent to Eq. (8). None of the PDF sets is preferred by these χ 2 values. As one can see the χ 2 values are extremely similar to those in the previous section, Tables 7 and 8, even though they are computed with a different code (FastNLO).
For each set of theoretical predictions {T k }, we can also determine the value λ 0α of each nuisance parameter that gives the best description of the data. It is found according to Eq. (14) once the {T k } values are known. In Eq. (11) for the total χ 2 , we can identify two parts: χ 2 D containing contributions from the data residuals d k = (D shifted k − T k )/s k , where D shifted k = D k − α β kα λ 0α ; and χ 2 λ , which is a quadrature sum α λ 2 α of the shifted nuisance parameters. We list χ 2 D and χ 2 λ separately in Table 10 and include histograms of the data residuals d k and best-fit parameters λ 0α in Figs. 17 and 18. In the histograms (which are shown here for CT10 NNLO and NNPDF2.3 NNLO PDFs, but are also representative of the histograms for the other NNLO PDF sets), the observed d k and λ 0α distributions are narrower than the standard normal distributions shown by the dotted curves. In other words, the fit to the 2010 ATLAS data is too good and can't distinguish between the PDF sets. Most of 88 best-fit parameters λ α0 are close to zero, i.e., they don't contribute much to the improvement of χ 2 . None of the best-fit parameters included in Fig. 18 has changed by more than 2.5 standard deviations.
At the Tevatron, some PDF sets required a shift in the data downwards due to the luminosity uncertainty by as much as 3-4 standard deviations in order to agree with the single-inclusive jet production data, cf. the appendix in Ref. [30]. In that paper, it was argued that such shifts are not strictly allowed. The luminosity is common to the data on the Z and W total cross sections and the Z rapidity distribution, which are rather constraining, and for which the PDF predictions are consistent with the nominal luminosity, or even a shift in the data upwards due to the luminosity uncertainty. It should be a mandatory test of PDFs that they fit the Tevatron and LHC jet and vector boson production data simultaneously, while the luminosity uncertainty is treated as completely correlated between the two types of measurement coming from the same experiment and the same data taking period. This has not been checked for all PDF sets and could help explain how some inconsistencies may arise. Note that Fig. 17 in Ref. [30] is of the same form as Fig. 18, but for Tevatron jet data. For the Tevatron inclusive jet data the distribution of the λ 0α is as expected, or even wider for poorly fitting PDFs, in contrast to those for ATLAS data.
The last column of Table 10 lists the best-fit values of the luminosity shift parameter in the ATLAS measurement, computed with the FastNLO code. Only one PDF set (HERA1.5 NNLO) requires a 2.4σ shift in the ATLAS luminosity. However, none of the PDF sets requires a luminosity shift by more than 3σ, suggesting that they are all compatible with the 2010 ATLAS jet data. This is despite the wide variety of predictions exhibited in Fig. 16. Clearly the improvement of the correlated systematic errors will be a priority for future data, since at present the shifts in data can accommodate quite dramatic differences in predictions without a large penalty in χ 2 .

Combined uncertainties in Higgs production
In this section we discuss in somewhat greater detail PDF and α s uncertainties for Higgs production via gluon fusion at the LHC, and specifically how PDF updates affect results obtained using the PDF4LHC recommendation [78] for the determination of PDF+α s uncertainties. At NLO this prescription entails finding the envelope of CT, MSTW and NNPDF PDF+α s uncertainty bands, each obtained with a different choice for the central value of α s . The outer bands of the envelope are taken as the upper and lower limits of uncertainty, and the midpoint value as the best prediction. When the prescription was published, of the three PDF sets included in the prescription, only MSTW was available at NNLO. The NNLO prescription recommended taking the MSTW08 prediction as the central value, while rescaling the MSTW08 uncertainty by a factor determined comparing at NLO the MSTW08 uncertainty to the envelope uncertainty.
The HXSWG cross section numbers have been computed with the current (2010) PDF4LHC prescription, m H = 125 GeV, and de Florian-Grazzini code [79], which incorporates softgluon effects up to next-to-next-to-leading logarithmic accuracy on top of the exact NNLO calculation. Since in this work we use fixed order NNLO calculations as implemented in iHixs, the central values that we will quote cannot be compared directly to the HXSWG numbers. However, this should have a minimal effect on the percentage PDF+α s uncertainty. We can thus investigate how the combined PDF+α s uncertainties would change if computed using an envelope prescription based on the most updated NNLO PDFs from the three global sets: NNPDF2.3, MSTW08 and CT10. Instead of the exact implementation of the PDF4LHC envelope, see e.g., Refs. [1,80], for simplicity we use the following definition: we compute the combined PDF+α s uncertainties for the three PDF sets for α s = 0.117 and α s = 0.119 and let the maximum and minimum values of the cross section in this range define the envelope. Combined PDF and α s uncertainties are obtained adding the two uncertainties in quadrature. The uncertainty on α s is taken to be δα s = 0.0012 at the 68% confidence level. The central value is taken as the midpoint of the envelope defined in this way.
This differs from the 2010 PDF4LHC prescription because in the latter the prediction from each of the three sets is obtained using a different value of α s (α s = 0.118 for CTEQ, α s = 0.119 for NNPDF and α s = 0.120 for MSTW), and also because α s and PDF uncertainties are added in quadrature instead of being determined exactly in the Hessian or Monte Carlo method (though in the Hessian method the two procedures are equivalent [?]). The change in α s range moves the central value a little, however, because the width of the α s range is unchanged the uncertainty is not affected significantly. Adding the PDF and α s uncertainties in quadrature reduces somewhat the MSTW08 uncertainty. Note also that the addition in quadrature was a simplification in the original PDF4LHC prescription. We used it because we think it is more suitable for benchmarking (which is the goal of this paper) while asymmetric α s uncertainties may be more accurate and thus better for phenomenology.
As in Sect. 3, the cross sections are computed at NNLO with the iHixs code [33]. The central scale has been taken to be Q = m H , which is the same choice used for the default predictions for Higgs production adopted by the Higgs Cross Section Working Group [34].
We begin by computing the envelope defined as above at NLO with the same NLO PDF sets of 2010 PDF4LHC prescription: CTEQ6.6, MSTW08, and NNPDF2.0. The corresponding results for α s = 0.117 and 0.119 are summarized in Table 11. The envelope is σ NLO H = 13.98 ± 0.85 pb, (±6.1% "PDF + α s "), so the uncertainty is a bit smaller than the current HXSWG result. Next, we repeat the computation of the NLO envelope, but now with the most upto-date PDF sets: CT10, MSTW08, and NNPDF2.3. Results are also summarized in Table 11, and lead to the envelope: σ NLO H = 14.05 ± 0.86 pb, (±6.1% "PDF + α s ").
so neither the central value nor the uncertainty change significantly. Note that the increase in the Higgs cross section using NNPDF2.3, as compared to NNPDF2.0, does not lead to an increase of the combined PDF+α s error since the CT10 prediction also increases by a similar amount. Finally, using the NNLO cross sections from the most updated NNLO PDF sets, but otherwise using the same prescription as at NLO, we obtain σ N N LO H = 18.75 ± 1.24 pb, (6.6% "PDF + α s ").
The combined PDF+α s error is thus essentially unchanged when going from NLO to NNLO, while the central value is within 2% from the MSTW2008 NNLO value of 18.45 pb, which in the 2010 PDF4LHC prescription was taken as the central value.
These cross sections are plotted in Fig. 19 and Fig. 20, showing both the cross sections from each individual PDF set and the envelope.
In summary, neither the central value nor the uncertainty on the NLO prediction are significantly affected when replacing 2010 PDF with 2012 PDFs, and if the NLO PDF4LHC prescription is also used at NNLO, the combined PDF+α s uncertainty for the Higgs cross section moderately rises from 6.1% to 6.6% when going from NLO to NNLO.
In this respect, the gluon fusion channel with m H = 125 GeV is an unusually unlucky case: for most standard candle processes, as well as for other Higgs production modes, and even for gluon fusion, but with other values of the Higgs mass, the uncertainties decrease when going from 2010 NLO PDFs to 2012 NNLO PDFs, as it is clear from comparing the luminosity plots of Section 2 with analogous plots from previous benchmarks [1,2].
To illustrate this explicitly, we compare in Fig. 21 predictions for W + boson production based on NLO PDFs, both from 2010 and from 2012, and 2012 NNLO PDFs from CT, MSTW and NNPDF. The improved agreement of the PDF sets when going from 2010 to 2012 PDFs is clear: the relative PDF+α s uncertainty, defined with the same prescription as for the Higgs cross section, goes down from ∆ PDF+αs = 5.3% to ∆ PDF+αs = 3.3%, i.e. from more than twice the MSTW2008 uncertainty (sometimes used as a simple approximation to the full envelope) to about 1. 5   contribute to this improvement, which include for instance the adoption of a GM-VFN scheme in NNPDF2.1 and a more similar choice of data sets in the different fits. Similar improvements are expected in all quark-initiated cross sections.

Conclusions and outlook
In this paper we have presented an updated benchmark comparison of the most recent NNLO PDF sets from the ABM, CT, HERAPDF, MSTW and NNPDF collaborations. We have compared PDFs, parton luminosities, LHC inclusive cross sections and differential distributions, always consistently for a common value of α s . Our main result is that the agreement between the most recent CT, MSTW and NNPDF NNLO parton distributions is at least as good as it was at NLO, and in many cases there is a clear improvement, in that the spread of predictions from different groups is reduced significantly. The HERAPDF1.5 NNLO central values are generally in good agreement with those of CT, MSTW and NNPDF, but with rather larger uncertainties due to the smaller dataset that HERAPDF uses. We find no evidence for tension between the HERA-only PDF sets and the PDF sets based on global data sets. It is interesting to observe that at NLO the HERAPDF1.5 set has smaller uncertainty and a more significant disagreement with other sets. The improvement in methodology in the HERAPDF1.5 NNLO analysis seems to not only to enlarge the uncertainty, but also to bring the central values more in line with the other sets.
We find that in several cases ABM11 disagrees with CT, MSTW and NNPDF both for PDFs and LHC cross sections, even when a common value of α s is used. For the ABM11 default α s (M Z ) = 0.1134 value, many of these differences with other sets would further increase (though the vector boson production predictions would become more similar). We have discussed some of the possible explanations of these differences. A plausible explanation seems to be the use of the FFN scheme instead of the GM-VFN scheme used by the other groups, together with the absence of collider data in the ABM11 fit [30]. Other, perhaps less likely explanations, include the presence of higher twist contributions in the ABM PDF determination. We have also shown (cf. the end of Sect. 3) that the 8 TeV LHC data on total inclusive cross sections tend to disfavor ABM11, especially in top quark pair production for the default ABM11 α s value, though experimental uncertainties are not yet precise enough to allow for a decisive discrimination.
For Higgs production via gluon-gluon fusion, we have shown that the combined PDF+α s uncertainties obtained from the envelope of CT, MSTW and NNPDF sets at NNLO are very similar to those obtained at NLO, which in turn are unchanged if 2012 instead of 2010 PDFs are used. For several other LHC processes (in particular quark-initiated processes) the NNLO combined PDF+α s uncertainty is smaller than the 2010 NLO result.
We would like to emphasize that we are not advocating here any new prescription to combine PDF sets, but only exploring the robustness of the original (2010) recommendation with respect to the update of its PDF sets. It is the task of the PDF4LHC Steering Committee to provide official updated recommendations for the use of PDFs in the comparisons with LHC data.
Available LHC data is already providing important information on PDFs, and future LHC data will provide even more stringent constraints. Such constraints will come from more precise measurements of already available processes (such as vector boson production and jet production), measurements of new PDF sensitive differential distributions (such as low-mass Drell-Yan pair, W +charm, tt, or single-top production), as well as new ways of combining the existing data (such as ratios of LHC cross sections at different center-ofmass energies [57]).
Here we have presented only a small subset of all the available plots. A complete repository of all available plots is http://nnpdf.hepforge.org/html/pdfbench/catalog , where in particular we provide • Comparisons of PDFs and parton luminosities at NLO and NNLO, for α s (M Z ) = 0.117 and 0.119.
• Comparisons of PDFs at a low scale of 2 GeV 2 , and as ratios with respect to a reference set for an LHC scale of 10 4 GeV 2 .
• Comparison of PDFs to all the relevant LHC data from ATLAS, CMS and LHCb at NNLO, for α s (M Z ) = 0.117 and 0.119.
• PDF dependence of benchmark cross sections.
A Definitions of χ 2 The value of the χ 2 estimator depends on the assumed functional form for χ 2 in the presence of experimental correlated systematic uncertainties. In this appendix, we document the various definitions of the χ 2 function adopted in this paper and the numerical inputs that were used to obtain our results. Statistical experimental errors are usually reported in the form of a list containing their absolute values, while for systematic errors the list gives relative values expressed as percentages of the central value. Often the systematic errors are asymmetric, i.e. they have different positive and negative deviations. The covariance matrix (cov) ij is calculated from this published information by following one of the methods described below. Needless to say it is important, when benchmarking the various PDF predictions, to state precisely how the covariance matrix was computed. On the other hand some experiments directly provide the covariance matrix rather than the list of systematic errors, and in this case no ambiguity is possible.

A.1 Definitions of χ 2 with the covariance matrix
We can define the χ 2 for a specific experiment with N pt data points by and use it as a figure of merit to judge the agreement between theory and data. The covariance matrix (cov) ij used in this definition may be written as where i and j run over the experimental points (i, j = 1, ..., N pt ), D i are the measured central values, and T i the corresponding theoretical predictions computed with a given set of PDFs. This covariance matrix depends on uncorrelated uncertainties s i , constructed by adding the statistical and uncorrelated systematic uncertainties in quadrature; N L multiplicative normalization uncertainties, σ i,α ; and N c other correlated systematic uncertainties, expressed for convenience in the above equation in terms of their relative values σ (c) i,α . The total number of correlated uncertainties is thus N λ = N L + N c . Asymmetric systematic uncertainties provided by the experiments must be symmetrized to use this expression. We symmetrize them by averaging, σ (c) . Note that it is important when fitting to distinguish between additive uncertainties (where the experimentalists have determined a absolute shift in the observable due to a systematic uncertainty) and multiplicative uncertainties (where the experimentalists have determined a relative shift, as a fraction of the measured observable). In particular it is important not to mistake an additive uncertainty for a multiplicative one just because it is presented multiplicatively (as are the correlated systematics in Eq. (8), where the absolute shift in data point i from systematic uncertainty α is written as σ (c) i,α D i ). Correlated systematics which are truly multiplicative should of course be treated in the same way as the normalization uncertainty.
This distinction is important because if Eq. (8) were used as a figure of merit in an actual PDF fit, it would result in a D'Agostini bias of the multiplicative uncertainties [81]. However it is a suitable objective criteria for comparing a posteriori the various predictions from the different PDF sets that are discussed here, and we have used it as such throughout the body of this paper.
An alternative definition of the covariance matrix is the t 0 -prescription [81], where a fixed theory prediction T (0) i (e.g., the final theory prediction from a previous fit) is used to define the normalization contribution to the χ 2 . In the t 0 -prescription the covariance matrix is thus This definition has the advantage of avoiding the D'Agostini bias from multiplicative normalization uncertainties when performing a PDF fit. When the breakdown into additive and multiplicative uncertainties is not provided by the experiment, one may use T (0) i to compute all systematic uncertainties, to give an 'extended-t 0 ' prescription: This prescription rescales by T (0) i all multiplicative uncertainties (associated with the normalization or not), but also modifies the additive uncertainties given by the experiment in a mild way consistent with their overall uncertainty. We will see below that the t 0 covariance matrix Eq. (9) and the extended-t 0 covariance matrix Eq. (10) generally produce lower χ 2 values than the experimental definition in Eqs. (8) for datasets with substantial systematic uncertainties.
In summary, we consider in this appendix three possible definitions of the covariance matrix: A.2 Definitions of χ 2 with shift parameters An alternative, yet numerically equivalent, representation for the χ 2 function has been used in the jet benchmarking exercise of Sec. 5, following the method traditionally adopted in the CTEQ and MSTW PDF fits for jet and some other data sets. In this representation, the χ 2 figure of merit for goodness-of-fit to an experiment with correlated systematic uncertainties is expressed as [77] where and using the same notation as in the previous section, where the β k,α are the absolute correlated uncertainties. Systematic uncertainties associated with N λ sources may now induce correlated variations (shifts) in the experimental data points. Their effect is approximated by including a sum α β k,α λ α dependent on the correlation matrix β k,α (k = 1, ..., N pt ; α = 1, ..., N λ ) and stochastic nuisance parameters λ α , with one nuisance parameter assigned to every source of the systematic uncertainty. By a common assumption, each λ α follows the standard normal distribution. Its deviation from λ α = 0 incurs a penalty contribution λ 2 α to χ 2 . Under this assumption the minimum of χ 2 with respect to λ α can be found algebraically, since the dependence on λ α is quadratic [77].
We can solve for the best-fit values λ 0α of the nuisance parameters to find with When these λ 0α values are substituted into Eq. (13), one obtains the usual expression Eq. (7) for the χ 2 , with the inverse of (cov) ij ≡ s 2 i δ ij + N λ α=1 β i,α β j,α .
the expression in Eq. (17) coincides with the covariance matrix introduced earlier in Eq. (8). It is equivalent to the usual definition Eq. (8), but also contains explicit information about the values of the systematic parameters λ 0α at the best fit. If instead of Eq. (18) we set we recover the extended-t 0 χ 2 in Eq. (10). Finally, using Eq. (18) to find σ

A.3 Impact on LHC cross sections
Numerical comparisons of the different χ 2 prescriptions will depend on the exact procedure used to determine s i and σ i,α . For example, in the comparisons to the ATLAS jet data in Sec. 5, we compute β k,α using Eq. (18) (equivalent to Eq. (8)), averaging any asymmetric errors. Given the large number of independent systematic parameters (N λ = 88), the asymmetry of some nuisance parameters is not expected to significantly bias the resulting PDFs, which has been confirmed by computing the χ 2 tables using the same χ 2 definition, but following alternative error symmetrization procedures. In all cases examined, the choice of the symmetrization procedure had a smaller effect on χ 2 for the ATLAS jet data than the choice of the χ 2 definition.
We have also checked numerically that the covariance matrix definitions described in Sec. A.1 and the corresponding shift definitions described in Sec. A.2 give the same results when implemented numerically (as they should). Thus for the remainder of this section we will focus on the difference between the three definitions of the covariance matrix described in Sec. A.1.
In Table 12, we compare the default 'experimental' definition of the covariance matrix used in the paper (cf. Eq. (8)) and the t 0 definition of Eq. (9). In this case, recent LHC measurements for W , Z, and jet production are compared to NLO predictions with five PDF sets and α s = 0.119. Results at NNLO and for other values of the strong coupling are qualitatively similar. One can see that the t 0 definition leads to smaller numerical values of χ 2 for all PDF sets considered, especially in experiments with sizable normalization contributions, though it is also clear that the qualitative comparison between PDF sets in Sect. 4 is not affected by this alternative definition.
Similarly, the experimental definition is compared with the extended-t 0 definition in the case of ATLAS jet production with R = 0.4 in Table 13. The comparisons are made for the NLO PDF sets, α s values, and computer codes specified in the table. Three columns of χ 2 /N pt are shown, corresponding to the 'experimental' definition realized according to Eqs. (17) and (18) in column 1; and the extended-t 0 definition based on Eqs. (17) and (19) Table 13: The χ 2 /N pt values for the ATLAS inclusive jet production data obtained with the experimental and extended-t 0 definitions of the χ 2 function. The cross sections are computed at NLO using the specified NLO PDFs, α s values, and the following codes: FastNLO, MEKS with µ F,R equal to the individual jet p T (MEKS1) or p T of the hardest jet (MEKS2), and APPLgrid.
NNPDF2.3 NLO PDFs in column 3. 11 In this case, the the χ 2 /N pt values in columns 2 and 3 are noticeably lower than in column 1. They are not exactly the same in columns 2 and 3, indicating that χ 2 also depends to some extent on the PDF that was used to compute T (0) . However this difference is much smaller than the difference between results using different codes, or different scale choices. The comparisons of the three covariance matrix definitions in the two tables indicate that, for the ATLAS jet data, the difference in the corresponding χ 2 values is quite large. Note that in this comparison, the t 0 covariance matrix treats only the normalization of these data as multiplicative, whereas the extended-t 0 treats all systematic uncertainties as multiplicative. Hence, it is always important to know when performing a fit whether a correlated error as determined by the experimentalists is multiplicative (hence, susceptible to the d'Agostini bias) or additive, since this will affect the impact of that data on the fit.