A critical appraisal and evaluation of modern PDFs

We review the present status of the determination of parton distribution functions (PDFs) in the light of the precision requirements for the LHC in Run 2 and other future hadron colliders. We provide brief reviews of all currently available PDF sets and use them to compute cross sections for a number of benchmark processes, including Higgs boson production in gluon–gluon fusion at the LHC. We show that the differences in the predictions obtained with the various PDFs are due to particular theory assumptions made in the fits of those PDFs. We discuss PDF uncertainties in the kinematic region covered by the LHC and on averaging procedures for PDFs, such as advocated by the PDF4LHC15 sets, and provide recommendations for the usage of PDF sets for theory predictions at the LHC.


Introduction
In Run 2 of the Large Hadron Collider (LHC), the very details of the Standard Model (SM), including cross sections of different processes and Higgs bosons properties, are being mea-sured with very high precision. At the same time, the new data at the highest center-of-mass collision energies ever achieved ( √ s = 13 TeV) are used to search for physics phenomena beyond the SM (BSM). The experimental data used to perform those measurements are generally expected to have percent-level accuracy, depending on details such as the final states and the acceptance and efficiency of the detectors in particular kinematics ranges.
To further test the SM and to identify signals for new physics, measurements need to be compared to precise theoretical predictions, which need to incorporate higher order radiative corrections in quantum chromodynamics (QCD) and, possibly, the electroweak sector of the SM. In order to reach the benchmark precision set by the accuracy of the experimental data, next-to-next-to-leading order (NNLO) corrections in QCD are often required. At next-to-leading order (NLO) in QCD, the residual theoretical uncertainty from truncating the perturbative expansion commonly estimated by variations of the renormalization and factorization scales μ r and μ f are often too large compared to the experimental accuracy. Nonetheless, for observables with complex final states, and indeed for many BSM signals, one must still contend with NLO calculations, which will continue to require corresponding NLO fits.
Parton distribution functions (PDFs) in the proton serve as an essential input for any cross section prediction at hadron colliders and have been measured with increasing precision over the last three decades. Likewise, the strong coupling constant α s (M Z ) at the Z boson mass scale M Z and the masses m h of the heavy quarks h = c, b, t are well constrained by existing data and their determination is accurate at least to NNLO. However, despite steady improvements in the accuracy of PDF determinations over the years, the uncertainties associated with PDFs, the strong coupling α s (M Z ), and quark masses still dominate many calculations of cross sections for SM processes at the LHC. A particularly prominent example is the cross section for the production of a SM Higgs boson in the gluon-gluon fusion channel.
The currently available PDF sets are CJ15 [1], accurate to NLO in QCD, as well as ABM12 [2], CT14 [3], HERA-PDF2.0 [4], JR14 [5], MMHT14 [6], and NNPDF3.0 [7] to NNLO in QCD. These provide a detailed description of the parton content of the proton, which depends on the chosen sets of experimental data as well as on the theory assumptions and the underlying physics models used in the analyses. Both theoretical and experimental inputs have direct impact on the obtained nonperturbative parameters, namely, the fitted PDFs, the value of α s (M Z ) and the quark masses. Moreover, they can lead to large systematic shifts compared to the uncertainties of the experimental data used in the fit. For precision predictions in Run 2 of the LHC it is therefore very important to quantify those effects in detailed validations of the individual PDF sets in order to reduce the uncertainties in those nonperturbative input parameters. Moreover, this will allow one to pinpoint problems with the determination of certain PDFs. Any approach to determine the parton luminosities at the LHC which implies mixing or averaging of various PDFs or of their respective uncertainties, such as that advocated in the recent PDF4LHC recommendations [8], is therefore potentially dangerous in the context of precision measurements, in particular, or when studying processes at kinematic edges such as at large values of Bjorken x or small scales Q 2 . The precision measurements of the LHC experiments themselves help to constrain the different sets of PDFs and may even indicate deviations from SM, cf. [9] for an example. It is thus of central importance that comparisons for all available PDF sets are performed in a quantitative manner and with the best available accuracy.
In this paper we briefly discuss the available world data used to constrain PDFs in Sect. 2 and stress the need to include only compatible data sets in any analysis. The data analysis relies on comparison with precise theoretical predictions, with many of these implemented in software tools. In this respect, we underline in Sect. 3 the importance of opensource code to provide benchmarks and to facilitate theory improvements through indication and reduction of possible errors. In addition, Sect. 3 is devoted to a discussion of a number of crucial theory aspects in PDF fits. These include the treatment of heavy quarks and their masses, QCD corrections for W ± -and Z -boson production applied in the fit of lightflavor PDFs, and the importance of nuclear corrections in scattering data off nuclei. The strong coupling constant is correlated with the PDFs and is therefore an important parameter to be determined simultaneously with the PDFs. The state of the art is reviewed in Sect. 4. The need to address PDF uncertainties for cross section predictions is illustrated in Sect. 5, with the Higgs boson cross section in the gluon-gluon fusion channel being the most prominent case. Other examples include the production of heavy quarks at the LHC in different kinematic regimes. Our observations illustrate important shortcomings of the recent PDF4LHC recommendations [8] which are addressed in Sect. 6, where alternative recommendations for the usage of sets of PDFs for theory predictions at the LHC are provided. Finally, we conclude in Sect. 7.

Data sets and results for PDF fits
We begin with an overview of the currently available data which can be used to determine PDFs and present the fit results of the various groups.

Data sets used in PDF fits
The data used in the various PDF fits overlap to a large extent, as indicated in Table 1. However, there are also substantial differences which are related to the accuracy required in the Table 1 Summary of major hard processes used in the various PDF analyses and the confidence level criteria employed. Detailed references to the different specific data sets used by the various groups are given in Refs. [1][2][3][4][5][6][7] and also the specific statistical analysis applied is described in these papers. Note that different analyses use partly different data sets for some processes PDF sets χ 2 criterion Data sets used in analysis ABM12 [ The CJ14 PDFs sets are provided with 90 % c.l. uncertainties. In addition, a two-tier tolerance test has been applied in case of some data sets c A Monte Carlo method is used to estimate the errors of the PDFs. This method has an interpretation with respect to a level of tolerance only in the range in which the corresponding uncertainties are Gaussian, which applies to wide kinematic regions studied. In these regions the error bands correspond to the 1 σ error obtained using the χ 2 method [10] analysis, the feasibility of efficiently implementing the corresponding theoretical computations, or the subjective evaluation of the data quality, to name a few. The core of all PDF fits comprises the deep-inelastic scattering (DIS) data obtained at the HERA electron-proton (ep) collider and in fixed-target experiments. While the former has used only a proton target, the latter have collected large amounts of data for the deuteron and heavier targets as well. The analysis of nuclear-target data requires an accurate account of nuclear effects. This is challenging already in the case of the loosely-bound deuteron (cf. Sect. 3), and even more so for heavier targets. Therefore, in general, data sets for DIS on targets heavier than deuteron are not used. Nonetheless, different combinations of data sets for the neutrino-induced DIS off iron and lead targets obtained by the CCFR/NuTeV, CDHSW, and CHORUS experiments are included in the CT14, MMHT14, and NNPDF3.0 analyses, but are not used by other groups to avoid any influence of nuclear correction uncertainties. One can also point out the abnormal dependence of the DIS structure functions on the beam energy in the NuTeV experiment [11] and the poor agreement of the CDHSW data with the QCD predictions on the Q 2 slope of structure functions [12][13][14] as an additional motivation to exclude these data sets.
The kinematic cuts applied to the commonly used DIS data also differ in various analyses in order to minimize the influence of higher twist contributions. Another important feature of the DIS data analyses in PDF fits concerns the use of data for the DIS structure function F 2 instead of the data for the measured cross sections. These aspects will be discussed in Sect. 3.
The inclusive DIS data are often supplemented by the semi-inclusive data on the neutral-current and chargedcurrent DIS charm-quark production. The neutral-current sample collected by the HERA experiments provides a valu-able tool to study the heavy-quark production mechanism. This is vital for pinning down PDFs, in particular the gluon PDF at small x, relevant for important phenomenological applications at the LHC (cf. Sect. 5). The charged-current charm production data help to constrain the strange sea PDF, which is strongly mixed with contributions from non-strange PDFs in other observables (cf. Sect. 3).
The Drell-Yan (DY) data are also a necessary ingredient of any PDF analysis since DIS data alone do not allow for a comprehensive disentangling of the quark and anti-quark distributions. Historically, for a long time only fixed-target DY data were available for PDF fits. In particular, this did not allow for a model-independent separation of the valence and sea quarks at small-x. The high precision DY data obtained in proton-proton ( pp) and proton-anti-proton ( pp) collisions from the LHC and the Tevatron open new possibilities to study the PDFs at small and large x. The LHC experiments are quickly accumulating statistics and are currently providing data samples at √ s = 7 and 8 TeV for W -and Zboson production with typical luminosities of over 20 fb −1 per experiment. The rapid progress in experimental measurements causes a greatly non-uniform coverage of the recent DY data in various PDF fits (cf. Tables 2, 3) and leads to corresponding differences in the accuracy of the extracted PDFs. Another issue here is the theoretical accuracy achieved for the description of the DY data. This varies substantially and will be discussed in Sect. 3.
Often, jet production in pp and pp collisions is used as an additional process to constrain the large-x gluon PDF. Here, the QCD corrections are known to NLO and the calculation of the NNLO ones is in progress [15]. The incomplete knowledge of the latter is problematic in view of a consistent PDF analysis at NNLO when including those jet data. This will be discussed in Sect. 4 in connection with the determination of the value of the strong coupling constant α s . that the respective data sets have not been used in the analysis. Other data sets of lower accuracy, which have become obsolete and data sets superseded are listed in Table 3 Experiment Final states  The ABM12 [2] analysis has used older data sets from CMS and LHCb listed in Table 3 b Statistically less significant data with the cut of P   [ 37] [ 38] [ 39] [ 40] [41] [ 42] [43] [ 44] N D P  In addition to these major categories of data commonly used to constrain PDFs, some complementary processes are also employed in some cases, as indicated in Table 1. These comprise the hadro-production of top-quark pairs from pp and pp collisions and the associated production of W bosons with charm quarks in pp collisions. Sometimes, also jet production in ep collisions and prompt photon (γ +jet) production from pp and pp collisions is considered. Except for tt production, the necessary QCD corrections are known to NLO only, so that the same arguments as in the case of jet hadro-production data apply, if those data are included in a fit at NNLO accuracy. For tt production, only the inclusive cross section is considered at the moment in the available PDFs and there is a significant correlation with PDFs, especially of the gluon PDF with the top-quark mass.
Taken together, the set of these data has a number of data points (NDP) of the order of few thousand, and provides sufficient information to describe the PDFs with an ansatz of about O(30) free parameters. The parameters can include the strong coupling constant α s (M Z ) and the heavy-quark masses m c , m b and m t , which are correlated with the PDFs, as will be discussed in Sects. 3 and 4. This provides sufficient flexibility for all PDF groups and it is routinely checked that no additional terms are required to improve the quality of the fit. The exception is the NNPDF group, which typically uses O (250) free parameters in the neural network.
Apart from those considerations there is the general problem of the quality of the experimental data, that is to say whether or not the PDFs are extracted from a consistent data set. The various groups have different approaches, which roughly fall into two classes according to the different confidence level (c.l.) criteria for the value of χ 2 in the goodnessof-fit test. One approach is to fit to a very wide (or even the widest possible) set of data, while the other one rejects inconsistent data sets. In the former case, a tolerance criterion for χ 2 is introduced (e.g. χ 2 = 100), while the latter approach maintains that χ 2 = 1. For the various PDF groups this information is listed in Table 1.
For further reference, we quote here the definition of χ 2 used in data comparisons (Tables 4, 10 , 11, 12, 14, 15, 16). It follows the definition described in Refs. [16][17][18] and is expressed as follows: where μ i represents the measurement at the point i, m i is the corresponding theoretical prediction and δ i,stat , δ i,unc are the relative statistical and uncorrelated systematic uncertainties, respectively. γ i j denotes the sensitivity of the measurement to the correlated systematic source j and b j their shifts, with a penalty term j b 2 j added. In addition, a logarithmic term is introduced arising from the likelihood transition to χ 2 when scaling of the errors is applied [16].
It is important to note that the χ 2 values obtained with Eq. (1) will not necessarily correspond to numbers quoted by PDF groups due to different χ 2 definitions, data treatment and other parameters, see also Table 1.

Results for PDFs
Before we start a detailed discussion of the theoretical aspects of the PDF determinations we would like to illustrate the present status of PDF sets at NNLO in QCD and discuss briefly some differences, which are clearly visible. The currently available sets at NNLO in QCD are shown in Figs. 1, 2, 3, 4, 5 and 6. The light-quark (u, d) valence PDFs together with the gluon and the quark sea distributions (x = 2x(ū +c +d +s) for four active flavors) with the respective uncertainty bands are displayed in Figs. 1, 3 and 5 at the scales Q 2 = 4 GeV 2 , 100 GeV 2 and M 2 Z in the range 10 −4 ≤ x ≤ 1 for the sets ABM12 [2], HERAPDF2.0 [4] and JR14 [5]. Likewise, Figs. 2, 4 and 6 show the sets CT14 [3], MMHT14 [6] and NNPDF3.0 [7].
The main features of the present NNLO PDFs in Figs. 1, 2, 3, 4, 5 and 6 in the main kinematic region of x and Q 2 relevant for hard scattering events at Tevatron and the LHC can be characterized as follows. The agreement in the distributions xu v , and to a slightly lesser extent , is very good for ABM12, JR14 and HERAPDF2.0, as shown in Fig. 1. For the valence PDF xd v there is also an overall reasonable agreement, but the distribution deviates by more than 1σ at x 0.1 in the case of HERAPDF2.0. One should note that xd v is more difficult to measure in e ± p DIS at HERA than xu v and additional constraints from deuteron data are important to fix the details of this PDF, as discussed in Sect. 3 below.
The results on the gluon momentum distribution xg are clearly different at low values of x. Here, JR14 obtains the largest values, followed by ABM12 and HERAPDF2.0, with the latter displaying a valence-like shape below x = 10 −3 . For CT14, MMHT14 and NNPDF3.0 there is very good agreement for xu v , cf. Fig. 2. Some differences are visible in case of xd v , where CT14 reports larger values than NNPDF3.0 at x 5 · 10 −3 and vice versa for smaller x. The spread in for the sets in Fig. 2 is much greater than those by ABM12, JR14 and HERAPDF2.0. This is true as well for the gluon PDF xg with the CT14 uncertainty band for the gluon PDF also covering the predictions for the distributions by ABM12, and HERAPDF2.0. Note that the error bands for CT14 in Figs. 2, 4 and 6 correspond to the c.l. of 68 %.
The disagreement in xd v between HERAPDF2.0 and ABM12 or JR14 persists through the evolution from Q 2 = 4 GeV 2 to Q 2 = M 2 Z , cf. Fig. 3 and 5. Likewise, the spread in xd v between CT14, MMHT14 and NNPDF3.0 becomes more pronounced, as shown in Fig. 4 and 6. On the other hand, differences in the singlet PDFs and xg, while still somewhat visible at Q 2 = 100 GeV 2 , largely wash out at scales Q 2 = M 2 Z which govern the physics of central rapidity events at the LHC. Those remaining differences persist at large scales (as in the case of the gluon PDFs at large x > 0.1) and will have a significant impact. The crucial test for all PDF sets comes through a detailed comparison of cross section predictions to data. This will be discussed in the remainder of the paper, in particular in Sects. 3 and 5.

Theory for PDF fits
In the following we describe the basic theoretical issues for a consistent determination of the twist-two PDFs from DIS and other hard scattering data, on the basis of perturbative QCD at NNLO using the MS scheme for renormalization and factorization.

Theory for analyses of DIS data
The world DIS data are provided in terms of reduced cross sections by the different experiments. QED and electroweak radiative corrections [45,46] are applied, which requires careful study of different kinematic variables [46][47][48][49]. In this way also the contributions from the exchange of more than one gauge boson to the partonic twist-2 terms are taken care of. In part, also the very small QED corrections to the hadronic tensor are already accounted for. These have a flat kinematic behavior and amount to O(1 %) or less [50][51][52][53].
The reduced cross sections are differential in either two of the kinematic variables in the set {x, y, Q 2 }. The virtuality Q 2 = −q 2 of the process is given by the 4-momentum transfer q to the hadronic system. The Bjorken variable is defined as x = Q 2 /(sy), with y = 2 p · q/s, and s = ( p + l) 2 the squared center-of-mass energy, where p and l denote the 4momenta of the nucleon and the lepton. At energies much greater than the nucleon mass M, in the nucleon rest frame y is the fractional energy of the lepton transferred to the nucleon. The double differential cross sections used in the QCD analyses are given by [46,54,55] Fig. 1 The u-valence, d-valence, gluon and sea quark (x = 2x(ū +c +d +s)) PDFs with their 1 σ uncertainty bands of ABM12 [2], HERAPDF2.0 [4] and JR14 (set JR14NNLO08VF) [5] at NNLO at the scale Q 2 = 4 GeV 2 ; absolute results (left) and ratio with respect to ABM12 (right) where α and G F denote the fine-structure and Fermi constants, Y ± = 1 ± (1 − y) 2 and we keep the dependence on the masses of the nucleon (M), the W and Z boson (M W , M Z ) and the lepton (m l ). The structure functions F NC i and W i are nonperturbative quantities defining the hadronic tensor. They can be measured by varying y at fixed Q 2 and x and form the input to the subsequent analysis. Note that in some previous experiments, assumptions were made about the longitudinal structure functions F NC L and W L , where (in the massless limit) since at the time of the data analysis the corresponding QCD corrections were still missing. Therefore, it is important to use the differential cross sections in Eqs.
(2)-(4) and to add the correct longitudinal structure functions [56,57], cf. also [58,59]. The structure functions are measured for DIS off massive proton and deuteron targets and are, therefore, subject to target mass corrections, which play an important role in the region of lower values of Q 2 and larger values of x. They are available in Refs. [54,60,61]. The neutral-and charged-current structure functions F NC i , W NC i and W CC i consist of a sum of several terms, each weighted by powers of the QED and electroweak couplings, and F NC i also include the γ − Z mixing, which has to be accounted for, cf. [46,54,55]. Then, considering one specific gauge boson exchange, one arrives at a representation for the individual structure functions F i , which are only subject to QCD corrections. For example, for pure photon exchange, they are given by where F τ =2 i denotes the leading-twist term and the coefficients C τ i parametrize the higher twist contributions. The latter terms are of relevance for many DIS data sets, see Sect. 2.
Present day QCD analyses are aimed at determining the leading-twist contributions to the structure functions. There are two ways to account for the higher twist terms: (i) One is fitting the higher twist terms in F i . A rigorous approach requires the knowledge of their scaling violations (term by term) and of the various Wilson coeffi-cients to higher orders in α s , see e.g. Sect. 16 in Ref. [54]. Since at present this is practically out of reach, such fits remain rather phenomenological. Moreover, the size of the (non-singlet) higher twist contributions to the structure function F 2 vary strongly with the correction applied to the leading-twist term up to next-to-next-to-next-toleading order (N 3 LO), as shown in Ref. [58,62]. Also, the non-singlet and singlet higher twist contributions are different [63,64]. (ii) One has to find appropriate cuts to sufficiently reduce the higher twist terms. For instance, in the flavor nonsinglet analysis of Ref. [58] the cuts are taken to be Q 2 ≥ 4 GeV 2 , In the combined singlet and non-singlet analysis of Ref. [64], Q 2 ≥ 10 GeV 2 , W 2 ≥ 12.5 GeV 2 have been used. These bounds are found empirically by cutting on W 2 and/or Q 2 starting from larger values. Applying these cuts severely limits the amount of large-x DIS data to be fitted, and usually leads to an increase of the errors of α s (M Z ) and other fitted fundamental parameters and distributions.
Both methods (i) and (ii) allow to access the leading-twist contributions to the DIS structure functions, with some qualifications, however. The cuts suggested in (ii) remove the large-x region potentially sensitive to the higher twist terms. However, they do not affect the data at x 0.1, where higher twist terms still play an important role [64,65]. To some extent, the influence of higher twist can be dampened by using the DIS data for the structure function F 2 instead of the cross section, since in this case the contribution to the structure function F L need not be considered. It should be kept in mind, though, that the experimental separation of the structure functions F 2 and F L in the full phase space of common DIS experiments is very difficult without dedicated longitudinal-transverse cross section separations. Therefore, the data on F 2 and F 3 are typically extracted from the cross section once a certain model for the structure function F L is taken. This approach is justified only at large x, however, where the contribution of F L is small and even large uncertainties in the modeling of F L cannot affect the extracted values of F 2 and F 3 . The procedure is not applicable for HERA kinematics, on the other hand, and introduces a bias into the analysis of the data taken by the New Muon Collaboration (NMC), in particular, a shift in the value of α s preferred by the fit [59,66], cf. Sect. 4. Nonetheless, the MMHT14 analysis [6] is still based on the DIS structure function data, as are the CJ15 and CT14 analyses [1,3]. The latter two use cross section data for HERA, and for HERA and NMC, respectively, and structure function data elsewhere. While CT14 performed this important change for the HERA and NMC data, the authors of Ref. [67] report that the change has little impact. Refs. [59,64], on the other hand, disagree with this claim.
The deep-inelastic structure functions are inclusive quantities and contain massless parton and heavy-quark contributions, Here the massless terms are given by where C i, j denote the massless Wilson coefficients, f j the massless PDFs and μ 2 is the factorization scale. The Mellin convolution is abbreviated by ⊗ and the sum over j is over all contributing partons. The renormalization group equation for F massless i allows one to eliminate the dependence on μ 2 order-by-order in perturbation theory. This also applies to there is a dependence on the heavy-quark masses m c and m b in the present world DIS data. Note that F massive i is not the structure function of a tagged heavy-flavor sample, which would be infrared sensitive [68]. Rather, F massive i is just given as the difference of the complete structure function F τ =2 i and the massless one in Eq. (8).

Massless PDFs
For all QCD calculations we use perturbation theory. The factorized representation in terms of Wilson coefficients and PDFs is obtained using the light-cone expansion [69][70][71][72]. For a proper definition of the Wilson coefficients and the PDFs one has to use the LSZ formalism and refer to asymptotic states at large times t → ±∞, given by massless partons. We first describe the massless contributions in Eqs. (7) and (8), and then discuss the contribution of heavy quarks. The Wilson coefficients in Eq. (8) have a perturbative expansion in the strong coupling constant. At one- [73], two- [74][75][76][77][78][79][80][81], and three-loop order [56,57,[82][83][84] they have been calculated for the neutral-current structure functions F i , with i = 1, 2, 3, except for the γ − Z mixing contribution at three loops.
The structure functions in general depend on the following three non-singlet and singlet combinations of parton densities: with the light-quark distributions f i of flavor i and n f the number of massless flavors. These combinations evolve in μ 2 from an initial scale μ 2 0 by the QCD evolution equations, where the singlet distribution q s (x, μ 2 ) mixes with the gluon distribution g(x, μ 2 ), The non-singlet splitting functions are given by while the anomalous dimensions γ i j corresponding to the splitting functions P i j are obtained by a Mellin transform, where we suppress for brevity the dependence of P i j and γ i j on the strong coupling a s (μ 2 ) = α s (μ 2 )/(4π). The P i j are known as well at one- [85][86][87][88][89][90], two- [81,[91][92][93][94][95][96][97][98][99][100][101][102] and at three-loop order [103,104] (see also [105,106] for checks of P ps and P gg at that order). The scale evolution of the strong coupling constant in the MS scheme is given by where β k denote the expansion coefficients of the QCD βfunction [107][108][109][110][111][112][113][114][115][116]. The evolution equations (10), (11) can be either solved in x-or Mellin (or moment) N -space. In Mellin-space, defined by the transform Eq. (14), an analytic solution is possible [117][118][119][120] by arranging the solution systematically in powers of the coupling constants a s (μ 2 ) and a s (μ 2 0 ), and even forming factorization-scheme invariant expressions. In case of the x-space solutions this is usually not done due to the necessary iterative solution. In the small-x region the iterative solution usually leads to a pile-up of a few per cent [121]. This can be corrected for in x-space solutions by applying the method given in [122]. Likewise, the iterated solution can be obtained in Mellin N -space and is a standard option of the evolution program QCD-Pegasus [123].

Heavy-quark structure functions
Disregarding contributions from charm at the input scale ("intrinsic charm"), cf. [124][125][126], the heavy-flavor corrections to the DIS functions are described by Wilson coefficients. The leading order results are of O(a s ). Higher order corrections in the perturbative expansions are, therefore, of O(a 2 s ) at NLO, and of O(a 3 s ) at NNLO, similar to the case of the longitudinal structure function [56,57]. The corrections in the neutral-and charged-current cases are available in one- [127][128][129][130][131][132][133][134] and two-loop order [135][136][137][138], where the latter corrections were given in semi-analytic form.
For the neutral-current exchange the heavy-flavor contributions to the structure functions F i with i = 2, L are [139,140]: , where the electroweak current couples either to a massless (L i,k ) or the massive (H i,k ) quark line. From three-loop order onwards there are contributions containing both heavy flavors c and b in a non-separable form, denoted by F massive,{c,b} 2 , in Eq. (16). The PDFs and the coupling constant in Eq. (16) are defined in the MS scheme, while the heavy quark masses are taken either in the on-shell or MS schemes [140,141]. The relations of the heavy quark masses between the pole mass (on-shell scheme) and the MS scheme are available to four-loop order [142]. Due to its better perturbative stability, the MS scheme for the definition of the heavy-quark mass is preferred.
For Q 2 m 2 i the asymptotic corrections to F L are available at three-loop order [139,143]. For F 2 , four out of the five massive Wilson coefficients, L ns 2,q , L ps 2,q , L s 2,g and H ps 2,q are known as well [105,139,[144][145][146] at large scales Q 2 . For the remaining coefficient, H s 2,g , an estimate has been made in Ref. [147] based on the anticipated small-x behavior [148], a series of moments calculated in [140], and two-loop operator matrix elements from Refs. [149,150]. This provides a good approximation of the NNLO corrections.

Heavy-flavor PDFs
An important issue in PDF fits concerns the number of active quark flavors and the theoretical description of heavy quarks such as charm and bottom. Due to the large range of hard scales Q for the scattering processes considered, different effective theories may be applied. At low scales, when Q O(few) GeV, one typically works with n f = 3 massless quark flavors, setting n f = 3 in the hard scattering cross section, the evolution kernels and the anomalous dimensions. In this case, only the light-quark PDFs for up, down and strange are taken into account. At higher scales, e.g., for hadro-production of jets at high transverse momentum p t or top quarks, additional dynamical degrees of freedom lead to theories with n f > 3. By means of the renormalization group and matching these are related to the case with n f = 3 massless quarks. Technically, one has to apply decoupling relations [151] at some matching scale μ, for instance in the transition of α This introduces some logarithmic dependence on the masses of the heavy quarks m c , m b and m t for charm, bottom and top. One should also note that the matching of the effective theories for n f → n f + 1 does not need to be smooth. In fact, it introduces discontinuities, such as for the running coupling as a solution of the QCD β-function at higher order in the perturbative expansion, where α in the MS scheme at the matching scale μ, see e.g., [152].
In a similar manner, PDFs in theories with a fixed number n f > 3 of quark flavors are related to those for n f = 3 with the help of heavy-quark operator matrix elements (OMEs) A i j at a chosen matching scale μ. Potential non-universal non-logarithmic heavy-flavor effects are taken care of by the Wilson coefficients. Starting with the PDFs in a so-called fixed-flavor number scheme (FFNS) with n f fixed, one has for the gluon and the singlet quark distributions with operator mixing in the singlet sector. In particular, one has [140,153] PDFs for charm and bottom (h = c, b) are then constructed as at the matching scale μ from the quark singlet and gluon PDFs with h =h.
The matching conditions are typically imposed at the scale μ = m h , and f h+h = 0 is assumed for scales μ ≤ m h . The necessary heavy-quark OMEs A i j depend logarithmically on the heavy-quark masses as α l s ln k (μ 2 /m 2 h ) with 0 ≤ k ≤ l in the perturbative expansion. As discussed above, the OMEs are known to NLO analytically [149,154] and at NNLO either exactly or to a good approximation [105,140,144,147,155]. Thus, charm and bottom PDFs can be consistently extracted in QCD with a fixed number n f = 3, 4 or 5.
It should be stressed, however, that the decoupling relations for PDFs in Eqs. (17)-(20) assume the presence of one heavy quark at a time upon moving from lower scales to higher ones. Beginning at three-loop order, however, there are graphs containing both charm-and bottom-quark lines, and charm quarks cannot be treated as massless at the scale of the bottom-quark due to (m c /m b ) 2 ≈ 1/10. Such terms cannot be attributed to either the charm-or bottom-quark PDFs, but rather one has to decouple charm and bottom quarks together at some large scale. The simultaneous decoupling of bottom and charm quarks in the strong coupling constant α s is discussed, for instance, in Ref. [156].

Variable-flavor number schemes
The hard scattering cross sections also depend on the number of flavors n f and additional parton channels may open up, which have to be included as well. In addition, processes involving massive quarks depend logarithmically on the ratio Q 2 /m 2 h , where Q is some hard scale associated with the scattering. For the heavy-flavor Wilson coefficients in Eq. (16) these logarithms are of the type α l s ln k (Q 2 /m 2 h ) with 1 ≤ k ≤ l in perturbation theory. These originate from collinear singularities screened by the heavy-quark mass due to the constrained phase space for gluon emission from massive quark lines, and as a prefactor of these logarithms one has the standard splitting functions. In addition to logarithmic terms, there are also power corrections (m 2 h /Q 2 ) l in the heavy-flavor Wilson coefficients, usually appearing in form of higher transcendental functions. In the asymptotic regime of Q 2 m 2 h the logarithms dominate and the kinematic dependence is measured experimentally, for instance in the tagged flavor case for charmquark pairs in the structure function F cc 2 . Logarithms of a similar kind are also experimentally observed in differential distributions, e.g. due to the QED corrections proportional to ln k (Q 2 /m 2 l ) with m l being the charged lepton mass, cf. [45,46].
The resummation of the logarithms α l s ln k (Q 2 /m 2 h ) to all orders in perturbation theory is effectively carried out by the transition n f → n f + 1 along with the introduction of new heavy-quark PDFs as described in Eqs. (17)-(20). Whether such a transition is appropriate or not depends, of course, on the detailed kinematics. If the hard scale is closer to threshold, Q 2 m 2 h , a description with n f light flavors is more suitable, while for Q 2 m 2 h one switches to a theory with n f + 1 massless flavors. In order to achieve a unified description for hard scattering cross sections both at low scales Q 2 m 2 h and asymptotically for Q 2 m 2 h , so-called variable-flavor number schemes (VFNS) have been constructed. Effectively, these aim at an interpolation between the asymptotic limits of the quarks being very light or very heavy relative to the other hard scales of the process. At the LHC such considerations apply to processes with bottom quarks in the initial state such as single top-quark production as well as bottomquark initiated Higgs boson production (see Ref. [157] for more recent studies and Ref. [158] for the so-called Santander matching scheme for Higgs boson production in bb annihilation).
Of particular interest for PDF fits is the reduced cross section for the pair-production of heavy quarks in DIS, which is parametrized in terms of the DIS heavy-quark structure functions F h i for i = 2, L in Eq. (16) and with heavy-flavor Wilson coefficients which are known exactly at NLO [135], and to a good approximation at NNLO [147] in QCD. For the interpolation n f → n f + 1 of the heavy-quark structure functions F h i a number of so-called general-mass VFNS (GM-VFNS) have been discussed in the literature, such as ACOT [159][160][161], BMSN [153], FONLL [162] or RT [163]. These keep m h = 0 and are to be distinguished from the zero-mass VFNS (ZM-VFNS), which describes essentially the massless case. Note that presently the GM-VFNS are applied only to one single heavy flavor at the time. That is the sequential transition n f → n f + 1, so that the charm or bottom quarks are not considered simultaneously and charmquark mass effects in the bottom-quark structure function F b i are neglected as discussed above.
The various GM-VFNS contain a number of additional assumptions, and some come in more than one variety. The GM-VFNS differ, for instance, in the way the low-Q 2 region is modeled. This modeling is a necessary undertaking to provide a reasonable behavior of the VFNS in the kinematical regime of present DIS data. Additional assumptions in the GM-VFNS are related to the matching scale μ for the transition n f → n f + 1 as the adopted choice μ = m h is not unique, see [164] for an in-depth discussion.
Briefly, the problem can be illustrated with the heavyquark velocity, the leading order formula [131] being The transition n f → n f + 1 when the corresponding flavor is considered as nearly massless requires light-like velocities v 1. That implies the absence of all power corrections (m 2 h /Q 2 ) l in the heavy-flavor Wilson coefficients at the matching scale μ 2 . In practice, the matching is often applied at the scale μ 2 = m 2 h and for kinematics Q 2 m 2 h , where this condition is not fulfilled, which implies restrictions on the range in x in Eq. (21).
Finally, the logarithmic accuracy of the resummation for large scales Q 2 m 2 h , or the order of perturbation theory in current implementations of GM-VFNS, is often not consistent. For example, NNLO evolution [103,104] of the massless PDFs is sometimes combined with the heavy-quark OMEs at NLO [149,154], omitting NNLO results [105,140,144,147].
Altogether, these facts introduce a significant model dependence in any GM-VFNS implementation. A sensitive parameter to test this model dependence is the extraction of the charm-or bottom-quark mass used in different versions of GM-VFNS and subsequent comparison with the Particle Data Group (PDG) results [55]. In addition, the quality of the various GM-VFNS can be quantified with the goodnessof-fit for the description of HERA data on DIS charm-quark production obtained from the combination of individual H1 and ZEUS results [165].

Validation with DIS charm-quark production
The H1 and ZEUS combined data for the DIS charm production cross section are unique for tests of GM-VFNS and span the region of 2.5 ≤ Q 2 ≤ 2000 GeV 2 and 3 × 10 −5 ≤ x ≤ 0.05. Values for the charm-quark mass and χ 2 /NDP for the individual PDF sets ABM12, CJ15, CT14, HERAPDF2.0, JR14, MMHT14, NNPDF3.0 as well as the averaged set PDF4LHC15 are given in Table 4, along with the information on the scheme choice for the heavy-quark struc-ture functions and on the theoretical accuracy for the massive quark DIS Wilson coefficients. For reference, Table 4 also list the χ 2 /NDP values for the HERA inclusive cross section data [4]. Comparisons to data for the DIS charm production cross section are shown in Figs. 7, 8, 9 and 10. Note that Table 4 adopt the standard definition of perturbative orders for the heavy-quark structure functions. This is not shared by CT14, MMHT14 and NNPDF3.0 in their GM-VFNS. There the Born contribution to the heavy-quark Wilson coefficients for ep → cc, which is proportional to O(α s ), is referred to as being "NLO". Analogously, the one-loop corrections of order O(α 2 s ) are denoted by "NNLO". Table 4 and Fig. 7 show that the ABM12 [2] and JR14 [5] PDFs at NNLO, using charm-quark masses in the MS scheme, provide a good description of the data. Both ABM12 and JR14 use the approximate massive threeloop Wilson coefficients as obtained in [147] by interpolating between existing O(α 3 s ) soft-gluon threshold resummation results and the O(α 3 s ) asymptotic (Q 2 m 2 c ) coefficients [140,144]. This is referred to as O(α 3 s ) approx in Table 4. The HERAPDF2.0 fit [4] also obtains a good description of the data, cf. Fig. 8. This is the only set which has fitted also to the HERA inclusive cross section data of Ref. [4]. On the other hand, the SACOT [160] GM-VFNS at NLO used by CJ15 [1] does not describe the data too well, although we should note that the HERA charm data were not included in the CJ15 fit itself.
The remarkable fact in Table 4 and Fig. 9 is, however, that the GM-VFNS SACOT(χ ) [161] of CT14 [3] and RT optimal [163] of MMHT14 [6] have difficulties in describing the DIS charm production data. Note that MMHT14 models the heavy-quark Wilson coefficient functions at O(α 3 s ) for low Q 2 as described in [163] using known leading threshold logarithms [168] and ln(1/x) terms [148], which have been shown not to be leading. This is indicated as O(α 2 s ) in Table 4. Note that CT14 has applied a universal cut of Q 2 ≥ 4 GeV 2 on all DIS data, excluding the bin at Q 2 = 2.5 GeV 2 in the HERA data [165] from the fit (cf. the upper left plot in Fig. 9). We have checked that including the low Q 2 bin leads to a dramatic deterioration of the fit quality.
In addition, the schemes SACOT(χ ) and RT optimal as well as FONLL-C [162] of NNPDF3.0 [7] do not improve the fit quality when comparing NLO and NNLO fits. We note in this context that those fits do not include the exact asymptotic [105,140,144,146] and approximate [147] O(α 3 s ) results for the heavy-quark Wilson coefficients in their theory predictions. The averaged set PDF4LHC15 [8], shown in Fig. 10, mixes PDFs derived with different mass schemes (ACOT, FONNL and RT) and does not describe the data very well for virtualities up to Q 2 20 GeV 2 . Table 4 Values of the charm-quark mass and renormalization scheme used in the PDF fits together with a summary of schemes chosen for the description of the charm-quark structure function F c 2 and the theoretical accuracy for the massive quark DIS Wilson coefficients. The values of χ 2 /NDP for the DIS charm production cross section data [165] and HERA inclusive cross section data [4] are given in two columns with the account of PDF uncertainties (with unc., where CT14 PDF errors scaled from 90 % c.l. to 68 % c.l., i.e., reduced by a factor 1.645) and for the central prediction of each PDF set (nominal). In xFitter [166,167], the values of electroweak parameters like the Fermi constant and Wboson mass are taken from Ref. [55]. The values for CT14 and for PDF4LHC with the SACOT(χ) scheme have been determined with a cut on Q 2 ≥ 5 GeV 2 on the HERA data [165] PDF sets m c (GeV) m c renorm. scheme

Fig. 7
Comparison of HERA data for the DIS pair-production of charm quarks [165] to the QCD predictions at NNLO in the FFNS using ABM12 [2] and JR14 [5] PDFs with a running charm-quark mass

Charm-quark mass
Dedicated studies of the charm-quark mass dependence have been performed by several groups. In the MS scheme, the value of m c (m c ) = 1.24 + 0.04 − 0.08 GeV has been obtained in [169] together with χ 2 /NDP=61/52 for the description of the HERA data [165] as a variant of the ABM11 fit [64]. Other groups, which keep a fixed value of m c in the analyses, cf. Table 4, have studied the effects of varying m c in predefined ranges. This has been done, for example, in the older NNPDF2.1 [170] and MSTW analyses [171] as well as for the MMHT PDFs [172]. The latter yields a pole mass of m  Table 4 as the best fit. In this context, it is worth to point out that the running mass m c (μ) in the MS scheme is free from renormalon ambiguities and can therefore be determined with high precision. The PDG [55] quotes m c (m c ) = 1.275 ± 0.025 GeV based on the averaging different mass determination in various kinematics. DIS charm-quark production analyzed in the FFNS (n f = 3) leads to m c (m c ) = 1.24 ± 0.03 + 0.03 − 0.03 GeV at NNLO [169], while measurements of the MS mass in e + e − annihilation give, for instance, m c (m c ) = 1.279 ± 0.013 GeV [175] and m c (m c ) = 1.288 ± 0.020 GeV [176]. The determination from quarkonium 1S energy levels yields m c (m c ) = 1.246 ± 0.023 GeV [177]. All these values are consistent with each other within the uncertainties.
In contrast, the accuracy of the pole mass m The low values for the pole mass of the charm quark assumed or obtained in some PDF fits as shown in Table 4 are thus not compatible with other determinations and with the world average. The rigorous determination of the charmquark mass discussed, for instance, in [169] provides a more controlled way of determining m c from the world DIS data, taking also into account its correlation with α s (M Z ).

Up-and down-quark distributions
The total quark contribution to nucleon matrix elements is known fairly well due to constraints from the available DIS data obtained in the fixed-target and collider experiments in the x-range 10 −4 x 0.8. However, a thorough disentangling of the quark flavor structure is still a challenging task in any PDF analysis. At moderate and large x values this has been routinely achieved by using a combination of the DIS data obtained on proton and deuteron targets. However, uncertainties in the modeling of nuclear corrections in the deuteron introduce a controllable source of theoretical uncertainty on the d-quark PDF obtained in this way, especially at large x, as discussed below.
An alternative way to resolve the u-and d-quark contributions is to use data on W -and Z -boson production obtained in pp and pp collisions at the LHC and Tevatron, respectively. Those experiments probe the W and Z rapidity distributions up to rapidities of y = 3 − 4, depending on details of the experiments, with an integrated luminosity of O(1) fb −1 achieved in each run. Such data samples are quite competitive in accuracy with the ones obtained in fixed-target DIS experiments, and provide simultaneously constraints on the quark and anti-quark PDFs at large and small x. Furthermore, the d-quark PDF extracted from a combination of the existing data on DIS off protons and W/Z -boson production in pp( pp) collisions is not sensitive to nuclear corrections. Moreover, if DIS data with small hadronic invariant masses W 2 are not used in the analyses in order to reduce the sensitivity to higher twist contributions, the statistical potential of the deuteron data is reduced and they become less competitive as compared to the collider data, cf. Fig. 11. As mentioned above, one can further constrain the uand d-quark flavor separated distributions by utilizing fixedtarget deuteron DIS data. However, nuclear effects need to be accounted for in cross sections and structure functions in order to access the underlying PDFs. The theoretical uncertainty inherent in this nuclear correction procedure should be added to the statistical PDF uncertainties. Nonetheless, the reduction of the uncertainties due to the increased number of fitted data points is even greater, leading to an overall smaller d-quark PDF uncertainty than in fits performed without deuterium data [30,191,192]. Furthermore, as shown in [1], and discussed in more detail below, it is possible to significantly reduce the nuclear correction uncertainty by exploiting the interplay of the deuteron DIS data and the recent high-statistics DØ data on the reconstructed W ± boson charge asymmetry at large rapidity, which is equally sensitive to the d/u ratio but is not affected by nuclear corrections.
The W/Z -boson collider data also provide a valuable constraint on the small-x quark PDFs. In particular, the charge asymmetry of leptons originating from the W decays is sen-sitive to the SU(2) flavor asymmetry of the non-strange sea, also referred to as the "isospin" asymmetry I (x) = [d(x) −ū(x)] at small x. This asymmetry is constrained by the DY data from fixed-targets with protons and deuterons collected by the Fermilab experiment E866 [193]. However, the E866 data are not sensitive to the value of I (x) at small x (x 0.2). Therefore, I (x) is sometimes parametrized in a Regge-like form as I (x) ∼ x 0.5 such that it vanishes at x = 0 (cf. the MMHT results in Fig. 12).
The large-rapidity tail of the W/Z -boson production data allows for a model-independent check of I (x) at small x. The asymmetry preferred by the combination of the currently available LHC and Tevatron data turns out to be negative at x < 0.01, while the Regge-like limit with a vanishing I (x) can still be recovered at x 10 −5 , cf. the ABMP15 results in Fig. 12. The CT14 analysis only includes the Tevatron forward DY data, but also confirms the negative trend in I (x) at small x, with errors in I (x) being substantially larger than those from ABMP15.
Finally, an important issue is the theoretical accuracy which is employed in the description of the DY data. There are significant differences as shown in Table 5 and these cause  Fig. 7 with QCD predictions at NLO using the PDF4LHC_100 PDF set and various VFNS: FONLL-B [162], RT optimal [163] and SACOT-χ [161] an additional spread in the fit quality and the results for the PDFs when comparing different NNLO PDF sets.

Strange-quark distribution
The main information on the strange sea distribution comes from charm-quark production in neutrino-induced chargedcurrent DIS experiments. The publication of data from CHO-RUS and NOMAD has recently enlarged the statistics available for those experiments. As a net result, the uncertainty in the strange PDF is now reduced down to a few percent at x ∼ 0.1 (cf. Fig. 13). However, at small x the strange sea distribution is still poorly known in view of the restricted kinematics of the production of charm quarks from fixed targets. Furthermore, since neutrino DIS experiments usually involve nuclear targets, care needs to be taken when extracting free-nucleon PDFs from the nuclear cross sections. Nuclear effects in neutrino DIS and possible differences between those in charged-lepton DIS have been discussed recently in Refs. [192,194,195], for instance. Supplementary information on the strange sea at small x, independent of nuclear effects, can be obtained from the associated production of charm quarks and W bosons in the pp collisions at the LHC. A constraint from collider data on W + c is potentially less sensitive to the c-quark fragmentation model compared to the one from semi-leptonic decays of charm, which plays major role in the existing fixed-target DIS experiments. The W + c data collected by ATLAS and CMS prefer a somewhat enhanced strange sea as compared to the fixed-target determination, cf. Fig. 13. However, the NNLO QCD corrections to this process are still unknown. They are not taken into account in the analysis of W + c data so far and may have a substantial influence on the fit. The strange sea extracted by ATLAS from an analysis of the combined inclusive data on the W -and Z -boson production is even further enhanced, which suggests a restoration of SU(3) flavor symmetry in the sea distributions. However, the accuracy of this determination is poor due to a limited potential of the inclusive data in disentangling the quark flavors. Therefore, the ATLAS result is in fact comparable with other determinations within uncertainties. In general, the existing experimental constraints on the strange PDF are relatively poor. Therefore, the results of various determinations demonstrate a significant spread, which is mainly driven by the data selection. An additional spread between results of earlier PDF analyses appears due to imple- mentation issues. In particular, the strong strange-sea suppression observed in the NNPDF2.1 analysis [170] was related to an error in the DIS charm-quark production cross section being off by a factor of two for low scales due to an additional factor of (1 + m 2 c /Q 2 ) in Eq. (34) of Ref. [170]. This is now correct in NNPDF3.0 [7]. The CT10 analysis [66], which reported an enhanced strange sea, may be flawed due to a wrong sign of the photon-Z interference for massive quarks in the structure function x F 3 [3]. This error also concerns the earlier results on the strange-anti-strange asymmetry [196] and has now been corrected in CT14 [3]. Finally, the MSTW [197] analysis suffered from an error in the NLO QCD correction for the charged-current DIS charmquark production as it had omitted a part of the gluon Wilson coefficient at NLO, which was corrected in MMHT14 [6].

Nuclear corrections
Many global PDF analyses make use of data with deuterium targets, such as lepton-deuteron DIS and proton-deuteron DY, as a way of obtaining stronger constraints on the flavor dependence of PDFs that are not possible with proton data alone. The use of deuterium data requires that one takes into account differences between PDFs in the deuteron and those in the free nucleon, which arise from effects such as nuclear Fermi motion and binding of the nucleons in the nucleus, as well as nucleon off-shell corrections and nuclear shadowing. While some analyses assume that nuclear corrections in the deuteron are negligible, a number of recent global PDF studies have incorporated nuclear effects into their analyses [1,5,64,191,[198][199][200].
Generally, the nuclear effects become increasingly important at large values of x (x 0.4), as Fig. 14 illustrates for the ratio of the deuteron to isoscalar nucleon structure functions. In this region the nuclear PDFs can be computed through convolutions of the bound nucleon PDFs and nuclear smearing functions describing the momentum distributions of nucleons in the deuteron. The latter can be expressed in terms of deuteron wave functions, calculated from modern potentials based on high-precision fits to nucleon-nucleon scattering data. These potentials differ primarily in their treatment of the short range N N interaction, and the different strengths of the high-momentum tails of the wave functions translate directly to the magnitude of the nuclear corrections at large x [199,201].
The nucleon off-shell corrections, on the other hand, are somewhat more model dependent, and several model studies have been performed to estimate their effect on nuclear PDFs [202][203][204][205][206]. Some earlier PDF analyses [191,199,200] used specific physics-motivated models for the off-shell corrections, while more recent approaches have fitted the offshell parameters directly to data [1,205]. Other analyses [6,207] have attempted to parametrize the entire nuclear correction in terms of a universal, Q 2 -independent function, without appealing to physical constraints. In this approach, to account for the effects of Fermi smearing a functional form must be used that produces the steep rise in the F d 2 /F N 2 ratio at high x, such as with a logarithm raised to a high power [207].
The effects of the nuclear corrections are most directly visible in the extraction of the d-quark PDF at large x, see [1]. Figure 14 shows that omitting nuclear smearing effects in the deuteron leads to an overestimated d/u ratio at x 0.6. In fact, omitting nuclear corrections induces a strong tension between the SLAC deuteron DIS data (see, e.g., [208]) and the recent high precision W -boson asymmetry data from the DØ collaboration at the Tevatron [26], which are sensitive to the d-quark PDF in a similar largex range as the SLAC data, but are not affected by nuclear corrections. It also causes an artificial deformation of the d-quark distribution, leading to essentially uncontrolled systematic errors when quark distributions are needed beyond the x range constrained by the data. This illustrates not only the theoretical but also the phenomenological need for such corrections when considering data at large x. Of  [184] based on the combination of the data by NuTeV/CCFR [185], CHORUS [186] and NOMAD [187] (shaded area), and CHORUS [186], CMS [188] and ATLAS [189] (dashed lines), compared with the results obtained by the CMS analysis [21] (hatched area) and by the ATLAS epW Z-fit [189,190] at different values of x (full circles). All quantities refer to the factorization scale μ 2 = 1.9 GeV 2 course, one can choose to avoid nuclear effects altogether by using only proton data; however, doing so increases the uncertainty on the d-quark PDF at both small and large values of x, as Fig. 14 illustrates. Additional details concerning the role of nuclear corrections when using deuterium target data in global fits can be found in Ref. [1]. The extrapolation of nuclear effects from the deuteron to heavy nuclei is unclear, especially in view of the differences between the off-shell quark deformation fitted using deuteron targets [1] and using the ratio of heavy nuclei to deuteron structure functions [205]. As mentioned in the previous section, in general care should also be exercised when using neutrino-nucleus scattering data to obtain, for example, constraints on strange-quark PDFs, due to the currently poor understanding of the interaction dynamics of the final state heavy quark propagating through the target nucleus [192].

Software and tools
Data used in the PDF fits cover a wide range of kinematics and stem from a large number of different scattering processes. In order to achieve an accurate theoretical description of both the PDF evolution and the hard scattering cross sections, well-tested software and tools are necessary. Benchmark numbers for the PDF evolution have long been established, see e.g., the Les Houches report [209], and open-source evolution codes such as QCDNUM [210,211] and Hoppet [212] are available in Bjorken x-space and QCD-Pegasus [123] in Mellin N -space. This is an important development as it allows to expose the software used in the PDF fits to systematic validation, the need of which can be illustrated with recent theory improvements published by various groups. For example, MSTW [197] has tested its NNLO evolution code against QCD-Pegasus [123] and corrected the implementation of one of the heavy-quark OMEs.
For the hard scattering cross sections of the various processes, fast fitting methods like fastNLO [213,214] and APPLGrid [181] have been developed. In addition, some groups have also published open-source code for the theory predictions of all physical cross sections employed in their analyses. The ABM11 and ABM12 fits [2,64] use OPENQCDRAD [215] code, which is publicly available. The HERAPDF2.0 fit [4] relies on the QCD fit platform xFitter (formerly known as HERAFitter) [166,167], which is an open-source package that provides a framework for the determination of PDFs and enables the choice of theoretical options for obtaining PDF-dependent cross section predictions. In particular, xFitter allows for a choice of different available schemes for treatment of heavy quarks in DIS. In Mellin N -space, an efficient method exists [216,217] which improves on that by [218] and which has been widely used in analyses, e.g. [217]. However, no code has been made publicly available so far.
Given the increasing precision of PDF analyses, which is driven by the accuracy of the experimental data, there is ongoing demand to provide theoretical predictions that are as precise as possible. This has stimulated recent checks of the analysis software used by various groups and has resulted in a number of documented improvements. The list includes, for example, the corrections to the different parts of the DIS cross section calculations in the NNPDF2.1, MSTW and CT10 PDF analyses as mentioned in the discussion of the PDFs for strange sea above.
This illustrates that there is a continued need for benchmarking the hard scattering cross sections of relevance for PDF determinations in order to consolidate the accuracy of theory predictions for those observables. In this respect, open-source software may facilitate future theory improvements and may help to establish standards for precision theory predictions.

Strong coupling constant
The value of the strong coupling constant α s (M Z ) has a direct impact on the size of a number of cross sections at the LHC, such as Higgs boson production, see Sect. 5, and is therefore an important parameter. Due to QCD factorization, α s exhibits a significant correlation with the gluon PDF and also with the charm-quark mass, as documented in the published correlation matrices, see for instance [64]. Therefore, the strong coupling constant has come to require particular attention in the context of global PDF analyses.
Current precision determinations of α s (M Z ) require NNLO accuracy in QCD because of the small uncertainties in the experimental data analyzed and the significantly reduced dependence from the variation of the renormalization scale indicating the uncertainty due to the truncation of the perturbative series. Extractions of α s at NLO typically yield α s (M Z ) 0.118, however, the NLO scale uncertainty is large, giving sizable variations α s (M Z ) = 0.005 for μ r ∈ [Q/2, 2Q] in DIS analyses. Determinations of α s to NNLO accuracy benefit from a significantly reduced renormalization scale dependence, but generally result in smaller central values for α s (M Z ), with shifts downwards from NLO to NNLO of a few percent in DIS analyses. Beyond NNLO, the perturbative expansion converges, as illustrated in DIS in a valence analysis [58] at N 3 LO which yields α s (M Z ) = 0.1141 + 0.0020 − 0.0022 , in agreement with the NNLO values listed in Table 7.
Of course, measurements of α s (M Z ) are not limited to global fits of PDFs, but stem from a large number of different processes and methods at different scales, see, e.g., [219][220][221] for discussions and comparisons. Here we restrict ourselves to issues of α s arising in PDF fits. In Table 6 we give an overview of the α s values currently used in the PDF analyses. There, two aspects are important. Firstly, some PDF analyses leave α s as a free parameter in their fits, which obviously allows one to control its correlation with other PDF parameters and avoids potential biases. Secondly   While the potential agreement or disagreement with the PDG average is beyond the scope of this study, it is instructive to focus on α s (M Z ) measurements from PDF analyses as listed in Table 7 which have been performed since the NNLO QCD corrections in DIS first became available. This series of measurements has led to α s (M Z ) values which are not only mostly lower than the PDG average, but also exhibit a large spread in the range α s (M Z ) = 0.1120 − 0.1175. This spread is significant given the small size of the experimental uncertainties in the data. As it turns out, the differences in the values of α s (M Z ) can be traced back to different data sets used or to different theory assumptions applied, as indicated in Table 7.
For instance, the inclusion of data for the hadro-production of jets, e.g., from the LHC, does have an impact on the value of α s (M Z ) and can therefore provide valuable constraints. However, it is important to note that the perturbative QCD corrections to the hard scattering cross section are only known completely to NLO, while the exact NNLO result for the gg channel [15] and approximations based on soft gluon enhancement [237][238][239][240] indicate corrections as large as 15-20 %. Those corrections and their magnitude depend, of course, on the details of the kinematics, the choice of the scale and on the jet parameters (e.g., jet radius R). For  [236] high p T they are dominated by threshold logarithms ln( p T ) accompanied by logarithms ln(R) for small jet radii [240]. The α s (M Z ) values in PDF analyses currently determined with the help of jet data (cf. Table 7) are, strictly speaking, valid to NLO accuracy only and therefore subject to signifi- Table 8 The jet data sets and the theory approximations used in the NNLO PDF fits. The threshold corrections of Ref. [237] neglect the dependence on the jet radius R. Ref. [238] has determined the regime of validity ("safety cuts") of the threshold approximation of Ref. [240] by comparing to the exact NNLO result for the gg channel [15] PDF set Theory accuracy Data sets used CT14 [3] NLO with the scale set to the individual p T of jet Tevatron + LHC MMHT14 [172] NLO+O(α 4 s ) approx threshold corrections [237] Tevatron NNPDF3.0 [7] NLO+O(α 4 s ) approx threshold corrections [238] Tevatron + LHC ("safety cuts") cantly larger theory uncertainties due to the variation of the renormalization scale. The various groups employ different approaches in their NNLO analyses to cope with this inconsistency, such as using dynamical scales or applying some variant of threshold corrections, as detailed in Table 8. As a result of these efforts, the gluon PDF and α s obtained, for example, in the MMHT14 and NNPDF3.0 analyses are in a good agreement. Different modeling of important theory aspects, such as whether or not to include target mass corrections, higher twist contributions and nuclear corrections in the description of DIS data, or whether or not to use a VFNS in the description of DIS heavy-quark data, can account for the range of α s (M Z ) values in Table 7. With largely similar model assumptions, NNPDF2.1 [234,235], MSTW [231] and MMHT [236] obtained the range α s (M Z ) = 0.1171 − 0.1174. All these choices can lead to systematic shifts of the value of α s (M Z ). Let us briefly mention some of the issues in detail.
Higher twist contributions do have a big impact, because these terms are fitted within a combined analysis. Alternatively, the part of the DIS data significantly affected by these terms has to be removed by suitable kinematical cuts on the scale Q 2 and center-of-mass energies W 2 . In a variant of the ABM11 analysis [64], higher twist terms have been omitted and the cuts W 2 > 12.5 GeV 2 and Q 2 > 2.5 GeV 2 as used by MSTW [231] have been applied. This resulted in a sizable shift upwards to α s (M 2 Z ) = 0.1191 ± 0.0016 in line with earlier studies in [241]. Yet more conservative cuts of W 2 > 12.5 GeV 2 and Q 2 > 10 GeV 2 in the ABM11 variant with higher twist terms set to zero led to α s (M 2 Z ) = 0.1134 ± 0.0008, well in agreement with the nominal value in the ABM11 analysis, cf. Table 7. Thus, in PDF analyses without account of higher twist contributions to DIS data such tight cuts are essential. In this regard we disagree with Refs. [67,232,243] which claim higher twist effects to be negligible in the framework of MSTW [197] and NNPDF2.3 [250]. We also note that NNPDF3.0 [7] uses a cut of Q 2 > 3.5 GeV 2 which is too low to remove the higher twist contributions.
Higher order constraints from fixed-target DIS data can also lead to shifts in α s (M Z ) [59]. For instance, NMC has measured the DIS differential cross sections and extracted the DIS structure functions F NMC 2 [242]. At the time of the NMC analysis, however, the relevant DIS corrections to O(α 3 s ) [57] were not available (see discussion after Eq. (5) above). This information is, however, important and has to be taken into account now. In case of fitting F NMC 2 and not describing F L (x, Q 2 ) at NNLO, much larger values of α s (M 2 Z ) are obtained [67]. It is therefore strongly recommended to fit the published differential scattering cross sections using F L (x, Q 2 ) at O(α 3 s ). Presently, the MMHT [236] analysis uses F L (x, Q 2 ) only at NLO. One should note, however, that the values of F L (x, Q 2 ) at NNLO are significantly different in the small-x region (see [67]).
Finally, great care needs to be exercised when DIS data off nuclei are included in global fits, see Sect. 3. Details of modeling of nuclear corrections can in fact also cause systematic shifts in the value of α s (M Z ). Therefore, Table 7 indicates if scattering data on heavy nuclei have been included in the determination. For example, MMHT [236] has reported a comparatively high value of α s (M Z ) as a consequence of fitting the NuTeV νFe DIS data [185]. In general, determinations of α s (M Z ) should be based upon, or at least crosschecked with, fits using proton and deuteron DIS data only.

Higgs boson production
The dominant production mechanism for the SM Higgs boson at the LHC is the gluon-gluon fusion process. The large size of the QCD radiative corrections to the inclusive cross section at NLO, see, e.g. Ref. [244], together with the sizable scale uncertainty have motivated systematic theory improvements. In the effective theory based on the limit of a large top-quark mass (m t → ∞, integrating out the topquark loop, but using the full m t dependence in the Born cross section), this has led to the computation of the corresponding corrections at NNLO [245][246][247] and even to N 3 LO in QCD [106,248]. This shows an apparent, if slow, convergence of the perturbative expansion, along with greatly reduced sensitivity to the choice for the renormalization and factorization scales μ r and μ f . At N 3 LO the total scale variation amounts to 3 % and estimates of the four-loop corrections support these findings [249].
This leaves, as the largest remaining source of uncertainties in the predictions of the physical cross section, the input for the strong coupling constant α s and the PDFs. Despite the impressive progress in theory and experiment, the situation resembles that after the completion of the NLO QCD corrections, when it was pointed out in Ref. [244] that one of the main residual uncertainties in the predictions was due to the gluon PDF.
In Table 9 we summarize the PDF dependence of the inclusive cross section σ (H ) NNLO  118. This is done to illustrate the fact that in some PDFs the value of α s (M Z ) is not obtained from a fit to data (including faithful uncertainties) but fixed beforehand, e.g., to the world average [55]. Often the same fixed value of α s (M Z ) is chosen at NLO and at NNLO independent of the order of perturbation theory, see also Sect. 4. Table 9 shows a large spread for predictions from different PDFs with a range σ (H ) NNLO = 38.0 − 42.6 pb using the nominal value of α s (M Z ). Specifically, the PDF and α s differences between different sets are up to 11 % and are significantly larger than the residual scale uncertainty due to N 3 LO QCD corrections. In addition, the cross sections shift in the range σ (H ) NNLO = 39.0 − 44.7 pb if a fixed value of α s (M Z ) in the range α s (M Z ) = 0.115 − 0.118 is used. This amounts to a relative difference of more than 13 % and contradicts the most recent estimates of the combined PDF and α s uncertainties in the inclusive cross section [106], which quote 3.2 %.
In general, the findings underpin the importance of controlling the accuracy and the correlation of the strong coupling constant with the PDF parameters in fits.
Of particular interest is the impact of additional parameters in the PDF fits, such as the charm-quark mass, on the Higgs cross section. The differences in the treatment of heavy quarks and the consequences for the quality of the description of charm-quark DIS data have already been discussed in Sect. 3. ABM12 [2] fits the value of m c (m c ) in the MS scheme and the uncertainties in the charm-quark mass are included in the uncertainties quoted in Table 9. Other groups keep a fixed value of the charm-quark mass in the onshell scheme, cf. Table 4, and vary the value of m pole c within some range. Such studies have been performed in the past by NNPDF2.1 [170] and MSTW [171] and more recently by MMHT [172].
In Tables 10, 11 and 12 we display the results of these fits together with the values of χ 2 /NDP for the DIS charmquark data [165], mostly computed with xFitter [166, 167], as well as the corresponding cross section for Higgs boson production to NNLO accuracy. The MSTW analysis in Table 10 Table 11 the Table 10 The values of the charm-quark mass (on-shell scheme m pole ) and the strong coupling α s (M Z ) in the MSTW analysis [171] using the set MSTW2008nnlo_mcrange together with the value for χ 2 /NDP for the HERA data [165] Table 11 Same as Table 10 for the MMHT14 analysis [172] using the set MMHT2014nnlo_mcrange_nf5 and setting α s (M Z ) to the best fit value. The numbers of Ref. [182] keep full account of the correlation between the PDFs and α s . The values of χ 2 /NDP for the HERA data [165] are those quoted in [172] for   NNPDF has performed a study of the m c dependence in [170], which shows the same trend as for MSTW and MMHT, i.e., the smaller the chosen value of m pole c , the bet-ter the goodness-of-fit for the HERA data [165]. In addition, Table 12 Tables 10, 11 and 12 for the charm-quark data from HERA [165], are not compatible with the world average of the PDG [55]. Thus, in some PDF fits, the numerical value of the charm-quark mass effectively takes over the role of a "tuning" parameter for the Higgs cross section. Note that the three analyses are based on partly different data sets, theory and methodology.

Top-quark hadro-production: inclusive cross section
The cross section for the hadro-production of top-quark pairs has been measured with unprecedented accuracy at the LHC in Run 1 with √ s = 7 TeV and 8 TeV. The inclusive cross section is known to NNLO in QCD [251][252][253][254], featuring good convergence of the perturbation series and reduced sensitivity to the renormalization and factorization scales μ r and μ f . These theory predictions adopt the on-shell renormalization scheme for the heavy-quark mass. The conversion to the MS scheme for the heavy-quark mass has been discussed in Refs. [255][256][257]. For observables such as the inclusive cross section which are dominated by hard scales μ r μ f m t , the theory predictions in terms of the MS mass for the top quark show an even better scale stability and perturbative convergence.
In a similar study as for Higgs boson production in Table 9 we illustrate in Table 13 the PDF dependence of the inclusive cross section σ (tt) NNLO for various sets with uncertainties σ (PDF+α s ). The computation is performed in the theoretical framework as implemented in the HATHOR code [256]. In Table 13 Table 13 display a spread in a range σ (tt) NNLO = 715 − 834 pb using the nominal value of α s (M Z ) for each PDF set, which amounts to a relative range of more than 15 %. This decreases to about 6 %, if the values of α s (M Z ) are fixed to 0.115 or 0.118.
The theoretical predictions at leading order depend parametrically on the strong coupling constant and the top-quark mass to second power, as well as on the convolution of the gluon PDFs, σ (tt) LO ∝ (α 2 s /m 2 t ) (g ⊗ g). Therefore, it is necessary to fully account for the correlations between the top-quark mass, the gluon PDF and the strong coupling when comparing to experimental data. A number of analyses have considered tt hadro-production data. ABM12 [2] has included data for top-quark pair-production in a variant of the fit to determine the MS mass m t (m t ), keeping the full correlation with α s (M Z ) and the gluon PDF. On the other hand, CMS has determined the top-quark pole mass as well as the strong coupling constant in a fit which kept all other parameters mutually fixed [258], while Ref. [259] has explored constraints on the gluon PDF from tt hadroproduction data using fixed values for α s (M Z ) and the pole mass m pole t . In the global analyses by MMHT14 [6] and NNPDF3.0 [7] those data were also used to fit α s (M Z ) and the gluon PDF. These analyses employ a fixed value for the pole mass m pole t , which is motivated by precisely measured top-quark masses from kinematic reconstructions, i.e., Monte Carlo masses, but does not account for the above mentioned correlation with α s (M Z ) and the gluon PDF. Moreover, the Monte Carlo mass requires additional calibration [260].
For the inclusive top-quark cross section we explore in Tables 14 and 15 the implicit dependence of the cross section on the charm-quark mass m c used in the GM-VFNS of the PDF fits and list the corresponding values of χ 2 /NDP for the DIS charm-quark data [165]. This is analogous to the study for the Higgs cross section in Tables 11 and 12. For MMHT [172] the best fit with m pole c = 1.25 GeV and α s (M Z ) = 0.1167 leads to an inclusive cross section of σ (tt) NNLO = 814 pb, which is 2 % lower than the value obtained for the nominal MMHT fit, cf. Table 13. Likewise, the changes in the NNPDF fits from v2.1 [170] and v2.3 [250] to v3.0 [7] are documented in Table 15. The effects amount to almost 2 % when comparing σ (tt) NNLO Tables 14 and  15 there is a correlation showing decreasing cross sections with decreasing values of m pole c , although less pronounced than in the case of the Higgs production cross section. The potential bias in the prediction of the inclusive top-quark pair production cross section due to a particular "tuning" of the value of the charm-quark mass for some PDFs is, however, of the same order of magnitude or larger than the quoted PDF uncertainties. Therefore, this needs to be accounted for as an additional modeling uncertainty.

Top-quark hadro-production: differential distributions
The differential cross section of the top-quark pair production is also known to NNLO in QCD [261]. Publicly avail- able codes such as Difftop [262] provide differential distributions to approximate NNLO accuracy based on softgluon threshold resummation results. We use Difftop to calculate the distribution in the top-quark rapidity y t for proton-proton collisions at √ s = 13 TeV at NNLO approx accuracy using the ABM12, CT14, MMHT14, NNPDF3.0, and the PDF4LHC15 PDF sets at NNLO with their respective α s values. Here, we take the top-quark pole mass to be m pole t = 172.5 GeV, following the preferences in the LHC analyses. The renormalization and factorization scales are set to m pole t and the choice of a dynamical scale does not change the following discussions. By using differential cross sections, not only the sensitivity of top-quark pair production to the PDFs can be estimated, but also possible effects on the experimental acceptance by changing the PDF choice. In the experimental analysis, the PDF dependent acceptance corrections arise mostly from the PDF dependent normalization of the production cross section and originate from the phase space regions uncovered by the detector. Usually, the acceptances are determined by using Monte Carlo simulations as a ratio of the number of reconstructed events in the fiducial volume of the detector (visible phase space) to the number of events generated in the full phase space. In the case of top-quark pair production, the visible (full) phase space would correspond to the top-quark rapidity range of |y t | < 2.5 (|y t | < 3). Here, an acceptance estimator and a related extrapolation factor are calculated by using Difftop predictions for the respective cross section ratios σ vis /σ tot and σ unmeasured /σ tot . Such estimators are not expected to describe the true experimental efficiency, but are helpful for drawing conclusions about PDF related effects.
The predictions of the top-quark rapidity and the acceptance estimates obtained by using Difftop with different PDFs are shown in Fig. 15. The largest difference in the global normalization of the predicted cross sections is observed if the ABM12 PDFs are used instead of the CT14, NNPDF3.0 or MMHT14 sets. The origin of this effect is again the smaller nominal value of α s in ABM12 in combination with a smaller gluon PDF in the x range relevant to top-quark pair production at √ s = 13 TeV. The corresponding acceptance estimators and their uncertainties, obtained from the error propagation of the corresponding PDF uncertainties at 68 % c.l., however, demonstrate significant differences also in the expected acceptance corrections, obtained by using ABM12 alternative to other PDFs.
The recent PDF4LHC recommendation [8] for calculation of the acceptance corrections for precision observables, such as the top-quark pair-production cross section in the LHC Run 2 data taking period, is to use the set PDF4LHC15_100, which is obtained by averaging the CT14, MMHT14 and NNPDF3.0 PDFs. While the central prediction obtained by using PDF4LHC15 is indeed very close to those obtained with the CT14, MMHT14 or NNPDF3.0 PDFs, the error on the corresponding acceptance estimator somewhat underestimates the acceptance spread of the individual PDFs with their uncertainties. Furthermore, it does not cover the difference in the acceptances to the one using the ABM12 PDF. Therefore, for the conservative estimate of the acceptance correction and its uncertainty, as demanded in the measurement of SM precision observables, the use of the PDF4LHC15_100 set would lead to a significant underestimation of the uncertainty on the resulting cross section measurement. A further conclusion from Fig. 15 is that in the case of top-quark pair production, once calculational speed is needed, it seems to be sufficient to consider a reduced choice of PDF sets. For instance, instead of using the averaged set PDF4LHC15_100 one can take just one of the three PDFs, CT14, MMHT14 or NNPDF3.0. Alternative PDF choices can then always be studied to some approximation with a reweighting method. In spite of the valiant effort in Ref. [8] to provide a uniform solution, the PDF choice for measurements of precision observables must be decided on a case-by-case basis for each particular process.

Bottom-quark hadro-production
Bottom-quark production in proton-proton collisions at the LHC is also dominated by the gluon-gluon fusion process. Therefore, the LHCb measurements of B-meson production in the forward region [263] with rapidities 2.0 < y < 4.5 at √ s = 7 TeV probe the gluon distributions simultaneously at small x up to x ∼ 2 × 10 −5 and at large x 1. The small-x region is not accessible with HERA DIS data, for example. The potential improvements of PDFs near the edges of the currently covered kinematical region, namely, at small x and low scales, was first illustrated in [264,265] using differential LHCb data on hadro-production of cc and bb pairs.
In the present comparison in Table 16, the normalized cross sections, (dσ/dy)/(dσ/dy 0 ), for bottom-quark production are calculated from the absolute measurements published by LHCb, with dσ/dy 0 being the cross section in the central bin, 3 < y 0 < 3.5, of the measured rapidity range in each p T bin [264]. In the absence of NNLO QCD corrections, the theoretical predictions are obtained at NLO in QCD [266][267][268] using a fixed number of flavors, n f = 3, for the hard scattering cross sections. Since data for the hadroproduction of heavy quarks other than top have not been considered for publicly available PDF fits thus far, issues of any model dependence such as in [158] due to the use of GM-VFNS cannot be quantified. In the calculation of the normalized cross sections, the theoretical uncertainty is strongly reduced, since variations of the renormalization and factorization scales as well as of the fragmentation parameters do not significantly affect the shape of the y distributions for heavy-flavor production, while this shape remains sensitive to PDFs. The values for χ 2 /NDP given in Table 16 are computed with the QCD fit platform xFitter for the individual PDF sets obtained at NLO, namely, ABM11 [64], CJ15 [1], CT14 [3], HERAPDF2.0 [4], JR14 [5], MMHT14 [6], NNPDF3.0 [7], as well as the averaged set PDF4LHC15 [8]. All PDFs provide a good description of the data, despite the fact that none of the groups use any data sensitive to the gluons at very low x, in the region directly probed by the LHCb B-meson measurement. Remarkably, one finds that χ 2 /NDP < 1 for the vast majority of the groups (left column in Table 16), suggesting that the derived PDF uncertainties at the edges of the so far measured regions might be inflated.

Charm-quark hadro-production
Charm-quark hadro-production offers another possibility to illustrate the consistency of the theory predictions for the various PDF sets. The exclusive production of charmed mesons in the forward region at LHCb probes the gluon distribution down to small-x values of x ∼ 5 × 10 −6 at √ s = 7 TeV, and data can be confronted with QCD predictions at NLO accuracy, see, e.g., [269,270].
For the inclusive cross section of the reaction pp → cc the QCD predictions are known up to NNLO in the MS scheme for the charm-quark mass and display good convergence of the perturbative expansion and stability under variation of the renormalization and factorization scales [269]. In Figs. 16 √ s to available experimental data. These data span a large range in √ s, which starts with fixed target experiments at energies up to √ s = 50 GeV summarized in [271] and HERA-B data [272] (purple points in Figs. 16, 17). At higher energies RHIC data from PHENIX and STAR [273,274] [275], ATLAS [276] and LHCb [277], and at the highest available energy √ s = 13 TeV from LHCb [278] (blue points in Figs. 16, 17). The total cross sections of LHCb have been obtained from charmed hadron production measurements in a limited phase space region [277,278] using extrapolations based on NLO QCD predictions matched with parton shower Monte Carlo generators.
The theory predictions for the PDF sets ABM12, CJ15, CT14 and JR14 at NLO and NNLO are shown in Fig. 16, together with the respective PDF uncertainties. For all these PDF sets the perturbative expansion is stable, the theory computations agree well with the data and predictions, e.g., for a future collider with √ s 100 TeV, yield positive cross sections. The PDF uncertainties obtained for CT14, however, do increase significantly above energies of √ s 1 TeV. The same information for the sets HERAPDF2.0, MMHT14, NNPDF3.0 and PDF4LHC15 is displayed in Fig. 17. These predictions all agree with data at low energies but start to behave very differently for HERA-PDF2.0, MMHT14 or NNPDF3.0 at energies above √ s O(10) TeV and for PDF4LHC15 above √ s O(100) TeV. At the same time, the associated PDF uncertainties in this region of phase space become very large, thereby limiting the predictive power. Typically, the PDF uncertainties of the NNLO sets are even larger than at NLO. In the case of MMHT14 the consistency of the NNLO predictions with LHC data from ALICE [275], ATLAS [276] and LHCb [277,278]  This results in an instability of the perturbative expansion of the σ pp→cc cross section at high energies when the contribution from the quark-gluon channel dominates. The reason for a negative gluon PDF in the NNLO set of PDF4LHC15 (being some average of the CT14, MMHT14 and NNPDF3.0 sets) is unclear. In contrast, other PDFs shown in Fig. 16 demonstrate stability of the perturbative expansion through NNLO up to very high energies and good consistency of the predictions with the experimental data.

W /Z production
Cross sections sensitive to large-x parton distributions typically fall rapidly with increasing x values, leading to limitations in the quantity and precision of experimental data and the kinematic range over which they can be obtained. Consequently, the precision to which one can constrain largex PDFs decreases with x, and systematic uncertainties due to extrapolations into unmeasured regions of x (or those excluded by cuts) increase. Similarly, the theoretical uncertainties due to various approximations in the treatment of nuclear corrections for deuterium data, or target mass and higher twist effects, also become larger.
To illustrate this, consider the production of a heavy W boson as a function of the W rapidity y W [279]. Assum-ing Standard Model couplings, the parton luminosity for a produced negatively charged W − boson is given by where G F is the Fermi constant and θ C the Cabibbo angle. The uncertainty δL W − in the luminosity is shown in Fig. 18 for various PDF sets as a function of y W , for several fixed values of the boson mass from the SM W up to M W = 7 TeV. Note that as the rapidity or mass of the produced boson increases, so does the momentum fraction x 1,2 = (M W / √ s) e ±y W of one or both partons, in which case the luminosity behaves as L − ∼ū(x 2 )d(x 1 ). Except for the highest M W values, the PDF uncertainty typically remains small up to large values of y W , corresponding to x 1 ≈ 0.65, beyond which it rises dramatically for all M W . This is precisely the region where data constraining the d-quark PDF are scarce, and theoretical assumptions play an important role [1]. This is particularly pronounced for fits that exclude DIS data at low invariant masses, such as the three fits included in the PDF4LHC combination [8]. For large W masses, theū PDF is evaluated at x 2 ∼ 0.2 − 0.5, where data are either nonexistent or have large errors, giving rise to the increased uncertainties in some of the PDF sets at y W ∼ 0. The relative uncertainties in the luminosities in Fig. 18 have been scaled to a common 68 % c.l., as in the tables in the previous sections. One observes a very large range of uncertainties for the various PDF sets, which stems from different tolerance criteria used and different methodologies employed for the treatment of data at high values of x. The smallest uncertainty is obtained for the CJ15 PDF set, which makes use of low invariant mass data to constrain the high-x region, and does not employ additional tolerance factors inflating the uncertainties. The MMHT and CT14 PDF sets have larger errors, due to stronger cuts on low-mass DIS data and larger tolerances, and consequently the averaged PDF4LHC15 set gives similarly large uncertainties.
This example illustrates the problematic nature of statistically combining PDF sets that have been determined using very different theoretical treatments of the high-x region, leading to an overestimate of the uncertainties at these kinematics. Using the PDF4LHC15 set as the sole basis for background estimates, for example, one could potentially miss signals of new physics in regions such as at high rapidity y W . A more meaningful PDF uncertainty would be obtained when combining PDF sets obtained under similar conditions and inputs; if large differences are found, these should be investigated further rather than simply averaged over. This is also illustrated in Fig. 19, where the central values for the W − luminosity for several PDF sets are compared relative to the luminosity computed from the central PDF4LHC15 distributions. The different theoretical assumptions utilized in the fits produce systematic differences in the large-x PDFs, which give rise to ratios of central values that are of the same order as the overall PDF4LHC15 68 % c.l. The fact that the uncertainty bands of the individual sets overlap with that of the PDF4LHC15 set is not, however, an indication that the latter is a good estimate of the PDF uncertainties in this extrapolation region. Rather, the PDF4LHC15 band effectively represents a statistical envelope of the systematic theoretical differences between the sets included in the combination. A comparison with the luminosity computed using the CJ15 PDF set, which is not included in the PDF4LHC15 combination, is instructive in this respect. The two main theoretical assumptions affecting the W − luminosity are the nuclear corrections in deuterium (applied or fitted in the CJ15 and MMHT14 analyses, as well as in JR14 and ABM12), and the parametrization of the d-quark PDF.
For the latter, the traditional choice has been to assume a behavior ∝ (1 − x) β as x → 1 for both the d-and u-quark PDFs (as, e.g., in the MMHT14 and NNPDF3.0 analyses), in which case the d/u ratio either vanishes or becomes infinite in the x → 1 limit depending on whether the exponent β is larger for d or u. Alternatively, including an additive term in the d-quark PDF proportional to u(x) (as in CJ15) or constraining β u = β d (as in CT14) allows the d/u ratio to reach a finite, nonzero limiting value at x → 1. Furthermore, the CJ15 distributions were also fitted to low invariant mass (3.5 GeV 2 < W 2 < 12.5 GeV 2 ) DIS data, which were excluded by kinematic cuts in the MMHT14, CT14 and NNPDF3.0 analyses. Consequently, the following features can be observed in Fig. 19: • The MMHT14 curve follows CJ15 closely until y W ≈ 1(x ≈ 0.65), after which the d-quark PDF turns upwards relative to CJ15, in the region not constrained by the large-x and low-W 2 SLAC data utilized in CJ15. • The CT14 curve is lower than CJ15 at y W 0.6 (x 0.45), and higher at larger y W , because of the neglect of nuclear corrections. At y W > 1 the d-quark PDF is essentially unconstrained since neither the low-W 2 SLAC data nor the reconstructed Tevatron W -boson production data are included in the fit.
• The NNPDF3.0 fit, which excludes low-W 2 DIS data and does not utilize nuclear or hadronic corrections, consistently deviates from all others. It is, however, compatible with those within its own uncertainties, which at large x are about four times larger than that of the other fits.
In summary, in extreme kinematic regions, such as at large rapidity or for large-mass observables, caution must be exercised when utilizing PDF error bands and nominal confidence levels provided by the various PDF groups for precision calculations and statistically meaningful comparisons to data. Utilizing the PDF4LHC15 band at face value likely overestimates the current uncertainty on large-x PDFs, and could lead to signals of new physics being missed. Calculations performed with the combination set should always be crosschecked with as many individual PDF sets as possible, taking into account the amount and kind of data included in each fit, as well as the different theoretical inputs. The latter explore different physics issues and can vary considerably from one PDF set to another. When differences arise, further scrutiny of the PDF fit results themselves may be needed before drawing any definitive conclusions.

Recommendations for PDF usage
Recommendations for the usage of PDFs generally aim in providing guidance for estimates of the magnitude and the uncertainties of cross sections in a reliable but also efficient way. First recommendations have been provided by the PDF4LHC Working Group in the Interim Recommendations [280]. There, the MSTW [197] PDF was used as a central set for predictions at NNLO in QCD and the procedure for calculation of the PDF uncertainties, based on an envelope of several PDF sets, was proposed. This approach has been criticized for being impractical. The 2015 PDF4LHC recommendations [8] have evolved from related discussions and aim in improving the efficiency of cross section computations by averaging several PDFs along with their respective uncertainties. Here, we briefly recall these suggestions and put them into context of the findings of the previous sections. We comment on several shortcomings of the recommendations [8] and propose an alternative for the PDF usage at the LHC. For the case (i), the recommendation is to use the individual PDF sets ABM12 [2], CJ12 [191], CT14 [3], JR14 [5], HERAPDF2.0 [4], MMHT14 [6], and NNPDF3.0 [7]. It is not clear, why the full account of the PDF dependence should be limited to SM processes only. Deviations observed in the theory predictions obtained with the various PDFs can often be traced back to the differences in the underlying theoretical assumptions and models in the PDF fits. With more LHC data available, tests of the compatibility of those data sets in the individual PDF fits will become more stringent. Studies to quantify the constraining power of processes like hadro-production of tt pairs, jets or W ± and Z bosons become possible at high precision.
For the case (ii), it is recommended to employ the PDF4LHC15 sets [8], which represent the combination of the CT14 [3], MMHT14 [6], and NNPDF3.0 [7]. The combination is performed using the Monte Carlo approach at different levels of precision, leading to the recommended sets PDF4LHC15_30 and PDF4LHC15_100. The restriction to CT14, MMHT14 and NNPDF3.0 implies a bias both for the central value and for the PDF uncertainties of BSM cross section predictions. For example, a bias is introduced by fixing the central value of α s (M Z ) to an agreed common value, currently chosen to be α s (M Z ) = 0.118 at both NLO and NNLO. This choice is in contradiction with the precision determinations of α s (M Z ) at different orders in perturbation theory, as summarized in Sect. 4. Further, for searches at the highest energies, the PDFs are probed close to the hadronic threshold near x 1, where nuclear corrections and other hadronic effects, considered for instance in the CJ15 [1] and JR14 [5] analyses, are important.
For the case (iii), the PDF4LHC15_30 sets are recommended to use. We would like to note, that here the balance between the computational speed and the precision of the result (in e.g. MC simulation) has to be determined by the analysers. The problem rises from the large deviations between data and theory predictions at low scales and also at the edges of the kinematical ranges of data currently used in PDF fits as illustrated in Sects. 3 and 5. The average of various GM-VFNS for heavy quark production, such as ACOT [159], FONLL [162] and RT [163], leaves a large degree of arbitrariness in the theory predictions, cf. Fig. 10. Note that the PDF4LHC15_30 sets were updated in December 2015 [281] to account for an extension of their validity range below the original Q > 8 GeV as only discussed in the later publication [282].
For the case (iv), the set PDF4LHC15_100 is recommended. Recalling that this case concerns measurements of the precision observables, it is unclear why PDFs should be treated differently than in the case (i). The differences between individual PDF sets propagate the cross section measurements directly through the acceptance corrections or extrapolation factors, as illustrated in Figs. 15, 17 and 19. Use of the PDF4LHC15_100 is worrysome, since these differences are smeared out in the combination, which, in addition, is limited to only three PDF sets. The SM parameters, determined using the precision observables obtained in this way, may be biased.
In summary, the recent PDF4LHC recommendations [8] cannot be viewed as definitive in the case of precision theory predictions, as the advocated averaging procedure introduces bias, artificially inflates the uncertainties, and makes it difficult to quantify potential discrepancies between the individual PDF sets.
6.2 New recommendations for the PDF usage at the LHC Based on the considerations above, we propose modifications to the recommendations for PDF usage at the LHC in order to retain the predictive capability of the individual PDF sets. Two cases can be distinguished: 1. Precise theory predictions, addressing a class of predictions, within or beyond the SM, which encompasses any type of cross section prediction including radiative corrections of any kind, whether at fixed-order or via resummation to some logarithmic accuracy. This class also includes the MC simulations used for the calculation of the acceptance corrections for precision observables, e.g. cross sections which might be used further for determination of SM parameters.
• Recommendation: Use the individual recent PDF sets, currently ABM12 [2], CJ15 [1], CT14 [3], JR14 [5], HERAPDF2.0 [4], MMHT14 [6], and NNPDF3.0 [7] (or as many as possible), together with the respective uncertainties for the chosen PDF set, the strong coupling α s (M Z ) and the heavy quark masses m c , m b and m t . Once a PDF set is updated, the most recent version should be used. • Rationale: Precise theory predictions as needed for any comparisons between theory and data for processes in the SM or beyond (such as hadro-production of jets, W ± -or Z -boson production, either singly or in pairs, heavy-quark hadro-production, or generally the production of new massive particles at the TeV scale) often depend on details of the PDF fits and the underlying theory assumptions and schemes used. Differences in the theory predictions based on the individual sets can give an indication of residual systematic uncertainties or shed light on drawbacks and need for potential improvements in the physics models used in the extraction of those PDFs. This applies in particular to measurements used for the determination of SM parameters such as the strong coupling α s (M Z ), heavy quark masses m c , m b and m t or the W -boson mass, because these parameters are directly correlated to the PDFs used in their extraction from the experimental observables.
2. Theory predictions for feasibility studies, the complementary class containing all other cross section predictions where high precision is not required, such as those based on Born approximations and/or order of magnitude estimates, or in cases where precision may be sacrificed in favor of computational speed. Here, also studies of novel accelerators and detectors are addressed.
• Recommendation: Use any of the recent PDF sets (listed in LHAPDFv6 or later versions). • Rationale: Often in phenomenological applications for the modern and future facilities one is interested in a quick order of magnitude estimate for the particular cross sections. These are directly proportional to the parton luminosity and to the value of α s (M Z ).
In these cases, one may be willing to sacrifice precision in favor of computational speed. Here, the usage of the sets PDF4LHC15_30 and PDF4LHC15_100 may provide an efficient estimate of PDF uncertainties, although care must be taken in their interpretation depending on the observable and covered kinematic range. Restricting the recommendation to PDFs listed in the LHAPDF(v6) [283] interface excludes parton luminosities with lesser precision in the interpolation of the underlying grids (e.g., in LHAPDF(v5) [284]) or "partonometers" [285] with outdated calibration.
In the Monte Carlo generators, for example, MadGraph5_aMC@NLO [286], POWHEG-BOX (v2) [287,288] and SHERPA (v2) [289,290], or other recently developed generators, like Geneva [291], different PDF sets can be efficiently studied with reweighting methods. This allows to generate weighted events for a given setup, and to reweight a-posteriori each event in a fast and efficient way, by generating new weights associated with different choices of renor-malization and factorization scales and/or PDFs. Please note, that at present, PDF reweighting is performed by assuming the linear PDF weight dependence, which is not correct, since PDFs are also present in the Sudakov form-factor. Efforts to extend the reweighting to the entire Sudakov form-factor and to the full parton shower are ongoing. The reweighting technique turns out to be particularly useful to compute in a fast (although at the moment approximate) way PDF uncertainties affecting the predictions.

Conclusion
In this report we have reviewed recent developments in the determination of PDFs in global QCD analyses. Thanks to high precision experimental measurements and continuous theoretical improvements, the parton content of the proton is generally well constrained and PDFs, along with the strong coupling constant α s (M Z ) and the heavy-quark masses m c , m b and m t , have been determined with good accuracy, at least at NNLO in QCD. This forms the foundation for precise cross section predictions at the LHC in Run 2.
We have briefly discussed the available data used in PDF extractions and the kinematic range covered, and emphasized the importance of selecting mutually consistent sets of data in PDF fits in order to achieve acceptable χ 2 values for the goodness-of-fit estimate. The main thrust of the study has been the computation of benchmark cross sections for a variety of processes at hadron colliders, including Higgs boson production in gluon-gluon fusion. We have illustrated how different choices for the theoretical description of the hard scattering process and choices of parameters have an impact on the predicted cross sections, and lead to systematic shifts that are often significantly larger than the associated PDF and α s (M Z ) uncertainties. A particular example has been the treatment of heavy quarks in DIS, where the quality of the various scheme choices has been quantified in terms of χ 2 /NDP values when comparing predicted cross sections to data. We have also pointed out the inconsistently low values for the pole mass of the charm quark used in some fits, and have stressed the correlation of the strong coupling constant α s (M Z ) with the PDF parameters. Ideally, α s (M Z ) should be determined simultaneously with the PDFs, and we have summarized here the state of the art in the context of PDF analyses.
Our findings expose a number of shortcomings in the recent PDF4LHC recommendations [8]. We have shown that these do not provide sufficient control over some theoretical uncertainties, and may therefore be problematic for precision predictions in Run 2 of the LHC. Instead, we suggest new recommendations for the usage of PDFs based on a theoretically consistent procedure necessary to meet the precision requirements of the LHC era.