Study of Monte Carlo approach to experimental uncertainty propagation with MSTW 2008 PDFs

We investigate the Monte Carlo approach to propagation of experimental uncertainties within the context of the established"MSTW 2008"global analysis of parton distribution functions (PDFs) of the proton at next-to-leading order in the strong coupling. We show that the Monte Carlo approach using replicas of the original data gives PDF uncertainties in good agreement with the usual Hessian approach using the standard Delta(chi^2) = 1 criterion, then we explore potential parameterisation bias by increasing the number of free parameters, concluding that any parameterisation bias is likely to be small, with the exception of the valence-quark distributions at low momentum fractions x. We motivate the need for a larger tolerance, Delta(chi^2)>1, by making fits to restricted data sets and idealised consistent or inconsistent pseudodata. Instead of using data replicas, we alternatively produce PDF sets randomly distributed according to the covariance matrix of fit parameters including appropriate tolerance values, then we demonstrate a simpler method to produce an arbitrary number of random predictions on-the-fly from the existing eigenvector PDF sets. Finally, as a simple example application, we use Bayesian reweighting to study the effect of recent LHC data on the lepton charge asymmetry from W boson decays.


Introduction
The parton distribution functions (PDFs) of the proton are best determined from global analysis of a wide variety of deep-inelastic scattering (DIS) and related hard-scattering data taken from both fixed-target experiments and colliders (HERA, the Tevatron, and most recently the LHC). Propagation of the experimental errors on the fitted data points to the uncertainties on the PDFs is a non-trivial task. The traditional Hessian method requires effective error inflation by a tolerance parameter to accommodate minor inconsistencies between the fitted data sets. This means that the PDF uncertainties cannot be considered to be statistically rigorous, despite the rôle of PDF uncertainties as an important (and sometimes dominant) source of theoretical uncertainty on predicted quantities, such as the cross sections for Drell-Yan processes or Higgs boson production at the Tevatron and LHC [1,2]. Moreover, the number of fitted parameters for error propagation in the Hessian method must be kept sufficiently small to avoid large correlations, often requiring several parameters to be held fixed and thereby introducing a potential parameterisation bias. Some insight into these problems may be gained using Monte Carlo techniques [3,4], recently used in conjunction with a neural-network parameterisation by the NNPDF Collaboration ( [5], and references therein), where a large number N rep ∼ O(10-1000) of fits are performed, each to a sample of replica pseudodata generated by shifting the original data points by random amounts dependent on the data errors. Then the PDF uncertainties can be calculated by simply taking the standard deviation of the resulting N rep PDF sets.
In this paper we make a first study of the Monte Carlo approach to experimental error propagation within the context of the established "MSTW 2008" PDF determination [6]. We retain the usual functional-form parameterisation and least-squares χ 2 -minimisation (using the Levenberg-Marquardt algorithm) rather than moving to the neural-network parameterisation and genetic-algorithm χ 2 -minimisation of the NNPDF approach [5]. We focus on the most widely-used PDF determination at next-to-leading order (NLO) in the strong coupling α S , although the results would be expected to be similar at leading-order (LO) and at next-to-next-to-leading order (NNLO). Moreover, to avoid complications associated with simultaneously fitting α S with the PDFs, throughout this paper we keep the value of α S (M 2 Z ) held fixed at the MSTW 2008 NLO best-fit value. First in section 2 we describe the Monte Carlo approach using data replicas and compare results to the usual Hessian method, then in section 3 we explore potential parameterisation bias by increasing the number of free parameters. We then motivate the need for a tolerance parameter by fitting restricted data sets in section 4 and by fitting idealised pseudodata in section 5. In section 6 we explain how to produce PDF sets randomly distributed in the space of parameters rather than in the space of data, which allows the inclusion of a suitable tolerance. As an example application of these random PDFs, in section 7 we demonstrate the use of Bayesian reweighting to study the effect of recent LHC data on the W → ℓν charge asymmetry [7,8]. Finally, we conclude in section 8.

Recap of the Hessian method
The basic procedure for propagating experimental uncertainties in global PDF analyses using the Hessian method is discussed in detail in refs. [6,[9][10][11]. Here, we briefly review the salient points. We assume that the global goodness-of-fit quantity, χ 2 global , is quadratic about the global minimum, which has n best-fit parameters {a 0 1 , . . . , a 0 n }. In this case we can write H ij (a i − a 0 i )(a j − a 0 j ), (2.1) where the Hessian matrix H has components It is convenient to diagonalise the covariance (inverse Hessian) matrix, C ≡ H −1 , also known as the error matrix, and work in terms of the eigenvectors and eigenvalues. Since the covariance matrix is symmetric it has a set of orthonormal eigenvectors v k defined by where λ k is the kth eigenvalue and v ik is the ith component of the kth orthonormal eigenvector (k = 1, . . . , n). The parameter displacements from the global minimum can be expanded in a basis of rescaled eigenvectors e ik ≡ √ λ k v ik , that is, e ik z k . (2.4) Then it can be shown, using the orthonormality of v k , that eq. (2.1) reduces to that is, n k=1 z 2 k ≤ T 2 is the interior of a hypersphere of radius T . Pairs of eigenvector PDF sets S ± k can then be produced to span this hypersphere, with parameters given by a i (S ± k ) = a 0 i ± t e ik .
(2. 6) In the quadratic approximation, t = T ≡ (∆χ 2 global ) 1/2 , but particularly for the larger eigenvalues λ k there are significant deviations from the ideal quadratic behaviour, so in general t is adjusted iteratively to give the desired value of T . Then asymmetric PDF uncertainties on a quantity F , which may be an individual PDF at particular values of x and Q 2 , or a derived quantity such as a cross section, can be calculated with the following "master equations": where S 0 is the central PDF set. Symmetric PDF uncertainties can be calculated with (2.9) Ideally, with the standard "parameter-fitting" criterion [12], we would expect the errors to be given by the choice of tolerance T = 1 for the 68% (one-sigma) confidence-level (C.L.) limit or T = 1.64 for the 90% C.L. limit [13]. This criterion is appropriate if fitting consistent data sets with ideal Gaussian errors to a well-defined theory. However, in practice, there are some inconsistencies between the independent fitted data sets, and unknown experimental and theoretical uncertainties, so the parameter-fitting criterion is not appropriate for global PDF analyses. Historically, the CTEQ [10] and MRST [11] groups defined 90% C.L. uncertainties using T = √ 100 and T = √ 50, respectively. Instead, the "MSTW 2008" analysis [6] introduced a new "dynamic" determination of the tolerance, chosen separately for each eigenvector direction according to a "hypothesis-testing" criterion [12] to maintain an adequate description of each individual data set in the global fit. Therefore, the distance t in eq. (2.6) was replaced by t ± k , adjusted to give the desired T ± k , with an average value of t ± k ≈ T ± k ≈ 3 for 68% C.L. uncertainties, and t ± k ≈ T ± k ≈ 6 for 90% C.L. uncertainties; see figure 10 of ref. [6] for the individual T ± k values in the MSTW 2008 NLO fit.

Generation of Monte Carlo replica sets
We generate replica data sets with the central values shifted according to m,k σ corr. m,k,i · 1 + R N m σ N m . (2.10) Here, "m" labels a particular data set, or a combination of data sets, with a common (fitted) normalisation N m , "i" labels the individual data points in that data set, and "k" labels the individual correlated systematic errors for a particular data set. The individual data points D m,i have uncorrelated (statistical and systematic) errors σ uncorr.
m,i and correlated systematic errors σ corr. m,k,i . Treating the correlated errors as uncorrelated leads to the alternative form used for most of the data sets in the MSTW 2008 fit: We shift the data points in a way to be as consistent as possible with the χ 2 definition used in the MSTW 2008 fit [6]. The random numbers R uncorr.
m,i or R corr. m,k are obtained from a Gaussian distribution of mean zero and variance one. A complication arises with the treatment of normalisation uncertainties in the MSTW 2008 analysis, where a quartic penalty term was used in the χ 2 definition instead of the usual quadratic penalty term, cf. eqs. (35) and (37) of ref. [6]. This modification was made to discourage large normalisation shifts in the fit. It was partly motivated by claims (see section 6.7.4 on "Normalizations", pg. 170 in ref. [14]) that, for many experiments, quoted normalisation uncertainties represent the limits of a box-shaped distribution rather than the standard deviation of a Gaussian distribution; see further discussion in section 5.2.1 of ref. [6]. The quartic χ 2 penalty term is small if the fitted normalisation N m ∈ [1 − σ N m , 1 + σ N m ], then it rises rapidly outside this range, with the effect that the normalisation uncertainty is perhaps closer to being described by a box-shaped distribution than by a Gaussian distribution (which would correspond to a quadratic χ 2 penalty term). Therefore, by default we take R N m in eqs. (2.10) and (2.11) to be uniformly distributed in the interval (−1, 1), so that the normalisation N m is uniformly distributed in the interval (1 − σ N m , 1 + σ N m ). However, we have also looked at the effect of obtaining R N m from a Gaussian distribution or alternatively simply fixing R N m = 0, i.e. the case of fixed data set normalisations. As expected, fixing normalisations in the data replicas generally gives slightly smaller PDF uncertainties, while assuming normalisation uncertainties to be Gaussian gives larger PDF uncertainties, particularly for the up-valence distribution. However, it is perhaps inconsistent to assume Gaussian uncertainties in the replica generation with a quartic penalty term in the χ 2 : changing to a quadratic penalty term would allow more freedom in the fitted normalisations and so the PDF parameters would move less, likely reducing the PDF uncertainty compared to the case of a quartic penalty term. The default treatment of uniform R N m ∈ (−1, 1) is probably reasonable and is closer to the treatment of normalisation uncertainties in the χ 2 definition than a Gaussian R N m . The Hessian error propagation via eigenvector PDF sets includes theoretical uncertainties on the hadronisation corrections for the CDF jet data (treated as a correlated systematic) and the small modification for the nuclear corrections (r 1 , r 2 , r 3 ) [6]. It is currently not obvious how to treat these theoretical uncertainties in the replica generation, so the effect on PDF uncertainties will be assumed to be small.
We perform a separate PDF fit to each replica data set, then we can take the average F and standard deviation ∆F of an observable F calculated with each PDF replica set, S k (k = 1, . . . , N rep ), that is, 14) The number of replicas N rep is arbitrary, but in all cases we choose to generate N rep = 40 replica PDF sets, where this number is chosen to be equal to the number of eigenvector PDF sets mostly for practical reasons, i.e. to demonstrate that the implementation of the Monte Carlo (MC) method does not necessarily require more computer resources than the Hessian method. It could easily be increased in further studies, but first indications are that N rep = 40 is sufficiently large to avoid significant fluctuations. To allow a fair comparison with the existing Hessian eigenvector PDF sets, we take n = 20 free PDF parameters, i.e. 8 PDF parameters are held fixed at their global best-fit values, and we do not apply a tolerance, i.e. we use the Hessian eigenvector PDF sets corresponding to T = 1 (see section 6.6 of ref. [6]). In figure 1 we show the input gluon distribution and strange asymmetry for the N rep = 40 MC replica PDF sets (thin dotted curves), and their average and standard deviation (thick dashed curves), and we compare to the best-fit and Hessian uncertainty (solid curves and shaded region). We find good agreement of the Hessian and MC results at all x and Q 2 values, and for all parton flavours, as will be demonstrated more explicitly in the next section. Similar comparisons between Hessian and MC results were performed in a fit only to the H1 data from HERA I on neutral-and charged-current e ± p cross sections [15], but it is still reassuring that we find a similar good agreement in the context of a more complicated global fit. On the other hand, in section 6.6 of ref. [6] we also performed a fit to a reduced data set consisting of a limited number of inclusive DIS data sets (BCDMS, NMC, H1, ZEUS) with fairly conservative cuts of Q 2 ≥ 9 GeV 2 and W 2 ≥ 15 GeV 2 , where eigenvector PDF sets were produced with n = 16 free PDF parameters for both a dynamic tolerance and with T = 1. We find that there are some differences between the MC results with n = 16 free PDF parameters and the Hessian results with T = 1. The approximate equivalence between the Hessian and MC methods may break down, therefore, when fitting a limited selection of discrepant data sets that are insufficient to unambiguously constrain all fitted parameters.

Investigation of potential parameterisation bias
Recall the MSTW 2008 NLO PDF parameterisation at the input scale Q 2 0 = 1 GeV 2 [6]: The parameters A u , A d , A g and x 0 were fixed by enforcing number-and momentum-sum rule constraints, while the other parameters were allowed to go free to determine the overall best fit. The 20 highlighted (red) parameters were those allowed to go free when producing the eigenvector PDF sets, where the other 8 (blue) parameters were held fixed, as for the MC results in the previous section. However, this is not in fact necessary in the MC approach where it is only needed to find the best fit for each replica data set, and the Hessian matrix is not used for error propagation. Therefore, we can perform MC replica fits with all 28 free parameters to examine the effect on PDF uncertainties of the greater freedom in parameterisation, and to explore the extent that the Hessian uncertainties are limited by the restricted parameterisation.
Recall [6] that the reason to freeze several parameters before applying the Hessian method was to reduce the large correlations between some parameters, which would lead to severe breaking of the quadratic behaviour of ∆χ 2 , meaning that linear error propagation would not be applicable. (A similar procedure was used in the CTEQ global fits; see, for example, section 5 of ref. [16].) We observed some departure from the ideal quadratic behaviour of ∆χ 2 even with only 20 parameters; see figures 5 and 6 of ref. [6]. However, even with all 28 parameters free, the Hessian matrix is generally still positive-definite (has positive eigenvalues) and therefore we can still be relatively confident that the best fit is correctly determined. Note that we use the Levenberg-Marquardt algorithm for χ 2 -minimisation, which combines the advantages of the inverse-Hessian method and the steepest-descent method, and therefore simply finding the best fit is less reliant on accurate knowledge of the Hessian matrix compared to subsequent error propagation using the Hessian method.
In figure 2 we show percentage uncertainties at the input scale Q 2 0 = 1 GeV 2 , and in figure 3 we show percentage uncertainties after evolving to Q 2 = (100 GeV) 2 . We show only the uncertainties since the MC average is very close to the Hessian best-fit, with residual differences likely explained by statistical fluctuations. Again the MC uncertainties with n = 20 input PDF parameters are in good agreement with the Hessian uncertainties with ∆χ 2 = 1, and both are much smaller than the 68% C.L. uncertainties including the dynamic tolerance. We show the effect of moving to n = 28 input PDF parameters, which gives significantly larger u v and d v uncertainties mainly at low x values (removing some unusual shapes in the x dependence) and slightly larger gluon uncertainties around x ∼ 0.05 in figure 2(f) and around x ∼ 0.01 in figure 3(f), but in all cases the MC uncertainties are still much smaller than the Hessian uncertainties at 68% C.L. One can see from the equations above that in going from a total of 20 → 28 input PDF parameters, the number of free parameters for both xu v and xd v goes from 3 → 4, for xS (≡ 2xū + 2xd + xs + xs) goes from 3 → 5, and for xg goes from 4 → 7. While there is perhaps some degree of parameterisation bias in the valence-quark distributions, the insensitivity of the sea-quark and gluon distributions to the relatively large increase in the number of free parameters suggests that parameterisation bias is likely to be small in those cases. Of course, an exception is the strange-quark and -antiquark distributions which are certainly constrained by the choice of parameterisation outside the limited data region (0.01 x 0.2) of the CCFR/NuTeV dimuon cross sections. For example, the low-x behaviour of s ands is assumed to be the same as forū andd, as suggested by arguments based on both Regge theory and perturbative QCD (see discussion in section 6.5.5 of ref. [6]).
The study of potential parameterisation bias presented here is indicative rather than exhaustive. It could be followed up by a more involved study, for example, using Chebyshev polynomials along the lines of refs. [17,18]. However, switching to an extremely flexible parameterisation brings the danger of fitting the statistical fluctuations of the data unless some method is used to enforce smoothness. We note that the limiting power-law behaviour as x → 0 and x → 1 is well-motivated by Regge theory and counting rules, respectively, and it is difficult to perceive a sensible alternative. More discussion and justification for the MSTW 2008 input parameterisation was given in section 6.5 of ref. [6].

Fits to restricted data sets using data replicas
Although we see little evidence for significant parameterisation bias in the MSTW 2008 global fit, this might not be true for some "non-global" fits which tend to be constrained by parameterisation choices in the absence of relevant data. For example, the input parameterisation at Q 2 0 = 1.9 GeV 2 in the HERAPDF1.0 [19] or HERAPDF1.5 NLO [20] analyses takes the form: There are only 10 parameters used to obtain the central fit and "experimental" uncer- tainties, although the more recent HERAPDF1.5 NNLO [21] analysis introduces 4 more parameters (2 for g and 1 each for u v , d v ). The HERAPDF analyses additionally include "model" and "parameterisation" uncertainties that can be much larger than the "experimental" uncertainties. For example, quantities sensitive to the high-x gluon distribution have a very large "model" uncertainty in the HERAPDF1.5 NNLO analysis due to variation of the minimum Q 2 cut [22]. Nevertheless, it is interesting to investigate the potentially more realistic constraint arising only from HERA data with a flexible parameterisation; see also similar studies by the NNPDF Collaboration [23]. This would be difficult to achieve in the Hessian method where several parameters would need to be held fixed to use the covariance matrix for error propagation, but it is straightforward using the MC method. We fit subsets of the global data included in the MSTW 2008 NLO analysis [6], specifically (i) excluding all HERA data (neutral-current e ± p and charged-current e + p cross sections, F charm 2 , and inclusive jet production in DIS), (ii) including only HERA data, (iii) performing a "collider" fit meaning data from HERA and the Tevatron (inclusive jet production, the W → ℓν charge asymmetry, and the Z rapidity distribution) with no fixed-target data. The HERA-only fit uses the older separate H1 and ZEUS inclusive cross sections compared to the more precise combined HERA I data [19] used in the HERAPDF fits. On the other hand, the public HERAPDF fits [19][20][21] do not use data on F charm 2 or jet production. In all cases we use the MC method with n = 28 free parameters wherever possible. However, for the HERA-only and HERA+Tevatron fits, there is no constraint at all on the strange asymmetry since the CCFR/NuTeV dimuon cross sections are missing, so we fix s −s at the global best-fit value, leaving n = 26 free parameters. The percentage uncertainties on the PDFs at Q 2 = (100 GeV) 2 from the various fits are shown in figure 4. The results reflect what might naïvely be expected. For example, removing HERA data gives a huge increase in the small-x uncertainties for the sea-quarks and gluon, but the valence-quark uncertainties are almost unchanged. With only HERA data, the gluon and antiquarks are still well-constrained at small x, but not at large x, and there are huge uncertainties in the valence-and strange-quark distributions. Adding the Tevatron data helps, but the collider-only uncertainty is still much larger than in the global fit, so really we need data from HERA, the Tevatron and the fixed-target experiments to get a meaningful result. The corresponding ratios to the global fit are shown in figure 5. Here, we see that the uncertainty bands from fits to subsets of the global data do not always overlap with those from the global fit, implying some tension between the different data sets, and suggesting that some kind of error inflation (or tolerance) is necessary. A similar exercise was performed in the MSTW 2008 paper [6] to a "reduced" data set, with a slightly more constrained parameterisation, and we find similar results if fitting the same "reduced" data set using the MC method.

Fits to idealised consistent and inconsistent pseudodata
As a further exercise to examine potential data set inconsistency within the global fit, we generate idealised pseudodata from the best-fit theory predictions, i.e. we replace D m,i by T m,i on the right-hand side of eqs. (2.10) and (2.11), where T m,i are the theory predictions evaluated using the global best-fit parameters. The pseudodata are then simply given by the best-fit theory predictions with appropriate Gaussian noise added, and with uncertainties given by the genuine data uncertainties. We can then introduce deliberate inconsistencies into this idealised pseudodata and investigate the effect on the fitted PDFs. We choose the following deliberate inconsistencies, intended to simulate realistic, if somewhat large, incompatibilities that could potentially be present in the genuine data: • We introduce a Q 2 -dependent offset for the H1 and ZEUS inclusive neutral-current reduced cross sections, such that the pseudodata are multiplied by a factor of {1 ± 0.005 log[Q 2 /(10 GeV 2 )]}, with the "+" sign for H1 and the "−" sign for ZEUS.
• We generate the pseudodata for the CDF and DØ inclusive jet cross sections with a scale choice µ R = µ F = p T /2, but fit it with µ R = µ F = p T .
• We introduce a rapidity-dependent offset for the CDF Z rapidity distribution, such that the pseudodata are multiplied by a factor of (1 + 0.03 y Z ).
• We introduce an x-dependent offset for the BCDMS/NMC/SLAC/E665 deuteron structure functions, intended to mimic a possible deuteron correction, such that the F d 2 data are multiplied by a factor of • We introduce a Q 2 -dependent offset for the BCDMS F p 2 and F d 2 structure functions, such that the pseudodata are multiplied by a factor of {1 + 0.01 log[Q 2 /(1 GeV 2 )]}.
In figures 6 and 7 we show the effect of fitting the genuine data, then the consistent or inconsistent idealised pseudodata, in each case using MC error propagation with N rep = 40 replica data sets and n = 20 input PDF parameters, and we compare to the standard MSTW 2008 NLO fit with dynamic tolerance. Despite the central values of the PDFs from the inconsistent fit shifting by significant amounts, the percentage uncertainties in figure 7 are remarkably almost identical whether fitting either the genuine data, the consistent pseudodata or the inconsistent pseudodata. The MC fit to perfectly consistent pseudodata gives χ 2 global /N pts. = 0.98 ± 0.03, which by construction is exactly unity up to the statistical fluctuation, and similarly for the individual data sets included in the global fit; see table 1. On the other hand, the MC fit to the inconsistent pseudodata gives χ 2 global /N pts. = 1.07 ± 0.03, so the fit quality has only deteriorated slightly, despite the central values of some PDFs shifting well outside their original uncertainty band; see figure 6. This result is in contradiction to what seems to be a widely held view that a fit to inconsistent data should lead to a χ 2 /N pts. ≫ 1. The values of the χ 2 /N pts. in table 1 deviate further from unity for a few individual data sets such as BCDMS F d 2 , the NMC F d 2 /F p 2 ratio, NuTeV xF 3 and the CDF Z rapidity distribution, but not by such large amounts that the inconsistent fit would not be judged to be an "acceptable" fit. Despite the fairly significant Q 2 -dependent offset of the H1 and ZEUS inclusive cross sections, amounting to almost 4% at Q 2 = 500 GeV 2 , there is only a slight increase in the χ 2 values in going from the consistent to the inconsistent fit. Similarly, by looking at the MSTW08 fit to the genuine data in table 1, there are only a few individual data sets with values of χ 2 /N pts. significantly above unity, perhaps giving the false impression that there is not a large degree of incompatibility between individual data sets.
In figures 8 and 9 we show the result of another study using the same consistent or inconsistent idealised pseudodata. First we show the PDFs obtained from fitting only the collider (HERA+Tevatron) subset of the pseudodata, then we show the effect of adding the remaining fixed-target pseudodata. In the "theory" case in figure 8, the fixed-target pseudodata are perfectly consistent with the collider pseudodata (by construction), so the global fit gives PDFs consistent with the collider fit, but with much smaller uncertainties. This is not true for the "inconsistent" case in figure 9, where the global fit gives PDFs often lying outside the uncertainty band for the collider fit. The latter situation arises when fitting the genuine data in figure 5, implying that the real collider data are inconsistent with the real fixed-target data. Note that the peculiar behaviour at large x in figures 8(c,d) and 9(c,d) is due to the antiquark distributions going negative in the collider fit at high x where there is no data constraint. The conclusion of these studies is that defining experimental uncertainties via ∆χ 2 global = 1 is overly optimistic for global PDF analysis and that the more conservative "dynamic" tolerance [6] based on a "hypothesis-testing" criterion [12] is much more appropriate. 1 As a final example of a situation where we believe it would make sense to introduce a tolerance to account for a potential discrepancy between data sets, consider the recent ATLAS determination [25] of the ratio of the strange-to-down sea-quark distributions, r s (x, Q 2 ) ≡ 0.5(s +s)/d, from a fit to inclusive W ± and Z differential cross sections at the LHC, combined with inclusive DIS data from HERA. This ratio took the surprising values  Table 1. Values of χ 2 /N pts. for the data sets in various NLO global fits. The "MSTW08" column shows the best-fit numbers [6]. The pseudodata numbers in the other two columns are the average and standard deviation of the χ 2 /N pts. over N rep = 40 replica fits. See ref. [6] for data references. Rescaling the experimental PDF uncertainty of the ATLAS determination [25] by a tolerance of ≈ 3, corresponding to ∆χ 2 ≈ 9, would be enough to bring it into agreement with the MSTW08 result. The conclusion that the uncertainty on r s in the ATLAS determination [25] has been underestimated was also reached by the NNPDF Collaboration [26].

Random PDFs generated in space of fit parameters
Given that we have now established that we need an appropriate tolerance, the question arises of how to include this into the MC method. We can introduce a tolerance in the generation of the data replicas simply by rescaling all experimental errors in eqs. (2.10) and (2.11) by t ≈ T ≈ 3, corresponding to the average tolerance for 68% C.L. uncertainties. We find that this simple approach, using n = 20 input PDF parameters, reproduces the Hessian uncertainties with a dynamic tolerance surprisingly well for most parton flavours and kinematic regions. However, it is not possible to implement exactly the "dynamic" tolerance (different for each eigenvector direction) in the MC method, since no reference is being made to the eigenvectors of the covariance matrix. Instead of sampling the probability density by working in the space of data, we can produce random PDFs directly in the space of fit parameters. 2 In fact, this was done in the original work of Giele and Keller [3] using the covariance matrix of parameters from Alekhin's fit [27]. A convenient way to generate the random PDFs is to use the eigenvectors of the covariance matrix. Recall from eq. (2.4) that the parameter displacements from the global minimum can be expanded in a basis of the rescaled eigenvectors e ik ≡ √ λ k v ik , that is, with n = 20 the number of input PDF parameters. Usually the ±kth eigenvector PDF set is defined by taking z j = ±t ± j δ jk in eq. (6.1), that is, the usual eigenvector PDF sets are generated with input parameters: with t ± k adjusted to give the desired T ± k = (∆χ 2 global ) 1/2 . However, we can instead randomly sample the parameter space such that the kth random PDF set is generated with input parameters obtained by taking z j = ±t ± j |R jk | in eq. (6.1), that is, where R jk is a Gaussian-distributed random number of mean zero and variance one, and either +t + j or −t − j is chosen depending on the sign of R jk . There are therefore n = 20 random numbers R jk (j = 1, . . . , n) associated with the kth random PDF set (k = 1, . . . , N pdf ). The number of random PDF sets N pdf is arbitrary, but again we choose N pdf = 40 mostly for practical reasons. Each random PDF set has equal probability defined by the covariance matrix of fit parameters, and therefore statistical quantities such as the mean and standard deviation can easily be calculated using formulae such as eqs. (2.13) and (2.14) with the obvious replacement N rep → N pdf . A comparison of the average and standard deviation of N pdf = 40 PDFs constructed with eq. (6.3) to the best-fit and Hessian uncertainty is made in figure 10. There is generally good agreement, with some shift of the average compared to the best-fit that can be attributed mostly to asymmetric tolerance values (t + j = t − j ). We have verified this explanation by repeating the same studies without a tolerance (T ± j = 1). Alternative ad hoc treatments of the asymmetric tolerance values are possible. For example, if t + j > t − j proportionally more random PDF sets could be produced for a "−" sign than for a "+" sign in eq. (6.3) so that the average would be closer to the best-fit, or one could simply symmetrise with the replacement t ± j → (t + j + t − j )/2 in eq. (6.3). However, since the degree of asymmetry is generally small, we will not explore these possibilities in practice. As some measure of the amount of statistical fluctuation, we produce another N pdf = 40 PDFs constructed with eq. (6.3) using different random numbers R jk . The results are shown in figures 11 and 12 and we conclude that N pdf = 40 is enough to avoid significant fluctuations, although there is some moderate variation due to the limited statistics (for example, in d v at x ∼ 0.1).
In principle, there is some amount of non-linearity in going from the input PDF parameters a i to the input PDFs f (x, Q 2 0 ), then to the evolved PDFs f (x, Q 2 ) and to physical observables F calculated using these evolved PDFs (for example, hadronic cross sections with a quadratic PDF dependence). However, we find that, in practice, the apparent degree of non-linearity is small, an assumption that is inherent in the Hessian method for propagating uncertainties. Making this assumption of linearity, an alternative and simpler way to generate random PDFs is to work with the existing eigenvector PDF sets directly at the level of the quantity of interest F such as the evolved PDF or the hadronic cross section. Then we can build random values of F according to 3 . . , N pdf ), (6.4) where S + j or S − j is chosen depending on the sign of R jk . Note that for the case F = a i in eq. (6.4), then a i (S 0 ) ≡ a 0 i and inserting a i (S ± j ) from eq. (6.2) then we recover eq. (6.3). This construction of a random F (S k ) using eq. (6.4) can be done "on the fly" for an almost arbitrarily large value of N pdf , after the initial computation of F (S 0 ) and F (S ± j ) (j = 1, . . . , n) requiring only 2n + 1 (= 41 for the MSTW 2008 PDFs) evaluations of F . We choose N pdf = 1000 for the results shown in figures 11 and 12, although the results are similar with a much smaller value. Here we take "F " in eq. (6.4) to be the evolved PDF at Q = 100 GeV for the particular parton flavour shown in each plot, then we construct N pdf = 1000 values of F (S k ) and take the average and standard deviation, finding good agreement with the best-fit and Hessian uncertainty. Again, the slight shift of the average compared to the best-fit can be attributed mostly to asymmetric tolerance values, which we confirm by repeating the same exercise starting from eigenvector PDF sets generated with ∆χ 2 global = 1. As already mentioned, ad hoc modifications to the procedure could be adopted to better account for asymmetric tolerance values, but we choose not to explore these possibilities in this work given the relatively small size of the effect.  Figure 10. N pdf = 40 random sets generated with eq. (6.3) as a ratio to the best-fit PDF set. symmetrised version of eq. (6.4) could be obtained using . . , N pdf ), (6.5) analogous to the symmetric formula for PDF uncertainties given in eq. (2.9). We note that an unsuccessful attempt to generate random PDFs directly in the space of fit parameters was made in section 6.5 of ref. [30]. This attempt was flawed in that all random PDF sets were constructed with the unnecessary constraint of a fixed ∆χ 2 = 100,  Figure 11. Comparison of best-fit and Hessian uncertainty to the average and standard deviation of two sets of N pdf = 40 PDFs generated with different random parameters given by eq. (6.3) and one set of N pdf = 1000 random PDFs generated with eq. (6.4).
with the n parameters distributed on the surface of an n-dimensional hypersphere using the eigenvectors as basis vectors, leading to an envelope of the random PDF sets covering a much smaller range than the usual Hessian uncertainty. By contrast, if we generate random PDF sets according to eq. (6.3), then the ∆χ 2 , or equivalently t ± j , is only used to define the distance along a particular eigenvector direction. At a general point in parameter space, given by stepping along all eigenvector directions by a random amount, the ∆χ 2 is irrelevant and it can be very large. It is not necessary or desirable that each random PDF set should have ∆χ 2 below a certain value. A fixed ∆χ 2 will only be recovered in the specific (and very unlikely) case that |R jk | = δ jk , then eq. (6.3) reduces to eq. (6.2).
Another argument that a Monte Carlo approach in the space of fit parameters involves exploring a space too wide to be sampled efficiently with a small number of random PDFs was made in section 3.2.1 of ref. [31]. There it was argued that if the probability distribution for each parameter is given as a histogram with three bins, say the one-sigma region around the central value and the two outer regions, then naïvely one might expect the  Figure 13. Convergence of average and standard deviation of N pdf random predictions as a function of N pdf , each time adding one more random prediction to the N pdf − 1 previous random predictions, normalised to the best-fit prediction and compared to the Hessian uncertainty. need to randomly sample 3 n 3 × 10 9 PDF sets for n = 20 free parameters. However, the n parameters are certainly not independent, and the complete correlation information is provided by the covariance matrix obtained from the global fit. Working in the basis of eigenvectors then provides an optimally efficient way to sample the parameter space randomly along each eigenvector direction.
Nevertheless, it is instructive to perform a numerical exercise in order to explicitly demonstrate roughly how many random predictions are necessary to adequately sample the parameter space. We consider the 7 TeV LHC total cross sections for four typical processes corresponding to inclusive production of (a) Z 0 bosons, (b) W + relative to W − bosons, (c) top-pairs and (d) Standard Model Higgs bosons with M H = 120 GeV from gluongluon fusion. These four processes are chosen to sample a variety of parton flavours and momentum fractions x. We use the existing NLO calculations from ref. [1] with the MSTW 2008 NLO best-fit and Hessian eigenvector PDF sets at 68% C.L. For each of the four processes, we generate the minimal N pdf = 2 random predictions computed using eq. (6.5) for F = {σ Z 0 , σ W + /σ W − , σ tt , σ H } and calculate the average and standard deviation. Then the number of random predictions, N pdf , is incremented by one, and the average and standard deviation recomputed, until N pdf = 1000. The results are shown in figure 13 (a) normalised to the best-fit prediction and compared with the symmetric Hessian uncertainty of eq. (2.9). We use the symmetrised formulae of eqs. (2.9) and (6.5) to allow a direct comparison between the best-fit prediction and the average over the random predictions, without the complications arising from asymmetric tolerance values discussed elsewhere. We show a similar set of plots in figure 14 where each value of N pdf now corresponds to the average and standard deviation over N pdf independent random predictions. The results for adjacent N pdf values therefore indicate the size of the statistical fluctuations, which decrease going to larger N pdf values, but are still not completely negligible even for N pdf ∼ 1000. However, although there is little computational overhead in taking N pdf to be very large when the random predictions are generated "on the fly", one would not expect to see noticeable differences when N rep is much larger than around 1000. In fact, the statistical fluctuations are very small compared to the PDF uncertainty for N pdf 100 and even N pdf = 40 may be sufficiently accurate for many practical purposes.
7 Reweighting to describe the LHC W → ℓν charge asymmetry data Updating a PDF set with new data using a Bayesian reweighting method based on statistical inference was originally proposed by Giele and Keller [3] and later developed further by the NNPDF Collaboration [32,33]. Suppose we have a set of N pdf random PDFs {S k } with equal probability. It is irrelevant whether they are generated in the space of data (section 2.2) or in the space of parameters (section 6). We can then apply the Bayesian reweighting technique exactly as for the NNPDF sets. The key formulae are summarised below, but we refer to refs. [32,33] for the derivation and more details of the method. We first compute the χ 2 k for the new data set (comprising N pts. data points) using each S k , then we can calculate the mean value of any PDF-dependent quantity F (S k ) as: where the weights are given by with the denominator of w k (χ 2 k ) ensuring the normalisation condition: Note that the expression for the weights in eq. (7.2) differs from the original formula in ref. [3] due to subtle arguments explained in ref. [32]. The standard deviation ∆F after reweighting can be calculated using eq. (2.14) with the trivial replacement N rep → N pdf and using the weighted averages F 2 new and F new . The effective number of random PDF sets left after reweighting, referred to as the "Shannon entropy" [32], is given by As a simple application of this reweighting technique, we will consider the 7 TeV LHC data from the 2010 running period on the W → ℓν charge asymmetry from CMS [7] and ATLAS [8]. The W → ℓν charge asymmetry is defined differentially as a function of the pseudorapidity η ℓ of the charged-lepton from the W -boson decay, i.e.
We will consider the CMS data [7] with charged-lepton transverse momentum cut of p ℓ T > 25 GeV in both the electron (ℓ = e) and muon (ℓ = µ) channels. The ATLAS data [8] combine the electron and muon channels with cuts of p ℓ T > 20 GeV, missing transverse we define a K-factor by taking the ratio of the dynnlo histograms, then we fit to quartic polynomials in |η ℓ | to provide a convenient parameterisation and to smooth statistical fluctuations from the vegas multidimensional integration. A fast calculation of the W → ℓν charge asymmetry can then be obtained using a simple LO calculation with zero W width (denoted "LEPTON"), including the parameterised K-factors for dσ(ℓ ± )/dη ℓ , making the assumption that the K-factors are independent of the PDF choice. In figure 15(e,f) we compare the LEPTON calculation, without and with the inclusion of K-factors, with the dynnlo histograms, finding good agreement (by construction). It can be seen that the NLO corrections and finite-width effects are very small over most of the |η ℓ | range. The effect on the W → ℓν charge asymmetry of neglecting the PDF dependence of the K-factors should then be completely negligible. We have also computed the NNLO corrections using the dynnlo code and find them to be much smaller than the NLO corrections, but we will consider only NLO QCD in making comparisons to data, as done elsewhere in this paper. We will focus on demonstrating the reweighting technique rather than aiming to make an exhaustive study of the impact of the LHC data. With this aim in mind, we will not consider in this work the 2010 CMS data with p ℓ T > 30 GeV [7], the preliminary CMS measurements using 2011 data with p µ T > 25 GeV [35] or p e T > 35 GeV [36], or the recent LHCb measurements using 2010 data with p µ T > {20, 25, 30} GeV [37]. The ATLAS Collaboration [8] provide the differential cross sections, dσ(ℓ ± )/dη ℓ , separately for W + → ℓ + ν and W − → ℓ −ν with the complete information on correlated systematic uncertainties, which is potentially more useful for PDF fits than simply the asymmetry A ℓ (η ℓ ). A future study could perhaps investigate the use of reweighting with the ATLAS W ± (and Z/γ * ) differential cross sections rather than the asymmetry A ℓ (η ℓ ). In this study, we simply calculate the χ 2 k values with all experimental uncertainties added in quadrature. In figure 16(a,b) we compare the (a) CMS and (b) ATLAS data on the W → ℓν charge asymmetry to predictions using the MSTW 2008 NLO PDFs, firstly with the usual best-fit and Hessian uncertainty. We then generate N pdf = 1000 random predictions for the asymmetry by taking F = A ℓ (η ℓ ) in eq. (6.4), and take the average and standard deviation, giving results slightly different from the best-fit and Hessian uncertainty (mainly due to the asymmetric tolerance values). The χ 2 values of the average A ℓ (η ℓ ), displayed in the plot legends, are then slightly larger than the χ 2 of the best-fit predictions. Next we compute weights for each of the N pdf predictions according to eq. (7.2), then finally we plot the weighted average and standard deviation in figure 16(a,b). The χ 2 of the weighted average A ℓ (η ℓ ) improves significantly compared to the unweighted average. The effective number of random predictions N eff after reweighting, computed according to eq. (7.4), is about half the original number for CMS and almost a quarter the original number for ATLAS. The most significant change in the predictions after reweighting is for η ℓ ≈ 0 where A ℓ (η ℓ ) depends on the combination u v − d v at momentum fractions x slightly above x ∼ M W / √ s ∼ 0.01.
In figure 16(c,d) we plot this combination for Q 2 = (100 GeV) 2 for the same three sets of predictions shown in figure 16(a,b). We compare the best-fit and Hessian uncertainty with the unweighted/weighted average and standard deviation of N pdf = 1000 random PDFs constructed by taking F = x(u v − d v )(x, Q 2 ) in eq. (6.4), with the same random numbers R jk and weights w k used in figure 16(a,b). As expected from figure 16(a,b), the shift in u v − d v is largest at x ∼ 0.01, and the average value after reweighting using the ATLAS data even lies outside the original uncertainty band. There is also a distinct reduction in the size of the uncertainty band after reweighting.
The procedure just described is not completely unambiguous. Alternative prescriptions could be formulated which are equivalent in a linear approximation, but which might differ due to some degree of non-linearity. For example, rather than starting by generating random predictions for the asymmetry by taking F = A ℓ (η ℓ ) in eq. (6.4), we could instead generate N pdf = 1000 random PDF sets by taking F = xf (x, M 2 W ) in eq. (6.4), where f = {g, d, u, s, c, b,d,ū,s,c,b}, then calculate A ℓ (η ℓ ) for each of these N pdf random PDF sets, before calculating weights according to eq. (7.2) as before. This alternative method will give slightly different results since A ℓ (η ℓ ) depends on xf (x, M 2 W ) in a non-linear manner. In figure 17(a,b) we compare the distribution of weights w k computed using the two different methods, using the same random numbers R jk to allow a direct comparison of individual weights with the same label k. The distribution of weights is very similar using the two methods. The individual weights typically agree to within a few percent and differ by only a few tens of percent in the worst cases. The values of N eff agree to the nearest integer and the values of χ 2 /N pts. agree to two decimal places. The plots of figure 16 produced using the alternative method are indistinguishable. We conclude that the degree of non-linearity is small and both techniques may be useful in practice. For example, it might be useful to first generate N pdf = 1000 random PDF sets as grid files by taking F = xf (x, Q 2 ) in eq. (6.4), then these grid files can be processed in exactly the same way as the NNPDF grid files. On the other hand, that method would require substantial disk storage and would require the observable A ℓ (η ℓ ) to be evaluated N pdf times, which is potentially timeconsuming. With the first method described above, it is unnecessary to store intermediate grid files, and only 2n + 1 (= 41 for the MSTW 2008 PDFs) evaluations of A ℓ (η ℓ ) are needed for the best-fit and 2n eigenvector PDF sets, exactly as for the usual computation of Hessian uncertainties. The first method will therefore be used for subsequent results. The χ 2 distribution of the new data set after reweighting can easily be histogrammed: where the χ 2 distribution before reweighting is trivially obtained by setting all weights w k equal to unity. Both these distributions are shown in figure 17(c,d). The plot legends indicate the mean χ 2 and the standard deviation. The reweighting procedure shifts the χ 2 distribution so that larger weights are given to the random predictions with χ 2 k /N pts. ∼ 1. If we rescale the data uncertainties by a factor α, then the probability density for the rescaling parameter α is given by [32] that is, the sum of the unnormalised weights given by eq. (7.2) with the replacement χ 2 k → χ 2 k /α 2 . The overall normalisation of eq. (7.7) can be determined from the condition that the integral of P(α) over α gives unity. The probability distribution P(α) is shown in figure 17(e,f). These plots suggest that the LHC data on A ℓ (η ℓ ) are somewhat inconsistent with the data in the MSTW 2008 NLO fit and that the uncertainties on the LHC A ℓ (η ℓ ) data should be rescaled by a factor 1.37 for CMS and 1.68 for ATLAS to achieve consistency, where these are the most probable values of α. Conversely, a most probable value of α much less than 1 would suggest that the experimental uncertainties are overestimated to some extent. In that case, it might be desirable to repeat the reweighting procedure with the replacement χ 2 k → χ 2 k /α 2 in eq. (7.2), where α is the most probable value. It is clear (see, for example, the discussion in ref. [1]) that there is some considerable tension between the LHC W → ℓν charge asymmetry data and some of the data already included in the MSTW 2008 fit, such as the Tevatron W → ℓν asymmetry, the NMC F d 2 /F p 2 ratio, and the E866/NuSea Drell-Yan σ pd /σ pp ratio. Other tensions have been observed with the more recent and precise Tevatron data on the W → ℓν charge asymmetry, and partially resolved by more flexible nuclear corrections for deuteron structure functions [38] and extended parameterisation choices for the functional form of the input PDFs. Indeed, we note that the LHC asymmetry A ℓ (η ℓ ) depends on valence-quark parameterisations near x ∼ 0.01, and the studies in section 3 suggested that this is the single place where the MSTW 2008 parameterisation is likely to be inadequate. Further attempts to resolve these tensions will be necessary for any future update of the MSTW 2008 fit. Therefore, the reweighting technique is instructive, but does not indicate the ultimate impact of including the new data in a global PDF fit after closer scrutiny of potential sources of tension. Nevertheless, we hope that the new method presented in this section of generating random predictions on-the-fly from the existing eigenvector PDF sets, followed by subsequent Bayesian reweighting, will be useful for a wide range of potential studies by third parties from both the experimental and theoretical communities.

Conclusions
We have made a first study of the Monte Carlo approach to experimental uncertainty propagation in the context of the MSTW 2008 NLO PDF fit [6], either using data replicas or alternatively working directly in parameter space. The main findings of this study are as follows: • The Hessian method and the Monte Carlo method using data replicas are approximately equivalent in a global fit when using the same parameterisation and (lack of) tolerance, i.e. ∆χ 2 = 1. Similar findings have previously been observed in a fit only to H1 data [15].
• The Monte Carlo approach using data replicas is better suited to exploring parameterisation bias due to the potentially restrictive input functional form. Increasing the number of parameters from 20 → 28 has only a small effect on PDF uncertainties, with the exception of the valence-quark distributions at low x values where there is a moderate increase in PDF uncertainties. This gives some confidence that, in general, PDF uncertainties in the MSTW 2008 fit are not significantly underestimated due to parameterisation bias, with the possible exception of the strange-quark andantiquark distributions where the imposed parameterisation constraint is more severe due to the lack of available data constraints.
• The previous findings raise the question why the MSTW/CTEQ uncertainties (with a tolerance) are similar to the NNPDF uncertainties (without a tolerance) [1], if the tolerance in the former is not compensating for the more restricted functionalform parameterisation rather than the more flexible neural-network parameterisation. One possibility is that the procedural uncertainties for NNPDF associated with splitting data into training and validation sets mimic the effect of a tolerance for MSTW/CTEQ (see discussion in section 3.2 of ref. [39]). Further investigation would be needed by the NNPDF Collaboration to clarify this possible explanation.
• The Monte Carlo approach using data replicas is also better suited when making fits to limited data sets without the need to restrict the input parameterisation. We compared the global-fit PDFs to those extracted using a similar flexible parameterisation from more limited data sets either excluding HERA data, including only HERA data, or including only collider (HERA and Tevatron) data. The fits to limited data sets gave much larger PDF uncertainties for some parton combinations, implying that we need data from HERA, the Tevatron and the fixed-target experiments to get a meaningful result. The PDF uncertainty bands from the fits to the limited data sets are not close to overlapping in many cases, implying that some kind of tolerance is needed to accommodate inconsistencies between the various data subsets.
• As a further exercise to examine the effect of data set inconsistency, we generated idealised pseudodata from the best-fit theory predictions, then we introduced deliberate inconsistencies. The fractional PDF uncertainties were very similar whether fitting the real data, the consistent pseudodata or the inconsistent pseudodata. On the other hand, the central values obtained when fitting the inconsistent pseudodata were incompatible accounting for the uncertainty bands, even though the χ 2 global only increased by around 10% and the χ 2 /N pts. for individual data sets did not deviate far above unity. Given that a good fit should have χ 2 /N pts. approximately in the range 1 ± 2/N pts. [12], giving 1.0 ± 0.1 for N pts. ∼ 200, it is far from obvious to spot genuine inconsistencies in the real data of the size we introduced into the idealised pseudodata. It is definitely not the case that the PDF uncertainties will automatically expand to accommodate any inconsistencies. Again, this suggests the need for a tolerance to accommodate potential data set inconsistencies in the real data.
• Having established the need for an appropriate tolerance, we pointed out that it could be introduced by rescaling all experimental uncertainties by a common factor (say, 3) in the generation of data replicas. However, the introduction of a "dynamic" tolerance for each eigenvector direction is not possible, since no use is made of the covariance matrix of fit parameters in the Monte Carlo error propagation.
• Instead, we proposed sampling the covariance matrix of fit parameters by stepping along each eigenvector direction by a random amount, including the appropriate tolerance values. This method of generating random PDF sets is close to the usual generation of eigenvector PDF sets in the Hessian method where one steps along each eigenvector direction in turn by a fixed amount.
• In fact, assuming linearity between the input PDF parameters and derived quantities such as evolved PDFs or hadronic cross sections, an assumption that is inherent in the Hessian method, then it is more convenient to generate random predictions on-the-fly from the existing eigenvector PDF sets.
• As a simple example application to demonstrate the benefits of having randomlydistributed theory predictions, we used Bayesian reweighting to investigate the effect on the PDFs of recent LHC data on the W → ℓν charge asymmetry. Similar studies can now easily be performed by third parties using any PDF determination where eigenvector PDF sets are provided. The reweighting technique is therefore no longer limited only to using the PDF sets provided by the NNPDF Collaboration.
We conclude that the Monte Carlo method using data replicas is, on balance, not superior to the Hessian method in a global fit when using a conventional functional-form parameterisation of the input PDFs. In particular, one of the key benefits of the Monte Carlo approach, namely the use of Bayesian reweighting, can even be accomplished more efficiently using the existing eigenvector PDF sets. Therefore, any future update of the "MSTW 2008" analysis will continue to use the Hessian method with a "dynamic" tolerance.