Parton Distributions with Theory Uncertainties: General Formalism and First Phenomenological Studies

We formulate a general approach to the inclusion of theoretical uncertainties, specifically those related to the missing higher order uncertainty (MHOU), in the determination of parton distribution functions (PDFs). We demonstrate how, under quite generic assumptions, theory uncertainties can be included as an extra contribution to the covariance matrix when determining PDFs from data. We then review, clarify, and systematize the use of renormalization and factorization scale variations as a means to estimate MHOUs consistently in deep inelastic and hadronic processes. We define a set of prescriptions for constructing a theory covariance matrix using scale variations, which can be used in global fits of data from a wide range of different processes, based on choosing a set of independent scale variations suitably correlated within and across processes. We set up an algebraic framework for the choice and validation of an optimal prescription by comparing the estimate of MHOU encoded in the next-to-leading order (NLO) theory covariance matrix to the observed shifts between NLO and NNLO predictions. We perform a NLO PDF determination which includes the MHOU, assess the impact of the inclusion of MHOUs on the PDF central values and uncertainties, and validate the results by comparison to the known shift between NLO and NNLO PDFs. We finally study the impact of the inclusion of MHOUs in a global PDF determination on LHC cross-sections, and provide guidelines for their use in precision phenomenology. In addition, we also compare the results based on the theory covariance matrix formalism to those obtained by performing PDF determinations based on different scale choices.


Introduction
An accurate estimate of the uncertainty in Standard Model (SM) predictions is a crucial ingredient for precision phenomenology at the Large Hadron Collider (LHC). Now, and for several years to come [1,2], theoretical uncertainties for hadron collider processes are dominated by the missing higher order uncertainty (MHOU) in perturbative QCD calculations, usually estimated by scale variation, and by parton distribution function (PDF) uncertainties. Of course, PDFs summarize the information on the nucleon structure extracted from other SM processes [3]: effectively, PDFs provide a way of obtaining a prediction for a given process in terms of other processes. This way of thinking about PDFs immediately shows that MHOUs are present not only in the perturbative prediction for a particular process, but also in the underlying processes used for the PDF determination.
Current PDF uncertainties essentially only include the propagated uncertainty arising from statistical and systematic uncertainties in the experimental data used in their determination. Methodological uncertainties related for example to the choice of functional form for the PDFs, or the fitting methodology employed, can be kept under control using closure tests [4], and with care can be made negligible in the data region. Parametric uncertainties, such as those related to the value of the strong coupling α s (m Z ) or the charm mass m c can be included by performing fits for a range of parameters. However up until now MHOUs have never been included in a PDF fit: what is usually called the "PDF uncertainty" does not include the MHOU in the theoretical calculations used for PDF determination, and, more generally, does not typically include any source of theory uncertainty.
Historically, this is related to the fact that MHOUs have always been considered as likely to be small in comparison to other PDF uncertainties, especially since NNLO PDFs have become the default standard. However, it is clear that as PDF uncertainties become smaller and smaller, at some point MHOUs will become significant. In the most recent NNPDF set, NNPDF3.1 [5], PDF uncertainties at the electroweak scale can be as low as 1%. Given that the typical size of MHOU on NNLO QCD processes is at the percent level (see e.g. [6]) their neglect seems difficult to justify a priori.
Besides contributing to the overall size of PDF uncertainty, more subtly the MHOU might affect the relative weights of different datasets included in the fit: a dataset which is accurately described by NNLO theory because it has small MHOU should in principle carry more weight than one which is poorly described because it has large MHOU. The neglect of MHOUs might thus be biasing current global PDF fits.
It is the purpose of this paper to set up a general formalism for the inclusion of theoretical uncertainties, specifically MHOUs, in PDF determinations, and then to perform a first phenomenological exploration of their impact on LHC phenomenology. The development of this treatment of MHOUs will involve three main ingredients. The first is the formulation of a general theory for the inclusion in PDF fits of generic theoretical uncertainties, of which MHOUs are a particular case. The second is the choice of a specific method for estimating the MHOU in each of the cross-sections that enter the PDF fit. The third is the construction of a set of tools for the validation of this methodology, to check that the MHOU is being correctly estimated.
The first ingredient in our approach is common to any kind of theory uncertainty: theory uncertainties include not only MHOUs, but also any other aspect in which the theory used in order to obtain predictions for the physical processes that enter the PDF fit is incompletely known. These include higher twists [7] and other power-suppressed corrections, nuclear corrections when nuclear targets are involved [8], final state corrections for non-inclusive processes, and so forth. All of these uncertainties are only meaningful in a Bayesian sense: there is only one correct value of the next-order perturbative correction, not a distribution of values. They thus necessarily involve a process of informed estimation or guesswork: the only way to actually know the size of, say, a missing higher order correction, is to calculate it.
We will show by adopting a Bayesian point of view, and assigning a Gaussian probability distribution to the expected true value of the theory calculation, that the impact of any missing theoretical contribution can be encoded as an additive contribution to the experimental covariance matrix used in the PDF fit [9]. The combination is additive because experimental and theoretical uncertainties are by their nature independent, and are thus combined in quadrature.
In a global fit, theoretical uncertainties can be strongly correlated not only across data points within a given experiment, but also between different experiments, and even different processes, so we need a theoretical covariance matrix which includes all these correlations across all the datasets included in the fit.
This then immediately raises the issue of choosing a meaningful way to estimate the MHOU, which in particular incorporates these correlations. The standard way of estimating MHOUs in perturbative QCD calculations is to perform a variation of the renormalization and factorization scales, denoted as µ r and µ f respectively, with various choices for the range and combination of variations existing. While the shortcomings of this method are well known, and various alternatives have been discussed [10][11][12], this remains the default and most widely used option. In the present context, its main advantage is its universality (it can be applied in the same way to any of the processes used in the fit), and the way in which it implicitly incorporates correlations (for example predictions for data points in the same process which are kinematically close will be automatically correlated), even across different processes (through the PDFs, which are the same in every process). Thus while in principle our covariance matrix formalism allows for the inclusion of any method for estimating MHOUs in a PDF determination, here we will specifically use scale variation.
In order to do this, we need to examine systematically the underpinnings of scale variation as a means to estimate theory uncertainties, since different definitions of scale variation have been used in different contexts. Indeed, the standard definitions of renormalization and factorization scale typically used for deep-inelastic scattering and hadronic collisions are not the same. Because PDF fits include both types of processes, it is important to understand in detail how these definitions relate to each other, in order to be able to correlate the scale variations in a meaningful way. Specifically, we will show that one may estimate the MHOU for any process by combining two independent scale variations: one to estimate the MHOU in the perturbative evolution of the PDFs (missing higher orders in the DGLAP splitting functions), and the other to estimate the MHOU in the perturbative calculation of the partonic cross-sections (missing higher orders in the hard-scattering matrix elements).
Once the scales to be varied are understood, the remaining task is to choose a particular prescription to be used to construct the theoretical covariance matrix. In estimating MHOUs for a given process, the most commonly adopted option is the so-called seven-point envelope prescription, in which µ r and µ f are independently varied by a factor of two about the central choice while ensuring that 1/2 ≤ µ r /µ f ≤ 2, and the MHOU is then taken as the envelope of the results. For our purposes this is insufficient: rather than taking an envelope, we wish to contruct a covariance matrix out of the scale variations. In particular, because theoretical uncertainties are correlated across processes (through the evolution of the PDFs), we need a prescription for determining the entries of the covariance matrix both within a single process and across pairs of processes.
We will discuss in detail a variety of options to achieve this, based on a general "n-point prescription". These options will differ from each other in the choice of the number of independent variations, the directions of such variations in the (µ r , µ f ) plane, and the way the variations are correlated (or not) across different processes.
The validation of these point prescriptions, and the choice of the optimal one to be used for PDF determinations is a nontrivial problem, which however admits an elegant solution. The validation can be performed at NLO, by comparing the estimate of the MHOU encoded in the theory covariance matrix to the known next (NNLO) order correction. The problem is then to compare the probability distribution of expected higher-order results to the unique answer given by the NNLO calculation. The solution to this problem is to view the set of shifts between the NLO and NNLO computations for all the processes under consideration as a vector, with one component for each of the data points. The theory covariance matrix corresponding to each prescription then defines a one-sigma ellipsoid in a subspace of this space. The validation is performed by projecting the shift vector into the ellipsoid: if the theory covariance matrix gives a sensible estimate of the MHOU at NLO, the shift vector will lie almost entirely within the ellipsoid. Using this strategy, we will validate a variety of scale variation prescriptions on a similar dataset to that of the global NNPDF3.1 analysis. Since the dimension of the space of datapoints is typically two orders of magnitude higher than the dimension of the subspace of the ellipsoid, this is a highly nontrivial test.
Once a prescription has been selected and used to construct the theory covariance matrix, it is possible to perform a PDF fit based on it. Within the NNPDF methodology, an ensemble of PDF replicas is fitted to data replicas. Data replicas are generated in a way which reflects the uncertainties and correlations of the underlying data, as encoded in their covariance matrix. The best-fit PDF replica for each data replica is then determined by minimizing a figure of merit (χ 2 ) which is computed using the covariance matrix. As mentioned, and as we shall show in Sect. 2, the theory contribution appears as an independent contribution to the total covariance matrix, uncorrelated with the experimental one and simply added to it. Therefore, once the covariance matrix is supplemented by an extra theory contribution coming from MHOUs, this should be treated on the same footing as any other contribution, and it will thus affect both the data replica generation, and the fitting of PDF replicas to data replicas.
Qualitatively, one may expect the inclusion of the MHOU in the data replica generation to increase the spread of the data replicas, and thus lead in itself to an increase in overall PDF uncertainties. On the other hand the inclusion of the MHOU in the fitting might also reduce tensions within the fit due to the imperfection of the theory and, since these are highly correlated, result in significant shifts in central values, and overall a better fit with reduced uncertainties. The combined effect of including the MHOU in both the data generation and the fitting is thus not at all obvious.
We will investigate these effects by performing PDF determinations in which MHOUs are included in either, or both, the replica generation and the PDF replica fitting. Once again, results can be validated at NLO by comparing NLO PDFs determined with the theory covariance matrix to NNLO PDFs. A successful validation should show that the best-fit NLO PDF moves towards the central NNLO result upon inclusion of the theory covariance matrix in both replica generation and fitting, due to a relaxation of tensions in the NLO fit, and that the NNLO PDF differs from the NLO PDF by an amount which is correctly estimated by the NLO uncertainty band. As we shall see, this is indeed the case, and in fact it will turn out that often the uncertainty band does not increase or even decreases upon inclusion of the theory covariance matrix.
Having determined PDFs which now account for the MHOU associated to the processes that enter the fit, the natural questions which then arise are what is their impact, and more generallly how they should be used for precision LHC phenomenology. On order to address the first question, we will compute predictions with MHOUs for typical LHC standard candle processes, both with and without including the MHOU in the PDF, and provide a first phenomenological exploration and assessment of the impact of these uncertainties.
The second question is not entirely trivial and we will address it in detail. Indeed, scale variation is routinely performed in order to estimate the MHOU in theoretical predictions for hadron collider processes. Clearly, when obtaining a prediction, we should avoid double counting a MHOU which has already been included in the PDF. Instances in which this might happen include not only the trivial situation in which a prediction is obtained for a process which has already been used for PDF determination, but also the somewhat more subtle situation in which the MHOU in the PDF and the observable which is being predicted are correlated through perturbative evolution [13]. We will discuss this situation, and provide guidelines for the usage of PDFs with MHOUs. This paper is broadly divided into two main parts. In the first part, we construct a general formalism for the inclusion of theory uncertainties and specifically MHOUs in PDF determination, and show how to construct and validate a theory covariance matrix. In the second part, we perform a first investigation of the phenomenological implications of these theory uncertainties. The structure of the paper is the following: in Sect. 2 we show, using a Bayesian approach, that under certain assumptions any type of theory uncertainty can be included as a contribution to the covariance matrix. In Sect. 3 we summarize the theory of scale variation and use it to review, compare and systematize different definitions which have been used in the literature. In Sect. 4 we then formulate a number of "point prescriptions" for the theory covariance matrix, both for a single process, and also to account for correlations between a pair of processes. In Sect. 5 we compute the theory covariance matrix for a variety of prescriptions, we test them against known higher order corrections, and use this comparison to select an optimal prescription. We then move to the second, more phenomenological, part of the paper. The centerpiece of this section is the determination of NLO PDF sets with MHOU, presented in Sect. 6. We first only include deep-inelastic scattering data (DIS-only fit), and then adopt a global data set, which is compared to PDFs without MHOU, and validated against NNLO PDFs. For the DIS-only PDF determination, we also determine a NNLO PDF set including MHOU. In Sect. 7 we present initial studies of the phenomenological impact of the inclusion of MHOUs in PDFs for representative LHC processes. Finally in Sect. 8 we provide guidelines for the usage of PDFs with MHOU, in particular concerning the combination of the PDF uncertainties with the MHOU on the hard matrix element, and present the delivery of the PDF sets produced in this work.
Two appendices contain further studies and technical details. In Appendix A we provide additional details concerning the procedure adopted to diagonalise the theory covariance matrix. Then in Appendix B we study another possible validation of the results of Sect. 6, by comparing PDFs with MHOUs to the PDFs obtained by adopting different choices of renormalization and factorization scales in the PDF determination. Families of fits which only differ in choices of scale have never been carried out before and will be presented here for the first time. Whereas they do not necessarily give a fair estimate of the MHOU on PDFs, they surely do provide an indication of the expected impact of scale variation on PDFs, and the pattern of MHOU correlations.
A concise discussion of the main results of this work was presented in Ref. [14], of which this paper represents the extended companion.

A theoretical covariance matrix
Parton distribution functions are determined from a set of N dat experimental data points, which we represent by an N dat -dimensional vector D i , i = 1, . . . , N dat . These data points have experimental uncertainties that may be correlated with each other, and this information is encoded in an experimental covariance matrix C ij . This covariance matrix may be block-diagonal if some sets of data are uncorrelated. Each experimental data point has associated with it a "true" value T i -the value given by Nature -whose determination is the goal of the experiment. Since the experimental measurements are imperfect, they cannot determine T exactly, but they can be used to estimate the Bayesian probability of a given hypothesis for T . Assuming that the experimental results are Gaussianly distributed about this hypothetical true value, the conditional probability for the true values T given the measured cross-sections D is up to an overall normalization constant. Note that this tacitly assumes equal priors for both D and T . Of course the true values T i are unknown. However we can calculate theoretical predictions for each data point D i , which we denote by T i . These predictions are computed using a theory framework which is generally incomplete: for example because it is based on the fixed-order truncation of a perturbative expansion, or because it excludes higher-twist effects, or nuclear effects, or some other effect that is difficult to calculate precisely. Furthermore, these theory predictions T i depend on PDFs, evolved to a suitable scale also using incomplete theory. While the theory predictions may correspond to a variety of different observables and processes, they all depend on the same underlying (universal) PDFs.
We now assume, in the same spirit as when estimating experimental systematics, that the true values T i are centered on the theory predictions T i , and Gaussianly distributed about the theory predictions, with which they would coincide if the theory were exact and the PDFs were known with certainty. The conditional probability for the true values T given theoretical predictions T is then again up to a normalization constant, where S ij is a "theory covariance matrix", to be estimated in due course.
PDFs are determined by maximizing the probability of the theory given the data P (T |D), marginalised over the true values T which of course remain unknown. Now using Bayes' theorem Moreover, since the experimental data do not depend on the theorists' calculations T , but only on the 'truth' T , P (D|T T ) = P (D|T ). (2.4) Then because by construction D N T P (T |T D) = 1, 5) where the N -dimensional integral is over all of the possible values of T i . The probability of the experimental data D is now conditional on the theory T because we have marginalised over the underlying 'truth' T , which is common to both. Writing the difference between the true T i and the actual T i values of the theory prediction as The Gaussian integrals can now be performed explicitly. Adopting a vector notation in order to make the algebra more transparent, we rewrite the exponent as where we used the fact that both C and S are symmetric matrices, and in the last line we completed the square. Integrating over ∆, ignoring the normalization, Eq. (2.7) then becomes so that Restoring the indices, we thus find the simple result Comparison of Eq. (2.12) with Eq. (2.1) indicates that when replacing the true T i by the theoretical predictions T i in the expression of the χ 2 of the data, the theoretical covariance matrix S ij should simply be added to the experimental covariance matrix C ij [9]. In effect this implies that, at least within this Gaussian approximation, when determining PDFs theoretical uncertainties can be treated simply as another form of experimental systematic: it is an additional uncertainty to be taken into account when trying to find the truth from the data on the basis of a specific theoretical prediction. The experimental and theoretical uncertainties are added in quadrature because they are in principle uncorrelated.
In the case for which theoretical uncertainties can be neglected, i.e. if S ij → 0, then P (T |T ) in Eq. (2.2) becomes proportional to δ N (T i − T i ). As a result, in this case Eq. (2.12) reduces to Eq. (2.1) with T i replaced by the predictions T i . This shows that Eq. (2.12) remains true even if S ij has zero eigenvalues and is thus not invertible. Note however that by construction C ij is positive definite, since any experimental measurement always has uncorrelated statistical uncertainties due to the finite number of events, so (C + S) ij will always be invertible.
The question remains of how to estimate the theory covariance matrix, S ij . The Gaussian hypothesis Eq. (2.2) implies that where the average is taken over the true theory values T using the probability distribution P (T |T ), and ∆ i = 0 consistent with the assumption that the probability distribution of the truth T is centred on the theoretical calculation T . In practice however the formal definition Eq. (2.13) is not very helpful: we need some way to estimate the shifts ∆ i -'nuisance parameters', in the language of systematic error determination -in a way that takes into account the theoretical correlations between different kinematic points within the same dataset, between different datasets measuring the same physical process, and between datasets corresponding to different processes (with initial state hadrons). Note that theory correlations will always be present even for entirely different processes, through the universal parton distributions: the only processes with truly independent theoretical uncertainties are those with only leptons in the initial state, which are of course irrelevant for PDF determination. The most commonly used method of estimating the theory corrections due to MHOUs, which can naturally incorporate all these theoretical correlations, is scale variation. This method is reviewed in Sect. 3 in general terms and then used in Sect. 4 in order to formulate specific prescriptions for constructing the theory covariance matrix S ij . Other approaches which have been discussed in the literature involve estimating MHOUs based on the behaviour of the known perturbative orders [10][11][12]; however, at least at present, these do not appear to provide a formalism which is sufficiently well-established, and of appropriately general applicability. We emphasize however that the formalism presented in this section is independent of the specific method adopted to estimate the correlated theory shifts ∆ i that enter Eq. (2.13).

MHOUs from scale variations
The variation of the renormalization and factorization scales is the most popular approach for estimating missing higher order uncertainties (MHOUs) in QCD perturbative calculations. It has a number of advantages: it naturally incorporates renormalization group (RG) invariance, thereby ensuring that as the perturbative order increases, estimates of MHOU decrease; the same procedure can be used for any perturbative process, since the scale dependence of the strong coupling α s (µ 2 ) and of PDFs is universal; the estimates of MHOU it produces are smooth functions of the kinematics, and thereby correctly incorporate the strong correlations in nearby regions of phase space; and correlations between different processes due to universal ingredients such as PDFs can be easily incorporated. Its drawbacks are also well known: there is no unique principle to determine the specific range of the scale variation (nor even the precise central scale to be adopted); and it misses uncertainties associated with new singularities or color structures present at higher orders but missing at lower orders. The former problem may be dealt with, at least qualitatively, by validating a given range in situations where the next order corrections are known. We will attempt such a validation in this paper. The latter problem is more challenging, requiring resummation in the case of unresummed logarithms, or other methods of estimating new types of corrections, and it is unclear whether or not it admits a general solution.
While scale variation has been discussed many times in a variety of contexts, there is no standard, commonly accepted formulation of it, and specifically none that can be applied to both electroproduction and hadroproduction processes, as we need to do if we wish to use scale variation in the context of global PDF analyses. In fact, it turns out that the most commonly adopted approaches to scale variation differ, typically according to the nature of the process which is being considered, though also as a function of time, with different prescriptions being favored in the past than those in common use at the present. Moreover, even the terminology is not uniform: it has evolved over time, resulting in the same names being used for what are essentially different scale variations.
To formulate prescriptions for the general use of scale variation for MHOU estimation which can be applied to any process included in present or future PDF determinations, it is thus necessary to first review the underpinnings of scale variation, and to then use them in order to set up a generally applicable formalism. This will be done in the current section, by specifically discussing the cases of electroproduction and hadroproduction. In particular, we will show that for factorized processes MHOUs on the partonic cross-sections and on perturbative evolution are independent and can be estimated through independent scale variations. We will then discuss how they can be combined, first with a single process and then for several processes, both correlated and uncorrelated.

Renormalization group invariance
The basic principle of scale variation is based on the observation that scale-dependent contributions to a perturbative prediction are fixed by RG invariance, and therefore scale variation can be used to generate higher order contributions, which are then taken as a proxy for the whole missing higher orders.
More explicitly, consider a generic theoretical prediction (typically a perturbative crosssection) of the form T (α s (µ 2 ), µ 2 /Q 2 ), where µ 2 is the renormalization scale and Q 2 is some physical scale in the process. Thus T indicates the theory prediction T when it is evaluated at some renormalization scale µ 2 instead of being evaluated at the physical scale Q 2 : if we instead set The QCD running coupling α s (µ 2 ) satisfies the RG equation where the QCD beta function has the following perturbative expansion: RG invariance is the statement that the all-order prediction is independent of the renormalization scale: It will be useful in what follows to define the variables so α s (µ 2 ) is a function of ln µ 2 /Λ 2 = t + κ. We can then write the RG equation (3.4) as where in the second line we assume that T is analytic in α s and κ, and in the third we use Taylor expanding T (α s , κ) in κ about κ = 0 (i.e. k = 1, µ 2 = Q 2 ) at fixed coupling α s , (3.8) where in the second line we use the RG invariance condition, Eq. (3.6), to replace ∂ ∂κ with − ∂ ∂t . We can thus determine the κ dependence of T (α s , κ) using the dependence of T (t) = T (α s (t), 0) on t: and β(α s ) = O(α 2 s ), we see that 1 : derivatives with respect to t always add one power of α s . It follows that in Eq. (3.9), the term O(κ) is O(α s ) with respect to the leading term, and the term O(κ 2 ) is O(α 2 s ) with respect to the leading term, and so on. We thus see explicitly that the scale-dependent terms (those that depend on κ), at a given order in perturbation theory, are determined by derivatives of the cross-section lower down the perturbation series.
This implies that if we know the cross-section T (t) as a function of the central scale Q 2 to a given order in perturbation theory, we can then use Eq. (3.9) to determine the scale-dependent κ terms directly from T (t) at any given order, by differentiating terms lower down the perturbative expansion. For instance, truncating at LO, NLO, or NNLO, one has (3.11) The differentiation may be performed analytically, which is trivial for a fixed order expansion, or numerically, which can be useful in a resummed expression where the dependence on α s (t) can be nontrivial [15]. Note that when the renormalization scale coincides with the physical scale of the process, µ 2 = Q 2 , then κ = 0 and T = T at every order in the perturbative expansion. The MHOU can now be estimated as the difference between the scale varied cross-section and the cross-section evaluated at the central scale, namely (3.12) Thus at LO, NLO and NNLO we have, using Eq. (3.11), that the theory nuisance parameters are given by (3.13) One finds that while at LO the theory uncertainty is entirely due to the scale chosen for α s , at NLO the dependence on scale is milder since the leading dependence is subtracted off by the O(κ) term. At NNLO it is milder still, since the O(κ) term subtracts the leading dependence in the first term, and the O(κ 2 ) removes the subleading dependence in the first two terms. RG invariance then guarantees that the terms generated by scale variation are always subleading, so if the perturbation series is well behaved, the theory shifts ∆ becomes smaller and smaller as the order of the expansion is increased.
Clearly the size of the MHOU, estimated in this way, will depend on the size of the scale variation, and thus on the value chosen for κ. Typically one varies the renormalization scale by a factor of two in each direction, i.e. κ ∈ [− ln 4, ln 4], since this range is empirically found to yield sensible results for many processes. However, in principle, one should treat κ as a free parameter, whose magnitude needs to be validated whenever possible by comparing to known higher order results.
In the present work, we are specifically interested in the application of this method to processes with one or more hadrons in the initial state, i.e. to cross-sections factorized into a hard cross-section convoluted with a PDF or a parton luminosity. There are then two independent sources of MHOU: the perturbative expansion of the hard partonic cross-section, and the perturbative expansion of the anomalous dimensions that determine the perturbative evolution of the parton distributions. It is convenient to obtain each of these from an independent scale variation, and this can be done by writing separate RG equations for the hard cross-section and for the PDF, as we will demonstrate below. This approach is completely equivalent to the perhaps more familiar point of view in which MHOUs on perturbative evolution are instead obtained by varying the scale at which the PDF is evaluated in the factorized expression, as we will also show.
We will begin by considering the MHOU in the hard-scattering partonic cross-sections; we will then turn to a discussion of MHOUs in the PDF evolution, and show that the latter can be obtained by several equivalent procedures. We will then discuss how both scale variations can be obtained from double scale variation of the hard cross-section, and how this fact also offers the possibility of performing scale variation in alternative ways whereby these two sources of MHOU are mixed. We will discuss these for completeness, since in the past scale variations were often performed in this way. Finally, we will address scale variations and their correlations when several processes are considered at once.

Scale variation for partonic cross-sections
We start by considering scale variation in hard-scattering partonic cross-sections, first in the case of electroproduction (that is, for lepton-proton deep-inelastic scattering, DIS), and then for the case of hadroproduction (proton-proton or proton-antiproton collisions).

Electroproduction
Consider first an electroproduction process, such as DIS, with an associated structure function given by where ⊗ is the convolution in the momentum fraction x between the perturbative coefficient function C(x, α s ) and the PDF f (x, Q 2 ), and where the sum over parton flavors is left implicit. In Eq. (3.14) both α s and the PDF are evaluated at the physical scale of the process, so nothing depends on unphysical renormalization or factorization scales. We can determine the MHOU associated with the structure function F due to the truncation of the perturbative expansion of the coefficient function by fixing the factorization scheme and keeping fixed the scale at which the PDF is evaluated (usually referred to as factorization scale), but varying the renormalization scale used in the computation of the coefficient function itself. The scale-dependent structure function F will then be given by where µ 2 is the renormalization scale used in the computation of the coefficient function, or equivalently by F (t, κ) = C(α s (t + κ), κ) ⊗ f (t), (3.16) where as in Eq. (3.5) we are using the notation t = ln Q 2 /Λ 2 and κ = ln µ 2 /Q 2 . Note that in Eq. (3.15) the structure function is written as a function of µ 2 in the sense of the RG equation (3.4): the dependence on µ 2 cancels order by order, and the residual dependence can be used to estimate the MHOU.
In phenomenological applications, it is more customary to write F (Q 2 ), i.e. not to write the dependence of F on µ 2 , thereby emphasizing the renormalization scale independence of the physical observable, and just to indicate the scale dependence of the hard coefficient function C(α s (µ 2 ), µ 2 /Q 2 ). Here and in the sequel we will stick to the notation used in RG equations since we wish to emphasize that, as the scale is varied, we are dealing with a one-parameter family of theory predictions for the physical (RG invariant) observable, which all coincide to the accuracy at which they are calculated but which differ by higher order terms. Now, the RG invariance of physical cross-sections, and therefore of the structure function F , requires RG invariance of the coefficient function. This is because we are not varying the factorization scheme, so the PDF is independent of the renormalization scale µ. It follows that, as in Eq. (3.11), where C(t) = C(α s (t), 0) is the coefficient function evaluated at µ 2 = Q 2 , and thus κ = 0. Then, given the perturbative expansion of the coefficient function, its derivatives can be easily evaluated using the beta function expansion Eq.
and we find that the renormalization scale variation of the coefficient function is Again, note that in the case where µ 2 = Q 2 , and so κ = 0, one recovers the standard perturbative expansion Eq. (3.18). We can now find the scale-dependent structure function, Note that evaluating these expressions is numerically very straightforward, in that the scalevaried expression Eq. (3.21) has the same form, involving the same convolutions of c i with f , as the convolution with the PDFs to the given order at the central scale Eqs. (3.14) and (3.18), only with rescaled coefficients. This means there is no need to recompute NNLO corrections, K-factors, etc.: all that is necessary is to change the coefficients in the perturbative expansion at the central scale according to Eq. (3.21).

Hadronic processes
MHOUs in the partonic hard cross-sections of hadronic processes can be computed in the same way as for DIS. The only additional complication is that the physical observable -typically, a cross-section Σ -now depends on the convolution of two PDFs: where again the physical scale is t = ln(Q 2 /Λ 2 ), H(t) is the partonic hard-scattering crosssection, the PDFs are convoluted together into a parton luminosity L = f ⊗ f , and the sum over parton flavors is left implicit. Then, varying the renormalization scale κ = ln µ 2 /Q 2 in the hard cross-section, we have . (3.23) where, just as for electroproduction, for PDFs evaluated at a fixed scale T , the RG invariance tells us that H(α s (t), κ) is given in terms of H(t) by Eq. (3.9): If the partonic process begins at O(α n s ), with n = 0, 1, 2, . . ., then one can expand the hard cross-section as follows Then, as in the case of electroproduction, using Eq. (3.3) we can readily evaluate these derivatives, so that, putting everything together, the expression for the scale-varied partonic cross-section to be used to evaluate the scale-varied hadronic cross-section Σ, Eq. (3.23), will be given by This is rather more involved than Eq. (3.21), but shares the same advantages: the convolutions to be evaluated in Eq. (3.23) have the same structure as those in Eq. (3.22), so all that is required to vary the renormalization scale is to modify their coefficients.

Scale variation for PDF evolution
The renormalization scale variation described in the previous section can be used to estimate the MHOU in any partonic cross-section of an electroproduction or hadroproduction process evaluated to a fixed order in perturbation theory. However, when computing factorized observables of the form Eqs. (3.14, 3.22), an entirely independent source of MHOU arises from the truncation of the perturbative expansion of the splitting functions (or anomalous dimensions in Mellin space) that govern the PDF evolution equations. We now show that this MHOU can again be estimated by scale variation; we will also show that this scale variation can be performed in different ways: either at the level of the anomalous dimension; or at the level of the PDFs themselves; or finally at the level of the hard-scattering partonic coefficient functions, by exploiting the fact that physical results cannot depend on the scale at which the PDF is evaluated, and so one may trade the effect of scale variation between the PDF and the hard coefficient function.
Consider a PDF f (µ 2 ), where µ is the scale at which the PDF is evaluated. For simplicity, in this section all the argument is presented implicitly assuming a Mellin space formalism, so that convolutions are replaced by ordinary products. Also, indices labeling different PDFs are left implicit, so our argument applies directly to the nonsinglet case but can be straightforwardly generalized to the singlet evolution and to other flavor combinations.
The scale dependence of f (µ 2 ) is fixed by the evolution equation which applies also to the general singlet case assuming that a sum over parton flavors is left implicit. The anomalous dimension admits a perturbative expansion of the form Eq. (3.28) can be integrated to give where f 0 indicates the PDF at the initial scale µ 0 . Of course, the left-hand side of the equation is independent of this initial scale µ 0 , so the dependence can be left implicit also on the right-hand side, by not specifying the lower limit on the integral. In practice, if the PDF f 0 were extracted from data, any change in this scale would be entirely reabsorbed by the fitting procedure. We now observe the well-known fact that the anomalous dimension in Eq. (3.28) is a RG invariant quantity, and therefore the scale on which it depends is physical. However, this physical scale can in general be different from the renormalization scale used to determine the anomalous dimension itself (e.g. if it were determined through the renormalization of a twist-two operator). We let µ 2 = kQ 2 , where as in the general argument of Sect. 3.1, µ 2 is an arbitrary renormalization scale and Q 2 is a physical scale. We can make γ independent of the renormalization scale order by order in perturbation theory if we define its scale-varied counterpart in the same way as before with κ given by Eq. (3.5) and γ(t) = γ(α s (t), 0), so that given the perturbative expansion Eq. (3.29) one has that is independent of κ up to higher orders terms, order by order. Note that Eq. (3.32) has the same form as Eqs. (3.25-3.27) (with n = 1). We have shown that variation of the scale on which the anomalous dimension depends can be used, in the usual way, to generate higher order terms which estimate MHOUs in the expansion of the anomalous dimension itself. We now show how the same result can be obtained by scale variation at the PDF level. Inserting the result Eq. (3.32) in the solution of the evolution equations for the PDFs, Eq. (3.30), one finds that the evolution factor can be expressed as where in the first line we changed integration variable (ignoring any change in the lower limit of integration), in the second we used Eq. (3.31), and in the third we expanded the exponential perturbatively. We can now use this result to determine renormalization scale variation in the evolution directly from the scale dependence of the PDF, as in Ref. [15]. Defining a scale-varied PDF as that is, as the PDF obtained by varying the renormalization scale in the anomalous dimension, then f (t) = f (α s (t), 0), and using Eq. (3.33) we find that provided only that any variation of the initial scale µ 0 due to changes in κ has been reabsorbed into the initial PDF f 0 . Eq. (3.35) is the same as the result obtained from varying the scale µ 2 at which the PDF is evaluated about the physical scale Q 2 : just as in the derivation of Eq. (3.24), this gives Eq. (3.36)) is precisely the same. This is essentially because the PDF f (t) depends on only a single scale. Equation (3.35) indicates that the κ dependence can be factorized out of the PDF. We can use this property to factor it into the hard-scattering coefficient function. Consider for example electroproduction, whose factorized structure function is given by Eq. (3.14): (3.37) where in the second line we used the expansion Eq. (3.35), and the third line should be viewed as the definition of the scale-varied coefficient function C(t + κ, κ). Moreover, given the relation and then using the perturbative expansions of the beta function β, the anomalous dimension γ, and the coefficient function C, Eqs.
Note that this result for C(t, κ) is not the same as C(t + κ, κ), Eq. (3.20). The reason is that C(t + κ, κ) is obtained from the variation of the renormalization scale of the hard coefficient function, and can be used to estimate the MHOU in the perturbative expansion of the coefficient function, while C(t, κ) is obtained from the variation of the renormalization scale of the anomalous dimension, and can be used to estimate the MHOU in the perturbative evolution of the PDF. We have obtained the former from RG invariance of the hard cross-section, and the latter from RG invariance of the anomalous dimension. However, Eq. (3.37) can be equivalently viewed as expressing the fact that the physically observable structure function cannot depend on the scale at which the PDF is evaluated in the factorized expression, usually referred to as factorization scale: provided we absorb changes in the initial scale in the initial PDF, varying the scale of the anomalous dimension is identical to varying the scale of the PDF.
It is customary to refer to the scale variation which estimates MHOU in the coefficient function as renormalization scale variation: this corresponds to evaluating C(t + κ, κ) in Eq. (3.20). The scale variation which estimates MHOU in the anomalous dimension, and corresponds to C(t + κ, κ) in Eq. (3.39), is usually called instead factorization scale variation. This terminology is used for example by the Higgs Cross-Section working group [16] and more generally within the context of LHC physics; in the older DIS literature the same terminology has a somewhat different meaning, as we shall discuss in Sect. 3.4 below.
The previous discussion entails that in practice there are (at least) three different ways of estimating the MHOU associated to the PDF evolution in terms of the anomalous dimension at fixed order in perturbation theory by means of scale variations: (A) The renormalization scale of the anomalous dimension can be varied directly, using Eq. (3.32).
This approach works well provided that the initial PDF f 0 is refitted, but if it is held fixed care must be taken to absorb scale variations of the initial scale into the initial PDF. This method was used for DIS renormalization scale variations in many older papers, see e.g.
Refs. [17][18][19]). It has the disadvantage that it requires refitting the PDF as the scale is varied, which is cumbersome for most applications.
(B) The scale at which the PDF is evaluated can be varied, either analytically or numerically, using Eq. (3.36). This is in many ways the simplest method, as the initial PDF remains unchanged, while only the PDF is involved so the result is manifestly universal. Furthermore it is easily adapted to a variable flavor number scheme (VFNS), since the MHOUs in the PDFs with different numbers of active flavors can each be estimated separately. The numerical method was employed in [15], in the context of small x resummation. It has the disadvantage that if one wishes to estimate the impact on a given physical observable one needs to first generate the scale-varied PDF, before combining it with the hard coefficient function.
(C) The scale at which the PDF is evaluated is varied, but the compensating scale-dependent terms are factorized into the coefficient function using for example Eq. (3.39). This factorization scale variation is most commonly used when evaluating a new process using an established PDF set, e.g. in studies of LHC processes (as in Ref. [16]) since it has the advantage that it can be implemented directly using an external interpolated PDF set (such as provided by LHAPDF [20]). It has the conceptual disadvantage that the universality of the variation is obscured, since the scale dependent terms are mixed in the expansion of the coefficient function (this is particularly complicated in a VFNS, where the coefficient functions also depend on heavy quark masses), and the practical disadvantage that it requires the evaluation of new contributions to the coefficient function involving additional convolutions. Also, it can be impractical in situations where higher order corrections are difficult to evaluate precisely due to numerical issues.
Note that whereas these methods are in principle completely equivalent, they can differ by subleading terms according to the convention used to truncate the perturbation expansion. Indeed, in method (A) the expansion of the anomalous dimension is truncated, but higher order terms in the exponentiation may be retained depending on the form of the solution to the evolution equations adopted; in method (B) the exponential has been expanded (see Eq. (3.33)) so the result is the same as would be obtained with a linearized solution of the evolution equation; while in method (C) cross-terms between the expansion of linearized evolution and coefficient function expansion have also been dropped (compare Eq. (3.37) with Eq. (3.39)). However, since the differences always involve higher order terms, each method can be regarded as giving an equally valid estimate of the MHOU in the perturbative evolution: differences between methods should be viewed as the uncertainty on the MHOU itself when estimated by scale variation.

Double scale variations
We now discuss the combination of the two independent scale variations of Sects. 3.2 and 3.3, respectively estimating MHOUs in the hard cross-section and in perturbative evolution, thereby deriving master formulae for scale variation up to NNLO which will then be used in the subsequent sections. For completeness, we will also discuss different options for scale variation which have been considered in the literature, and clarify some terminological mismatches, especially between the older studies of DIS and the more recent applications to LHC processes.

Electroproduction
Consider first the more general factorization of an electroproduction cross-section, such as a DIS structure function: where here and in the following we adopt the (standard) terminology that we introduced in Sect. 3.3, and the viewpoint which corresponds to option (B) of that section: µ r denotes the renormalization scale, whose dependence is entirely contained in the hard coefficient function C (as in Eq. (3.15)), and whose variation estimates MHOUs in its expansion; while µ f denotes the factorization scale, whose dependence is entirely contained in the PDF (as in Eq. (3.34)), and whose variation estimates MHOUs in the expansion of the anomalous dimension (or equivalently the splitting functions). In the following, as in Sect. 3.3, we will omit the convolution as well as the parton indices. Note that again, as in Eq. (3.15), and then in Eqs. (3.23), (3.31), and (3.36), the dependence on the scales µ f and µ r should be understood in the sense of the RG equation: the structure function does not depend on them, but as the scales are varied there remains a subleading dependence which estimates the MHOU. As already mentioned, this notation, while standard in the context of RG equations, is somewhat unusual in the context of factorization, where instead it is more customary to omit the scale dependence of the physical observable.
Given that the structure function F (Q 2 , µ 2 f , µ 2 r ) factorizes into the hard coefficient function and the PDF, the factorization and renormalization scales µ f and µ r can be chosen completely independently; the scale dependence will also factorize. Explicitly, we define and then κ f = ln k f , κ r = ln k r . In terms of these variables, the factorized structure function will be given by where, as in Sects. 3.2 and 3.3, the scale-varied PDF and coefficient functions are and C(t r ) ≡ C(t r , 0) stand for the PDF and the coefficient function evaluated at the central scale, µ 2 f = Q 2 and µ 2 r = Q 2 , respectively. Recalling that ∂ ∂t ∼ O(α s ), the structure function is therefore given by From this expression, it follows that scale variations with respect to κ f can be determined by taking derivatives with respect to t f while holding t r fixed and vice-versa, so one has In other words, we can think of the two variations as being generated by κ f ∂ ∂t f and κ r ∂ ∂tr respectively. We can equivalently treat the factorization scale variation using method (C) of the previous subsection, and thus factorize both scale variations into the coefficient function, as done in Eq. (3.39). In the case of electroproduction, inserting the expansions of Eq. (3.18) with now all dependence on κ r and κ f encoded into a redefined coefficient function: up to terms of O(α 3 s (t r )), given that one can change the scale that enters the coupling using Note that in the expression for C the coupling constant is always evaluated at the renormalization scale µ r , and that for κ r = κ f = 0 one gets back the original perturbative expansion Eq. (3.18). However, especially in the context of PDF determinations, as opposed to the situation in which a pre-computed PDF set is being used, it is rather more convenient to use either of methods (A) or (B) from Sect. 3.3 when estimating the MHOU in the scale dependence of the PDF, since this can be done without reference to any particular process. We can then determine the universal µ f variation by varying the scale in the PDF evolution, as done for instance in Eq. (3.32) or Eq. (3.36), while instead the process-dependent µ r variation is estimated by varying Scale MHOU 'Traditional' name [17,18,[21][22][23] 'Modern' name [24], [PDG] µ r in hard xsec -renormalization scale µ f in PDF evolution renormalization scale factorization scale µ in physical xsec factorization scale scale of the process Table 3.1. Nomenclatures for the different scale variations used in some of the earlier papers (mainly in the context of DIS), and in more recent work (mainly in the context of hadronic processes), as discussed in detail in the text. The 'modern' terminology is adopted throughout this paper.
the renormalization scale in the coefficient function, as done in Eq. (3.20), or Eq. (3.27) in the case of hadronic processes.
Note that since all scale-varied terms ultimately derive from the scale dependence of the universal QCD coupling α s (µ 2 ), it is reasonable to treat the independent scale variations of µ f and µ r symmetrically, e.g. by varying in the range |κ f |, |κ r | ≤ ln 4. Indeed, this symmetry is an advantage of the method: we use the same variation for estimating all MHOUs. Since µ f and µ r can each be varied independently, a simple option is to perform the double scale variations by considering the five scale choices (κ f , κ r ) = (0, 0), (± ln 4, 0), (0, ± ln 4). We will refer to this as 5-point scale variation; alternative schemes will be considered in the next section.
Note finally that if we set the renormalization and factorization scales in Eq. (3.40) to be equal to each other, µ 2 f = µ 2 r =μ 2 , we have the factorization In most of the earlier papers, mainly concerned with DIS structure functions, e.g. [17,18,[21][22][23], the scaleμ 2 was termed the factorization scale: this originates in the earliest papers on the OPE. However, in our current terminology it corresponds to both renormalization and factorization scales taken equal to each other. Likewise, in the earlier papers what here we call the factorization scale µ f was referred to as the renormalization scale. Here, to avoid confusion, we will callμ 2 in Eq. (3.49) the scale of the process. For clarity the different nomenclatures for the various scales used in the earlier papers, and in more modern work (and in this paper), are summarized in Table 3.1.
Consider now the effect on the structure function of varying the scale of the process. As before, we defineκ = lnμ 2 /Q 2 and write F (t +κ,κ) = C(α s (t +κ),κ) f (t +κ) .
where the first line is the same as Eq. (5.8) in Ref. [15] while in the second line we used Eq. (3.36) for scale variation of the PDF. Then, expanding in the usual way, we find that Clearly, variations ofμ 2 are not independent of the variations of µ 2 f or µ 2 r : rather they are generated byκ ( ∂ ∂t f + ∂ ∂tr ), so they correspond to directions along the diagonal in the space of κ f and κ r , see Fig. 3.1. In the earlier literature, MHOUs were estimated by combining renormalization scale variation with this latter variation, namely by varyingμ 2 and µ 2 f : see e.g. Refs. [17,18]. This however has the disadvantage of generating large scale ratios: performing variations ofμ 2 and µ 2 f sequentially we can obtain κ f = 2 ln 4, becausẽ A way of avoiding these large ratios was constructed in Ref. [24]: first do the scale variation of Eq. (3.52), but then substitute where care must be taken to use the correct argument of α s in each term. Indeed, this procedure then agrees with Eq. (3.46) given that

Hadronic processes
Consider now the case of hadronic process as in Eq. (3.22). For these processes, the factorization has the general form The hard coefficient function will have the same expansion as Eq. (3.27). Just as for electroproduction, it is possible to factorize variations of κ f into the hard coefficient functions: then where (using as above Mellin space, to avoid the convolutions), one finds However these expressions are even more cumbersome than in the case of electroproduction, thereby demonstrating the greater clarity of methods (A) or (B) in determining the dependence on the scale µ f . By adopting one of these two methods, we can determine the MHOU in a hadronic process through independent variations of the factorization scale µ f and the renormalization scale µ r in just the same way as we estimated the MHOU in the deep inelastic structure function in the previous section.

Multiple scale variations
We finally consider simultaneous scale variation in a pair of processes: for instance the electroproduction process of Sect. 3.4.1 and a hadronic process as in Sect. 3.4.2. Clearly, the PDF is universal, but the coefficient functions are process-dependent. It follows that scale variation κ f of the PDF will be totally correlated between these two processes, while the scale variations of κ r in the two coefficient functions will be totally independent. Now, considering both processes together, we have three independent scales to vary, µ f , µ r 1 , and µ r 2 , where µ r 1 is the renormalization scale for the deep inelastic process, and µ r 2 is the renormalization scale for the hadronic process. The relation of the factorization scale µ f to the physical scale of each process (whatever that is) is the same for both processes, since the PDFs are universal. Thus if we vary all scales independently by a factor two about their central value we end up with seven scale choices. We can think of the additional renormalization scale as an extra dimension in the space of possible scale variations.
By trivial generalization for p independent processes π a , a = 1, . . . , p, we will have p + 1 independent scale parameters µ f , µ r 1 , . . . µ rp corresponding to a total of 3+2p scale variations. Writing κ ra = ln µ 2 ra /Q 2 with a = 1, . . . , p, the traditional range of variation of κ f , κ r 1 , ..., κ rp would then be defined by Clearly all prescriptions constructed in this way will be symmetrical in the different scales.
We now see why, for the determination of MHOUs in PDFs, it is advantageous to work with the independent scales κ f , κ ra , a = 1, . . . , p rather than with the traditional factorization scalesκ used in the older treatments of scale variation: while the scale κ f used to estimate MHOUs in the PDF evolution is universal, the scales κ ra used to estimate MHOUs in the hard cross-sections are instead process-dependent. We can therefore only define process scales κ by either introducing artificial correlations between the scales of the hard cross-sections for different processes (which would result in underestimated MHOU in the hard cross-sections), or else by sacrificing universality of the PDFs, with uncorrelated evolution uncertainties for different processes (which would result in overestimated MHOU from PDF evolution). Neither of these options is very satisfactory, though we consider the latter briefly in Sect. 4.3 below, where it gives rise to asymmetric scale-variation prescriptions.

Scale variation prescriptions for the theory covariance matrix
Having set out a general formalism for the inclusion of MHOUs through a theory covariance matrix, based on assuming a distribution of shifts between a theory calculation at finite perturbative order and the true all-order value (Sect. 2), and having discussed how scale variation can be used to produce estimates for such shifts (Sect. 3), we now provide an explicit prescription for the construction of a theory covariance matrix from scale variation. Because of the intrinsic arbitrariness involved in the procedure, we actually propose several alternative prescriptions, which will be then validated in the next section by studying cases in which the next perturbative order is in fact known. We will also assess the impact at the PDF fit level of varying the prescription used for constructing the theory covariance matrix.
We consider a situation in which we have p different types of processes π a = {i a }, where i a labels the data points belonging to the a-th process and a = 1, . . . , p. Each of the p processes is characterized by a factorization scale µ f (associated with the PDFs) and a renormalization scale µ ra (associated with the hard coefficient functions), to be understood in the sense of the 'modern' terminology in Table 3.1. We will perform scale variation of both scales following Sect. 3.4, by taking them as independent, as discussed in that section. When considering a pair of different processes, as explained in Sect. 3.5, we assume the variations of µ ra to be uncorrelated among them, while those of µ f are taken to be fully correlated.
The theory covariance matrix is then constructed by averaging outer products of the shifts with respect to the central scales, given for the a-th process as over points in the space of scales. Here, as before, we have defined κ ra = ln k ra = ln µ 2 ra /Q 2 and κ f = ln k f = ln µ 2 f /Q 2 . In Eq. (4.1), T ia (κ f , κ ra ) indicates the theoretical prediction evaluated at these scales with T ia (0, 0) being the central theory prediction, and the index i a running over all data points corresponding to process a.
We assume here that all scale variations correspond to the same range for some w (typically w = ln 4, as in Eq. (3.5)). In practice, in each prescription the three points κ = 0, ±w are sampled for each scale. The theory covariance matrix is then where i a ∈ π a and i b ∈ π b indicate two data points, possibly corresponding to different processes π a and π b , m labels the prescription, V m is the set of scale points to be summed over in the given prescription, and n m is a normalization factor, both to be determined. Different prescriptions to construct the theory covariance matrix S ij vary in the set of combination of scales which are summed over in Eq. (4.2), as we will discuss below. Because Eq. (4.2) is a sum of outer products of shifts, the theory covariance matrix S ij is positive semi-definite by construction. To demonstrate this, consider a real vector e i : then it follows that Note however that because the number of elements of V m is finite, S ij will generally be singular, since for any vector z j which is orthogonal to the space S spanned by the set of vectors . This property will be important when we come to validate the covariance matrix in the following section. We now consider various prescriptions. Because S ij will in general span the full set of data points, we must consider both the case in which points i, j in Eq. (4.2) belong to the same process ("single process prescription") and the case in which they belong to two different processes ("multiple process prescription"). We first discuss the case of symmetric scale variation, in which the two scales are varied independently, and then the case in which the two scales are varied in a correlated way, the latter scenario being equivalent to varying the "scale of the process" (in the sense of Table 3.1), thereby leading to asymmetric prescriptions as already mentioned in Sect. 3.5.

Symmetric prescriptions for individual processes
We consider first the prescriptions for when there is just a single process, that is, p = 1. In this case, there are at most two independent scales, the factorization and renormalization scales κ f and κ r . The theory covariance matrix is then constructed as where again v m represents the set of points to be summed over in the given prescription, limited here to points in the space of the two scales κ f and κ r , and n m is the normalization factor. Let s be the number of independent scales being varied (so s = 1 or s = 2), and m be the number of points in the variation (so m is the number of elements of v m ): a given scheme is then usually described as an '(m + 1)-point scheme'. Note that we do not include in v m trivial points for which ∆ i vanishes (which in practice means the single point κ f = κ r = 0), since these do not contribute to the sum. The normalization factor n m in Eq. (4.4) is determined by averaging over the number of points associated with the variation of each scale, and adding the contributions from variation of independent scales. This means that n m = s/m. (4.5) We consider three different prescriptions, represented schematically in Fig. 4.1.
where we have adopted the abbreviated notation ∆ +0 Note that the variations of κ f and κ r are added in quadrature since they are independent: this is why it is important to make sure that the variations are indeed independent, as is the case for renormalization and factorization scales, as discussed in Sect. 3.4.
As before, the two scales are varied in a manifestly independent way.
• 9-point: here we vary κ f and κ r completely independently, giving the union of the 5-point and 5-point prescriptions: v 8 = v 4 ⊕ v 4 . Now s = 2, m = 8 and n 8 = 1/4, and the theory covariance matrix is given by (4.8)

Symmetric prescriptions for multiple processes
Now we consider multiple processes, i.e. p > 1, with scale variations either uncorrelated or partially correlated. In Eq. (4.2), the set V m now involves possible variations of the p + 1 scales κ f , κ r 1 , . . . κ rp , where κ ra indicates the renormalization scale for process a = 1, . . . , p. This implies that now V m is a much bigger set than v m . However any given element of S ij in Eq. (4.2) can involve at most two different processes, π a and π b , so to compute this element we can simply ignore the other processes. Consequently, it is sufficient to consider p = 2, since generalization to p > 2 will then be straightforward. For a given pair of processes, say π 1 and π 2 , the covariance matrix has diagonal elements S i 1 j 1 , S i 2 j 2 and off-diagonals S i 1 j 2 = S j 2 i 1 , where as above the extra subscript indicates the process: i 1 , j 1 ∈ π 1 , i 2 , j 2 ∈ π 2 . Thus one can write Consider first the diagonal blocks S i 1 j 1 and S i 2 j 2 . Adding process π 2 cannot change the theoretical uncertainty in process π 1 , although the two uncertainties may be correlated. Consequently S i 1 j 1 and S i 2 j 2 are each given by the same expression as in the single-process case, Eq. (4.4), so we must have This can only be true if the set of points v m in Eq. (4.4) is a subset of the set V m in Eq. (4.2): so when for example computing S i 1 j 1 , ∆ i 1 and ∆ j 1 depend only on the scales κ f and κ r 1 associated with π 1 , and are independent of the scale κ r 2 associated with π 2 . Consequently, when we sum over V m in Eq.(4.2), performing the trivial sum over κ r 2 must reduce V m to its subset v m , up to a degeneracy factor d m which counts the number of copies of elements of v m contained in V m . This fixes the overall normalization factor N m : It remains to determine V m for a given (m + 1)-point prescription. It is easy to see that in each case we obtain a unique result, which is in a sense a direct product of p copies of v m , taking into account the common scale κ f . The points in the (κ f , κ r 1 , κ r 2 ) space that are being sampled in each prescription when there are two processes are shown in Fig. 4.2 (corresponding to the single-process prescriptions shown in Fig. 4.1).
To show how this works, we consider each prescription in turn, starting with the 5-point prescription which is easier to construct than 5-point.
The set V 4 thus has eight points in total. For each element of v 4 , there are two elements of V 4 , so d 4 = 2, and since n 4 = 1/2, N 4 = 1/4. The result for the off-diagonal blocks of the theory covariance matrix in this prescription is thus given by From this expression it is clear that while the scale κ f is varied coherently between the two processes, the scales κ r 1 and κ r 2 are varied incoherently, as required.
It is straightforward to generalize this procedure to three processes: then V 4 = {(±; ±, ±, ±)}, so d 4 = 4, and since n 4 = 1/2, N 4 = 1/8. However Eq. (4.12) remains unchanged, in the sense that it can be used to evaluate all three off-diagonal blocks s i 1 j 2 , s i 2 j 3 , s i 3 j 1 : this must always be the case, since each term in the sum Eq. (4.2) involves at most three scales. For p processes, it is easy to see that the number of distinct elements of V 4 is 2 p+1 .
• 5-point: again for two processes we have three scales, but now one varies each holding the other fixed to its central value: v 4 = {(±; 0), (0; ±)}, so V 4 = {2(±; 0, 0), (0; ±, ±)}, where the two in front of the first element indicates that there are two copies of it, so V 4 has eight elements in total. Then for each element of v 4 , there are precisely two elements of V 4 , so d 4 = 2, and since n 4 = 1/2, N 4 = 1/4. The result for the off-diagonal entries of the theory covariance matrix in this prescription is thus given by Note that also in this expression the variations of κ f are manifestly correlated between the two processes, whereas the variations of κ r 1 , κ r 2 are not.
When there are three processes, it is easy to see that V 4 = {4(±; 0, 0, 0), (0; ±, ±, ±)}, i.e. it has 16 elements, though only 10 are distinct: the other six are simply copies, necessary to obtain the correct coefficients in Eq. (4.6) and Eq. (4.13). There are now four elements of V 4 for each element of v 4 , so now d 4 = 4, and N 4 = 1/8. Again Eq. (4.13) can be used to calculate all three off-diagonal blocks. For p processes, it is easy to see that V 4 has 2 p+1 elements, but that only 2 + 2 p of these are actually distinct.
• 9-point: here we vary the three scales completely independently: v 8 = v 4 ⊕ v 4 . Constructing V 8 is now somewhat more involved, since while terms with κ f = 0 have degeneracy 2, terms where κ f = 0 is varied have degeneracy 3, so we need three copies of the former and two of the latter to take the overall degeneracy to 6. The solution is thus V 8 = {3(0; ±, ±), 2(±; ± , ± )}, where ± means either +, − or 0. Thus V 8 has 48 elements, of which only 22 are actually distinct. Since the first term of V 8 has a degeneracy of 2, while the last has a degeneracy of 3, the overall degeneracy is d 8 = 6, and since n 8 = 1/4, N 8 = 1/24. It follows that the off-diagonal blocks of the theory covariance matrix in this prescription are (4.14) The pattern of correlations in the variation of the three scales in this expression should be clear from the way it is written.

Asymmetric prescriptions
It is sometimes argued that since only the cross-section is actually physical, a single process has only one scale, namely the scale of the process in the sense of Table 3.1 and Eq. (3.49). Therefore, in order to estimate the MHOUs, only this single scale should be varied. Alternatively, one may consider the variation of the scale of the process on top of the variation of the renormalization and factorization scales considered previously. The logic of the first alternative (variation of the scale of the process only) is that after all there is only one scale in the factorised expressions, for example those given by the Wilson expansion applied to DIS. The logic of the second alternative (variation of the scale of the process, the renormalization scale, and the factorization scale) is that each of these estimates a different source of MHOU: varying the scale of the process generates terms related to missing higher order contributions to the hard cross-section which are proportional to collinear logarithms, the renormalization scales to missing higher order contributions to the hard cross-section which are proportional to the beta function, and finally the factorization scale to missing higher order contributions to the anomalous dimension. On the other hand, both alternatives might be criticized on the grounds that they suppress correlations between the uncertainties in PDF evolution across different processes, and thus seriously overestimate uncertainties (the first worse than the second). Ultimately, however, they can be considered as possible options to be tested in a situation in which the true answer is known. Such a validation will be performed in the next section.
We now consider these two options in turn, both for the single-process case, which is represented schematically in Fig. 4.3, and for multiple-processes.
• 3-point: For a single process, we set κ f = κ r and only vary the single resulting scale.
Then v 2 = {±} in an obvious notation, and s = 1, m = 2 and n 2 = 1/2, i.e. we simply average over the two nontrivial values of the single scale. For a single process we thus find that S Likewise, for two different processes π 1 and π 2 , we set κ f = κ r 1 for π 1 , set κ f = κ r 2 for π 2 , and then vary κ r 1 and κ r 2 independently. This procedure necessarily ignores the correlations in the variation of κ f between π 1 and π 2 .
where the ordered pairs denote the two independent scales (κ r 1 , κ r 2 ). Clearly, for each element of v 2 there are two elements of V 2 , so d 2 = 2, Eq. (4.11) gives N 2 = 1/4, and the off-diagonal entries of the theory covariance matrix are It can be seen from this factorised expression that the variations for each process are entirely uncorrelated. Generalization to more than two processes is straightforward: for p processes V 2 has 2 p elements, all of them distinct.
Because in this prescription we ignore correlations in the PDF evolution uncertainties, we expect this prescription to significantly overestimate the MHOUs. Note that a fully correlated 3-point prescription in which we set κ f = κ r 1 = κ r 2 would instead significantly underestimate the MHOUs, which is why we do not consider it. We then have v 6 = {(±; 0), (0; ±), (±; ±)}, s = 2 (note we still have only two independent scales), m = 6 and n 6 = 1/3, and thus for a single process When there is more than one process, we have to remember that variations of the scale of the process are uncorrelated between different processes, so they can decorrelate the allowed variations of µ f . This means the allowed variations for two processes are in a space of four dimensions rather than three: call these say (κ f 1 , κ r 1 ; κ f 2 , κ r 2 ). The extension of v 6 is then This prescription gives smaller correlations than the symmetric prescriptions since the variation of the two factorisation scales µ f 1 and µ f 2 is now entirely uncorrelated.
Generalization to p processes is again straightforward: since the variations of the scale of the process are in effect independent of the separate variations of µ f and µ r , V 6 = V 4 ⊕ V 2 for any number of processes, so there are in total 2 + 2 p+1 distinct elements.

Validation of the theory covariance matrix
In this section we determine the theory covariance matrix S ij at NLO using the different prescriptions formulated in Sect. 4, we introduce a method for the validation of the theory covariance matrix when the next-order result is known, and we use it to validate the theory covariance matrices that we computed against the known NNLO results. This validation is performed on a global dataset based on the same processes as those used in the NNPDF3.1 PDF determination. This dataset will then be used to produce fits incorporating MHOUs using the theory covariance matrix (Sect. 6), and also, for comparison, fits using scale-varied theories (Appendix B).
This input dataset differs in many small respects from that used in the NNPDF3.1 baseline. Firstly, the fixed-target Drell-Yan (DY) cross-sections [61][62][63][64] are excluded from the fit since APFEL currently does not allow the calculation of scale-varied fixed-target DY cross-sections. Secondly, the value of the lower kinematic cut has been increased from Q 2 min = 2.69 GeV 2 to 13.96 GeV 2 in order to ensure the validity of the perturbative QCD expansion when scales are varied downwards. Thirdly, we include only jet data for which the exact NNLO calculations are available, as discussed in [65], namely the ATLAS and CMS inclusive jet cross-sections at 7 TeV from the 2011 dataset. Finally, we exclude the bottom structure function F b 2 measurements, for which the implementation of scale variations is complicated by the crossing of the heavy quark thresholds.
Also, in original NNPDF3.1 determination somewhat different cuts were applied to data at NLO and NNLO (essentially in order to remove from the NLO fit data which are subject to  The categorization of the input datasets into different processes adopted in this work. Each dataset is assigned to one of five categories: neutral-current DIS (DIS NC), charged-current DIS (DIS CC), Drell-Yan (DY), jet production (JET) and top quark pair production (TOP). For each dataset, we also provide the corresponding publication reference and the number of data points after cuts. We also show the total number of points in each of the five categories of process.
large NNLO corrections). Here we wish to have exactly the same dataset at NLO and NNLO, in order to make sure that the differences between NLO and NNLO are due purely to differences in the theoretical calculations, and not in the input datasets. Therefore, the baseline kinematic cuts of NNPDF3.1 have been slightly modified so that the data points excluded at NLO are also excluded at NNLO and vice-versa. Taking into account all these modifications, in total the input dataset includes N dat = 2819 datapoints.
Because the prescriptions in Sect. 4 assume that renormalization scale variation is fully correlated within a given process, but uncorrelated between different processes, it is necessary to define what it is meant by "process", i.e., to classify datasets into processes. This requires an educated guess as to which theory computations share the same higher order corrections. For example, it is necessary to decide whether charged-current (CC) and neutral-current (NC) DIS are the same process or not, and whether the transverse momentum and rapidity distributions for one observable (such as, say, Z production) should be grouped together. Our categorization is summarized in Table 5.1. Specifically, we group the data into five distinct categories: DIS NC, DIS CC, Drell-Yan (DY), inclusive jet production (JET), and top quark pair production (TOP). More refined categorizations will be considered elsewhere, but we consider this to be sufficient for a first study.
All calculations are performed using the same settings as in [5]: PDF evolution and the calculation of DIS structure functions up to NNLO are carried out using the APFEL [66] program; heavy quark mass effects are included by means of the FONLL general-mass variable flavor number scheme [67][68][69]; the charm PDF is fitted alongside the light quark PDFs [70], rather than being generated from perturbative evolution of light partons; the charm quark pole mass is taken to be m c = 1.51 GeV, and the strong coupling constant is fixed to be α s (m Z ) = 0.118, consistent with the latest PDG average [71].
In order to evaluate the theory covariance matrix S ij , it is necessary to be able to evaluate both DIS structure functions and hadronic cross-sections for a range of values of the factorization and renormalization scales, i.e., in the notation of Eq. (3.41), for κ f = 0 and κ r = 0. In this case, the entries of the NLO theory covariance matrix have been constructed by means of the ReportEngine software [72] taking the scale-varied NLO theory cross-sections T i (k f , k r ) as input. These are provided by APFEL [66] for the DIS structure functions and by APFELgrid [73] combined with APPLgrid [74] for the hadronic cross-sections. The evaluation of these scalevaried cross-sections has been validated by means of independent programs, in particular with HOPPET [75] and OpenQCDrad [76] for the DIS structure functions, and with the built-in scale variation functionalities of APPLgrid. All these NLO cross-sections are evaluated using the central NLO PDF obtained by performing a NLO fit to the same dataset, for consistency.

The theory covariance matrices at NLO
We now present results for the theory covariance matrices, constructed using NLO calculations and evaluated according to the prescriptions introduced in Sect. 4, and discuss some of their qualitative features.
In Fig. 5.1 we show the diagonal elements of the experimental and theory covariance matrices, or more specifically the experimental uncertainty normalised to the data, (C ii ) 1/2 /D i , and the MHOU normalised to the data, (S ii ) 1/2 /D i , for i = 1, . . . , N rmdat , where D i is the i-th datapoint. The datapoints are grouped by process and, within a process by experiment, following Table 5.1. The theory covariance matrix S ij is computed using the 9-point prescription (the one with the largest number of independent variations; recall Sect. 4). Broadly speaking, the estimated NLO MHOU is roughly comparable to experimental uncertainties, as expected. However for some datapoints the experimental uncertainty is dominant (and thus the theory uncertainty will have only a small effect), while for others the MHOU is dominant. These latter datapoints will have relatively little weight in a PDF fit with MHOU included, since the large theory uncertainty makes them less useful for the extraction of PDFs. Some datasets have datapoints in both these categories: the HERA NC DIS are particularly striking, since at high Q 2 (where statistics are low) the dominant uncertainty is experimental, while at low Q 2 (and thus small x, where perturbation theory is less reliable) the dominant uncertainty is due to MHO.  Comparison of the experimental C ij (left) and the theoretical S ij (right) covariance matrices, the latter evaluated using the 9-point prescription. All entries are normalized to the central experimental value. The data are grouped by process and, within a process, by experiment, following In Fig. 5.2 we compare the complete experimental covariance matrix C ij to the theory covariance matrix S ij , again computed using the 9-point prescription. Both covariance matrices are displayed as heat maps, with each entry expressed as a fraction with respect to the corresponding experimental central value; i.e. C ij /D i D j and S ij /D i D j . It is clear from Fig. 5.2 that the theory covariance matrix has, as expected, a richer structure of correlations than its experimental counterpart: for example data from the same process (such as DIS) are correlated even when the corresponding experimental measurements are completely uncorrelated (such as HERA and fixed target). Furthermore, correlation of the factorization scale variation between disparate processes, such as DIS processes and hadronic processes, results in nonzero entries in the theory covariance matrix even in these regions.
The precise structure of these theory-induced correlations depends on the choice of prescrip- tion adopted. To illustrate this, Fig. 5.3 compares the experimental correlation matrix, given by with the corresponding combined experimental and theoretical correlation matrices, defined by for all the prescriptions defined in Sect. 4. Specifically, from top left to bottom right we have the experimental correlations ρ (C) followed by ρ (C+S) for the symmetric 5,5, 9 point prescriptions, and the asymmetric 3 and 7 point prescriptions. As in Fig. 5.2, the cross-sections are grouped by process type and, within that, by experiment. Some qualitative features of the theory-induced correlations are apparent. There are clearly large positive correlations within individual experiments along the diagonal blocks, this being particularly evident for DIS NC and DY data, which have large numbers of points which are relatively close kinematically. Off the diagonal, but still within the same process, there are large correlations between experiments for the DY, jets and top data points, and large anticorrelations for the DIS NC data points (these mostly between fixed target and HERA). Correlations and anticorrelations between different processes are also often present but are somewhat weaker: for example the DY data points (from LHC) are quite correlated with the HERA NC DIS data points, but anticorrelated with fixed target NC DIS data points.
When comparing different prescriptions, it is clear that the 3-point prescription leads to especially small correlations between processes, which is expected because with this prescription the factorization scale and renormalization scale variations are uncorrelated between processes. The correlations between processes are also weaker in 7-point than in 5-point, due to the fact that (as discussed in Sect. 4.3) the correlated variation of the factorization scale is combined with the uncorrelated variations of the scale of the process for the pair of processes involved. It is worth noting, however, that the pattern of correlations is similar for all the symmetric prescriptions.
In order to decide which prescriptions are best, and more generally whether or not they produce a reliable estimate of MHOUs, we must proceed to their validation.

Construction of validation tests
We wish to construct a validation test for the NLO theory covariance matrix, by comparing it to the known NNLO theoretical result. We do so by viewing the set of experimental data as a vector with components D i , where i = 1, . . . , N dat . The vector lives in an N dat -dimensional "data" space D, on which the theory covariance matrix, S ij acts as a linear operator. S ij is symmetric and positive semi-definite, meaning that all of its non-zero eigenvalues are positive. In a PDF fit, S ij always enters as an additive contribution to the experimental covariance matrix C ij , and thus their sum is always invertible, owing to the non-zero statistical uncertainties on the data, which bound the eigenvalues from below.
S ij defines ellipsoids E corresponding to a given confidence level in the data space, centered on the NLO theoretical prediction, T NLO i ≡ T NLO i (0, 0) evaluated using the central scale choice. In the context of MHOUs, we can take T NLO i to be the predictions at NLO, with the one-sigma ellipsoid E 1σ estimating a 68% confidence level for the MHO correction.
We can validate whether S ij correctly predicts both the size and the correlation pattern of the MHO terms by testing the extent to which the shift vector δ i ∼ T NNLO i − T NLO i , i.e. the difference between the NNLO and NLO predictions for T i , falls within a given ellipsoid E. Note that the dimensionality of the subspace spanned by the ellipsoid E is much smaller than that of the data space D: in a global fit the data space has dimension O(3000) ( Table 5.1), while even the most complex prescriptions in Sect. 4 have at O(30) independent variations, not all of which correspond to independent eigenvectors, as we will see shortly. So E actually lives in a subspace S of dimension N sub of the full space D: E ∈ S ∈ D. For a single process we expect N sub to be of order a dozen or so at most. In fact, even for a single process (see Table 5.1) we always have N sub N dat . Hence, a nontrivial validation of the theory covariance matrix is if the component of the shift vector δ i lying outside E is small, i.e. if the angle between δ i and the projection of δ i onto S is small.
Furthermore we expect the component of δ i along each axis of the ellipsoid E to be of the same order as the typical one-sigma variation. The physical interpretation of such a successful validation is that the eigenvectors of S ij correctly estimate the independent directions of uncertainty in theory space, with the size of the shift estimated by the corresponding eigenvalue. The null subspace of E, i.e. the directions of vanishing eigenvalues, would then correspond to directions in D for which the theory uncertainty is so small that it cannot be reliably estimated and so can be safely neglected. These are highly nontrivial tests, given the huge discrepancy between the dimensionality of the space D, and the dimensionality of S.
Let us now see how this works in detail. First, we need to identify the spaces E and S. To do this, we normalize the NLO theory covariance matrix S ij to the central NLO theory prediction T i , so that all its elements are dimensionless, allowing a meaningful comparison: we define Likewise, we define a normalized shift vector with components The NNLO prediction T NNLO i is computed using NNLO matrix elements and parton evolution, but with the same NLO PDF set used in the computation of T NLO i and S ij . In this way the shift δ i only takes account of the perturbative effects due to NNLO corrections, which are estimated by S ij , and not the additional effect of refitting.
We then diagonalize S ij , to give eigenvectors, e α i (chosen to be orthonormal, i.e. i e α i e β i = δ αβ ), with corresponding non-zero eigenvalues, λ α = (s α ) 2 ; α = 1, . . . , N sub . All these eigenvalues are real and positive, see Eq. (4.3). The eigenvectors span the subspace S. There are also N dat − N sub zero eigenvalues. These are degenerate, and their eigenvectors span the space D/S. Because of the zero eigenvalues, the diagonalization of S ij is in practice rather difficult: the procedure we use to identify the subspace S and its dimensionality N sub , and then diagonalize the projection of S ij into S, is described in some detail in Appendix A.
Next we project the shift vector δ i onto the eigenvectors, These projections δ α should be of the same order as the size of the ellipse in this direction, i.e. the s α : more specifically in an ideal world 68% of the δ α /s α would be less than one. This is all the meaningful statistical information that is contained in S ij . Finally, we can now resolve the shift vector δ i into its component lying within S between the shift δ i and its component in the subspace S, δ S i is reasonably small. As mentioned above, for a global PDF fit the typical situation that one encounters is that N dat N sub (in the  30)). So this validation test is highly nontrivial, since finding the relatively small subspace S in the huge space D is rather hard: for a random symmetric matrix S ij , components of δ i in D/S will generally be as large as those in S, and thus |δ S i | |δ i |, and θ will be very close to a right angle.

Results of validation tests
We now explicitly perform the validation tests discussed in Sect. 5.3, with the NLO theory covariance matrices S ij (normalised to NLO theory, as in Eq. (5.3)) constructed from scale variations for all data points in Table 5.1, and for each prescription of Sect. 4. These are then validated using the shift vector δ i constructed as the difference of NNLO and NLO theory, normalised to the latter, as in Eq. (5.4). A very first comparison can be done at the level of diagonal elements σ i , where S ii = (σ i ) 2 , by comparing them directly to the normalised shifts δ i Eq. (5.4). This already tells us whether the overall size of the scale variation is of the right order of magnitude: one expects the shifts δ i and the uncertainties σ i to be of roughly the same order.
These comparisons are shown in Figs. 5.5-5.6. In each plot the data points are presented sequentially on the horizontal axis, organised by process as in Table 5.1. The shape of the estimated MHOU imitates the shape of the true shift rather faithfully, for each of the five processes, and for each prescription. This shows that the theory covariance matrix gives a qualitatively reliable estiamte of the true MHOU, in the sense that the estimate is small when the MHOU is small, large when it is large, and moreover correctly incorporates the correlations in the HOU between nearby kinematic regions, responsible for the shape. There is little discernible difference between all the various point prescriptions, except is the overall size of the estimates: for example comparing the symmetric prescriptions, we see that 5-point is the least conservative and 5-point is the most conservative, whilst 9-point lies somewhere between the two. This is particularly noticeable in the DY data.
It is clear from these plots that the overall size of the estimated uncertainties, given by varying renormalization and factorization scales by a factor of two in either direction (i.e. as in Eq. (4) with w = ln 4) is roughly correct: if the range were significantly smaller, some of the uncertainties would have been underestimated, whereas if it were larger all uncertainties would have been overestimated. This said, for some data points the MHOU at NLO is clearly overesimated by scale variation: this is particularly true of the small-x NC DIS data from HERA in the centre of the plot.  Overall, these plots demonstrate that since there are only small differences in the diagonal elements of each prescription, it is in the detailed correlations between data points where the differences in performance between the prescriptions lies. To expose this, we need to diagonalise the theory covariance matrix (using the procedure in Appendix A), so that we can see in detail which components of the shift vector are correctly estimated, and which are missed, as explained in Sect. 5 Table 5.3. Same as Table 5.2 for each process of Table 5.1. The number of data points in each process is given directly below the name of the process.
As discussed in Sect. 5.3, once we have the eigenvectors corresponding to the nonzero eigenvalues of the theory covariance matrix, the first validation test consists of checking how much of the shift vector δ i lies within the space spanned by these eigenvectors, S, and has thus been included in the estimation of MHOU provided by the theory covariance matrix. The results of this test for the global dataset, described in Sect. 5.1, are shown in Table 5.2: for each prescription we give the dimension N sub of S, i.e. the number of linearly independent eigenvectors e α i of S ij , and then the value of the angle θ, defined in Eq. (5.7), between the shift δ i and its component δ S i , defined in Eq. (5.6), lying within the subspace S spanned by e α i . We note that all the angles are reasonably small, despite the fact that N sub is so much smaller that the dimension 2819 of the data space.
The 9-point prescription performs best, with an angle of θ = 26 o between the shift δ i and its projection δ S i in the subspace S: clearly the more complicated pattern of scale variations (compared to the other two symmetric prescriptions) improves the estimation of the MHOU. The 3-point prescription performs worst, suggesting that lack of correlation in the factorization scale between processes in this prescription means that much of the correlation in the MHOU due to universal PDF evolution has been missed. The 7-point prescription is however only a little worse than 9-point, presumably due to the dilution of the correlation in factorization scale variation which is a feature of this prescription. Note that since these results for θ are geometrical, they are largely independent of the range of the scale variation Eq. (4).
It is interesting to ask whether all processes are equally well described, and whether there are significant differences in correlations between processes or within a process. To this purpose, in Table 5.3 we list the angle θ computed for each individual process using the various prescriptions. Three conclusions emerge from inspection of this table. First, when each process is taken individually, the results seen in Table 5.2 for the relative merits of each prescription are replicated process by process: again 3-point is worst, and 9-point is best. Secondly, processes with large numbers of data points are much harder to describe than those with only a few data points (i.e. θ is smallest for smaller datasets): this is hardly surprising, since the larger datasets cover a wider kinematic range and thus have more structure to predict. Finally, the quality of the description of the global dataset for each prescription is in each case dominated by the process (DIS NC) which is described worst, however the global dataset is actually described a little better (for each prescription) than the dataset for this process, particularly for 9-point, less so for 3-point. This suggests that correlations across processes are actually described reasonably well, and are anyway less critical than correlations within processes. We next look in more detail at the part of δ i which falls outside the subspace S, δ miss i = δ i −δ S i . This is shown for the 9-point prescription in Fig. 5.7. While this is generally uniformly small, of order a few percent, across the full range of processes, it also has nonzero components in all datasets, and all processes. This may suggest that much of it is due to poor estimation of the MHOU in PDF evolution, rather than poor estimation of MHOU in hard cross-sections which can vary substantially between different processes (and indeed different kinematics). A more sophisticated treatment of the factorization scale variation may help improve this.
Having established that most of the NNLO-NLO shift δ i lies within S, we now proceed to examine what fraction of δ S i lies with the error ellipse E specified by the theory covariance matrix. To that end, the eigenvalues λ α = (s α ) 2 of the theory covariance matrix of the global dataset are shown in Inspection of these plots confirms that all the prescriptions seem to perform reasonably well, with the eigenvalues of comparable size to the shifts, the size of the eigenvalues generally falling as the projected shifts get smaller. As expected, the 3-point prescription clearly overestimates uncertainties, since δ α < s α for all the eigenvalues. The same is true, but to a lesser extent, for both 5-point and 5-point. For the more complicated 7-point and 9-point prescriptions the largest projections (corresponding to the first seven or eight eigenvalues) are estimated rather well, though still perhaps a little conservatively, but for the smaller projections the scatter increases  significantly, with some projected shifts hardly predicted at all. This is perhaps not surprising: when varying just six independent scales, we can only expect to obtain only a limited amount of information on the MHO terms. However the correct estimation of the largest projected shifts shows that the theory covariance matrix is giving a reasonable estimation of the MHOU, especially when implemented through the more complicated prescriptions.
On each of these plots, we also show the length of the component δ miss i that is orthogonal to S, and thus completely outside E. For the symmetric prescriptions, |δ miss i | is always less than the largest component of δ in S, while for the asymmetric prescriptions it is greater, very significantly so for the 3-point prescription. This is another indication that the symmetric prescriptions give a better account of the correlations in theoretical uncertainties.
A more detailed understanding of the physical meaning of each eigenvector can be acquired by inspecting its components e α i in the data space. These are shown in Fig. 5.10 for the eigenvectors corresponding to the five largest eigenvalues in the 9-point prescription: the shift vector δ i is also shown for comparison. It is clear that there is a close correspondence between eigenvectors and MHO contributions to individual processes. For instance the first eigenvector contributes mostly to DIS NC, the second to both DIS NC and DIS CC, the third to DY, the fourth mainly to DIS CC, and the fifth mainly to JETS. Clearly the ordering of these larger eigenvalues is related to the number of data points for the respective processes: the more datapoints, the larger the eigenvalue of the (correlated) uncertainty estimate. Even relatively small eigenvalues can give an important contribution, though to processes with fewer datapoints: for example the ninth eigenvector (not shown) clearly dominates TOP.
In summary, from these validation tests it is apparent that the 9-point prescription gives a reasonable estimate of most of the MHOU, both for individual processes and for the global dataset, with the 7-point being just slightly worse. Based on this, we will therefore adopt 9point as a default prescription for the theory covariance matrix in the PDF determination to be discussed in the next section.

PDFs with missing higher order uncertainties
We can now present the main results of this work: the first determination of the parton distributions of the proton which systematically accounts for the MHOUs affecting the theory calculations of the input processes for the fit. First we present the results for PDFs obtained by fitting only DIS data. This provides us with an initial test case, which we will study both at NLO and NNLO by comparing PDFs obtained including the combined experimental and theoretical covariance matrix to the corresponding baseline fit in which only experimental uncertainties are included.
We then turn to the global PDF determination, which offers a nontrivial validation of our methodology, specifically by comparing NLO PDFs, with and without MHOUs, to NNLO PDFs. For global fits, we also study the stability of the results to changes in the prescription used for the computation of the theory covariance matrix: specifically, we compare PDFs obtained with the 9-point prescription (which is our default) to those based on the 7-and 3-point ones. We also study PDFs determined by only partially including the theory covariance matrix, either only in the data generation or only in the fitting. As discussed in the introduction, this provides us with a way of disentangling the impact of the theory covariance matrix on the central value of the PDFs or on the PDF uncertainty.
As discussed in Sect. 2, the theory uncertainties are included by simply replacing the experimental covariance matrix C ij with the sum (C + S) ij of the experimental and theory covariance matrices in the expression for the likelihood of the true value given the data. The NNPDF methodology, as used specifically in the determination of the most recent NNPDF3.1 PDF set [5], is otherwise unchanged. Within this methodology, the covariance matrix is used to generate N rep pseudodata replicas D (k) i for each datapoint i, with k = 1, . . . , N rep , whose distribution must reproduce the covariance of any two data points. This means that with theory uncertainties included, denoting the average over Monte Carlo replicas.
A PDF replica is then fitted to each pseudodata replica D where T i ≡ T i (0, 0) is the theory prediction evaluated with the central scale choice κ f = κ r = 0, and with the theory covariance matrix S ij computed using one of the prescriptions presented in Sect. 4.  It is thus clear that the inclusion of a theory-induced contribution in the covariance matrix affects only two steps of the procedure: the pseudodata generation, and the minimization. Everything else is unchanged, and is identical to the default NNPDF methodology. Note that in particular the experimental covariance matrix C used in the fitting is determined, as in NNPDF3.1 and previous NNPDF releases using the so-called t 0 method for the treatment of multiplicative uncertainties, in order to avoid d'Agostini bias (see Refs. [77,78] for a detailed discussion). As in previous NNPDF releases, minimization is thus performed using the t 0 definition of the χ 2 , but all χ 2 values shown are computed using the covariance matrix as published by the respective experiments.
All the PDF sets which have been produced and which will be discussed in this section are listed in Table 6.1. For each of the fits, we indicate its label, the input dataset, the perturbative order and the covariance matrix used. For the fits that include a theory covariance matrix, we also indicate the prescription with which it has been constructed. In the remainder of this section we discuss the main features of these PDF sets.

DIS-only PDFs
We first discuss PDF sets based on DIS data only. Fit quality indicators are collected in Table 6.2. The theory covariance matrix is always constructed using the 9-point prescription. We show the value of χ 2 /N dat and the φ estimator, defined in Ref. [4]. This estimator is a measure of the size of the uncertainty on the prediction: for an uncorrelated covariance matrix, it reduces to the ratio of the uncertainty in the predictions using the output PDFs to that in the original data, which is then generalized to the correlated case. Results are shown for both the total dataset and for the individual DIS experiments of Table 5.1. Note that the total χ 2 is no longer just the weighted sum of the individual χ 2 s, because it now also includes correlations between experiments.
It is apparent from Table 6.2 that in all cases the χ 2 improves when including the theory covariance matrix, both for individual experiments and for the total dataset. Specifically, the χ 2 decreases by about 2-3% for the NLO and NNLO fits when including theory a covariance matrix S (9pt) evaluated with the 9-point prescription. Furthermore the value of φ also slightly decreases upon inclusion of the theory covariance matrix, both for the total dataset and experiment by experiment.
The decrease in the χ 2 value when including an extra contribution to the covariance matrix is to be expected, so this is essentially a consistency check. However, if the χ 2 reduction was simply a consequence of having increased uncertainties, one would also expect a corresponding NNPDF3.1 DIS-only fits  increase in uncertainty in the theory predictions, i.e. an increase in φ. The fact that both χ 2 and φ decrease suggests that the inclusion of MHOUs resolves some tensions in the fit which are present when only the the experimental covariance matrix is included. It is interesting that for DIS data these theoretical tensions are greater in the NNLO fit than at NLO. Next we compare PDFs: in Fig. 6.1 we compare the gluon and the total quark singlet PDF at Q = 10 GeV at NLO and NNLO with and without MHOUs in the covariance matrix, determined using the 9-point prescription. The NLO results are also compared with the central value of the NNLO fit based on the experimental covariance matrix only. Note that in these comparison plots the PDF uncertainty band is always computed using standard NNPDF methodology, i.e., as the standard deviation over the PDF replica sample. Therefore, this uncertainty band has a different meaning dependent on whether or not the theory covariance matrix is included: when it is not included, the band represents the conventional "PDF uncertainty", reflecting the uncertainties from the data (and methodology), while when it is included, the band provides the combined "PDF" and MHO uncertainty.
The comparison shows that for PDFs which are strongly constrained by data, such as the quark singlet PDF for x ∼ > 10 −3 , the uncertainty does not increase upon inclusion of the theory covariance matrix, in fact it even decreases. This is consistent with the previous observation that the uncertainty on the theory predictions themselves decreases somewhat. In the case of the gluon PDF, which is only loosely constrained by the DIS data, the uncertainty increases substantially with MHOUs, particularly in the NNLO fit. This is of course consistent with the fact that, in the absence of stringent experimental constraints, an extra contribution to the covariance matrix will lead to increased uncertainties in the best fit. So this indirectly supports the conclusion that in the regions where data are abundant the decrease in uncertainty is due to the fact that the theory contribution to the covariance matrix helps to resolve tensions in the fit.
Comparison to the NNLO best fit shows that its value is clearly compatible with the total uncertainty band.

Global PDFs
We now discuss PDFs determined from the global dataset presented in Sect. 5.1. Only NLO PDFs will be discussed here, with global NNLO PDFs left for future work. The χ 2 values and The theory covariance matrix S has been constructed using the 9-point prescription. In the NLO plots the central value of the NNLO determined without MHOU is also shown. All results are shown as a ratio to the central value of the set with theory covariance matrix not included. Note that the uncertainty band has a different meaning according to whether the theory covariance matrix is included or not: if not it is the standard PDF uncertainty coming from data, while if it is included, then it is the total uncertainty including the MHOU.
φ values are shown in Tables 6.3 and 6.4 respectively, both for the total dataset and for the individual processes of Table 5.1. In comparison to the DIS-only case of Table 6.2 we now also show results obtained using the 7-point and 3-point prescriptions, and also for the default 9point prescription but where the results were obtained by including the theory covariance matrix either only in the χ 2 definition Eq. (6.2), or only in the data generation Eq. (6.1), in order to understand better the two distinct effects.
As in the case of the DIS-only fit, on adding the MHOU we find a reduction of χ 2 both for the global fit and for individual datasets, while the value of φ is roughly unchanged. We again conclude that the inclusion of the theory covariance matrix improves fit quality by removing some tension which would be otherwise present between datasets. Indeed the total χ 2 for the NLO fit with MHOU (in 9-point prescription) is now comparable to that of the NNLO fit. Comparing different prescriptions, results are reasonably stable, even when comparing to the 3point prescription which, as discussed in Sects. 4.3-5.3, spans a much smaller subspace of theory variations. However, the 9-point prescription appears to perform best in terms of χ 2 quality with very little difference in φ, in agreement with the results of Sect. 5.4.
We finally turn to fits in which the theory covariance matrix is included either in the χ 2 definition Eq. (6.2) but not in the data generation Eq. (6.1), or in the data generation Eq. (6.1) but not in the χ 2 definition Eq. (6.2). In the former case, we expect the MHOUs to affect mostly the central value (since the relative weighting of different data points is altered during the fitting according to the relative size of their MHOUs), and to a lesser extent the uncertainties (since the data replicas only fluctuate according to the experimental uncertainties). The results show that indeed including the MHOU in the χ 2 definition alone leads to a χ 2 value which is very close to that found when the MHOU is fully included, consistent with the expectation that it is the inclusion of the theory covariance matrix in the χ 2 which mostly drives the best fit. On the other hand, the φ value is somewhat reduced, due to the relaxation of some of the tensions in the fit, now uncompensated by the great fluctuation of the replicas. In the latter case, we expect to obtain increased uncertainties but a worse fit, since the data replica fluctuations are wider due to the MHOU, and this is not accounted for in the χ 2 . The results indeed show a significant deterioration of fit quality, as expected for an inconsistent fit: the χ 2 goes up, and also the φ value goes up, showing the increase in uncertainty due to the inclusion of MHOU in the sampling, now uncompensated by a rebalancing of the datasets through the inclusion of MHOU in the fit.
We now move on to discuss the corresponding results at the PDF level, in analogy with the comparisons presented for the DIS-only fits in Fig. 6.1. Specifically, in Fig. 6.2. we show the results of the NLO fits based on C and C + S (9pt) , as well as the central value of the NNLO fit based on C, for the gluon, the total quark singlet, the anti-down quark, and the total strangeness PDFs.
We find that in the data region the PDF uncertainty is not substantially increased by the inclusion of the theory covariance matrix, while central values can shift significantly, by up to one sigma. This is consistent with the observation that the φ values in Table 6.4 do not increase upon inclusion of the theory covariance matrix. This provides evidence that in the data region the inclusion of the theory covariance matrix resolves tensions which are otherwise present in the global dataset. In contrast, in regions where PDFs which are only loosely constrained by the data, and in particular in the extrapolation regions, the PDF uncertainty can increase significantly.
An especially interesting comparison is with respect to the central NNLO value: not only is this quite compatible with the uncertainty band, but there is now clear evidence that upon inclusion of the NLO MHOU the central best fit moves towards the correct NNLO answer. This is further evidence that indeed the theory covariance matrix has resolved tensions due to MHOs. This improved agreement of the central value of the NLO C + S (9pt) with the NNLO C fits is  Table 6.3. The values of the χ 2 /N dat in NLO global fits with the theory covariance matrix S, compared to the results based on including only the experimental covariance matrix C. Results are shown for the 9-, 7-, and 3-point prescriptions. For the 9-point prescription we also show results obtained including the theory covariance matrix in the χ 2 definition Eq. (6.2) but not in the data generation Eq. (6.1) (marked S 9pt fit ) and then in the data generation Eq. (6.1) but not in the χ 2 definition Eq. (6.2) (marked S 9pt sampl ). Values corresponding to the NNLO fit with experimental covariance matrix C only are also shown. φ in the NNPDF3.1 global fits  non-trivial: for instance, inclusion of the theory covariance matrix leads to a suppression of the gluon at large x and an enhancement of strangeness, both of which are indeed also observed at NNLO. Next, in Fig. 6.3 we compare PDFs obtained using different prescriptions. The corresponding relative PDF uncertainties are compared in Fig. 6.4. In agreement with what we saw for the χ 2 and φ values in Tables 6.3, 6.4 results are quite stable with respect to the choice of prescription, though in the most extreme case of the 3-point prescription, where factorization scale variations are entirely uncorrelated between different processes, we observe somewhat smaller uncertainties, and a central value which is closer to that when the MHOU is not included.
Finally, in Fig. 6.5 we compare PDFs obtained including the theory covariance matrix only in the χ 2 definition Eq. (6.2) but not in the data generation Eq. (6.1) and conversely. We see that when the theory covariance matrix is included in the replica generation but not in the χ 2 , uncertainties increase very significantly. This result is in agreement with the observation from Table 6.3 that in this case the fit quality significantly deteriorates, which is because the fit becomes inconsistent due to the χ 2 not matching the wider fluctuations in the data. The effect is particularly visible for the quark distributions. On the other hand, including the theory covariance matrix only in the χ 2 singles out the effect of the theory covariance matrix on central values, due to rebalancing of datapoints in the fit according to their relative MHOU. Indeed in this case the central value is very close to that obtained when including the MHOU is both data generation and fit. We also see that the change in uncertainties in the data region is now very small, consistent with Table 6.3. These results confirm our expectation that in the full fit, while the MHOU results in a substantial increase in the fluctuations of data replicas, this is compensated by a relaxation of tensions due to the inclusion of MHOU the fit, with the net result that while central values shift, overall uncertainties remain relatively unchanged.

Implications for phenomenology
Whereas a full assessment of the impact of the inclusion of MHOU in PDFs will be possible only once we have global NNLO sets with MHOU, it is worth performing a first phenomenological investigation, by computing reference LHC standard candles with the NLO PDF sets which include MHOUs presented in Sect. 6, and comparing to results with the corresponding NLO PDF sets in which no MHOU is included.
In this section we will specifically consider Higgs boson production in gluon-fusion and in vector-boson fusion, top quark pair and Z and W electroweak gauge boson production. Note that the latter processes are among those which have been used for PDF determination, see Tab. 5.1. This raises the issue of possible double counting of uncertainties between the MHOU in the PDF and in the hard matrix element. This will be addressed in Sect. 8.1 below.
As discussed in Sect. 6, once the MHOU is included in the covariance matrix, the standard NNPDF methodology can be used, but with the PDF uncertainties now also including a theory-induced contribution. Specifically, PDF uncertainties (which now include the MHOU uncertainty) are obtained as standard deviations over the replica sample. The total uncertainty on a physical prediction is then obtained by combining this uncertainty with that on the hard cross-section for the given process. The latter is conventionally obtained as the envelope of a 7-point scale variation, see e.g. Ref. [16]. Of course, an alternative possibility is to compute the theory uncertainty on the hard cross-sections in exactly the same way as we compute it when performing PDF determination, i.e. using the theory covariance matrix. In this case, the MHOU on any measurement is found as the diagonal element of the covariance matrix, evaluated for the given measurement. Here we will compute the theory uncertainty both using the theory covariance matrix (with the 9-point prescription, given in Eq. (4.8)), and as a 7-point envelope. The MHOU uncertainty on the hard cross-section can then be combined with the total uncertainty on the PDF (which includes both MHOU and data uncertainties) in quadrature. A more detailed discussion of prescriptions for the computation of the total uncertainty on a physical Higgs production in gluon fusion at 13 TeV  Table 7.1. The total cross-sections for Higgs production in gluon fusion (in pb) obtained by using NLO global PDFs based on either C or C + S (9pt) , see Table 6.1. We quote the central prediction, the total PDF uncertainty (first) and the MHOU uncertainty on the hard cross-section (second) expressed as a percentage of the central value. The latter is evaluated both using the theory covariance matrix (9-point prescription) or, in parenthesis, a (symmetrised) envelope of the 7-point scale variations (see Sect. 8.1), obtained by taking the maximum value between the lower and upper uncertainties.
observable, including explicit formulae, will be given in Sect. 8.1 below. The current state of the art for precision phenomenology is NNLO, and thus NNLO PDFs would be needed for accurate predictions. However, as discussed in Sect. 6, at present only NLO global PDFs with MHOU are available. In principle, NNLO PDFs from a DIS only fit are also available. However, also as discussed in Sect. 6, some of these PDFs (specifically the gluon) are affected by large uncertainties due to the lack of experimental constraints. The comparison of PDFs with and without MHOU for such sets would thus be rather misleading. Therefore, in this section we will focus on NLO PDFs. It should of course be kept in mind that NNLO PDFs with MHOU are likely to have smaller uncertainties.

Higgs production
We first discuss Higgs production in gluon fusion (ggF) and in vector boson fusion (VBF). These two processes are of direct relevance for the characterization of the Higgs sector and are both currently known at N 3 LO accuracy [79][80][81][82]. Note that the perturbative behavior and leading partonic channels for these processes are quite different. Higgs production in gluon fusion is driven by the gluon-gluon luminosity and its perturbative expansion converges slowly, with manifest convergence reached only at N 3 LO. Vector boson fusion is driven by the quarkantiquark luminosity and it exhibits fast perturbative convergence.
In Table 7.1 we present predictions for Higgs production in gluon fusion at the LHC for √ s = 13 TeV. We perform the calculation at NLO, NNLO and N 3 LO in the rescaled effective theory approximation using ggHiggs [83][84][85][86][87][88] with µ f = µ r = m H /2 as central scale, with the NLO global sets obtained in this paper, with and without MHOUs, as input PDFs at all orders. The results are displayed graphically in Fig. 7.1.
We find that for all perturbative orders the central values obtained with PDFs with and without MHOU are very similar, while the PDF uncertainty is about 50% larger when MHOU are included in the PDF fit. This can be understood by noticing that for the intermediate values of the momentum fraction, x 10 −2 , relevant for Higgs production in gluon fusion, the PDF uncertainty of the gluon is increased in the C + S (9pt) fit as compared to the C-only fits, see  Table 7.1 one can also observe that the MHOU on the hard matrix element uncertainty σ th F evaluated using the 9-point theory covariance matrix, Eq. (4.8), is compatible with the canonical 7-point envelope if the latter is symmetrised by taking the maximum value between the lower and upper uncertainties. In particular, the theory covariance matrix estimate is slightly larger than the envelope prescription at NLO and at NNLO, while it becomes a little smaller at N 3 LO. Even so, the NLO uncertainty band does not contain the NNLO central value, which lies just above the edge of the band.
We conclude that using NLO PDFs in the N 3 LO calculation, the inclusion of MHOU in the PDFs translates into a few per-mille increase of the PDF uncertainty at the cross-section   level. In Ref. [80] NNLO PDFs were used with the N 3 LO calculation in order to provide a state-of-the art result, and a MHOU uncertainty on the NNLO PDF was estimated based on the difference between results obtained using NLO and NNLO PDFs. Once NNLO PDFs with MHOUs determined within our approach are available it will be interesting to compare our results with this estimate. We now turn to Higgs production in vector boson fusion. We perform the calculation at N 3 LO accuracy using proVBFH-inclusive [82,89]. with central factorization and renormalization scales set equal to the squared four-momentum of the vector boson. Results are collected in Table 7.2 and shown in Fig. 7.1. The MHOU corrections to the PDFs are very small, so PDF uncertainties with or without theory covariance matrix are very similar. Also in this case, like for gluon fusion, the uncertainty on the hard matrix element computed with the 9-point theory covariance matrix is similar to the one obtained by symmetrizing the 7-point envelope.
The smallness of the MHOU in the PDF follows from the fact that VBF Higgs production is driven by the quark-antiquark luminosity, which in turn is dominated by the quark PDF in the data region, whose uncertainties, as we have seen in Sect. 6.2, are almost unaffected by the inclusion of MHOU. On the other hand, in Sect. 6.2 we have also seen that MHOUs have the effect of moving the central value of the PDFs in the data region towards the NNLO result, and indeed, the shift in the central value of the VBF cross-section due to the MHOU turns out to be significant, by an amount which is comparable to the MHOU σ th F on the NLO matrix element, and indeed the shift when going from NLO to NNLO matrix elements, and thus much larger that the corresponding N 3 LO correction.
We conclude that for VBF the main effect of including the MHOU in the PDF is a significant shift in the central value of the prediction. Also in this case estimates of the MHOU on the NNLO PDF were presented in Ref. [82], and it will be interesting to compare them to our approach once NNLO PDFs with MHOU determined within our approach are available.

Top quark pair production
We now study the impact of the PDF-related MHOU on the total top-quark pair production cross-section at the LHC for different center-of-mass energies. In Table 7.3 we collect, using the same format as Table 7.1, the predictions for the top-quark pair-production cross-sections at √ s = 7, 8 and 13 TeV obtained using the top++ code [90] and setting the central scales to µ f = µ r = m t = 172.5 TeV. The results in the case of 8 and 13 TeV are also displayed in Higgs production in VBF at 13 TeV  Just as in the case of Higgs production via gluon-gluon fusion, we find that for top-quark pair production the central values obtained with PDFs with and without MHOU are rather similar, and well within the one-σ PDF uncertainty. We also observe that the PDF uncertainty at √ s = 7 and 8 TeV (13 TeV) is about 50% (20%) larger once MHOU are included in the determination of the PDFs. This is again compatible with the corresponding behavior of the gluon PDF shown in Fig. 6.2, where it can be observed that, for x 0.1, relevant for top pair production at √ s = 7 and 8 TeV, the PDF uncertainty is increased in the C + S (9pt) fit as compared to the C-only fit, while this increase is less marked for x ∼ 0.3, relevant for top pair production at √ s = 13 TeV. In addition, we note once again that the uncertainty on the hard cross-section σ th F evaluated using the 9-point covariance matrix is rather similar to that obtained from the symmetrised 7-point envelope. In particular, the 9-point result is slightly larger (smaller) than the 7-point envelope at NNLO (NLO). Finally, from Fig. 7.2 we notice that for this process the MHOU on the hard cross-section dominates the PDF uncertainty (with or without MHOU included), even with NLO PDFs.

Z and W gauge boson production
We finally turn to gauge boson production, for which we obtain predictions using the computational framework Matrix [91]. In this formalism, all tree-level and one-loop amplitudes are obtained from OpenLoops [92][93][94]. For these theoretical predictions for inclusive W and Z pro-   duction cross sections at √ s = 13 TeV, we adopt realistic kinematic cuts similar to those applied by ATLAS and CMS. The fiducial phase space for the W ± cross-section is defined by requiring p l,T ≥ 25 GeV and η l ≤ 2.5 for the charged lepton transverse momentum and pseudo-rapidity and a missing energy from the neutrino of p ν,T ≥ 25 GeV. In the case of Z production, we require p l,T ≥ 25 GeV and |η l | ≤ 2.5 for the charged leptons transverse momentum and rapidity and 66 ≤ m ll ≤ 116 GeV for the di-lepton invariant mass.
In Table 7.4 we display a similar comparison as in Table 7.1 now for W and Z gauge boson production at √ s = 13 TeV. The corresponding graphical representation of the results is provided in Fig. 7.3, again using the same conventions as in Fig. 7.1.
We find that when including the MHOU the PDF uncertainty is increased by 70%, 30% and 75% for Z, W + , and W − production respectively. Given that W and Z production at ATLAS and CMS at √ s = 13 TeV is sensitive to the light sea quarks down to x 10 −3 , this increase in the PDF uncertainty once MHOU are accounted for is consistent with the corresponding increase reported in the case of the singlet PDF in Fig. 6.4.
Similarly to Higgs production in vector-boson-fusion, we find that the inclusion of MHOU in the PDF shifts the central value of the prediction, by an amount which is comparable to or larger than the data-driven PDF uncertainty. We conclude that for weak gauge boson production at the LHC the impact of the MHOU associated to the PDFs is twofold: on the one hand an overall increase in the PDF uncertainties that ranges between 30% and 70% depending on the process, and on the other hand a shift in the central values which is comparable to that of the PDF uncertainties of the fit without MHOU.

Usage and delivery
As mentioned previously, the PDF sets with MHOU presented in Sect. 6 can be used in essentially the same way as the standard NNPDF sets. In this section we discuss how MHOUs included in PDF sets should be combined with those in hard matrix elements, specifically addressing some conceptual issues, and we then provide detailed instructions for their use. We then discuss the delivery of the PDF sets presented in this work, and provide a list of the sets which are being made publicly available by means of the LHAPDF interface.

Combining MHOUs in PDFs and hard matrix elements
As discussed in the introduction, the MHOU on PDFs discussed in this paper arise due to the fact that PDFs are determined using perturbative computations performed at finite order in the perturbative expansion, and it manifests itself in the fact that PDFs change when varying the order at which they are determined: in other words, NLO and NNLO PDFs differ. We have further seen in Sect. 3 that there exist two distinct sources of MHOU in the PDF: that related to MHOs in the computation of the hard cross-sections for those processes used for PDF determination, and that coming from MHOs in the anomalous dimensions. These two sources of MHOU in the PDFs are respectively associated with renormalization and factorization scale variation and can be treated as independent of each other, at least with the definition given here and summarized in Table 3.1. Thus when considering a factorised prediction for a PDF-dependent hard process not used in the determination of the PDFs, but rather predicted using a given PDF set, there are two independent sources of MHOUs: those specific to the hard process itself (in the hard crosssection for the process and in the evolution of the PDF to the scale of the process), and those in the PDF (due the MHOUs in the hard cross-sections and evolution kernels used to determine the PDF). Clearly, each of these two sources receives contributions from both renormalization and factorization scale variation. This immediately raises the question as to whether some of these uncertainties are correlated, and if this is the case how and whether this correlation should be kept into account.
A first obvious source of correlation arises when producing a prediction for a process which is among those included for the PDF determination. Examples of this category of processes are top quark pair and gauge boson production, discussed in Sect. 7 and which are already included among the processes of Table 5.1. The MHOU coming from renormalization scale variation is then in principle correlated. Indeed, we know from Fig. 5.3 that any two predictions for the same physical process are highly correlated. This correlation is not taken into account if the MHOU in the PDF and in the hard matrix element are simply combined in quadrature, as will be recommended in Sect. 8.2. This correlation is mostly positive, e.g. between predictions for the same process for points which are kinematically close, so neglecting it provides a conservative estimate of uncertainties, but it may also be negative, for points in different regions of phase space, or perhaps for different observables related to the same process, such as e.g. a rapidity and a transverse momentum distribution.
Given that we have demonstrated that MHOU on PDFs are generally small, and that most individual processes give a relatively small contribution in a global fit based on a wide dataset (see e.g. Ref. [5]), this lack of correlation is likely to be a very small or even negligible effect. However, if it is felt that a particular process should be predicted in a way that is free of this problem, the only completely safe way to proceed would be to remove that process from the dataset used for PDF determination. The situation here is akin to what happens when one wishes to avoid the prediction for a given process to be biased by the inclusion of that same process in the PDF determination. Indeed, several PDF sets in which specific datasets are removed were presented and discussed in Ref. [5], and additional custom-made sets excluding particular datasets have been provided for specific applications.
A potentially more serious problem was recently raised in Ref. [13], where it was pointed out that the factorization scale dependence of the PDF and the hard matrix element are strongly correlated, and in fact fully correlated for a process (such as nonsinglet deep-inelastic structure function) which depends on one PDF only. This can be easily understood recalling that factorization scale variation estimates MHOU in the anomalous dimension, and thus in perturbative evolution. Consider then a PDF set in which all PDFs are parametrised at a certain scale Q 0 . Assume for the sake of argument that one wishes to predict a hard process whose scale Q is this same scale Q 0 . It is clear that then there should be no evolution uncertainty, and thus one should not vary the factorization scale in the hard matrix element. This is the limiting case of the cases discussed in Ref. [13]: the MHOU related to factorization scale, being related to perturbative evolution, is in principle correlated between the process used for PDF determination and the process which is being predicted in a way that only depends on the evolution length.
However, further reflection shows that this correlation is always lost for realistic PDF de-terminations, because PDF sets are parametrised at a (typically low) scale Q 0 which does not coincide with the scale of typical hard processes, and that is actually in most cases below the minimum kinematic cut Q min . This implies that the information on the correlation of the evolution uncertainty when evolving back to the scale of hard processes is lost when constructing the PDF set at Q 0 . In principle, one could produce various PDF sets, each parametrised at the scale of an individual process, such as "Higgs PDFs" with Q 0 = m H /2 (the scale of Higgs production), "Drell-Yan PDFs"with Q 0 = m Z , and so on. Predictions obtained with these PDFs for the corresponding processes would then have all factorization scale uncertainties (due to evolving from the scales of data used in the PDF fit to Q 0 ) already incorporated into the PDF uncertainty, so the only additional uncertainty in the prediction would be that from renormalization scale variation of the hard cross-section. However the price to pay would be loss of PDF universality: the "Higgs PDFs" and "Drell-Yan PDFs" would be different, in the sense that one could no longer be obtained from the other entirely through fixed order evolution. Note that the correlation of factorization scale between all processes used for PDF determination is instead fully accounted for by our formalism, and is included in the theory covariance matrix when adopting any prescription (such as our default 9-point prescription) in which factorization scale variation is fully correlated. Therefore, in practice, by neglecting this correlation in factorization scale between the evolution to Q 0 in the PDF determination and the evolution from Q 0 when making our prediction, we are at worst producing an overestimate of the MHOU. Given the moderate effect of MHOUs on PDFs, this is likely to be a small effect.
We conclude that the uncorrelated combination of the MHOU on the PDF with the MHOU of the hard matrix element of the predicted process is both pragmatic and realistic, especially given the well known uncertainties intrinsic to the estimation of MHOUs.

Computation of the total uncertainty
Having concluded that uncorrelated combination of the MHOU on the PDF and on the hard matrix element is justified, we summarize our procedure for computing uncertainties in practice.
To begin with, the PDF uncertainty σ PDF F associated with a given cross-section F is evaluated as usual in the NNPDF methodology as the standard deviation over the replica set: If this prescription is applied to a PDF set with "standard" PDF uncertainty (such as the published NNPDF3.1 [5]) set, the resulting uncertainty only includes the correlated statistical and systematic uncertainties from the data, and the methodological uncertainty intrinsic to any PDF fit. If the PDF sets including MHOU presented in Sect. 5 of this paper are used instead, the resulting uncertainty obtained from Eq. (8.1) accounts for both the data-driven and MHOU on the PDF, with all correlations taken into account. Because the MHOU on the hard matrix element is treated as uncorrelated to the PDF uncertainty, it can in principle be computed with any prescription preferred by the end-user. A commonly used prescription is 7-point scale variation [16]. Our preferred prescription is instead to use the same methodology as used for the computation of the theory covariance matrix. In this case, the uncertainty on the cross-section F is then simply the corresponding diagonal entry of the covariance matrix element, namely where S (9pt) F F is evaluated using our default 9-point prescription defined by Eq. (4.8), with ∆ ij computed for i = j = F, i.e. the theory prediction for the given observable. We showed in Sect. 7 that for various standard candles our 9-point theory covariance matrix prescription and the 7-point envelope prescription give very similar results, provided the envelope prescription is symmetrized.
The PDF uncertainty Eq. (8.1) and the uncertainty on the hard matrix element Eq. (8.2) can then be combined as uncorrelated uncertainties. For instance, one can combine them in quadrature and thus the total uncertainty on the cross-section F is simply Note that when using a χ 2 to assess the quality of the agreement between experimental data and the associated theory predictions for a PDF set which includes MHOUs, the MHOU must be always be included in the definition of the χ 2 estimator, ideally (though not necessarily) by means of the theory covariance matrix. This is because, as seen in Sect. 6.2, the inclusion of MHOU modifies the best-fit central value, and thus if the MHOU were not included in the χ 2 , these PDFs would not provide the best fit, and the results might be misleading. In this sense, the theory covariance matrix should be regarded as an additional systematic uncertainty, specific to the determination of PDFs from the data, to be added in quadrature to the usual experimental systematics.

Delivery
The variants of the NNPDF3.1 NLO global sets presented in this work are publicly available in the LHAPDF format [20] from the NNPDF website: http://nnpdf.mi.infn.it/nnpdf3-1th/ In the following, we list the PDF sets that are made available. The NLO sets based on the theory covariance matrix are: NNPDF31 nlo as 0118 scalecov 9pt NNPDF31 nlo as 0118 scalecov 7pt NNPDF31 nlo as 0118 scalecov 3pt which correspond to the fits based on Eq. (6.2) in the cases in which the theory covariance matrix S ij has been evaluated with the 9-, 7-, and 3-point prescriptions, respectively.
We have also constructed NLO PDF sets based on scale-varied theories, to be discussed in Appendix B below. These are determined using Eq. (B.1), and they are NNPDF31 nlo as 0118 kF 1 kR 1 NNPDF31 nlo as 0118 kF 2 kR 2 NNPDF31 nlo as 0118 kF 0p5 kR 0p5 NNPDF31 nlo as 0118 kF 2 kR 1 NNPDF31 nlo as 0118 kF 1 kR 2 NNPDF31 nlo as 0118 kF 0p5 kR 1 NNPDF31 nlo as 0118 kF 1 kR 0p5 where the naming convention indicates the values of the scale ratios k f and k r . Note that the NNPDF31 nlo as 0118 kF 1 kR 1 set is also the baseline (central scales and experimental covariance matrix only) to be used in the comparisons with the fits based on the theory covariance matrix listed above. Finally, we also provide the set NNPDF31 nnlo as 0118 kF 1 kR 1 which corresponds to the NNLO fit with central scales and experimental covariance matrix only, that has been produced for validation purposes.
It is important to bear in mind that the variants of the NNPDF3.1 fits presented in this work are based on a somewhat different dataset to that used in the default NNPDF3.1 analysis. Therefore, when using these sets it is important to be consistent: for example by comparing fits with and without MHOU that are based on a common input dataset.
In addition to the sets listed above, the other PDF sets presented in this paper, such as the DIS-only fits based on scale-varied calculations and on the theory covariance matrix, are available from the authors upon request.

Summary and outlook
In this work we have presented the first PDF determination that includes MHOU as part of the PDF uncertainty. This is in principle required for consistency, given that MHOU are routinely part of the theoretical predictions for hadron collider processes, and likely to become a requirement for precision collider phenomenology as other sources of uncertainties decrease.
The bulk of our work amounted to establishing a general language and formalism for the inclusion of MHOU when multiple processes are considered at once in the global PDF fit, constructing prescriptions for estimating these MHOU by means of scale variation, and for validating them in cases in which the higher order corrections are known. The formalism presented here is sufficiently flexible that it can also be applied to different sources of theoretical uncertainty, such as nuclear corrections or higher twists, and could also be used in conjunction with alternative ways of estimating MHOU, such as for example the Cacciari-Houdeau method.
The validation studies presented here suggest however that the conventional scale variation method to estimate the MHOU works remarkably well. Indeed, when coupled to the theory covariance matrix formalism that we introduced, this method turns out to be free of the instabilities that plague envelope techniques, and it leads to results which appear to be reasonably stable and thus insensitive to the arbitrary choices that are inherent to its implementation. The reason for these properties is essentially that, within a covariance matrix approach, possible directions which do not correspond to actual MHO have no impact on the fitting.
Our results however also suggest that even more realistic estimates of MHOU might be obtained through more complex patterns of scale variation than those considered here. For example, it might be advantageous to vary independently the renormalization scales in different partonic sub-channels, or the factorization scales for singlet, nonsinglet and valence partons that evolve independently. Indeed, we have observed from the validation of our estimate of MHOU, while always reasonably successful for the datasets considered here, deteriorates as the size of the dataset increases, which suggests that more complex structures might be advantageous. Here we have performed a first investigation, and the exploration of these more complex patterns of scale variation will be left for future work.
On the phenomenological side, our results show that at least at NLO the main effect of the inclusion of MHOU in PDF determination is to improve the accuracy of the result, while not significantly reducing its precision. Indeed, whenever experimental information is abundant, in particular for a global dataset, we have found that the total PDF uncertainty is only moderately affected by the inclusion of MHOU -in fact, for the datapoints included in PDF determination it even decreases -but the central value moves close to the true result. Moreover, the fit quality improves, thereby showing that the main effect of the inclusion of MHOU is in reducing tensions between datasets due to imperfections in their theoretical description.
The most interesting future phenomenological development will be of course the extension of our methodology to the determination of MHOU in a state-of-the-art global NNLO PDF set. It will be interesting to assess to what extent the behaviour observed at NLO persists there. More generally, the inclusion of MHOU at NNLO is expected to lead to the most precise and accurate PDF sets that can be determined with currently available theoretical and experimental information.

Dataset
Order k f = µr/Q kr = µ f /Q  As we shall see, whereas this approach provides an independent way of assessing the dependence of PDFs on scale choice, it does not provide a stable way of estimating MHOUs. Nevertheless, even though these PDF sets do not appear to be advantageous for MHOU, they are be interesting for their own sake, and they are presented here, especially in view of the fact that PDF sets based on systematic scale variation of the underlying theory have never been presented before.
Based on the experimental and theoretical settings described in Sect. 5.1, we have produced a number of PDF sets with different choices for k r and k f , input dataset, and perturbative order, which are summarized in Table B.1. The PDF sets corresponding to the central scale choices are the same as discussed in Sect. 6. In the same way as in Sect. 6, here we determine PDFs at NLO and NNLO from a DIS-only dataset, and NLO PDFs from a global dataset.
In all these PDF determinations, the factorization and renormalization scale are varied in a fully correlated way between all datasets. So for example, if k r = 2, then the renormalization scale is taken to be twice its default value for all processes. This immediately exposes a defect in this method: in principle, the MHOU in the hard cross-sections of different processes are uncorrelated. However uncorrelated variations of k r across the five processes we consider would require 3 6 fits of each type, or O(70, 000) replicas, which is of course impractical.   ) PDF sets based on scale-varied theories. We display the values of the χ 2 /N dat for the nine combination of scale variations, (k f , k r ), listed in Table B.1. For each dataset, we also indicate the number of data points after cuts, see also

B.1 DIS-only PDFs
In 2 ) leads to a much larger χ 2 than any other choice. This choice, which involves a large scale ratio, is typically not used when estimating scale uncertainties from an envelope prescription. Note however that the reciprocal choice ( 1 2 , 2), which is also usually discarded for the same reason, leads to a χ 2 which is not particularly large. Variation of χ 2 values as the scales are varied are more marked for HERA experiments that for fixed target, consistent with the observation that scale variation is larger in the small x region covered by the HERA data.
We next assess the impact of scale variation on the PDFs. The impact on PDF uncertainties turns out to be moderate, and thus we concentrate on central values. In Fig. B.1 we compare the central values of the NLO and NNLO DIS-only PDFs obtained with the values of (k f , k r ) listed in Table B.1, normalized to the (k f , k r ) = (1, 1) baseline. The gluon, total quark singlet and down antiquark are shown at Q = 10 GeV. The behaviour for other quark flavors is similar.
Scale variation for the gluon turns out to be rather asymmetric, with all scale choices leading to a gluon which is below the central scale choice for x ∼ < 10 −2 . The spread is similar at NLO     and NNLO, with only a slight reduction in the latter case. The scale choice (k f , k r ) = (2, 1 2 ) which, as already noted, leads to a much worse χ 2 value appears to lead to a rather unstable PDF. In the singlet case the spread of scale variations about the central choice is rather more symmetric, with a clear reduction when going from NLO to NNLO. Whereas for the gluon the spread is considerable for all x values, for the quark singlet the spread become quite small at large x ∼ > 0.1. The behavior of the down antiquark is similar to that of the singlet. In general, the sensitivity of results to scale variation appears to be directly linked to the data-driven PDF uncertainty, with a much wider spread observed whenever the information coming from data is reduced and PDF uncertainties are large.

B.2 Global PDFs
We now turn to PDF fits based on the global dataset. As we will show, the use of a global dataset not only reduces the PDF uncertainties, but also the relative impact of varying the scales as compared to the DIS-only fits, consistent with previous results [95] showing that the perturbative stability of PDFs improves as the size of the dataset used for their determination grows.
In Table B case, the best fit is found for the central scale choice, with all other scale choice leading to a somewhat worse χ 2 value, and the scale choice (2, 1 2 ) leading to much worse fit quality. All scale choices at NLO give a worse fit than the NNLO fit, due to the fact that NNLO corrections are needed for a good description of several high-precision LHC data [5].
The corresponding PDFs are shown in Fig

B.3 The envelope prescription for MHOU
We would like now to assess the possibility of using PDF sets obtained with different choices of renormalization and factorization scale as a means to estimate MHOUs. We first compare systematically fit quality as a function of the scale choice. In order to facilitate this comparison we evaluate ∆ + χ 2 ≡ χ 2 min − χ 2 central ; where χ 2 max (χ 2 min ) denotes the largest (smallest) value of the χ 2 , for either the total dataset or for individual experiments, among all the NLO fits based on scale-varied theories listed in Tables B.2 Table B.4, both for individual experiments and for the total dataset. From this comparison, see that for DIS-only PDFs the HERA data drive the differences in χ 2 values, with rather smaller contributions from the fixed-target experiments. In the case of the NNLO DIS-only fit, the spread in the χ 2 values due to the scale-varied theories for the HERA inclusive data is markedly reduced compared to the NLO case, consistent with the reduction of MHOUs when increasing the perturbative order of the calculations.
The NLO global fits behave in a similar way, with the HERA data still dominating χ 2 differences. However, a marked χ 2 spread is now also seen for the LHC experiments. This shows that the precise collider (HERA and LHC) data are most sensitive to higher order corrections. As in the case of DIS-only fits, for almost all experiments the central scale choice (k f , k r ) = (1, 1) provides the best overall description of the various datasets.
The fact that, with the exception of the combination (k f , k r ) = (2, 1 2 ) all scale choices lead to PDFs in reasonable agreement with the data suggest that an estimate of MHOU might be obtained by taking an envelope of PDFs determined with different scale choices. We consider in  Table. (B.1) are included in the envelope, the 7-point envelope, in which the two choices (k f , k r ) = (2, 1 2 ) and (k f , k r ) = ( 1 2 , 2) are removed from the 9-point envelope, and the 3-point envelope, in which only the two choices (k f , k r ) = (2, 2) and (k f , k r ) = ( 1 2 , 1 2 ) are considered together with the central scale choices. These envelopes are shown in Fig. B.2. It is clear that the size of the envelope is extremely sensitive to the choice of scales to be included. Of course, by construction, an envelope including more scale choices always leads to a wider band than envelopes with fewer choices. Specifically, because of the instability of results found with (k f , k r ) = (2, 1 2 ) , the 9-point envelope leads to a very wide uncertainty band.
Further insight on the envelope method can be obtained by comparing the envelope of scale variations, taken as a candidate MHOU, for DIS-only and global PDF sets. This is done in Fig. B.3, where the standard, data-driven PDF uncertainties are also shown for reference. It is clear that the size of the envelope significantly decreases when going from DIS-only to global PDFs, consistent with what was observed in Sect. 6. However, the envelope estimates of MHOUs appear to be quite large in comparison to PDF uncertainties, and unstable upon changes in dataset.
In the case of DIS-only PDFs we can also study the behaviour of the envelope as a function of the perturbative order. This is done in Fig. B  expect. On the other hand, again results appear to be quite unstable, with large instabilities observed for thed(x, Q 2 ) PDF at large x.
We can finally ask how MHOU estimated from an envelope could be combined with the data-driven PDF uncertainties. Unlike in the case of the covariance matrix approach discussed in Sect. 6, in which results are found by using default NNPDF methodology, but with an extra contribution added to the covariance matrix, here we need a prescription for the combination of MHOU (obtained from the envelope of scale variations) and data-driven PDF uncertainties, obtained using standard NNPDF methodology with a purely experimental covariance matrix.
A possible prescription would be to take as total uncertainty on PDFs a sum on quadrature of the envelope MHOU and the standard PDF uncertainty. Taking into account that these envelopes are asymmetric the prescription is then σ tot,± = (σ mho,± ) 2 + (σ PDF ) 2 1/2 , (B .3) where σ mho,+(−) q indicates the upper (lower) limit of the envelope, and σ PDF q is the standard PDF uncertainty.
In Fig. B.5 we show the uncertainties σ tot,± , σ mho,± and σ PDF Eq. (B.3) using the 7-point envelope for the gluon, the quark singlet, the down antiquark, and the charm PDFs, all normalized to the central value. In Fig. B.6 we further compare, for the same PDF combinations, the total uncertainties obtained with the envelope method and shown in Fig. B.5 to the total uncertainties obtained with our theory covariance matrix methodology and shown in Fig. 6.2, all normalized to the central curve of the envelope method. The NNLO central curve (with experimental covariance matrix only) is also shown. Results are obtained using the baseline settings: the 7-point prescription for the envelope method and the 9-point prescription for the theory covariance matrix.
It is clear that some qualitative features are common to both uncertainty estimates: in particular, the asymmetry of the envelope prescription favors variations which go towards the direction of the true NNLO result. However, it is also clear that the envelope prescription has a number of shortcomings. It leads to discontinuous and asymmetric uncertainties, which are difficult to accommodate in a Gaussian framework. It is very unstable and strongly dependent on arbitrary choices of the set of the scale variation of which the envelope should be taken. It is quite cumbersome and again arbitrary in requiring one to postulate a specific way of combining MHOU and data-induced PDF uncertainties. It leads to very large MHOU which appear to be overestimated in comparison to the known shift to the NNLO result if a 7-point prescription is used. The reason or the much greater stability of MHOU estimated using the covariance matrix prescription should be clear. When using an envelope prescription, any large deviation in a given direction leads to large uncertainties in that direction, regardless of whether indeed there are large MHOU or not. In a covariance matrix approach, a large eigenvalue in any given direction will allow the fit to move in that direction. This, however, at least in the presence of abundant experimental information, will actually happen only if the data pull in that direction due to MHOU, and otherwise it will simply have little effect. Note also that, by construction, in an envelope approach the best fit will be the same as that in which MHOU are not included: so it is only possible to have a more conservative estimate of the overall uncertainty, but not a more accurate result.
We conclude that, whereas results for MHOU based on scale varied fits and an envelope prescription are by and large consistent with those obtained with a covariance matrix approach, they are less stable, less reliable, and less accurate.    Fig. 6.2)] computed using the 9-point prescription. The central NNLO value obtained using the experimental covariance matrix is also shown. All results are normalized to the central NLO value with experimental covariance matrix.