1 Introduction

An accurate estimate of the uncertainty in Standard Model (SM) predictions is a crucial ingredient for precision phenomenology at the Large Hadron Collider (LHC). Now, and for several years to come [1, 2], theoretical uncertainties for hadron collider processes are dominated by the missing higher order uncertainty (MHOU) in perturbative QCD calculations, usually estimated by scale variation, and by parton distribution function (PDF) uncertainties. Of course, PDFs summarize the information on the nucleon structure extracted from other SM processes [3]: effectively, PDFs provide a way of obtaining a prediction for a given process in terms of other processes. This way of thinking about PDFs immediately shows that MHOUs are present not only in the perturbative prediction for a particular process, but also in the underlying processes used for the PDF determination.

Current PDF uncertainties essentially only include the propagated uncertainty arising from statistical and systematic uncertainties in the experimental data used in their determination. Methodological uncertainties related for example to the choice of functional form for the PDFs, or the fitting methodology employed, can be kept under control using closure tests [4], and with care can be made negligible in the data region. Parametric uncertainties, such as those related to the value of the strong coupling \(\alpha _s(m_Z)\) or the charm mass \(m_c\) can be included by performing fits for a range of parameters. However up until now MHOUs have never been included in a PDF fit: what is usually called the “PDF uncertainty” does not include the MHOU in the theoretical calculations used for PDF determination, and, more generally, does not typically include any source of theory uncertainty.

Historically, this is related to the fact that MHOUs have always been considered as likely to be small in comparison to other PDF uncertainties, especially since NNLO PDFs have become the default standard. However, it is clear that as PDF uncertainties become smaller and smaller, at some point MHOUs will become significant. In the most recent NNPDF set, NNPDF3.1 [5], PDF uncertainties at the electroweak scale can be as low as 1%. Given that the typical size of MHOU on NNLO QCD processes is at the percent level (see e.g. [6]) their neglect seems difficult to justify a priori.

Besides contributing to the overall size of PDF uncertainty, more subtly the MHOU might affect the relative weights of different datasets included in the fit: a dataset which is accurately described by NNLO theory because it has small MHOU should in principle carry more weight than one which is poorly described because it has large MHOU. The neglect of MHOUs might thus be biasing current global PDF fits.

It is the purpose of this paper to set up a general formalism for the inclusion of theoretical uncertainties, specifically MHOUs, in PDF determinations, and then to perform a first exploration of their impact on LHC phenomenology. The development of this treatment of MHOUs will involve three main ingredients. The first is the formulation of a general theory for the inclusion in PDF fits of generic theoretical uncertainties, of which MHOUs are a particular case. The second is the choice of a specific method for estimating the MHOU in each of the cross-sections that enter the PDF fit. The third is the construction of a set of tools for the validation of this methodology, to check that the MHOU is being correctly estimated.

The first ingredient in our approach is common to any kind of theory uncertainty: theory uncertainties include not only MHOUs, but also any other aspect in which the theory used to obtain predictions for the physical processes that enter the PDF fit is incompletely known. These include higher twists (see Refs. [7, 8] and Ref. therein) and other power-suppressed corrections, nuclear corrections when nuclear targets are involved (see Refs. [9, 10] and Ref. therein), final state corrections for non-inclusive processes, and so forth. All of these uncertainties are only meaningful in a Bayesian sense: there is only one correct value of the next-order perturbative correction, not a distribution of values. They thus necessarily involve a process of informed estimation or guesswork: the only way to actually know the size of, say, a missing higher order correction, is to calculate it.

We will show by adopting a Bayesian point of view, and assigning a Gaussian probability distribution to the expected true value of the theory calculation, that the impact of any missing theoretical contribution can be encoded as an additive contribution to the experimental covariance matrix used in the PDF fit [11]. The combination is additive because experimental and theoretical uncertainties are by their nature independent, and are thus combined in quadrature. In a global fit, theoretical uncertainties can be strongly correlated not only across data points within a given experiment, but also between different experiments, and even different processes, so we need a theoretical covariance matrix which includes all these correlations across all the datasets included in the fit.

This then immediately raises the issue of choosing a meaningful way to estimate the MHOU, which in particular incorporates these correlations. The standard way of estimating MHOUs in perturbative QCD calculations is to perform a variation of the renormalization and factorization scales, denoted as \(\mu _r\) and \(\mu _f\) respectively, with various choices for the range and combination of variations existing. While the shortcomings of this method are well known, and various alternatives have been discussed [12,13,14], this remains the default and most widely used option. In the present context, its main advantage is its universality (it can be applied in the same way to any of the processes used in the fit), and the way in which it implicitly incorporates correlations (for example predictions for data points in the same process which are kinematically close will be automatically correlated), even across different processes (through the PDFs, which are the same in every process). Thus while in principle our covariance matrix formalism allows for the inclusion of any method for estimating MHOUs in a PDF determination, here we will specifically use scale variation.

In order to do this, we need to examine systematically the underpinnings of scale variation as a means to estimate theory uncertainties, since different definitions of scale variation have been used in different contexts. Indeed, the standard definitions of renormalization and factorization scale typically used for deep-inelastic scattering and hadronic collisions are not the same. Because PDF fits include both types of processes, it is important to understand in detail how these definitions relate to each other, in order to be able to correlate the scale variations in a meaningful way. Specifically, we will show that one may estimate the MHOU for any process by combining two independent scale variations: one to estimate the MHOU in the perturbative evolution of the PDFs (missing higher orders in the DGLAP splitting functions), and the other to estimate the MHOU in the perturbative calculation of the partonic cross-sections (missing higher orders in the hard-scattering matrix elements).

Once the scales to be varied are understood, the remaining task is to choose a particular prescription to be used to construct the theoretical covariance matrix. In estimating MHOUs for a given process, the most commonly adopted option is the so-called seven-point envelope prescription, in which \(\mu _r\) and \(\mu _f\) are independently varied by a factor of two about the central choice while ensuring that \(1/2 \le \mu _r/\mu _f\le 2\), and the MHOU is then taken as the envelope of the results. For our purposes this is insufficient: rather than taking an envelope, we wish to contruct a covariance matrix out of the scale variations. In particular, because theoretical uncertainties are correlated across processes (through the evolution of the PDFs), we need a prescription for determining the entries of the covariance matrix both within a single process and across pairs of processes.

We will discuss in detail a variety of options to achieve this, based on a general “n-point prescription”. These options will differ from each other in the choice of the number of independent variations, the directions of such variations in the \((\mu _r,\mu _f)\) plane, and the way the variations are correlated (or not) across different processes.

The validation of these point prescriptions, and the choice of the optimal one to be used for PDF determinations is a nontrivial problem, which however admits an elegant solution. The validation can be performed at NLO, by comparing the estimate of the MHOU encoded in the theory covariance matrix to the known next (NNLO) order correction. The problem is then to compare the probability distribution of expected higher-order results to the unique answer given by the NNLO calculation. The solution to this problem is to view the set of shifts between the NLO and NNLO computations for all the processes under consideration as a vector, with one component for each of the data points. The theory covariance matrix corresponding to each prescription then defines a one-sigma ellipsoid in a subspace of this space. The validation is performed by projecting the shift vector into the ellipsoid: if the theory covariance matrix gives a sensible estimate of the MHOU at NLO, the shift vector will lie almost entirely within the ellipsoid. Using this strategy, we will validate a variety of scale variation prescriptions on a similar dataset to that of the global NNPDF3.1 analysis. Since the dimension of the space of datapoints is typically two orders of magnitude higher than the dimension of the subspace of the ellipsoid, this is a highly nontrivial test.

Once a prescription has been selected and used to construct the theory covariance matrix, it is possible to perform a PDF fit based on it. Within the NNPDF methodology, an ensemble of PDF replicas is fitted to data replicas. Data replicas are generated in a way which reflects the uncertainties and correlations of the underlying data, as encoded in their covariance matrix. The best-fit PDF replica for each data replica is then determined by minimizing a figure of merit (\(\chi ^2\)) which is computed using the covariance matrix. As mentioned, and as we shall show in Sect. 2, the theory contribution appears as an independent contribution to the total covariance matrix, uncorrelated with the experimental one and simply added to it. Therefore, once the covariance matrix is supplemented by an extra theory contribution coming from MHOUs, this should be treated on the same footing as any other contribution, and it will thus affect both the data replica generation, and the fitting of PDF replicas to data replicas.

Qualitatively, one may expect the inclusion of the MHOU in the data replica generation to increase the spread of the data replicas, and thus lead in itself to an increase in overall PDF uncertainties. On the other hand the inclusion of the MHOU in the fitting might also reduce tensions within the fit due to the imperfection of the theory and, since these are highly correlated, result in significant shifts in central values, and overall a better fit with reduced uncertainties. The combined effect of including the MHOU in both the data generation and the fitting is thus not at all obvious.

We will investigate these effects by performing PDF determinations in which MHOUs are included in either, or both, the replica generation and the PDF replica fitting. Once again, results can be validated at NLO by comparing NLO PDFs determined with the theory covariance matrix to NNLO PDFs. A successful validation should show that the best-fit NLO PDF moves towards the central NNLO result upon inclusion of the theory covariance matrix in both replica generation and fitting, due to a relaxation of tensions in the NLO fit, and that the NNLO PDF differs from the NLO PDF by an amount which is correctly estimated by the NLO uncertainty band. As we shall see, this is indeed the case, and in fact it will turn out that often the uncertainty band does not increase or even decreases upon inclusion of the theory covariance matrix.

Having determined PDFs which now account for the MHOU associated to the processes that enter the fit, the natural questions which then arise are what is their impact, and more generallly how they should be used for precision LHC phenomenology. In order to address the first question, we will compute predictions with MHOUs for typical LHC standard candle processes, both with and without including the MHOU in the PDF, and provide a first phenomenological exploration and assessment of the impact of these uncertainties.

The second question is not entirely trivial and we will address it in detail. Indeed, scale variation is routinely performed in order to estimate the MHOU in theoretical predictions for hadron collider processes. Clearly, when obtaining a prediction, we should avoid double counting a MHOU which has already been included in the PDF. Instances in which this might happen include not only the trivial situation in which a prediction is obtained for a process which has already been used for PDF determination, but also the somewhat more subtle situation in which the MHOU in the PDF and the observable which is being predicted are correlated through perturbative evolution [15]. We will discuss this situation, and provide guidelines for the usage of PDFs with MHOUs.

This paper is broadly divided into two main parts. In the first part, we construct a general formalism for the inclusion of theory uncertainties and specifically MHOUs in PDF determination, and show how to construct and validate a theory covariance matrix. In the second part, we perform a first investigation of the phenomenological implications of these theory uncertainties. The structure of the paper is the following: in Sect. 2 we show, using a Bayesian approach, that under certain assumptions any type of theory uncertainty can be included as a contribution to the covariance matrix. In Sect. 3 we summarize the theory of scale variation and use it to review, compare and systematize different definitions which have been used in the literature. In Sect. 4 we then formulate a number of “point prescriptions” for the theory covariance matrix, both for a single process, and also to account for correlations between a pair of processes. In Sect. 5 we compute the theory covariance matrix for a variety of prescriptions, we test them against known higher order corrections, and use this comparison to select an optimal prescription.

We then move to the second, more phenomenological, part of the paper. The centerpiece of this section is the determination of NLO PDF sets with MHOU, presented in Sect. 6. We first only include deep-inelastic scattering data (DIS-only fit), and then adopt a global data set, which is compared to PDFs without MHOU, and validated against NNLO PDFs. In Sect. 7 we present initial studies of the phenomenological impact of the inclusion of MHOUs in PDFs for representative LHC processes. Finally in Sect. 8 we provide guidelines for the usage of PDFs with MHOU, in particular concerning the combination of the PDF uncertainties with the MHOU on the hard matrix element, and present the delivery of the PDF sets produced in this work.

Two appendices contain further studies and technical details. In Appendix A we provide additional details concerning the procedure adopted to diagonalise the theory covariance matrix. Then in Appendix B we study another possible validation of the results of Sect. 6, by comparing PDFs with MHOUs to the PDFs obtained by adopting different choices of renormalization and factorization scales in the PDF determination. Families of fits which only differ in choices of scale have never been carried out before and will be presented here for the first time. Whereas they do not necessarily give a fair estimate of the MHOU on PDFs, they surely do provide an indication of the expected impact of scale variation on PDFs, and the pattern of MHOU correlations.

A concise discussion of the main results of this work was presented in Ref. [16], of which this paper represents the extended companion.

2 A theoretical covariance matrix

Parton distribution functions are determined from a set of \(N_\mathrm{dat}\) experimental data points, which we represent by an \(N_\mathrm{dat}\)-dimensional vector \(D_i\), \(i=1,\ldots ,N_{\mathrm{dat}}\). These data points have experimental uncertainties that may be correlated with each other, and this information is encoded in an experimental covariance matrix \(C_{ij}\). This covariance matrix may be block-diagonal if some sets of data are uncorrelated. Each experimental data point has associated with it a “true” value \({{\mathcal {T}}}_i\) – the value given by Nature – whose determination is the goal of the experiment. Since the experimental measurements are imperfect, they cannot determine \({{\mathcal {T}}}\) exactly, but they can be used to estimate the Bayesian probability of a given hypothesis for \({{\mathcal {T}}}\). Assuming that the experimental results are Gaussianly distributed about this hypothetical true value, the conditional probability for the true values \({{\mathcal {T}}}\) given the measured cross-sections D is

$$\begin{aligned} P({{\mathcal {T}}}|D)=P(D|{{\mathcal {T}}}) \propto \exp \left( -{{1}\over {2}}({{\mathcal {T}}}_i-D_i)C_{ij}^{-1}({{\mathcal {T}}}_j-D_j)\right) , \end{aligned}$$
(2.1)

up to an overall normalization constant. Note that this tacitly assumes equal priors for both D and \({{\mathcal {T}}}\).

Of course the true values \({{\mathcal {T}}}_i\) are unknown. However we can calculate theoretical predictions for each data point \(D_i\), which we denote by \(T_i\). These predictions are computed using a theory framework which is generally incomplete: for example because it is based on the fixed-order truncation of a perturbative expansion, or because it excludes higher-twist effects, or nuclear effects, or some other effect that is difficult to calculate precisely. Furthermore, these theory predictions \(T_i\) depend on PDFs, evolved to a suitable scale also using incomplete theory. While the theory predictions may correspond to a variety of different observables and processes, they all depend on the same underlying (universal) PDFs.

We now assume, in the same spirit as when estimating experimental systematics, that the true values \({{\mathcal {T}}}_i\) are centered on the theory predictions \(T_i\), and Gaussianly distributed about the theory predictions, with which they would coincide if the theory were exact and the PDFs were known with certainty. The conditional probability for the true values \({{\mathcal {T}}}\) given theoretical predictions T is then

$$\begin{aligned} P({{\mathcal {T}}}|T)=P(T|{{\mathcal {T}}}) \propto \exp \left( -{{1}\over {2}}({{\mathcal {T}}}_i-T_i)S_{ij}^{-1}({{\mathcal {T}}}_j-T_j)\right) , \end{aligned}$$
(2.2)

again up to a normalization constant, where \(S_{ij}\) is a “theory covariance matrix”, to be estimated in due course.

PDFs are determined by maximizing the probability of the theory given the data P(T|D), marginalised over the true values \({{\mathcal {T}}}\) which of course remain unknown. Now using Bayes’ theorem

$$\begin{aligned} P({{\mathcal {T}}}|DT)P(D|T)=P(D|{{\mathcal {T}}}T)P({{\mathcal {T}}}|T) \, . \end{aligned}$$
(2.3)

Moreover, since the experimental data do not depend on the theorists’ calculations T, but only on the ‘truth’ \({{\mathcal {T}}}\),

$$\begin{aligned} P(D|{{\mathcal {T}}}T)=P(D|{{\mathcal {T}}}). \end{aligned}$$
(2.4)

Then because by construction \(\int \!D^{N}{{\mathcal {T}}}\, P({{\mathcal {T}}}|TD)=1\),

$$\begin{aligned} P(D|T) = \int \!D^{N}{{\mathcal {T}}}\, P({{\mathcal {T}}}|D)P({{\mathcal {T}}}|T) \, , \end{aligned}$$
(2.5)

where the N-dimensional integral is over all of the possible values of \({{\mathcal {T}}}_i\). The probability of the experimental data D is now conditional on the theory T because we have marginalised over the underlying ‘truth’ \({{\mathcal {T}}}\), which is common to both.

Writing the difference between the true \({{\mathcal {T}}}_i\) and the actual \(T_i\) values of the theory prediction as

$$\begin{aligned} \Delta _i \equiv {{\mathcal {T}}}_i-T_i \, , \end{aligned}$$
(2.6)

we can change variables of integration to convert the integral over \({{\mathcal {T}}}_i\) into an integral over the shifts \(\Delta _i\): using the Gaussian hypotheses Eqs. (2.1) and (2.2), Eq. (2.5) becomes that

$$\begin{aligned} P(D|T)&\propto \int \!D^{N}\Delta \, \exp \Bigg (-{{1}\over {2}}\left( D_i-T_i-\Delta _i\right) \nonumber \\&\quad \times C_{ij}^{-1} \left( D_j-T_j-\Delta _j\right) -{{1}\over {2}}\Delta _i S_{ij}^{-1}\Delta _j\Bigg ). \end{aligned}$$
(2.7)

The Gaussian integrals can now be performed explicitly. Adopting a vector notation in order to make the algebra more transparent, we rewrite the exponent as

$$\begin{aligned}&(D - T - \Delta )^T C^{-1} (D - T - \Delta ) + \Delta ^T S^{-1} \Delta \nonumber \\&\quad = \Delta ^T (C^{-1} + S^{-1})\Delta - \Delta ^T C^{-1} (D - T) \nonumber \\&\qquad - (D - T)^T C^{-1} \Delta + (D - T)^T C^{-1} (D - T) \nonumber \\&\quad = (\Delta - (C^{-1} + S^{-1})^{-1} C^{-1} (D - T))^T (C^{-1} + S^{-1}) \nonumber \\&\qquad \times (\Delta - (C^{-1} + S^{-1})^{-1} C^{-1} (D - T)) \nonumber \\&\qquad - (D - T)^T C^{-1} (C^{-1} + S^{-1})^{-1} C^{-1} (D - T)\nonumber \\&\qquad + (D - T)^T C^{-1} (D - T), \end{aligned}$$
(2.8)

where we used the fact that both C and S are symmetric matrices, and in the last line we completed the square. Integrating over \(\Delta \), ignoring the normalization, Eq. (2.7) then becomes

$$\begin{aligned} P(T|D)&=P(D|T) \propto \exp \Bigg (- {{1}\over {2}}(D - T)^T\nonumber \\&\quad \times (C^{-1} - C^{-1} (C^{-1} + S^{-1})^{-1} C^{-1}) (D - T)\Bigg ) \end{aligned}$$
(2.9)

However

$$\begin{aligned} (C^{-1} + S^{-1})^{-1} = (C^{-1} (C + S) S^{-1})^{-1} = S (C + S)^{-1} C, \end{aligned}$$
(2.10)

so that

$$\begin{aligned} \begin{aligned}&C^{-1} - C^{-1} (C^{-1} + S^{-1})^{-1} C^{-1} = C^{-1} - C^{-1} S (C + S)^{-1} \\&\quad = (C^{-1} (C + S) - C^{-1} S) (C + S)^{-1} = (C + S)^{-1}. \end{aligned} \end{aligned}$$
(2.11)

Restoring the indices, we thus find the simple result

$$\begin{aligned} P(T|D) \propto \exp \left( - {{1}\over {2}}(D_i - T_i) (C + S)_{ij}^{-1} (D_j - T_j)\right) . \end{aligned}$$
(2.12)

Comparison of Eq. (2.12) with Eq. (2.1) indicates that when replacing the true \({{\mathcal {T}}}_i\) by the theoretical predictions \(T_i\) in the expression of the \(\chi ^2\) of the data, the theoretical covariance matrix \(S_{ij}\) should simply be added to the experimental covariance matrix \(C_{ij}\) [11]. In effect this implies that, at least within this Gaussian approximation, when determining PDFs theoretical uncertainties can be treated simply as another form of experimental systematic: it is an additional uncertainty to be taken into account when trying to find the truth from the data on the basis of a specific theoretical prediction. The experimental and theoretical uncertainties are added in quadrature because they are in principle uncorrelated.

In the case for which theoretical uncertainties can be neglected, i.e. if \(S_{ij}\rightarrow 0\), then \(P({{\mathcal {T}}}|T)\) in Eq. (2.2) becomes proportional to \(\delta ^N({{\mathcal {T}}}_i-T_i)\). As a result, in this case Eq. (2.12) reduces to Eq. (2.1) with \({{\mathcal {T}}}_i\) replaced by the predictions \(T_i\). This shows that Eq. (2.12) remains true even if \(S_{ij}\) has zero eigenvalues and is thus not invertible. Note however that by construction \(C_{ij}\) is positive definite, since any experimental measurement always has uncorrelated statistical uncertainties due to the finite number of events, so \((C+S)_{ij}\) will always be invertible.

The question remains of how to estimate the theory covariance matrix, \(S_{ij}\). The Gaussian hypothesis Eq. (2.2) implies that

$$\begin{aligned} S_{ij} = \big \langle ({{\mathcal {T}}}_i - T_i)({{\mathcal {T}}}_j - T_j)\big \rangle =\big \langle \Delta _i\Delta _j\big \rangle , \end{aligned}$$
(2.13)

where the average is taken over the true theory values \({{\mathcal {T}}}\) using the probability distribution \(P({{\mathcal {T}}}|T)\), and \(\langle \Delta _i\rangle =0\) consistent with the assumption that the probability distribution of the truth \({{\mathcal {T}}}\) is centred on the theoretical calculation T. In practice however the formal definition Eq. (2.13) is not very helpful: we need some way to estimate the shifts \(\Delta _i\) – ‘nuisance parameters’, in the language of systematic error determination – in a way that takes into account the theoretical correlations between different kinematic points within the same dataset, between different datasets measuring the same physical process, and between datasets corresponding to different processes (with initial state hadrons). Note that theory correlations will always be present even for entirely different processes, through the universal parton distributions: the only processes with truly independent theoretical uncertainties are those with only leptons in the initial state, which are of course irrelevant for PDF determination.

The most commonly used method of estimating the theory corrections due to MHOUs, which can naturally incorporate all these theoretical correlations, is scale variation. This method is reviewed in Sect. 3 in general terms and then used in Sect. 4 in order to formulate specific prescriptions for constructing the theory covariance matrix \(S_{ij}\). Other approaches which have been discussed in the literature involve estimating MHOUs based on the behaviour of the known perturbative orders [12,13,14]; however, at least at present, these do not appear to provide a formalism which is sufficiently well-established, and of appropriately general applicability. We emphasize however that the formalism presented in this section is independent of the specific method adopted to estimate the correlated theory shifts \(\Delta _i\) that enter Eq. (2.13).

3 MHOUs from scale variations

The variation of the renormalization and factorization scales is the most popular approach for estimating missing higher order uncertainties (MHOUs) in QCD perturbative calculations. It has a number of advantages: it naturally incorporates renormalization group (RG) invariance, thereby ensuring that as the perturbative order increases, estimates of MHOU decrease; the same procedure can be used for any perturbative process, since the scale dependence of the strong coupling \(\alpha _s(\mu ^2)\) and of PDFs is universal; the estimates of MHOU it produces are smooth functions of the kinematics, and thereby correctly incorporate the strong correlations in nearby regions of phase space; and correlations between different processes due to universal ingredients such as PDFs can be easily incorporated. Its drawbacks are also well known: there is no unique principle to determine the specific range of the scale variation (nor even the precise central scale to be adopted); and it misses uncertainties associated with new singularities or color structures present at higher orders but missing at lower orders. The former problem may be dealt with, at least qualitatively, by validating a given range in situations where the next order corrections are known. We will attempt such a validation in this paper. The latter problem is more challenging, requiring resummation in the case of unresummed logarithms, or other methods of estimating new types of corrections, and it is unclear whether or not it admits a general solution.

While scale variation has been discussed many times in a variety of contexts, there is no standard, commonly accepted formulation of it, and specifically none that can be applied to both electroproduction and hadroproduction processes, as we need to do if we wish to use scale variation in the context of global PDF analyses. In fact, it turns out that the most commonly adopted approaches to scale variation differ, typically according to the nature of the process which is being considered, though also as a function of time, with different prescriptions being favored in the past than those in common use at the present. Moreover, even the terminology is not uniform: it has evolved over time, resulting in the same names being used for what are essentially different scale variations.

To formulate prescriptions for the general use of scale variation for MHOU estimation which can be applied to any process included in present or future PDF determinations, it is thus necessary to first review the underpinnings of scale variation, and to then use them in order to set up a generally applicable formalism. This will be done in the current section, by specifically discussing the cases of electroproduction and hadroproduction. In particular, we will show that for factorized processes MHOUs on the partonic cross-sections and on perturbative evolution are independent and can be estimated through independent scale variations. We will then discuss how they can be combined, first with a single process and then for several processes, both correlated and uncorrelated.

3.1 Renormalization group invariance

The basic principle of scale variation is based on the observation that scale-dependent contributions to a perturbative prediction are fixed by RG invariance, and therefore scale variation can be used to generate higher order contributions, which are then taken as a proxy for the whole missing higher orders.

More explicitly, consider a generic theoretical prediction (typically a perturbative cross-section) of the form \({\overline{T}}(\alpha _s(\mu ^2), \mu ^2/Q^2)\), where \(\mu ^2\) is the renormalization scale and \(Q^2\) is some physical scale in the process. Thus \({\overline{T}}\) indicates the theory prediction T when it is evaluated at some renormalization scale \(\mu ^2\) instead of being evaluated at the physical scale \(Q^2\): if we instead set \(\mu ^2=Q^2\), then

$$\begin{aligned} T(Q^2) \equiv {\overline{T}} \left( \alpha _s(Q^2), 1 \right) \, . \end{aligned}$$
(3.1)

The QCD running coupling \(\alpha _s(\mu ^2)\) satisfies the RG equation

$$\begin{aligned} \mu ^2 {{d}\over {d \mu ^2}} \alpha _s(\mu ^2) = \beta (\alpha _s(\mu ^2)) \,, \end{aligned}$$
(3.2)

where the QCD beta function has the following perturbative expansion:

$$\begin{aligned} \beta (\alpha _s) = \beta _0 \alpha _s^2 + \beta _1 \alpha _s^3 + \beta _2 \alpha _s^4 + \cdots . \end{aligned}$$
(3.3)

RG invariance is the statement that the all-order prediction is independent of the renormalization scale:

$$\begin{aligned} \mu ^2 {{d}\over {d \mu ^2}} {\overline{T}} \left( \alpha _s(\mu ^2), \mu ^2/Q^2\right) = 0 . \end{aligned}$$
(3.4)

It will be useful in what follows to define the variables

$$\begin{aligned} \mu ^2 = k Q^2,\quad t = \ln (Q^2 / \Lambda ^2), \quad \kappa = \ln k = \ln \mu ^2/Q^2, \end{aligned}$$
(3.5)

so \(\alpha _s(\mu ^2)\) is a function of \(\ln \mu ^2/\Lambda ^2 = t + \kappa \). We can then write the RG equation (3.4) as

$$\begin{aligned} 0= & {} {{d}\over {d \kappa }} {\overline{T}}(\alpha _s(t + \kappa ), \kappa ) \nonumber \\= & {} {{d}\over {d \kappa }} \alpha _s(t + \kappa ) {{\partial }\over {\partial \alpha _s}} {\overline{T}}(\alpha _s(t + \kappa ), \kappa ) \bigg |_\kappa \nonumber \\&+ {{\partial }\over {\partial \kappa }} {\overline{T}}(\alpha _s(t + \kappa ), \kappa ) \bigg |_{\alpha _s}\nonumber \\= & {} {{\partial }\over {\partial t}} {\overline{T}}(\alpha _s(t + \kappa ), \kappa ) \bigg |_\kappa + {{\partial }\over {\partial \kappa }} {\overline{T}}(\alpha _s(t + \kappa ), \kappa ) \bigg |_{\alpha _s} \, , \end{aligned}$$
(3.6)

where in the second line we assume that \({\overline{T}}\) is analytic in \(\alpha _s\) and \(\kappa \), and in the third we use

$$\begin{aligned} {{d}\over {d \kappa }} \alpha _s(t + \kappa ) = {{d}\over {dt}} \alpha _s(t + \kappa ) = \beta (\alpha _s(t + \kappa ) ) \, . \end{aligned}$$
(3.7)

Taylor expanding \({\overline{T}}(\alpha _s, \kappa )\) in \(\kappa \) about \(\kappa =0\) (i.e. \(k=1\), \(\mu ^2=Q^2\)) at fixed coupling \(\alpha _s\),

$$\begin{aligned} {\overline{T}}(\alpha _s(t + \kappa ), \kappa )= & {} {\overline{T}}(\alpha _s(t + \kappa ), 0)\nonumber \\&+\kappa {{\partial }\over {\partial \kappa }} {\overline{T}}(\alpha _s(t + \kappa ), 0) \bigg |_{\alpha _s} \nonumber \\&+ {{1}\over {2}}\kappa ^2 {{\partial ^2}\over {\partial \kappa ^2}} {\overline{T}}(\alpha _s(t + \kappa , 0)\bigg |_{\alpha _s} + \cdots \qquad \nonumber \\= & {} {\overline{T}}(\alpha _s(t + \kappa ), 0) - \kappa {{\partial }\over {\partial t}} {\overline{T}}(\alpha _s(t + \kappa ), 0) \bigg |_\kappa \nonumber \\&+ {{1}\over {2}}\kappa ^2 {{\partial ^2}\over {\partial t^2}} {\overline{T}}(\alpha _s(t + \kappa ), 0)\bigg |_\kappa + \cdots ,\nonumber \\ \end{aligned}$$
(3.8)

where in the second line we use the RG invariance condition, Eq. (3.6), to replace \({{\partial }\over {\partial \kappa }}\) with \(-{{\partial }\over {\partial t}}\). We can thus determine the \(\kappa \) dependence of \({\overline{T}}(\alpha _s, \kappa )\) using the dependence of \(T(t)={\overline{T}}(\alpha _s(t), 0)\) on t:

$$\begin{aligned} {\overline{T}}(\alpha _s(t + \kappa ), \kappa )&= T(t + \kappa ) - \kappa {{d}\over {dt}} T(t + \kappa ) \nonumber \\&\quad + {{1}\over {2}}\kappa ^2 {{d^2}\over {dt^2}} T(t + \kappa )+\cdots . \end{aligned}$$
(3.9)

Now since

$$\begin{aligned} {{d}\over {dt}} T(t) {=} {{d \alpha _s(t)}\over {dt}} {{\partial }\over {\partial \alpha _s}} {\overline{T}}(\alpha _s(t), 0) {=} \beta (\alpha _s(t)) {{\partial }\over {\partial \alpha _s}} {\overline{T}}(\alpha _s(t), 0), \end{aligned}$$
(3.10)

and \(\beta (\alpha _s) = \mathcal {O}(\alpha _s^2)\), we see that \({{1}\over {T}} {{dT}\over {dt}} = \mathcal {O}(\alpha _s)\), while \({{1}\over {T}} {{d^2T}\over {dt^2}} = \mathcal {O}(\alpha _s^2)\) etc.: derivatives with respect to t always add one power of \(\alpha _s\). It follows that in Eq. (3.9), the term \(\mathcal {O}(\kappa )\) is \(\mathcal {O}(\alpha _s)\) with respect to the leading term, and the term \(\mathcal {O}(\kappa ^2)\) is \(\mathcal {O}(\alpha _s^2)\) with respect to the leading term, and so on. We thus see explicitly that the scale-dependent terms (those that depend on \(\kappa \)), at a given order in perturbation theory, are determined by derivatives of the cross-section lower down the perturbation series.

This implies that if we know the cross-section T(t) as a function of the central scale \(Q^2\) to a given order in perturbation theory, we can then use Eq. (3.9) to determine the scale-dependent \(\kappa \) terms directly from T(t) at any given order, by differentiating terms lower down the perturbative expansion. For instance, truncating at LO, NLO, or NNLO, one has

$$\begin{aligned} {\overline{T}}_{\text {LO}}(\alpha _s(t + \kappa ), \kappa )&= T_{\text {LO}}(t + \kappa ), \nonumber \\ {\overline{T}}_{\text {NLO}}(\alpha _s(t + \kappa ), \kappa )&= T_{\text {NLO}}(t + \kappa ) - \kappa {{d}\over {dt}}{T}_{\text {LO}}(t + \kappa ), \nonumber \\ {\overline{T}}_{\text {NNLO}}(\alpha _s(t + \kappa ), \kappa )&= T_{\text {NNLO}}(t + \kappa ) - \kappa {{d}\over {dt}}{T}_{\text {NLO}}(t + \kappa )\nonumber \\&\quad + {{1}\over {2}}\kappa ^2 {{d^2}\over {dt^2}}{T}_{\text {LO}}(t + \kappa ). \end{aligned}$$
(3.11)

The differentiation may be performed analytically, which is trivial for a fixed order expansion, or numerically, which can be useful in a resummed expression where the dependence on \(\alpha _s(t)\) can be nontrivial [17]. Note that when the renormalization scale coincides with the physical scale of the process, \(\mu ^2=Q^2\), then \(\kappa =0\) and \({\overline{T}}=T\) at every order in the perturbative expansion.

The MHOU can now be estimated as the difference between the scale varied cross-section and the cross-section evaluated at the central scale, namely

$$\begin{aligned} \Delta (t,\kappa ) = {\overline{T}}(\alpha _s(t + \kappa ), \kappa ) - T(t) \, . \end{aligned}$$
(3.12)

Thus at LO, NLO and NNLO we have, using Eq. (3.11), that the theory nuisance parameters are given by

$$\begin{aligned} \Delta _{\text {LO}}(t,\kappa )&= T_{\text {LO}}(t + \kappa )-T_{\text {LO}}(t), \nonumber \\ {\Delta }_{\text {NLO}}(t,\kappa )&= \left( T_{\text {NLO}}(t {+} \kappa ) {-} \kappa {{d}\over {dt}}{T}_{\text {LO}}(t {+} \kappa )\right) {-}T_{\text {NLO}}(t), \nonumber \\ {\Delta }_{\text {NNLO}}(t, \kappa )&= \left( T_{\text {NNLO}}(t + \kappa ) - \kappa {{d}\over {dt}}{T}_{\text {NLO}}(t + \kappa )\right. \nonumber \\&\quad \left. + {{1}\over {2}}\kappa ^2 {{{d^2}\over {dt^2}}}{T}_{\text {LO}}(t + \kappa )\right) -T_{\text {NNLO}}(t). \end{aligned}$$
(3.13)

One finds that while at LO the theory uncertainty is entirely due to the scale chosen for \(\alpha _s\), at NLO the dependence on scale is milder since the leading dependence is subtracted off by the \(O(\kappa )\) term. At NNLO it is milder still, since the \(O(\kappa )\) term subtracts the leading dependence in the first term, and the \(O(\kappa ^2)\) removes the subleading dependence in the first two terms. RG invariance then guarantees that the terms generated by scale variation are always subleading, so if the perturbation series is well behaved, the theory shifts \(\Delta \) become smaller and smaller as the order of the expansion is increased.

Clearly the size of the MHOU, estimated in this way, will depend on the size of the scale variation, and thus on the value chosen for \(\kappa \). Typically one varies the renormalization scale by a factor of two in each direction, i.e. \(\kappa \in [-\ln 4,\ln 4]\), since this range is empirically found to yield sensible results for many processes. However, in principle, one should treat \(\kappa \) as a free parameter, whose magnitude needs to be validated whenever possible by comparing to known higher order results.

In the present work, we are specifically interested in the application of this method to processes with one or more hadrons in the initial state, i.e. to cross-sections factorized into a hard cross-section convoluted with a PDF or a parton luminosity. There are then two independent sources of MHOU: the perturbative expansion of the hard partonic cross-section, and the perturbative expansion of the anomalous dimensions that determine the perturbative evolution of the parton distributions. It is convenient to obtain each of these from an independent scale variation, and this can be done by writing separate RG equations for the hard cross-section and for the PDF, as we will demonstrate below. This approach is completely equivalent to the perhaps more familiar point of view in which MHOUs on perturbative evolution are instead obtained by varying the scale at which the PDF is evaluated in the factorized expression, as we will also show.

We will begin by considering the MHOU in the hard-scattering partonic cross-sections; we will then turn to a discussion of MHOUs in the PDF evolution, and show that the latter can be obtained by several equivalent procedures. We will then discuss how both scale variations can be obtained from double scale variation of the hard cross-section, and how this fact also offers the possibility of performing scale variation in alternative ways whereby these two sources of MHOU are mixed. We will discuss these for completeness, since in the past scale variations were often performed in this way. Finally, we will address scale variations and their correlations when several processes are considered at once.

3.2 Scale variation for partonic cross-sections

We start by considering scale variation in hard-scattering partonic cross-sections, first in the case of electroproduction (that is, for lepton-proton deep-inelastic scattering, DIS), and then for the case of hadroproduction (proton-proton or proton-antiproton collisions).

3.2.1 Electroproduction

Consider first an electroproduction process, such as DIS, with an associated structure function given by

$$\begin{aligned} {F}(Q^2) = {C}(\alpha _s(Q^2))\otimes f(Q^2) \, , \end{aligned}$$
(3.14)

where \(\otimes \) is the convolution in the momentum fraction x between the perturbative coefficient function \(C(x,\alpha _s)\) and the PDF \(f(x,Q^2)\), and where the sum over parton flavors is left implicit. In Eq. (3.14) both \(\alpha _s\) and the PDF are evaluated at the physical scale of the process, so nothing depends on unphysical renormalization or factorization scales. We can determine the MHOU associated with the structure function F due to the truncation of the perturbative expansion of the coefficient function by fixing the factorization scheme and keeping fixed the scale at which the PDF is evaluated (usually referred to as factorization scale), but varying the renormalization scale used in the computation of the coefficient function itself.

The scale-dependent structure function \({\overline{F}}\) will then be given by

$$\begin{aligned} {\overline{F}}(Q^2, \mu ^2) = {\overline{C}}\left( \alpha _s(\mu ^2), \mu ^2/Q^2\right) \otimes f(Q^2)\, , \end{aligned}$$
(3.15)

where \(\mu ^2\) is the renormalization scale used in the computation of the coefficient function, or equivalently by

$$\begin{aligned} {\overline{F}}(t, \kappa ) = {\overline{C}}(\alpha _s(t + \kappa ), \kappa )\otimes f(t), \end{aligned}$$
(3.16)

where as in Eq. (3.5) we are using the notation \(t=\ln Q^2/\Lambda ^2\) and \(\kappa = \ln \mu ^2/Q^2\). Note that in Eq. (3.15) the structure function is written as a function of \(\mu ^2\) in the sense of the RG equation (3.4): the dependence on \(\mu ^2\) cancels order by order, and the residual dependence can be used to estimate the MHOU.

In phenomenological applications, it is more customary to write \(F(Q^2)\), i.e. not to write the dependence of F on \(\mu ^2\), thereby emphasizing the renormalization scale independence of the physical observable, and just to indicate the scale dependence of the hard coefficient function \({\overline{C}}(\alpha _s(\mu ^2), \mu ^2/Q^2)\). Here and in the sequel we will stick to the notation used in RG equations since we wish to emphasize that, as the scale is varied, we are dealing with a one-parameter family of theory predictions for the physical (RG invariant) observable, which all coincide to the accuracy at which they are calculated but which differ by higher order terms.

Now, the RG invariance of physical cross-sections, and therefore of the structure function F, requires RG invariance of the coefficient function. This is because we are not varying the factorization scheme, so the PDF is independent of the renormalization scale \(\mu \). It follows that, as in Eq. (3.11),

$$\begin{aligned} {\overline{C}}(\alpha _s(t + \kappa ), \kappa )&= C(t + \kappa ) - \kappa {{d}\over {dt}} C(t + \kappa ) \nonumber \\&\quad + {{1}\over {2}}\kappa ^2 {{d^2}\over {dt^2}} C(t + \kappa )+\cdots , \end{aligned}$$
(3.17)

where \(C(t) = {\overline{C}}(\alpha _s(t),0)\) is the coefficient function evaluated at \(\mu ^2=Q^2\), and thus \(\kappa =0\). Then, given the perturbative expansion of the coefficient function,

$$\begin{aligned} C(t) = c_0 + \alpha _s(t) c_1 + \alpha _s^2(t) c_2 + \alpha _s^3(t) c_3 +\cdots , \end{aligned}$$
(3.18)

its derivatives can be easily evaluated using the beta function expansion Eq. (3.3),

$$\begin{aligned} \begin{aligned} {{d}\over {dt}}{C}(t)&= \alpha _s^2(t) \beta _0 c_1+ \alpha _s^3(t) (\beta _1c_1+2\beta _0c_2) + \cdots ,\\ {{d^2}\over {dt^2}}{C}(t)&= 2\alpha _s^3(t) \beta _0^2 c_1+ \cdots , \end{aligned} \end{aligned}$$
(3.19)

and we find that the renormalization scale variation of the coefficient function is

$$\begin{aligned} {\overline{C}}(\alpha _s(t + \kappa ), \kappa )&= c_0 \nonumber \\&\quad + \alpha _s(t + \kappa ) c_1 + \alpha _s^2(t + \kappa ) (c_2 - \kappa \beta _0 c_1)\nonumber \\&\quad +\alpha _s^3(t + \kappa )\big (c_3-\kappa (\beta _1c_1+2\beta _0c_2)\nonumber \\&\quad + \kappa ^2 \beta _0^2 c_1\big ) + \cdots \, . \end{aligned}$$
(3.20)

Again, note that in the case where \(\mu ^2=Q^2\), and so \(\kappa =0\), one recovers the standard perturbative expansion Eq. (3.18). We can now find the scale-dependent structure function,

$$\begin{aligned} {\overline{F}}(t, \kappa )&= c_0\otimes f(t) + \alpha _s(t + \kappa ) c_1\otimes f(t)\nonumber \\&\quad + \alpha _s^2(t + \kappa ) \left( c_2 - \kappa \beta _0 c_1 \right) \otimes f(t)\nonumber \\&\quad +\alpha _s^3(t + \kappa ) \left( c_3-\kappa (\beta _1c_1+2\beta _0c_2)+ \kappa ^2 \beta _0^2 c_1 \right) \nonumber \\&\quad \otimes f(t) + \cdots . \end{aligned}$$
(3.21)

Note that evaluating these expressions is numerically very straightforward, in that the scale-varied expression Eq. (3.21) has the same form, involving the same convolutions of \(c_i\) with f, as the convolution with the PDFs to the given order at the central scale Eqs. (3.14) and (3.18), only with rescaled coefficients. This means there is no need to recompute NNLO corrections, K-factors, etc.: all that is necessary is to change the coefficients in the perturbative expansion at the central scale according to Eq. (3.21).

3.2.2 Hadronic processes

MHOUs in the partonic hard cross-sections of hadronic processes can be computed in the same way as for DIS. The only additional complication is that the physical observable – typically, a cross-section \(\Sigma \) – now depends on the convolution of two PDFs:

$$\begin{aligned} \Sigma (t) = H(t)\otimes ( {f}(t)\otimes {f}(t)) \, , \end{aligned}$$
(3.22)

where again the physical scale is \(t = \ln (Q^2 / \Lambda ^2)\), H(t) is the partonic hard-scattering cross-section, the PDFs are convoluted together into a parton luminosity \(\mathcal {L}=f\otimes f\), and the sum over parton flavors is left implicit. Then, varying the renormalization scale \(\kappa = \ln \mu ^2/Q^2\) in the hard cross-section, we have

$$\begin{aligned} {\overline{\Sigma }}(t,\kappa ) = {\overline{H}} (\alpha _s(t+\kappa ), \kappa )\otimes (f(t)\otimes f(t)). \end{aligned}$$
(3.23)

where, just as for electroproduction, for PDFs evaluated at a fixed scale T, the RG invariance tells us that \({\overline{H}} (\alpha _s(t), \kappa )\) is given in terms of H(t) by Eq. (3.9):

$$\begin{aligned} {\overline{H}}(\alpha _s(t), \kappa ) = H(t) - \kappa {{d}\over {dt}} H(t) + {{1}\over {2}}\kappa ^2 {{d^2}\over {dt^2}} H(t)+\cdots . \end{aligned}$$
(3.24)

If the partonic process begins at \(O(\alpha _s^n)\), with \(n=0,1,2,\ldots \), then one can expand the hard cross-section as follows

$$\begin{aligned} H(t) = \alpha _s^n(t)h_0 + \alpha _s^{n+1}(t)h_1 + \alpha _s^{n+2}(t)h_2+ \cdots . \end{aligned}$$
(3.25)

Then, as in the case of electroproduction, using Eq. (3.3) we can readily evaluate these derivatives,

$$\begin{aligned} \begin{aligned} {{d}\over {dt}}{H}(t)&= n\alpha _s^{n-1}(t)\beta (\alpha _s) h_0 + (n+1)\alpha _s^n(t)\beta (\alpha _s) h_1 + \cdots \\&= \alpha _s^{n+1} n \beta _0 h_0 + \alpha _s^{n+2} (n \beta _1 h_0 + (n+1) \beta _0 h_1) + \cdots \\ {{d^2}\over {dt^2}}{H}(t)&= \alpha _s^{n+2} n(n+1) \beta _0^2 h_0 + \cdots \end{aligned} \end{aligned}$$
(3.26)

so that, putting everything together, the expression for the scale-varied partonic cross-section to be used to evaluate the scale-varied hadronic cross-section \({\overline{\Sigma }}\), Eq. (3.23), will be given by

$$\begin{aligned} {\overline{H}}(\alpha _s, \kappa )= & {} \alpha _s^n h_0 + \alpha _s^{n+1} (h_1 - \kappa n \beta _0 h_0) \nonumber \\&+\alpha _s^{n+2} \Big (h_2 - \kappa (n \beta _1 h_0 + (n+1) \beta _0 h_1) \nonumber \\&+ {{1}\over {2}}\kappa ^2 n(n+1) \beta _0^2 h_1\Big ) + \cdots . \end{aligned}$$
(3.27)

This is rather more involved than Eq. (3.21), but shares the same advantages: the convolutions to be evaluated in Eq. (3.23) have the same structure as those in Eq. (3.22), so all that is required to vary the renormalization scale is to modify their coefficients.

3.3 Scale variation for PDF evolution

The renormalization scale variation described in the previous section can be used to estimate the MHOU in any partonic cross-section of an electroproduction or hadroproduction process evaluated to a fixed order in perturbation theory. However, when computing factorized observables of the form Eqs. (3.14, 3.22), an entirely independent source of MHOU arises from the truncation of the perturbative expansion of the splitting functions (or anomalous dimensions in Mellin space) that govern the PDF evolution equations. We now show that this MHOU can again be estimated by scale variation; we will also show that this scale variation can be performed in different ways: either at the level of the anomalous dimension; or at the level of the PDFs themselves; or finally at the level of the hard-scattering partonic coefficient functions, by exploiting the fact that physical results cannot depend on the scale at which the PDF is evaluated, and so one may trade the effect of scale variation between the PDF and the hard coefficient function.

Consider a PDF \(f(\mu ^2)\), where \(\mu \) is the scale at which the PDF is evaluated. For simplicity, in this section all the argument is presented implicitly assuming a Mellin space formalism, so that convolutions are replaced by ordinary products. Also, indices labeling different PDFs are left implicit, so our argument applies directly to the nonsinglet case but can be straightforwardly generalized to the singlet evolution and to other flavor combinations.

The scale dependence of \(f(\mu ^2)\) is fixed by the evolution equation

$$\begin{aligned} \mu ^2 {{d}\over {d \mu ^2}} f(\mu ^2) = \gamma (\alpha _s(\mu ^2)) f(\mu ^2)\, , \end{aligned}$$
(3.28)

which applies also to the general singlet case assuming that a sum over parton flavors is left implicit. The anomalous dimension admits a perturbative expansion of the form

$$\begin{aligned} \gamma (t) = \alpha _s(t) \gamma _0 + \alpha _s^2(t) \gamma _1^2 + \alpha _s^3(t) \gamma _2^3 + \cdots . \end{aligned}$$
(3.29)

Eq. (3.28) can be integrated to give

$$\begin{aligned} f(\mu ^2) = \text {exp}\bigg (\int ^{\mu ^2} {{d \mu '^{2}}\over {\mu '^{2}}} \gamma (\alpha _s(\mu '^{2}))\bigg ) f_0 \, , \end{aligned}$$
(3.30)

where \(f_0\) indicates the PDF at the initial scale \(\mu _0\). Of course, the left-hand side of the equation is independent of this initial scale \(\mu _0\), so the dependence can be left implicit also on the right-hand side, by not specifying the lower limit on the integral. In practice, if the PDF \(f_0\) were extracted from data, any change in this scale would be entirely reabsorbed by the fitting procedure.

We now observe the well-known fact that the anomalous dimension in Eq. (3.28) is a RG invariant quantity, and therefore the scale on which it depends is physical. However, this physical scale can in general be different from the renormalization scale used to determine the anomalous dimension itself (e.g. if it were determined through the renormalization of a twist-two operator). We let \(\mu ^2 = k Q^2\), where as in the general argument of Sect. 3.1, \(\mu ^2\) is an arbitrary renormalization scale and \(Q^2\) is a physical scale. We can make \(\gamma \) independent of the renormalization scale order by order in perturbation theory if we define its scale-varied counterpart in the same way as before

$$\begin{aligned} {\overline{\gamma }}(\alpha _s(t), \kappa ) = \gamma (t) - \kappa {{d}\over {dt}}{\gamma }(t) + {{1}\over {2}}\kappa ^2 {{d^2}\over {dt^2}}{\gamma }(t) + \cdots , \end{aligned}$$
(3.31)

with \(\kappa \) given by Eq. (3.5) and \(\gamma (t)= {\overline{\gamma }}(\alpha _s(t), 0)\), so that given the perturbative expansion Eq. (3.29) one has that

$$\begin{aligned} {\overline{\gamma }}(\alpha _s(t + \kappa ), \kappa )= & {} \alpha _s(t+\kappa ) \gamma _0 + \alpha _s^2(t+\kappa ) (\gamma _1 - \kappa \beta _0 \gamma _0) \nonumber \\&+ \alpha _s^3(t+\kappa ) (\gamma _2 - \kappa (\beta _1 \gamma _0 + 2 \beta _0 \gamma _1) \nonumber \\&+ \kappa ^2 \beta _0^2 \gamma _0) + \cdots \end{aligned}$$
(3.32)

is independent of \(\kappa \) up to higher orders terms, order by order. Note that Eq. (3.32) has the same form as Eqs. (3.253.27) (with \(n=1\)).

We have shown that variation of the scale on which the anomalous dimension depends can be used, in the usual way, to generate higher order terms which estimate MHOUs in the expansion of the anomalous dimension itself. We now show how the same result can be obtained by scale variation at the PDF level. Inserting the result Eq. (3.32) in the solution of the evolution equations for the PDFs, Eq. (3.30), one finds that the evolution factor can be expressed as

$$\begin{aligned}&\exp \left( \int ^{t} dt' {\overline{\gamma }}(\alpha _s(t' + \kappa ), \kappa )\right) \nonumber \\&\quad = \exp \left( \int ^{t + \kappa } dt' {\overline{\gamma }}(\alpha _s(t'), \kappa )\right) \nonumber \\&\quad = \exp \left( \left[ \int ^{t + \kappa } dt' \gamma (t')\right] \right. \nonumber \\&\qquad \left. - \kappa \gamma (t + \kappa ) + {{1}\over {2}}\kappa ^2 {{d}\over {dt}} {\gamma }(t + \kappa ) + \cdots \right) \nonumber \\&\quad = \left[ 1 - \kappa \gamma (t + \kappa ) + {{1}\over {2}}\kappa ^2 (\gamma ^2(t + \kappa ) \right. \nonumber \\&\qquad \left. +{{d}\over {dt}}{\gamma }(t + \kappa )) + \cdots \right] \exp \left( \int ^{t + \kappa } dt' \gamma (t')\right) \ , \end{aligned}$$
(3.33)

where in the first line we changed integration variable (ignoring any change in the lower limit of integration), in the second we used Eq. (3.31), and in the third we expanded the exponential perturbatively. We can now use this result to determine renormalization scale variation in the evolution directly from the scale dependence of the PDF, as in Ref. [17]. Defining a scale-varied PDF as

$$\begin{aligned} {\overline{f}}(\alpha _s(t + \kappa ), \kappa ) = \text {exp}\bigg (\int ^t dt' {\overline{\gamma }}(\alpha _s(t' + \kappa ), \kappa )\bigg ) f_0 \, , \end{aligned}$$
(3.34)

that is, as the PDF obtained by varying the renormalization scale in the anomalous dimension, then \(f(t) = {\overline{f}}(\alpha _s(t), 0)\), and using Eq. (3.33) we find that

$$\begin{aligned} {\overline{f}}(\alpha _s(t + \kappa ), \kappa )= & {} \left[ 1 - \kappa \gamma (t + \kappa ) + {{1}\over {2}}\kappa ^2 (\gamma ^2(t + \kappa ) \right. \nonumber \\&\left. +{{d}\over {dt}}{\gamma }(t + \kappa )) + \cdots \right] \,f(t+\kappa ),\nonumber \\ \end{aligned}$$
(3.35)

provided only that any variation of the initial scale \(\mu _0\) due to changes in \(\kappa \) has been reabsorbed into the initial PDF \(f_0\).

Equation (3.35) is the same as the result obtained from varying the scale \(\mu ^2\) at which the PDF is evaluated about the physical scale \(Q^2\): just as in the derivation of Eq. (3.24), this gives

$$\begin{aligned} \begin{aligned} {\overline{f}}(\alpha _s(t + \kappa ), \kappa )&= f(t + \kappa ) - \kappa {{d}\over {dt}} {f}(t + \kappa ) \\&\quad + {{1}\over {2}}\kappa ^2 {{d^2}\over {dt^2}}{f}(t + \kappa ) + \cdots \\&= f(t + \kappa ) - \kappa \gamma f(t + \kappa ) \\&\quad + {{1}\over {2}}\kappa ^2 \big (\gamma ^2 + {{d}\over {dt}}\gamma \big ) f(t + \kappa ) + \cdots , \end{aligned} \end{aligned}$$
(3.36)

where in the second line we used the PDF evolution equation, Eq. (3.28). Thus there is little point in varying the renormalization scale of the anomalous dimension and the scale at which the PDF is evaluated independently: provided we absorb changes in the initial scale in the initial PDF, and use the linearised solution of the evolution equation, the result (Eq. (3.35) or Eq. (3.36)) is precisely the same. This is essentially because the PDF f(t) depends on only a single scale.

Equation (3.35) indicates that the \(\kappa \) dependence can be factorized out of the PDF. We can use this property to factor it into the hard-scattering coefficient function. Consider for example electroproduction, whose factorized structure function is given by Eq. (3.14):

$$\begin{aligned} {\widehat{F}}(t, \kappa )= & {} {C}(t) {\overline{f}}(\alpha _s(t+\kappa ),\kappa )\nonumber \\= & {} {C}(t) \left[ 1 - \kappa \gamma (t + \kappa ) + {{1}\over {2}}\kappa ^2 (\gamma ^2(t + \kappa )\right. \nonumber \\&\left. +{{d}\over {dt}}{\gamma }(t + \kappa )) + \cdots \right] f(t+\kappa )\nonumber \\\equiv & {} {\widehat{C}}(t, \kappa ) f(t+\kappa ) \, , \end{aligned}$$
(3.37)

where in the second line we used the expansion Eq. (3.35), and the third line should be viewed as the definition of the scale-varied coefficient function \({\widehat{C}}(t+\kappa , \kappa )\). Moreover, given the relation

$$\begin{aligned} {{d}\over {dt}}\gamma (\alpha _s) = \beta (\alpha _s) {{d\gamma }\over {d\alpha _s}} \, , \end{aligned}$$
(3.38)

and then using the perturbative expansions of the beta function \(\beta \), the anomalous dimension \(\gamma \), and the coefficient function C, Eqs. (3.3), (3.29), and (3.18), respectively, one finds

$$\begin{aligned} {\widehat{C}}(t, \kappa )= & {} c_0 + \alpha _s(t) (c_1 -\kappa \gamma _0) + \alpha _s^2(t) \nonumber \\&\times \left( c_2- \kappa (\gamma _0 c_1 +\gamma _1 c_0) + {{1}\over {2}}\kappa ^2 \gamma _0(\gamma _0 + \beta _0)c_0)\right) \nonumber \\&+ \cdots \, . \end{aligned}$$
(3.39)

Note that this result for \({\widehat{C}}(t, \kappa )\) is not the same as \({\overline{C}}(t + \kappa , \kappa )\), Eq. (3.20). The reason is that \({\overline{C}}(t + \kappa , \kappa )\) is obtained from the variation of the renormalization scale of the hard coefficient function, and can be used to estimate the MHOU in the perturbative expansion of the coefficient function, while \({\widehat{C}}(t, \kappa )\) is obtained from the variation of the renormalization scale of the anomalous dimension, and can be used to estimate the MHOU in the perturbative evolution of the PDF. We have obtained the former from RG invariance of the hard cross-section, and the latter from RG invariance of the anomalous dimension. However, Eq. (3.37) can be equivalently viewed as expressing the fact that the physically observable structure function cannot depend on the scale at which the PDF is evaluated in the factorized expression, usually referred to as factorization scale: provided we absorb changes in the initial scale in the initial PDF, varying the scale of the anomalous dimension is identical to varying the scale of the PDF.

It is customary to refer to the scale variation which estimates MHOU in the coefficient function as renormalization scale variation: this corresponds to evaluating \({\overline{C}}(t +\kappa , \kappa )\) in Eq. (3.20). The scale variation which estimates MHOU in the anomalous dimension, and corresponds to \({\widehat{C}}(t + \kappa , \kappa )\) in Eq. (3.39), is usually called instead factorization scale variation. This terminology is used for example by the Higgs Cross-Section working group [18] and more generally within the context of LHC physics; in the older DIS literature the same terminology has a somewhat different meaning, as we shall discuss in Sect. 3.4 below.

The previous discussion entails that in practice there are (at least) three different ways of estimating the MHOU associated to the PDF evolution in terms of the anomalous dimension at fixed order in perturbation theory by means of scale variations:

  1. (A)

    The renormalization scale of the anomalous dimension can be varied directly, using Eq. (3.32). This approach works well provided that the initial PDF \(f_0\) is refitted, but if it is held fixed care must be taken to absorb scale variations of the initial scale into the initial PDF. This method was used for DIS renormalization scale variations in many older papers, see e.g. Refs. [19,20,21]). It has the disadvantage that it requires refitting the PDF as the scale is varied, which is cumbersome for most applications.

  2. (B)

    The scale at which the PDF is evaluated can be varied, either analytically or numerically, using Eq. (3.36). This is in many ways the simplest method, as the initial PDF remains unchanged, while only the PDF is involved so the result is manifestly universal. Furthermore it is easily adapted to a variable flavor number scheme (VFNS), since the MHOUs in the PDFs with different numbers of active flavors can each be estimated separately. The numerical method was employed in [17], in the context of small x resummation. It has the disadvantage that if one wishes to estimate the impact on a given physical observable one needs to first generate the scale-varied PDF, before combining it with the hard coefficient function.

  3. (C)

    The scale at which the PDF is evaluated is varied, but the compensating scale-dependent terms are factorized into the coefficient function using for example Eq. (3.39). This factorization scale variation is most commonly used when evaluating a new process using an established PDF set, e.g. in studies of LHC processes (as in Ref. [18]) since it has the advantage that it can be implemented directly using an external interpolated PDF set (such as provided by LHAPDF [22]). It has the conceptual disadvantage that the universality of the variation is obscured, since the scale dependent terms are mixed in the expansion of the coefficient function (this is particularly complicated in a VFNS, where the coefficient functions also depend on heavy quark masses), and the practical disadvantage that it requires the evaluation of new contributions to the coefficient function involving additional convolutions. Also, it can be impractical in situations where higher order corrections are difficult to evaluate precisely due to numerical issues.

Note that whereas these methods are in principle completely equivalent, they can differ by subleading terms according to the convention used to truncate the perturbation expansion. Indeed, in method (A) the expansion of the anomalous dimension is truncated, but higher order terms in the exponentiation may be retained depending on the form of the solution to the evolution equations adopted; in method (B) the exponential has been expanded (see Eq. (3.33)) so the result is the same as would be obtained with a linearized solution of the evolution equation; while in method (C) cross-terms between the expansion of linearized evolution and coefficient function expansion have also been dropped (compare Eq. (3.37) with Eq. (3.39)). However, since the differences always involve higher order terms, each method can be regarded as giving an equally valid estimate of the MHOU in the perturbative evolution: differences between methods should be viewed as the uncertainty on the MHOU itself when estimated by scale variation.

3.4 Double scale variations

We now discuss the combination of the two independent scale variations of Sects. 3.2 and 3.3, respectively estimating MHOUs in the hard cross-section and in perturbative evolution, thereby deriving master formulae for scale variation up to NNLO which will then be used in the subsequent sections. For completeness, we will also discuss different options for scale variation which have been considered in the literature, and clarify some terminological mismatches, especially between the older studies of DIS and the more recent applications to LHC processes.

3.4.1 Electroproduction

Consider first the more general factorization of an electroproduction cross-section, such as a DIS structure function:

$$\begin{aligned} {\overline{F}}(Q^2, \mu _f^2, \mu _r^2) = {\overline{C}} \left( \alpha _s(\mu _r^2),\ \mu _r^2/Q^2 \right) \otimes {\overline{f}} \left( \alpha _s(\mu _f^2), \mu _f^2/Q^2\right) , \end{aligned}$$
(3.40)

where here and in the following we adopt the (standard) terminology that we introduced in Sect. 3.3, and the viewpoint which corresponds to option (B) of that section: \(\mu _r\) denotes the renormalization scale, whose dependence is entirely contained in the hard coefficient function \({\overline{C}}\) (as in Eq. (3.15)), and whose variation estimates MHOUs in its expansion; while \(\mu _f\) denotes the factorization scale, whose dependence is entirely contained in the PDF (as in Eq. (3.34)), and whose variation estimates MHOUs in the expansion of the anomalous dimension (or equivalently the splitting functions). In the following, as in Sect. 3.3, we will omit the convolution as well as the parton indices.

Note that again, as in Eq. (3.15), and then in Eqs. (3.23), (3.31), and (3.36), the dependence on the scales \(\mu _f\) and \(\mu _r\) should be understood in the sense of the RG equation: the structure function does not depend on them, but as the scales are varied there remains a subleading dependence which estimates the MHOU. As already mentioned, this notation, while standard in the context of RG equations, is somewhat unusual in the context of factorization, where instead it is more customary to omit the scale dependence of the physical observable.

Given that the structure function \({\overline{F}}(Q^2, \mu _f^2, \mu _r^2)\) factorizes into the hard coefficient function and the PDF, the factorization and renormalization scales \(\mu _f\) and \(\mu _r\) can be chosen completely independently; the scale dependence will also factorize. Explicitly, we define

$$\begin{aligned} \mu _f^2= & {} k_fQ^2 \, , \quad \mu _r^2 = k_rQ^2\, , \quad \mathrm{with} \quad t_f = t + \kappa _f ,\nonumber \\ \quad t_r= & {} t + \kappa _r \, , \end{aligned}$$
(3.41)

and then \(\kappa _f=\ln k_f\), \(\kappa _r=\ln k_r\). In terms of these variables, the factorized structure function will be given by

$$\begin{aligned} {\overline{F}}(t, \kappa _f, \kappa _r) = {\overline{C}}(t_r, \kappa _r) {\overline{f}}(t_f, \kappa _f), \end{aligned}$$
(3.42)

where, as in Sects. 3.2 and 3.3, the scale-varied PDF and coefficient functions are

$$\begin{aligned} \begin{aligned} {\overline{f}} (t_f, \kappa _f)&= f(t_f) - \kappa _f {{d}\over {dt}}{f}(t_f) + {{1}\over {2}}\kappa _f^2 {{d^2}\over {dt^2}}{f}(t_f) + \cdots \, , \\ {\overline{C}} (t_r, \kappa _r)&= C(t_r) - \kappa _r {{d}\over {dt}}{C}(t_r) + {{1}\over {2}}\kappa _r^2 {{d^2}\over {dt^2}}{C}(t_r) + \cdots \, , \end{aligned} \end{aligned}$$
(3.43)

where \(f(t_f) \equiv {\overline{f}}(t_f, 0)\) and \(C(t_r) \equiv {\overline{C}}(t_r, 0)\) stand for the PDF and the coefficient function evaluated at the central scale, \(\mu _f^2=Q^2\) and \(\mu _r^2=Q^2\), respectively. Recalling that \({{\partial }\over {\partial t}} \sim \mathcal {O}(\alpha _s)\), the structure function is therefore given by

$$\begin{aligned} {\overline{F}}(t, \kappa _f, \kappa _r)= & {} C(t_r)f(t_f) \nonumber \\&- \left( \kappa _r {{d}\over {dt}}{C}(t_r) f(t_f) + \kappa _f C(t_r) {{d}\over {dt}}{f} (t_f)\right) \nonumber \\&+ {{1}\over {2}}\left( \kappa _r^2 {{d^2}\over {dt^2}}{C}(t_r)f(t_f) \right. \nonumber \\&\left. + 2\kappa _r \kappa _f {{d}\over {dt}}{C}(t_r){{d}\over {dt}}{f}(t_f) + \kappa _f^2 C(t_r) {{d^2}\over {dt^2}} f(t_f) \right) \nonumber \\&+ \mathcal {O}(\alpha _s^3) \, . \end{aligned}$$
(3.44)

From this expression, it follows that scale variations with respect to \(\kappa _f\) can be determined by taking derivatives with respect to \(t_f\) while holding \(t_r\) fixed and vice-versa, so one has

$$\begin{aligned} {\overline{F}}(t, \kappa _f, \kappa _r)= & {} F(t_f,t_r) - \bigg ( \kappa _f\ {{\partial F}\over {\partial t_f}}\bigg |_{t_r} + \kappa _r\ {{\partial F}\over {\partial t_r}}\bigg |_{t_f} \bigg ) \nonumber \\&+ {{1}\over {2}}\bigg ( \kappa _f^2 {{\partial ^2 F}\over {\partial t_f^2}}\bigg |_{t_r} \nonumber \\&+ 2 \kappa _f \kappa _r {{\partial ^2 F}\over {\partial t_f \partial t_r}} + \kappa _r^2 {{\partial ^2 F}\over {\partial t_r^2}}\bigg |_{t_f} \bigg ) + \cdots \, .\nonumber \\ \end{aligned}$$
(3.45)

In other words, we can think of the two variations as being generated by \(\kappa _f {{\partial }\over {\partial t_f}}\) and \(\kappa _r {{\partial }\over {\partial t_r}}\) respectively.

We can equivalently treat the factorization scale variation using method (C) of the previous section, and thus factorize both scale variations into the coefficient function, as done in Eq. (3.39). In the case of electroproduction, inserting the expansions of Eq. (3.18) in Eq. (3.44) one obtains

$$\begin{aligned} {\overline{F}}(t,\kappa _f, \kappa _r) = \widehat{{\overline{C}}}(\alpha _s(t_r), \kappa _f, \kappa _r) f(t_f) \, , \end{aligned}$$
(3.46)

with now all dependence on \(\kappa _r\) and \(\kappa _f\) encoded into a redefined coefficient function:

$$\begin{aligned}&\widehat{{\overline{C}}}(\alpha _s(t_r), \kappa _f, \kappa _r) \equiv c_0 + \alpha _s(t_r) c_1 - \alpha _s(t_f) \kappa _f\ c_0 \gamma _0 \nonumber \\&\quad + \alpha _s(t_r)^2 (c_2 - \kappa _r \ \beta _0c_1) - \alpha _s(t_r) \alpha _s(t_f) \kappa _f \ c_1 \gamma _0 \nonumber \\&\quad + \alpha _s^2(t_f)(-\kappa _f \ c_0\gamma _1 + {{1}\over {2}}\kappa _f^2 c_0 \gamma _0(\beta _0 + \gamma _0)) + \cdots \nonumber \\&= c_0 + \alpha _s(t_r)(c_1 -\kappa _f \ c_0 \gamma _0) + \alpha _s^2(t_r) \nonumber \\&\quad \times \big (c_2 - \kappa _r \ \beta _0 c_1 - \kappa _f\ (c_1\gamma _0 + c_0\gamma _1) \nonumber \\&\quad + {{1}\over {2}}\kappa _f^2 c_0\gamma _0(\gamma _0 - \beta _0) + \kappa _f \kappa _r \beta _0 c_0 \gamma _0\big ) + \cdots \end{aligned}$$
(3.47)

up to terms of \(\mathcal {O}(\alpha _s^3(t_r))\), given that one can change the scale that enters the coupling using

$$\begin{aligned} \alpha _s(t_f)=\alpha _s(t_r) +(\kappa _f-\kappa _r)\beta _0\alpha _s^2(t_r)+\cdots \, . \end{aligned}$$
(3.48)

Note that in the expression for \( \widehat{{\overline{C}}}\) the coupling constant is always evaluated at the renormalization scale \(\mu _r\), and that for \(\kappa _r=\kappa _f=0\) one gets back the original perturbative expansion Eq. (3.18).

However, especially in the context of PDF determinations, as opposed to the situation in which a pre-computed PDF set is being used, it is rather more convenient to use either of methods (A) or (B) from Sect. 3.3 when estimating the MHOU in the scale dependence of the PDF, since this can be done without reference to any particular process. We can then determine the universal \(\mu _f\) variation by varying the scale in the PDF evolution, as done for instance in Eq. (3.32) or Eq. (3.36), while instead the process-dependent \(\mu _r\) variation is estimated by varying the renormalization scale in the coefficient function, as done in Eq. (3.20), or Eq. (3.27) in the case of hadronic processes.

Note that since all scale-varied terms ultimately derive from the scale dependence of the universal QCD coupling \(\alpha _s(\mu ^2)\), it is reasonable to treat the independent scale variations of \(\mu _f\) and \(\mu _r\) symmetrically, e.g. by varying in the range \(|\kappa _f|, |\kappa _r| \le \ln 4\). Indeed, this symmetry is an advantage of the method: we use the same variation for estimating all MHOUs. Since \(\mu _f\) and \(\mu _r\) can each be varied independently, a simple option is to perform the double scale variations by considering the five scale choices \((\kappa _f,\kappa _r)= (0,0),(\pm \ln 4,0),(0,\pm \ln 4)\). We will refer to this as 5-point scale variation; alternative schemes will be considered in the next section.

Note finally that if we set the renormalization and factorization scales in Eq. (3.40) to be equal to each other, \(\mu _f^2=\mu _r^2 = {\tilde{\mu }}^2\), we have the factorization

$$\begin{aligned} {\widetilde{F}}(Q^2, {{\tilde{\mu }}}^2) = {\widetilde{C}}(\alpha _s({{\tilde{\mu }}}^2), {{\tilde{\mu }}}^2/Q^2)\ f({{\tilde{\mu }}}^2) \, . \end{aligned}$$
(3.49)

In most of the earlier papers, mainly concerned with DIS structure functions, e.g. [19, 20, 23,24,25], the scale \({{\tilde{\mu }}}^2\) was termed the factorization scale: this originates in the earliest papers on the OPE. However, in our current terminology it corresponds to both renormalization and factorization scales taken equal to each other. Likewise, in the earlier papers what here we call the factorization scale \(\mu _f\) was referred to as the renormalization scale. Here, to avoid confusion, we will call \({{\tilde{\mu }}}^2\) in Eq. (3.49) the scale of the process.

For clarity the different nomenclatures for the various scales used in the earlier papers, and in more modern work (and in this paper), are summarized in Table 1.

Table 1 Nomenclatures for the different scale variations used in some of the earlier papers (mainly in the context of DIS), and in more recent work (mainly in the context of hadronic processes), as discussed in detail in the text. The ‘modern’ terminology is adopted throughout this paper

Consider now the effect on the structure function of varying the scale of the process. As before, we define \({{\tilde{\kappa }}} = \ln {\tilde{\mu }}^2/Q^2\) and write

$$\begin{aligned} {\widetilde{F}}(t + {{\tilde{\kappa }}}, {{\tilde{\kappa }}}) = {\widetilde{C}}(\alpha _s(t + {{\tilde{\kappa }}}), {{\tilde{\kappa }}})\ f(t+ {{\tilde{\kappa }}}) \, . \end{aligned}$$
(3.50)

Now the renormalization group invariance of the cross-section [i.e. Eq. (3.4)] requires a cancellation between scale variations in the coefficient function and the PDF: with \(F(t) \equiv {\widetilde{F}}(t, 0)\),

$$\begin{aligned} {\widetilde{F}}(t + {{\tilde{\kappa }}}, {{\tilde{\kappa }}})&= F(t + {{\tilde{\kappa }}}) - {{\tilde{\kappa }}} {{d}\over {dt}}{F}(t+{{\tilde{\kappa }}}) \nonumber \\&\quad +{{1}\over {2}}{{\tilde{\kappa }}}^2 {{d^2}\over {dt^2}}{F}(t + {{\tilde{\kappa }}}) + \cdots \nonumber \\&= Cf - {{\tilde{\kappa }}}\left( {{d}\over {dt}} C + \gamma C\right) f \nonumber \\&\quad + {{1}\over {2}}{{\tilde{\kappa }}}^2 \left( {{d^2}\over {dt^2}}{C} + 2\gamma {{d}\over {dt}}{C} + C{{d}\over {dt}}{\gamma } + C\gamma ^2\right) f \nonumber \\&\quad + \cdots . \end{aligned}$$
(3.51)

where the first line is the same as Eq. (5.8) in Ref. [17] while in the second line we used Eq. (3.36) for scale variation of the PDF. Then, expanding in the usual way, we find that

$$\begin{aligned} {\overline{C}}(t + {{\tilde{\kappa }}}, \kappa )&= c_0 + \alpha _s(t+ {{\tilde{\kappa }}})(c_1 -{{\tilde{\kappa }}} c_0 \gamma _0) \nonumber \\&\quad + \alpha _s^2 ( t + {{\tilde{\kappa }}})\big (c_2 - {{\tilde{\kappa }}}(\beta _0 c_1 + c_1 \gamma _0 + c_0 \gamma _1) \nonumber \\&\quad + {{1}\over {2}}{{\tilde{\kappa }}}^2 \ c_0 \gamma _0 (\beta _0 + \gamma _0)\big )+\cdots \end{aligned}$$
(3.52)

which indeed coincides with the expression for what is referred to as factorization scale variation in this earlier literature: see e.g. Ref. [26], Eq. (2.17). Therefore, varying the scale of the process mixes together the scale dependence in the coefficient function and the scale dependence in the PDF: indeed, if in Eq. (3.47) we set \(\kappa _f=\kappa _r={{\tilde{\kappa }}}\), it reduces to Eq. (3.52).

Fig. 1
figure 1

The two-dimensional space of scale variations for a single process: \(\kappa _r\) is the renormalization scale (giving the MHOU in the hard cross-section), \(\kappa _f\) is the factorization scale (giving the MHOU in the evolution of the PDF) and \({{\tilde{\kappa }}}\) is the variation of the scale of the process (called factorization scale variation in the earlier literature), obtained by setting \(\kappa _f=\kappa _r\)

Clearly, variations of \({{\tilde{\mu }}}^2\) are not independent of the variations of \(\mu _f^2\) or \(\mu _r^2\): rather they are generated by \({{\tilde{\kappa }}}\ ({{\partial }\over {\partial t_f}} + {{\partial }\over {\partial t _r}})\), so they correspond to directions along the diagonal in the space of \(\kappa _f\) and \(\kappa _r\), see Fig. 1. In the earlier literature, MHOUs were estimated by combining renormalization scale variation with this latter variation, namely by varying \({{\tilde{\mu }}}^2\) and \(\mu _f^2\): see e.g. Refs. [19, 20]. This however has the disadvantage of generating large scale ratios: performing variations of \({{\tilde{\mu }}}^2\) and \(\mu _f^2\) sequentially we can obtain \(\kappa _f = 2 \ln 4\), because

$$\begin{aligned} {{\tilde{\kappa }}} \left( {{\partial }\over {\partial t_f}} + {{\partial }\over {\partial t _r}} \right) + \kappa _f\ {{\partial }\over {\partial t_f}} = ({{\tilde{\kappa }}}+\kappa _f)\ {{\partial }\over {\partial t_f}} + {{\tilde{\kappa }}} \ {{\partial }\over {\partial t _r}} \, . \end{aligned}$$
(3.53)

A way of avoiding these large ratios was constructed in Ref. [26]: first do the scale variation of Eq. (3.52), but then substitute

$$\begin{aligned} c_2 \rightarrow c_2 - (\kappa _r - \kappa _f) \beta _f c_1 = c_2 - \left( \ln \mu _f^2/\mu _r^2\right) \beta _0 c_1 \, , \end{aligned}$$
(3.54)

where care must be taken to use the correct argument of \(\alpha _s\) in each term. Indeed, this procedure then agrees with Eq. (3.46) given that

$$\begin{aligned} \kappa _f\ {{\partial }\over {\partial t_f}} + \kappa _r\ {{\partial }\over {\partial t _r}} = \kappa _f \left( {{\partial }\over {\partial t_f}} + {{\partial }\over {\partial t _r}}\right) + (\kappa _r - \kappa _f){{\partial }\over {\partial t _r}} \, . \end{aligned}$$
(3.55)

3.4.2 Hadronic processes

Consider now the case of hadronic process as in Eq. (3.22). For these processes, the factorization has the general form

$$\begin{aligned} {\overline{\Sigma }}(t_f, t_r, \kappa _f, \kappa _r) = {\overline{H}}(\alpha _s(t_r), \kappa _r)\otimes \left( {\overline{f}}(t_f, \kappa _f)\otimes {\overline{f}}(t_f, \kappa _f) \right) \, . \end{aligned}$$
(3.56)

The hard coefficient function will have the same expansion as Eq. (3.27). Just as for electroproduction, it is possible to factorize variations of \(\kappa _f\) into the hard coefficient functions: then

$$\begin{aligned} {\overline{\Sigma }}(t_f, t_r, \kappa _f, \kappa _r) = \widehat{{\overline{H}}}(\alpha _s(t_r), \kappa _r,\kappa _f)\otimes ({f}(t_f)\otimes {f}(t_f)), \end{aligned}$$
(3.57)

where (using as above Mellin space, to avoid the convolutions), one finds

$$\begin{aligned} \widehat{{\overline{H}}}= & {} ~\alpha _s^n(t_r) h_0 + \alpha _s^{n+1}(t_r)(h_1 - \kappa _r \ \beta _0 h_0) \nonumber \\&- 2\alpha _s^n(t_r)\alpha _s(t_f) \kappa _0 \ h_0 \gamma _0 \nonumber \\&+ \alpha _s^{n+2}(t_r)\big (h_2 - \kappa _2(n\beta _1h_0 + (n+1)\beta _0 h_1)\nonumber \\&+ {{1}\over {2}}\kappa _2^2 n(n+1) \beta _0^2 h_1\big ) \nonumber \\&- \alpha _s^{n+1}(t_r)\alpha _s(t_f)\big (\kappa _0 (h_1 - \kappa _2 \beta _0 h_0)2 \gamma _0\big )\nonumber \\&+\alpha _s^n(t_r)\alpha _s^2(t_f)\big (- \kappa _0 h_0 2\gamma _1 + {{1}\over {2}}\kappa _0^2h_02 \gamma _0 (\beta _0 + 2\gamma _0)\big )\nonumber \\&+ \cdots . \end{aligned}$$
(3.58)

However these expressions are even more cumbersome than in the case of electroproduction, thereby demonstrating the greater clarity of methods (A) or (B) in determining the dependence on the scale \(\mu _f\). By adopting one of these two methods, we can determine the MHOU in a hadronic process through independent variations of the factorization scale \(\mu _f\) and the renormalization scale \(\mu _r\) in just the same way as we estimated the MHOU in the deep inelastic structure function in the previous section.

3.5 Multiple scale variations

We finally consider simultaneous scale variation in a pair of processes: for instance the electroproduction process of Sect. 3.4.1 and a hadronic process as in Sect. 3.4.2. Clearly, the PDF is universal, but the coefficient functions are process-dependent. It follows that while the scale variations of \(\kappa _r\) in the two coefficient functions will be totally independent, the scale variation \(\kappa _f\) of the PDF will be correlated between these two processes.

The degree of this correlation is somewhat subtle: indeed, \(\kappa _f\) generates MHO terms in anomalous dimensions, but the anomalous dimension matrix has several independent eigenvalues (two singlet and one nonsinglet which at NLO and beyond further splits into C-even and C-odd). Hence in principle one should introduce an independent factorization scale variation for each of these components, which is then fully correlated across all processes. For the time being, we will perform fully correlated variations of the factorization scale. This is an approximation, which may not be accurate particularly for processes which depend on PDFs whose evolution is controlled by different anomalous dimensions (such as, say, the singlet and the isospin triplet). We will comment further on this approximation in the sequel.

Now, considering both processes together, we have three independent scales to vary, \(\mu _f\), \(\mu _{r_1}\), and \(\mu _{r_2}\), where \(\mu _{r_1}\) is the renormalization scale for the deep inelastic process, and \(\mu _{r_2}\) is the renormalization scale for the hadronic process. The relation of the factorization scale \(\mu _f\) to the physical scale of each process (whatever that is) is the same for both processes, since the PDFs are universal. Thus if we vary all scales independently by a factor two about their central value we end up with seven scale choices. We can think of the additional renormalization scale as an extra dimension in the space of possible scale variations.

By trivial generalization for p independent processes \(\pi _a\), \(a=1,\ldots ,p\), we will have \(p+1\) independent scale parameters \(\mu _f,\mu _{r_1},\ldots \mu _{r_p}\) corresponding to a total of 3+2p scale variations. Writing \(\kappa _{r_a} = \ln \mu _{r_a}^2/Q^2\) with \(a= 1,\ldots ,p\), the traditional range of variation of \(\kappa _f, \kappa _{r_1},\ldots , \kappa _{r_p}\) would then be defined by

$$\begin{aligned} |\kappa _f| \le \ln 4,\quad |\kappa _{r_a}| \le \ln 4,\quad a=1,\ldots p \, . \end{aligned}$$

Clearly all prescriptions constructed in this way will be symmetrical in the different scales.

We now see why, for the determination of MHOUs in PDFs, it is advantageous to work with the independent scales \(\kappa _f\), \(\kappa _{r_a}\), \(a = 1,\ldots ,p\) rather than with the traditional factorization scales \({{\tilde{\kappa }}}\) used in the older treatments of scale variation: while the scale \(\kappa _f\) used to estimate MHOUs in the PDF evolution is universal, the scales \(\kappa _{r_a}\) used to estimate MHOUs in the hard cross-sections are instead process-dependent. We can therefore only define process scales \({{\tilde{\kappa }}}\) by either introducing artificial correlations between the scales of the hard cross-sections for different processes (which would result in underestimated MHOU in the hard cross-sections), or else by sacrificing universality of the PDFs, with uncorrelated evolution uncertainties for different processes (which would result in overestimated MHOU from PDF evolution). Neither of these options is very satisfactory, though we consider the latter briefly in Sect. 4.3 below, where it gives rise to asymmetric scale-variation prescriptions.

4 Scale variation prescriptions for the theory covariance matrix

Having set out a general formalism for the inclusion of MHOUs through a theory covariance matrix, based on assuming a distribution of shifts between a theory calculation at finite perturbative order and the true all–order value (Sect. 2), and having discussed how scale variation can be used to produce estimates for such shifts (Sect. 3), we now provide an explicit prescription for the construction of a theory covariance matrix from scale variation. Because of the intrinsic arbitrariness involved in the procedure, we actually propose several alternative prescriptions, which will be then validated in the next section by studying cases in which the next perturbative order is in fact known. We will also assess the impact at the PDF fit level of varying the prescription used for constructing the theory covariance matrix.

We consider a situation in which we have p different types of processes \(\pi _a =\{i_a\}\), where \(i_a\) labels the data points belonging to the a-th process and \(a = 1,\ldots ,p\). Each of the p processes is characterized by a factorization scale \(\mu _f\) (associated with the PDFs) and a renormalization scale \(\mu _{r_a}\) (associated with the hard coefficient functions), to be understood in the sense of the ‘modern’ terminology in Table 1. We will perform scale variation of both scales following Sect. 3.4, by taking them as independent, as discussed in that section. When considering a pair of different processes, as explained in Sect. 3.5, we assume the variations of \(\mu _{r_a}\) to be uncorrelated among them, while those of \(\mu _f\) are taken to be fully correlated.

The theory covariance matrix is then constructed by averaging outer products of the shifts with respect to the central scales, given for the a-th process as

$$\begin{aligned} \Delta _{i_a} (\kappa _f, \kappa _{r_a} ) \equiv T_{i_a}(\kappa _f, \kappa _{r_a}) - T_{i_a}(0,0) \, , \end{aligned}$$
(4.1)

over points in the space of scales. Here, as before, we have defined \(\kappa _{r_a}=\ln k_{r_a} = \ln \mu _{r_a}^2/Q^2\) and \(\kappa _{f}=\ln k_{f} = \ln \mu _{f}^2/Q^2\). In Eq. (4.1), \(T_{i_a}(\kappa _f, \kappa _{r_a})\) indicates the theoretical prediction evaluated at these scales with \(T_{i_a}(0,0)\) being the central theory prediction, and the index \(i_a\) running over all data points corresponding to process a.

We assume here that all scale variations correspond to the same range

$$\begin{aligned} |\kappa _f| \le w,\quad |\kappa _{r_a}| \le w,\quad a=1,\ldots , p, \end{aligned}$$

for some w (typically \(w = \ln 4\), as in Eq. (3.5)). In practice, in each prescription the three points \(\kappa = 0, \pm w\) are sampled for each scale. The theory covariance matrix is then

$$\begin{aligned} S_{ij} = N_m \sum _{V_m} \Delta _{i_a} (\kappa _f, \kappa _r) \Delta _{i_b} (\kappa _f, \kappa _s) \end{aligned}$$
(4.2)

where \(i_a\ \in \ \pi _a\) and \(i_b\ \in \ \pi _b\) indicate two data points, possibly corresponding to different processes \(\pi _a\) and \(\pi _b\), m labels the prescription, \(V_m\) is the set of scale points to be summed over in the given prescription, and \(N_m\) is a normalization factor, both to be determined. Different prescriptions to construct the theory covariance matrix \(S_{ij}\) vary in the set of combination of scales which are summed over in Eq. (4.2), as we will discuss below.

Because Eq. (4.2) is a sum of outer products of shifts, the theory covariance matrix \(S_{ij}\) is positive semi-definite by construction. To demonstrate this, consider a real vector \(v_i\): then it follows that

$$\begin{aligned} \sum _{ij} v_iS_{ij}v_j = N_m \sum _{V_m} \left( \sum _i v_i\Delta _{i}\right) ^2 \ge 0. \end{aligned}$$
(4.3)

Note however that because the number of elements of \(V_m\) is finite, \(S_{ij}\) will generally be singular, since for any vector \(z_j\) which is orthogonal to the space S spanned by the set of vectors \(\{\Delta _{i_a}(\kappa _f, \kappa _{r_a}): \kappa _f, \kappa _{r_a} \in V_m\}\), \(S_{ij}z_j=0\). This property will be important when we come to validate the covariance matrix in the following section, by constructing the set of orthonormal eigenvectors \(e_i^\alpha \) which span the space S.

It is interesting to note that the diagonalization of \({\widehat{S}}_{ij}\) can be rephrased in terms of nuisance parameters of the systematic uncertainties associated with the MHOU. For example, following the notation of Appendix A.2 of Ref. [27], the absolute correlated uncertainties \(\beta _{i,\alpha }\) may be expressed in terms of the eigenvalues and eigenvectors of the normalised covariance matrix \({{{\widehat{S}}}}_{ij} = \sum _{\alpha =1}^{N_{\mathrm{sub}}} (s^\alpha )^2 e_i^\alpha e_j^{\alpha }\) as

$$\begin{aligned} \beta _{i,\alpha } = T_i^{\mathrm{NLO}} s^{\alpha } e_i^\alpha , \end{aligned}$$
(4.4)

for \(\alpha =1,\ldots ,N_{\mathrm{sub}}\). An algorithm for constructing the eigenvectors \(e_i^\alpha \) from the shifts induced by scale variation is given in Appendix A. This way of looking at the theory covariance matrix might be useful in that the nuisance parameters can be interpreted in terms of missing higher order contributions. For instance, the values of the nuisance parameters which optimize the agreement bwetween data and theory are the most likely guess for MHO terms which is favored by the data, everything else being equal.

We now consider various prescriptions. Because \(S_{ij}\) will in general span the full set of data points, we must consider both the case in which points \(i,\>j\) in Eq. (4.2) belong to the same process (“single process prescription”) and the case in which they belong to two different processes (“multiple process prescription”). We first discuss the case of symmetric scale variation, in which the two scales are varied independently, and then the case in which the two scales are varied in a correlated way, the latter scenario being equivalent to varying the “scale of the process” (in the sense of Table 1), thereby leading to asymmetric prescriptions as already mentioned in Sect. 3.5.

4.1 Symmetric prescriptions for individual processes

We consider first the prescriptions for when there is just a single process, that is, \(p=1\). In this case, there are at most two independent scales, the factorization and renormalization scales \(\kappa _f\) and \(\kappa _r\). The theory covariance matrix is then constructed as

$$\begin{aligned} S_{ij} = n_{m} \sum _{v_{m}} \Delta _{i} (\kappa _f, \kappa _r)\Delta _{j} (\kappa _f, \kappa _r)\, , \end{aligned}$$
(4.5)

where again \(v_m\) represents the set of points to be summed over in the given prescription, limited here to points in the space of the two scales \(\kappa _f\) and \(\kappa _r\), and \(n_m\) is the normalization factor. Let s be the number of independent scales being varied (so \(s=1\) or \(s=2\)), and m be the number of points in the variation (so m is the number of elements of \(v_m\)): a given scheme is then usually described as an ‘\((m+1)\)-point scheme’. Note that we do not include in \(v_m\) trivial points for which \(\Delta _{i}\) vanishes (which in practice means the single point \(\kappa _f=\kappa _r=0\)), since these do not contribute to the sum.

The normalization factor \(n_m\) in Eq. (4.5) is determined by averaging over the number of points associated with the variation of each scale, and adding the contributions from variation of independent scales. This means that

$$\begin{aligned} n_m = s/m. \end{aligned}$$
(4.6)
Fig. 2
figure 2

Symmetric prescriptions for a single process, indicating the sampled values for the factorization scale \(\kappa _f\) and renormalization scale \(\kappa _r\) in each case. The origin of coordinates corresponds to the central scales \(\kappa _f=\kappa _r= 0\). We show the three prescriptions 5-point (left), \(\bar{5}\)-point (center) and 9-point (right)

We consider three different prescriptions, represented schematically in Fig. 2.

  • 5-point  We vary \(\kappa _f\) keeping \(\kappa _r = 0\) and vice versa, so \(v_4 = \{(\pm ;0), (0; \pm ) \}\), where the pairs denote the values of the two independent scales \((\kappa _f; \kappa _r)\). Then \(s=2\), \(m=4\), and the normalisation is \(n_4 = 1/2\). This definition implies that we can average over the two nontrivial values of the each scale in turn, and add the results:

    $$\begin{aligned} S^{\mathrm{(5pt)}}_{ij}\! = \!{{1}\over {2}}\Big \{ \Delta _i^{+0}\Delta _j^{+0} + \Delta _i^{-0} \Delta _j^{-0} + \Delta _i^{0+} \Delta _j^{0+} + \Delta _i^{0-} \Delta _j^{0-} \Big \} \, , \end{aligned}$$
    (4.7)

    where we have adopted the abbreviated notation \(\Delta _i^{+0}=\Delta _i(+w,0)\), \(\Delta _i^{0-}=\Delta _i(0,-w)\), etc. for the shifts.

    Note that the variations of \(\kappa _f\) and \(\kappa _r\) are added in quadrature since they are independent: this is why it is important to make sure that the variations are indeed independent, as is the case for renormalization and factorization scales, as discussed in Sect. 3.4.

  • \({\overline{5}}\)-point  This is an alternative 5-point prescription, which is basically the complement of 5-point: \({\overline{v}}_4 = \{(\pm ; \pm ) \}\), where \((\pm ;\pm )\) are assumed uncorrelated, i.e. 4 independent points.

    The counting is the same as for 5-point: \(s=2\), \(m=4\) and again \({\overline{n}}_4 = 1/2\):

    $$\begin{aligned} S^{(\mathrm {\overline{5}}pt)}_{ij}&= {{1}\over {2}}\Big \{ \Delta _i^{++}\Delta _j^{++} + \Delta _i^{--}\Delta _j^{--} \nonumber \\&\quad + \Delta _i^{+-}\Delta _j^{+-} + \Delta _i^{-+} \Delta _j^{-+}\Big \} \, . \end{aligned}$$
    (4.8)

    As before, the two scales are varied in a manifestly independent way.

  • 9-point  Here we vary \(\kappa _f\) and \(\kappa _r\) completely independently, giving the union of the 5-point and \({\overline{5}}\)-point prescriptions: \(v_8=v_4\oplus {\overline{v}}_4\). Now \(s=2\), \(m=8\) and \(n_8=1/4\), and the theory covariance matrix is given by

    $$\begin{aligned} S^{(\mathrm 9pt)}_{ij}&= {{1}\over {4}}\Big \{ \Delta _i^{+0} \Delta _j^{+0} + \Delta _i^{-0}\Delta _j^{-0} + \Delta _i^{0+} \Delta _j^{0+} + \Delta _i^{0-}\Delta _j^{0-} \nonumber \\&\quad + \Delta _i^{++}\Delta _j^{++} + \Delta _i^{+-} \Delta _j^{+-} + \Delta _i^{-+}\Delta _j^{-+} \nonumber \\&\quad + \Delta _i^{--} \Delta _j^{--} \Big \}. \end{aligned}$$
    (4.9)

4.2 Symmetric prescriptions for multiple processes

Now we consider multiple processes, i.e. \(p>1\), with scale variations either uncorrelated or partially correlated. In Eq. (4.2), the set \(V_m\) now involves possible variations of the \(p+1\) scales \(\kappa _f\), \(\kappa _{r_1},\ldots \kappa _{r_p}\), where \(\kappa _{r_a}\) indicates the renormalization scale for process \(a=1,\ldots ,p\). This implies that now \(V_m\) is a much bigger set than \(v_m\). However any given element of \(S_{ij}\) in Eq. (4.2) can involve at most two different processes, \(\pi _a\) and \(\pi _b\), so to compute this element we can simply ignore the other processes. Consequently, it is sufficient to consider \(p=2\), since generalization to \(p>2\) will then be straightforward.

Fig. 3
figure 3

Same as Fig. 2, now for the case of two different processes \(\pi _1\) and \(\pi _2\) with a common factorization scale \(\kappa _f\) and different renormalization scales \(\kappa _{r_1}\) and \(\kappa _{r_2}\), so the diagrams are now in three dimensions. The origin of coordinates is associated to the central scale, \(\kappa _f=\kappa _{r_1}=\kappa _{r_2}=0\). We again show the three prescriptions 5-point (left), \(\bar{5}\)-point (center) and 9-point (right)

For a given pair of processes, say \(\pi _1\) and \( \pi _2\), the covariance matrix has diagonal elements \(S_{i_1j_1}, S_{i_2j_2}\) and off-diagonals \(S_{i_1j_2} = S_{j_2i_1}\), where as above the extra subscript indicates the process: \(i_1,j_1\in \pi _1\), \(i_2,j_2\in \pi _2\). Thus one can write

$$\begin{aligned} S_{ij} = \left( \begin{array}{cc} S_{i_1j_1}&{}S_{i_1j_2}\\ S_{i_2j_1}&{}S_{i_2j_2}\end{array}\right) \, . \end{aligned}$$
(4.10)

Consider first the diagonal blocks \(S_{i_1j_1}\) and \(S_{i_2j_2}\). Adding process \(\pi _2\) cannot change the theoretical uncertainty in process \(\pi _1\), although the two uncertainties may be correlated. Consequently \(S_{i_1j_1}\) and \(S_{i_2j_2}\) are each given by the same expression as in the single-process case, Eq. (4.5), so we must have

$$\begin{aligned} S_{i_1j_1}= & {} N_m \sum _{V_m} \Delta _{i_1} (\kappa _f, \kappa _{r_1}) \Delta _{j_1} (\kappa _f, \kappa _{r_1})\nonumber \\= & {} n_{m} \sum _{v_{m}} \Delta _{i_1} (\kappa _f, \kappa _{r_1})\Delta _{j_1} (\kappa _f, \kappa _{r_1}) \, . \end{aligned}$$
(4.11)

This can only be true if the set of points \(v_m\) in Eq. (4.5) is a subset of the set \(V_m\) in Eq. (4.2): so when for example computing \(S_{i_1j_1}\), \(\Delta _{i_1}\) and \(\Delta _{j_1}\) depend only on the scales \(\kappa _f\) and \(\kappa _{r_1}\) associated with \(\pi _1\), and are independent of the scale \(\kappa _{r_2}\) associated with \(\pi _2\). Consequently, when we sum over \(V_m\) in Eq.(4.2), performing the trivial sum over \(\kappa _{r_2}\) must reduce \(V_m\) to its subset \(v_m\), up to a degeneracy factor \(d_m\) which counts the number of copies of elements of \(v_m\) contained in \(V_m\). This fixes the overall normalization factor \(N_m\):

$$\begin{aligned} N_m = n_m/d_m \, . \end{aligned}$$
(4.12)

It remains to determine \(V_m\) for a given \((m+1)\)-point prescription. It is easy to see that in each case we obtain a unique result, which is in a sense a direct product of p copies of \(v_m\), taking into account the common scale \(\kappa _f\). The points in the \((\kappa _f,\kappa _{r_1},\kappa _{r_2})\) space that are being sampled in each prescription when there are two processes are shown in Fig. 3 (corresponding to the single-process prescriptions shown in Fig. 2).

To show how this works, we consider each prescription in turn, starting with the \({\overline{5}}\)-point prescription which is easier to construct than 5-point.

  • \({\overline{5}}\)-point  For two processes, \(\pi _1\) and \(\pi _2\) say, we now have three scales, \(\kappa _f, \kappa _{r_1}, \kappa _{r_2}\) which can each be varied independently. For the \({\overline{5}}\)-point prescription we only consider variations in which none of the scales is at the central value: \({\overline{v}}_4 = \{(\pm ;\pm )\}\), where the ± variations are performed independently. It follows that \({\overline{V}}_4 = \{(\pm ;\pm ,\pm )\}\), where the triples denote the three independent scales, \((\kappa _f; \kappa _{r_1},\kappa _{r_2})\), varied independently.

    The set \({\overline{V}}_4\) thus has eight points in total. For each element of \({\overline{v}}_4\), there are two elements of \({\overline{V}}_4\), so \({\overline{d}}_4=2\), and since \({\overline{n}}_4=1/2\), \({\overline{N}}_4=1/4\). The result for the off-diagonal blocks of the theory covariance matrix in this prescription is thus given by

    $$\begin{aligned} S^{(\mathrm {\overline{5}}pt)}_{i_1j_2}&= {{1}\over {4}}\left\{ \left( \Delta _{i_1}^{++} + \Delta _{i_1}^{+-}\right) \left( \Delta _{j_2}^{++} + \Delta _{j_2}^{+-} \right) \right. \nonumber \\&\quad \left. + \left( \Delta _{i_1}^{-+} + \Delta _{i_1}^{--}\right) \left( \Delta _{j_2}^{-+} + \Delta _{j_2}^{--} \right) \right\} . \end{aligned}$$
    (4.13)

    From this expression it is clear that while the scale \(\kappa _f\) is varied coherently between the two processes, the scales \(\kappa _{r_1}\) and \(\kappa _{r_2}\) are varied incoherently, as required.

    It is straightforward to generalize this procedure to three processes: then \({\overline{V}}_4 = \{(\pm ;\pm ,\pm ,\pm )\}\), so \({\overline{d}}_4=4\), and since \({\overline{n}}_4=1/2\), \({\overline{N}}_4=1/8\). However Eq. (4.13) remains unchanged, in the sense that it can be used to evaluate all three off-diagonal blocks \(s_{i_1j_2}\), \(s_{i_2j_3}\), \(s_{i_3j_1}\): this must always be the case, since each term in the sum Eq. (4.2) involves at most three scales. For p processes, it is easy to see that the number of distinct elements of \(V_4\) is \(2^{p+1}\).

  • 5-point  Again for two processes we have three scales, but now one varies each holding the other fixed to its central value: \(v_4 = \{(\pm ;0),(0;\pm )\}\), so \(V_4 = \{2(\pm ;0,0), (0;\pm ,\pm )\}\), where the two in front of the first element indicates that there are two copies of it, so \(V_4\) has eight elements in total. Then for each element of \(v_4\), there are precisely two elements of \(V_4\), so \(d_4=2\), and since \(n_4=1/2\), \(N_4=1/4\).

    The result for the off-diagonal entries of the theory covariance matrix in this prescription is thus given by

    $$\begin{aligned} S^{(\mathrm 5pt)}_{i_1j_2}= & {} {{1}\over {4}}\left\{ 2\Delta _{i_1}^{+0}\Delta _{j_2}^{+0} + 2\Delta _{i_1}^{-0}\Delta _{j_2}^{-0} \right. \nonumber \\&\left. + \left( \Delta _{i_1}^{0+} + \Delta _{i_1}^{0-} \right) \left( \Delta _{j_2}^{0+} + \Delta _{j_2}^{0-}\right) \right\} . \end{aligned}$$
    (4.14)

    Note that also in this expression the variations of \(\kappa _f\) are manifestly correlated between the two processes, whereas the variations of \(\kappa _{r_1}, \kappa _{r_2}\) are not.

    When there are three processes, it is easy to see that \(V_4 = \{4(\pm ;0,0,0), (0;\pm ,\pm ,\pm )\}\), i.e. it has 16 elements, though only 10 are distinct: the other six are simply copies, necessary to obtain the correct coefficients in Eq. (4.7) and Eq. (4.14). There are now four elements of \(V_4\) for each element of \(v_4\), so now \(d_4=4\), and \(N_4=1/8\). Again Eq. (4.14) can be used to calculate all three off-diagonal blocks. For p processes, it is easy to see that \(V_4\) has \(2^{p+1}\) elements, but that only \(2+2^p\) of these are actually distinct.

  • 9-point  Here we vary the three scales completely independently: \(v_8 = v_4 \oplus {\overline{v}}_4\). Constructing \(V_{8}\) is now somewhat more involved, since while terms with \(\kappa _f=0\) have degeneracy 2, terms where \(\kappa _f=0\) is varied have degeneracy 3, so we need three copies of the former and two of the latter to take the overall degeneracy to 6. The solution is thus , where means either \(+\), − or 0. Thus \(V_8\) has 48 elements, of which only 22 are actually distinct. Since the first term of \(V_8\) has a degeneracy of 2, while the last has a degeneracy of 3, the overall degeneracy is \(d_8=6\), and since \(n_8=1/4\), \(N_8=1/24\). It follows that the off-diagonal blocks of the theory covariance matrix in this prescription are

    $$\begin{aligned} S^{(\mathrm 9pt)}_{i_1j_2}&= {{1}\over {24}}\left\{ 2\left( \Delta _{i_1}^{+0}+\Delta _{i_1}^{++} + \Delta _{i_1}^{+-}\right) \right. \nonumber \\&\quad \times \left( \Delta _{j_2}^{+0} + \Delta _{j_2}^{++} + \Delta _{j_2}^{+-} \right) \nonumber \\&\quad + 2\big (\Delta _{i_1}^{-0} + \Delta _{i_1}^{-+} + \Delta _{i_1}^{--}\big )\big (\Delta _{j_2}^{-0} + \Delta _{j_2}^{-+} + \Delta _{j_2}^{--} \big )\nonumber \\&\quad \left. + 3\left( \Delta _{i_1}^{0+}+ \Delta _{i_1}^{0-}\right) \left( \Delta _{j_2}^{0+} + \Delta _{j_2}^{0-} \right) \right\} . \end{aligned}$$
    (4.15)

    The pattern of correlations in the variation of the three scales in this expression should be clear from the way it is written.

    When there are three processes, , whence \(d_8=36\), and since \(n_8=1/4\), \(N_8=1/144\). Again, Eq. (4.15) can be used to calculate all three off-diagonal blocks. \(V_8\) now has 288 elements, of which 62 are distinct. For p processes, there are \(2^p+2\cdot 3^p\) distinct elements.

4.3 Asymmetric prescriptions

It is sometimes argued that since only the cross-section is actually physical, a single process has only one scale, namely the scale of the process in the sense of Table 1 and Eq. (3.49). Therefore, in order to estimate the MHOUs, only this single scale should be varied. Alternatively, one may consider the variation of the scale of the process on top of the variation of the renormalization and factorization scales considered previously.

The logic of the first alternative (variation of the scale of the process only) is that after all there is only one scale in the factorised expressions, for example those given by the Wilson expansion applied to DIS. The logic of the second alternative (variation of the scale of the process, the renormalization scale, and the factorization scale) is that each of these estimates a different source of MHOU: varying the scale of the process generates terms related to missing higher order contributions to the hard cross-section which are proportional to collinear logarithms, the renormalization scales to missing higher order contributions to the hard cross-section which are proportional to the beta function, and finally the factorization scale to missing higher order contributions to the anomalous dimension.

On the other hand, both alternatives might be criticized on the grounds that they suppress correlations between the uncertainties in PDF evolution across different processes, and thus seriously overestimate uncertainties (the first worse than the second). Ultimately, however, they can be considered as possible options to be tested in a situation in which the true answer is known. Such a validation will be performed in the next section.

Fig. 4
figure 4

Same as Fig. 2, now in the case of the asymmetric prescriptions for a single process with factorization scale \(\kappa _f\) and renormalization scale \(\kappa _r\). We display the 3-point (left) and 7-point (right) prescriptions, defined in the text

We now consider these two options in turn, both for the single-process case, which is represented schematically in Fig. 4, and for multiple-processes.

  • 3-point  For a single process, we set \(\kappa _f = \kappa _r\) and only vary the single resulting scale. Then \(v_2 = \{\pm \}\) in an obvious notation, and \(s =1\), \(m=2\) and \(n_2 = 1/2\), i.e. we simply average over the two nontrivial values of the single scale. For a single process we thus find that

    $$\begin{aligned} S^{(\mathrm 3pt)}_{ij} = {{1}\over {2}}\left\{ \Delta _i^{++}\Delta _j^{++} + \Delta _i^{--}\Delta _j^{--}\right\} , \end{aligned}$$
    (4.16)

    whenever \(i,j\in \pi \).

    Likewise, for two different processes \(\pi _1\) and \(\pi _2\), we set \(\kappa _f = \kappa _{r_1}\) for \(\pi _1\), set \(\kappa _f = \kappa _{r_2}\) for \(\pi _2\), and then vary \(\kappa _{r_1}\) and \(\kappa _{r_2}\) independently. This procedure necessarily ignores the correlations in the variation of \(\kappa _f\) between \(\pi _1\) and \(\pi _2\). Since \(v_2 = \{\pm \}\), \(V_2 = v_2 \otimes v_2 = \{\pm , \pm \}\), where the ordered pairs denote the two independent scales \((\kappa _{r_1},\kappa _{r_2})\). Clearly, for each element of \(v_2\) there are two elements of \(V_2\), so \(d_2 =2\), Eq. (4.12) gives \(N_2 = 1/4\), and the off-diagonal entries of the theory covariance matrix are

    $$\begin{aligned} S^{(\mathrm 3pt)}_{i_1j_2} = {{1}\over {4}}\left\{ \left( \Delta _{i_1}^{++} + \Delta _{i_1}^{--} \right) \left( \Delta _{j_2}^{++} + \Delta _{j_2}^{--} \right) \right\} . \end{aligned}$$
    (4.17)

    It can be seen from this factorised expression that the variations for each process are entirely uncorrelated. Generalization to more than two processes is straightforward: for p processes \(V_2\) has \(2^p\) elements, all of them distinct.

    Because in this prescription we ignore correlations in the PDF evolution uncertainties, we expect this prescription to significantly overestimate the MHOUs. Note that a fully correlated 3-point prescription in which we set \(\kappa _f = \kappa _{r_1} = \kappa _{r_2}\) would instead significantly underestimate the MHOUs, which is why we do not consider it.

  • 7-point  We now combine the variation of the scale of the process to the variation of renormalization and factorization scales. As we saw in Sect. 3.4, a change in the scale of the process is generated by \({{\tilde{\kappa }}} (\partial _{t_r} + \partial _{t_f})\), so it moves diagonally in the \((\kappa _f,\kappa _{r})\) plane. Thus for a single process, varying the scale of the process just corresponds to a new point-prescription, symmetric only about the line \(\kappa _f=\kappa _r\), but asymmetric about the \(\kappa _f\) and \(\kappa _r\) axes. However, because variations of the scale of the process are assumed uncorrelated across different processes, while \(\mu _f\) variations are correlated, such a scheme can give reduced correlations when there several processes.

    For a single process, variation of the scale of the process just gives two extra points \((+;+),(-;-)\) (in the same notation as before, i.e. variations in the \(\kappa _f=\kappa _r\) plane), so \(v_4 =\{(\pm ;0),(0;\pm )\}\) becomes \(v_6 = \{(\pm ;0),(0;\pm ),(+;+),(-;-)\} =\{(\pm ;0),(0;\pm ),(\overline{\pm ;\pm })\}\), where \((\overline{\pm ;\pm })\) simply means that the variation is fully correlated (so there are only 2 terms, not 4).

    We then have \(v_6 = \{(\pm ;0),(0;\pm ),(\overline{\pm ;\pm })\}\), \(s=2\) (note we still have only two independent scales), \(m=6\) and \(n_6 = 1/3\), and thus for a single process

    $$\begin{aligned} S^{(\mathrm 7pt)}_{ij}= & {} {{1}\over {3}}\Big \{ \Delta _i^{+0}\Delta _j^{+0} + \Delta _i^{-0}\Delta _j^{-0} + \Delta _i^{0+}\Delta _j^{0+} \nonumber \\&+ \Delta _i^{0-}\Delta _j^{0-} + \Delta _i^{++}\Delta _j^{++} + \Delta _i^{--}\Delta _j^{--} \Big \} \, . \end{aligned}$$
    (4.18)

    When there is more than one process, we have to remember that variations of the scale of the process are uncorrelated between different processes, so they can decorrelate the allowed variations of \(\mu _f\). This means the allowed variations for two processes are in a space of four dimensions rather than three: call these say (\(\kappa _{f_1},\kappa _{r_1};\kappa _{f_2},\kappa _{r_2})\). The extension of \(v_6\) is then \(V_6=\{2(+,0;+,0),2(-,0;-,0),(0,\pm ;0,\pm ),(\overline{\pm ,\pm };\overline{\pm ,\pm })\}\), where \((\overline{\pm ,\pm };\overline{\pm ,\pm })=\{(+,+;+,+),(+,+;-,-),(-,-;+,+),(-,-;-,-)\}\), and thus \(d_6=2\), so \(N_6=1/6\), and the off-diagonal theory covariance matrix reads

    $$\begin{aligned} \begin{aligned} S^{(\mathrm 7pt)}_{i_1j_2} =&{{1}\over {6}}\left\{ 2\Delta _{i_1}^{+0} \Delta _{j_2}^{+0} + 2\Delta _{i_1}^{-0} \Delta _{j_2}^{-0} \right. \\&+ \left( \Delta _{i_1}^{0+}+\Delta _{i_1}^{0-}\right) \left( \Delta _{j_2}^{0+} + \Delta _{j_2}^{0-} \right) \\&\left. +\left( \Delta _{i_1}^{++}+\Delta _{i_1}^{--}\right) \left( \Delta _{j_2}^{++} + \Delta _{j_2}^{--} \right) \right\} \, . \end{aligned} \end{aligned}$$
    (4.19)

    This prescription gives smaller correlations than the symmetric prescriptions since the variation of the two factorization scales \(\mu _{f_1}\) and \(\mu _{f_2}\) is now entirely uncorrelated.

    Generalization to p processes is again straightforward: since the variations of the scale of the process are in effect independent of the separate variations of \(\mu _f\) and \(\mu _r\), \(V_6=V_4\oplus V_2\) for any number of processes, so there are in total \(2+2^{p+1}\) distinct elements.

5 Validation of the theory covariance matrix

In this section we determine the theory covariance matrix \(S_{ij}\) at NLO using the different prescriptions formulated in Sect. 4, we introduce a method for the validation of the theory covariance matrix when the next-order result is known, and we use it to validate the theory covariance matrices that we computed against the known NNLO results. This validation is performed on a global dataset based on the same processes as those used in the NNPDF3.1 PDF determination. This dataset will then be used to produce fits incorporating MHOUs using the theory covariance matrix (Sect. 6), and also, for comparison, fits using scale-varied theories (Appendix B).

5.1 Input data and process categorization

The validation of the theory covariance matrix and the PDF determination to be discussed in the next section are performed using a set of theory predictions for a dataset which is very similar to that used in the NNPDF3.1 PDF determination [5], but differs from it in some details, as we now discuss.

The input dataset used here includes fixed-target [28,29,30,31,32,33,34,35] and HERA [36] deep-inelastic inclusive structure functions; charm cross-sections from HERA [37]; gauge boson production from the Tevatron [38,39,40,41]; and electroweak boson production, inclusive jet, Z \(p_T\) distributions, and \(t\bar{t}\) total and differential cross-sections from ATLAS [42,43,44,45,46,47,48,49,50], CMS [51,52,53,54,55,56,57,58,59] and LHCb [60,61,62,63] at \(\sqrt{s}=7\) and 8 TeV (two data points for the ATLAS and CMS total \(t\bar{t}\) cross-sections are at 13 TeV).

This input dataset differs in many small respects from that used in the NNPDF3.1 baseline. Firstly, the fixed-target Drell–Yan (DY) cross-sections [64,65,66,67] are excluded from the fit since APFEL currently does not allow the calculation of scale-varied fixed-target DY cross-sections. Secondly, the value of the lower kinematic cut has been increased from \(Q_{\mathrm{min}}^2=2.69\) GeV\(^2\) to 13.96 GeV\(^2\) in order to ensure the validity of the perturbative QCD expansion when scales are varied downwards. Thirdly, we include only jet data for which the exact NNLO calculations are available, as discussed in [68], namely the ATLAS and CMS inclusive jet cross-sections at 7 TeV from the 2011 dataset. Finally, we exclude the bottom structure function \(F_2^b\) measurements, for which the implementation of scale variations is complicated by the crossing of the heavy quark thresholds.

Also, in original NNPDF3.1 determination somewhat different cuts were applied to data at NLO and NNLO (essentially in order to remove from the NLO fit data which are subject to large NNLO corrections). Here we wish to have exactly the same dataset at NLO and NNLO, in order to make sure that the differences between NLO and NNLO are due purely to differences in the theoretical calculations, and not in the input datasets. Therefore, the baseline kinematic cuts of NNPDF3.1 have been slightly modified so that the data points excluded at NLO are also excluded at NNLO and vice-versa.

Taking into account all these modifications, in total the input dataset includes \(N_{\mathrm{dat}}=2819\) datapoints. The fact that the dataset differs somewhat from that of Ref. [5] must be kept in mind when assessing the impact of theory uncertainties, and indeed to this purpose in Sect. 6.2 we will construct a new baseline PDF set which only differs from that of Ref. [5] in that it is based on the dataset we present here. Specifically, the loss of Drell–Yan data will lead to an increased uncertainty in the \(\bar{u}-{\bar{d}}\) combination, and the higher \(Q^2\) cutoff to somewhat larger uncertainties in the small-x region where the low \(Q^2\) data are concentrated. Here our main goal is to assess the impact of theory uncertainties, not to construct the most competitive, state-of-the art PDF set, which will be the subject of future work.

Because the prescriptions in Sect. 4 assume that renormalization scale variation is fully correlated within a given process, but uncorrelated between different processes, it is necessary to define what it is meant by “process”, i.e., to classify datasets into processes. This requires an educated guess as to which theory computations share the same higher order corrections. For example, it is necessary to decide whether charged-current (CC) and neutral-current (NC) DIS are the same process or not, and whether the transverse momentum and rapidity distributions for one observable (such as, say, Z production) should be grouped together. Our categorization is summarized in Table 2.

Table 2 The categorization of the input datasets into different processes adopted in this work. Each dataset is assigned to one of five categories: neutral-current DIS (DIS NC), charged-current DIS (DIS CC), Drell–Yan (DY), jet production (JET) and top quark pair production (TOP). For each dataset, we also provide the corresponding publication reference and the number of data points after cuts. We also show the total number of points in each of the five categories of process

Specifically, we group the data into five distinct categories: DIS NC, DIS CC, Drell–Yan (DY), inclusive jet production (JET), and top quark pair production (TOP). More refined categorizations will be considered elsewhere, but we consider this to be sufficient for a first study. The logic underlying this choice is that we group together processes that are likely to share the same MHO terms. Thus for instance the predictions for all DY processes are obtained by integrating the same underlying fully differential distributions, and thus have a similar perturbative structure. Because different distributions impact different PDF combinations – so e.g. the Z p\(_T\) distribution mostly impacts the gluon, while the W rapidity distributions mostly impact flavor separation – this will induce nontrivial correlations in the PDF fitting.

All calculations are performed using the same settings as in [5]: PDF evolution and the calculation of DIS structure functions up to NNLO are carried out using the APFEL [69] program; heavy quark mass effects are included by means of the FONLL general-mass variable flavor number scheme [70,71,72]; the charm PDF is fitted alongside the light quark PDFs [73], rather than being generated from perturbative evolution of light partons; the charm quark pole mass is taken to be \(m_c=1.51\) GeV, and the strong coupling constant is fixed to be \(\alpha _s(m_Z) = 0.118\), consistent with the latest PDG average [74].

Fig. 5
figure 5

Comparison of the diagonal experimental uncertainties (blue) and the diagonal theoretical uncertainties evaluated using the 9-point prescription (red), all normalized to the central experimental value. The data are grouped by process and, within a process, by experiment, following Table 2

In order to evaluate the theory covariance matrix \(S_{ij}\), it is necessary to be able to evaluate both DIS structure functions and hadronic cross-sections for a range of values of the factorization and renormalization scales, i.e., in the notation of Eq. (3.41), for \(\kappa _f\ne 0\) and \(\kappa _r\ne 0\). In this case, the entries of the NLO theory covariance matrix have been constructed by means of the ReportEngine software [75] taking the scale-varied NLO theory cross-sections \(T_i(k_f,k_r)\) as input. These are provided by APFEL [69] for the DIS structure functions and by APFELgrid [76] combined with APPLgrid [77] for the hadronic cross-sections. The evaluation of these scale-varied cross-sections has been validated by means of independent programs, in particular with HOPPET [78] and OpenQCDrad [79] for the DIS structure functions, and with the built-in scale variation functionalities of APPLgrid. All these NLO cross-sections are evaluated using the central NLO PDF obtained by performing a NLO fit to the same dataset, for consistency.

5.2 The theory covariance matrices at NLO

We now present results for the theory covariance matrices, constructed using NLO calculations and evaluated according to the prescriptions introduced in Sect. 4, and discuss some of their qualitative features.

In Fig. 5 we show the diagonal elements of the experimental and theory covariance matrices, or more specifically the experimental uncertainty normalized to the data, \((C_{ii})^{1/2}/D_i\), and the MHOU normalized to the data, \((S_{ii})^{1/2}/D_i\), for \(i=1,\ldots ,N_\mathrm{dat}\), where \(D_i\) is the i-th datapoint. Here and henceforth, the experimental covariance matrix \(C_{ij}\) includes all uncorrelated statistical uncertainties as well as correlated systematic uncertainties, as published by the respective experiments, and used to assess fit quality as e.g. in Sect. 3.2 of Ref. [68]. Note that this differs from the covariance matrix \(C_{ij}\) used for PDF minimization in the treatment of multiplicative uncertainties (such as normalization or luminosity uncertainties) in that the latter must be treated using the so-called \(t_0\) method of Ref. [80] in order to avoid bias. As in all previous NNPDF determinations, the \(t_0\) covariance matrix is used for PDF minimization while the experimental covariance matrix is used in order to assess fit quality, in order to ensure reproducibility of results.

The datapoints are grouped by process and, within a process by experiment, following Table 2. The theory covariance matrix \(S_{ij}\) is computed using the 9-point prescription (the one with the largest number of independent variations; recall Sect. 4). Broadly speaking, the estimated NLO MHOU is roughly comparable to experimental uncertainties, as expected. However for some datapoints the experimental uncertainty is dominant (and thus the theory uncertainty will have only a small effect), while for others the MHOU is dominant. These latter datapoints will carry less weight in a PDF fit with MHOU included, depending also on the underlying correlation pattern. Some datasets have datapoints in both these categories: the HERA NC DIS are particularly striking, since at high \(Q^2\) (where statistics are low) the dominant uncertainty is experimental, while at low \(Q^2\) (and thus small x, where perturbation theory is less reliable) the dominant uncertainty is due to MHO.

Fig. 6
figure 6

Comparison of the experimental \(C_{ij}\) (left) and the theoretical \(S_{ij}\) (right) covariance matrices, the latter evaluated using the 9-point prescription. All entries are normalized to the central experimental value. The data are grouped by process and, within a process, by experiment, following Table 2

In Fig. 6 we compare the complete experimental covariance matrix \(C_{ij}\) to the theory covariance matrix \(S_{ij}\), again computed using the 9-point prescription. Both covariance matrices are displayed as heat maps, with each entry expressed as a fraction with respect to the corresponding experimental central value; i.e. \(C_{ij}/D_iD_j\) and \(S_{ij}/D_iD_j\). It is clear from Fig. 6 that the theory covariance matrix has, as expected, a richer structure of correlations than its experimental counterpart: for example data from the same process (such as DIS) are correlated even when the corresponding experimental measurements are completely uncorrelated (such as HERA and fixed target). Furthermore, correlation of the factorization scale variation between disparate processes, such as DIS processes and hadronic processes, results in nonzero entries in the theory covariance matrix even in these regions.

Fig. 7
figure 7

Comparison of the experimental correlation matrix Eq. (5.1) (top left) and the the combined experimental and theoretical correlation matrices Eq. (5.2) computed using the prescriptions described in Sect. 4: the symmetric prescriptions (5-pt top right, \({\overline{5}}\)-pt center left, 9-pt center right), and asymmetric prescriptions (3-pt bottom left, 7-pt bottom right). The data are grouped by process and within a process by experiment, as in Fig. 6

The precise structure of these theory-induced correlations depends on the choice of prescription adopted. To illustrate this, Fig. 7 compares the experimental correlation matrix, given by

$$\begin{aligned} \rho ^{(C)}_{ij} = {{C_{ij}}\over { \sqrt{C_{ii}}\sqrt{C_{jj}} }} \, , \end{aligned}$$
(5.1)

with the corresponding combined experimental and theoretical correlation matrices, defined by

$$\begin{aligned} \rho ^{(C+S)}_{ij} = {{( C+S)_{ij} }\over { \sqrt{(C+S)_{ii}} \sqrt{(C+S)_{jj}} }} \, , \end{aligned}$$
(5.2)

for all the prescriptions defined in Sect. 4. Specifically, from top left to bottom right we have the experimental correlations \(\rho ^{(C)}\) followed by \(\rho ^{(C+S)}\) for the symmetric 5, \(\bar{5}\), 9 point prescriptions, and the asymmetric 3 and 7 point prescriptions. As in Fig. 6, the cross-sections are grouped by process type and, within that, by experiment.

Some qualitative features of the theory-induced correlations are apparent. There are clearly large positive correlations within individual experiments along the diagonal blocks, this being particularly evident for DIS NC and DY data, which have large numbers of points which are relatively close kinematically. Off the diagonal, but still within the same process, there are large correlations between experiments for the DY, jets and top data points, and large anticorrelations for the DIS NC data points (these mostly between fixed target and HERA). Correlations and anticorrelations between different processes are also often present but are somewhat weaker: for example the DY data points (from LHC) are quite correlated with the HERA NC DIS data points, but anticorrelated with fixed target NC DIS data points.

When comparing different prescriptions, it is clear that the 3-point prescription leads to especially small correlations between processes, which is expected because with this prescription the factorization scale and renormalization scale variations are uncorrelated between processes. The correlations between processes are also weaker in 7-point than in 5-point, due to the fact that (as discussed in Sect. 4.3) the correlated variation of the factorization scale is combined with the uncorrelated variations of the scale of the process for the pair of processes involved. It is worth noting, however, that the pattern of correlations is similar for all the symmetric prescriptions.

In order to decide which prescriptions are best, and more generally whether or not they produce a reliable estimate of MHOUs, we must proceed to their validation.

5.3 Construction of validation tests

We wish to construct a validation test for the NLO theory covariance matrix, by comparing it to the known NNLO theoretical result. We do so by viewing the set of experimental data as a vector with components \(D_i\), where \(i=1, \ldots , N_{\mathrm{dat}}\). The vector lives in an \(N_{\mathrm{dat}}\)-dimensional “data” space D, on which the theory covariance matrix \(S_{ij}\) acts as a linear operator. The matrix \(S_{ij}\) is symmetric and positive semi-definite, meaning that all of its non-zero eigenvalues are positive. In a PDF fit, \(S_{ij}\) always enters as an additive contribution to the experimental covariance matrix \(C_{ij}\), and thus their sum is always invertible, owing to the non-zero statistical uncertainties on the data, which bound the eigenvalues from below.

The matrix \(S_{ij}\) defines ellipsoids E corresponding to a given confidence level in the data space, centered on the NLO theoretical prediction, \(T^{\mathrm{NLO}}_i\equiv T^{\mathrm{NLO}}_i(0,0)\) evaluated using the central scale choice. In the context of MHOUs, we can take \(T^{\mathrm{NLO}}_i\) to be the predictions at NLO, with the one-sigma ellipsoid \(E_{1\sigma }\) estimating a 68% confidence level for the MHO correction. We can validate whether \(S_{ij}\) correctly predicts both the size and the correlation pattern of the MHO terms by testing the extent to which the shift vector \(\delta _i \sim T^{\mathrm{NNLO}}_i-T^{\mathrm{NLO}}_i\), i.e. the difference between the NNLO and NLO predictions for \(T_i\), falls within a given ellipsoid E. These predictions should be taken with a fixed underlying PDF (which could indeed be any standard reference PDF): it is the change in prediction due to the change of perturbative evolution and hard matrix element which are relevant here. Note that the dimensionality of the subspace spanned by the ellipsoid E is much smaller than that of the data space D: in a global fit the data space has dimension \(\mathcal {O}(3000)\) (Table 2), while even the most complex prescriptions in Sect. 4 have \(\mathcal {O}(30)\) independent variations, not all of which correspond to independent eigenvectors, as we will see shortly. So E actually lives in a subspace S of dimension \(N_{\mathrm{sub}}\) of the full space D: \(E \in S \in D\). For a single process we expect \(N_{\mathrm{sub}}\) to be of order a dozen or so at most. In fact, even for a single process (see Table 2) we always have \(N_{\mathrm{sub}} \ll N_{\mathrm{dat}}\). Hence, a nontrivial validation of the theory covariance matrix is if the component of the shift vector \(\delta _i\) lying outside E is small, i.e. if the angle between \(\delta _i\) and the projection of \(\delta _i\) onto S is small.

Furthermore we expect the component of \(\delta _i\) along each axis of the ellipsoid E to be of the same order as the typical one-sigma variation. The physical interpretation of such a successful validation is that the eigenvectors of \(S_{ij}\) correctly estimate the independent directions of uncertainty in theory space, with the size of the shift estimated by the corresponding eigenvalue. The null subspace of E, i.e. the directions of vanishing eigenvalues, would then correspond to directions in D for which the theory uncertainty is so small that it cannot be reliably estimated and so can be safely neglected. These are highly nontrivial tests, given the huge discrepancy between the dimensionality of the space D, and the dimensionality of S.

Let us now see how this works in detail. First, we need to identify the spaces E and S. To do this, we normalize the NLO theory covariance matrix \(S_{ij}\) to the central NLO theory prediction \(T_i\), so that all its elements are dimensionless, allowing a meaningful comparison: we define

$$\begin{aligned} {\widehat{S}}_{ij} = S_{ij}/\left( T^{\mathrm{NLO}}_iT^{\mathrm{NLO}}_j\right) . \end{aligned}$$
(5.3)

Likewise, we define a normalized shift vector with components

$$\begin{aligned} {\delta }_i = \left( T^{\mathrm{NNLO}}_i-T^{\mathrm{NLO}}_i\right) /T^{\mathrm{NLO}}_i \, . \end{aligned}$$
(5.4)

The NNLO prediction \(T^{\mathrm{NNLO}}_i\) is computed using NNLO matrix elements and parton evolution, but with the same NLO PDF set used in the computation of \(T^{\mathrm{NLO}}_i\) and \(S_{ij}\). In this way the shift \(\delta _i\) only takes account of the perturbative effects due to NNLO corrections, which are estimated by \(S_{ij}\), and not the additional effect of refitting.

We then diagonalize \({\widehat{S}}_{ij}\), to give eigenvectors, \(e_i^\alpha \) (chosen to be orthonormal, i.e. \(\sum _i e_i^\alpha e_i^\beta = \delta ^{\alpha \beta }\)), with corresponding non-zero eigenvalues, \(\lambda ^\alpha =(s^\alpha )^2\); \(\alpha = 1, \ldots , N_{\mathrm{sub}}\). All these eigenvalues are real and positive, see Eq. (4.3). The eigenvectors span the subspace S. There are also \(N_{\mathrm{dat}}-N_{\mathrm{sub}}\) zero eigenvalues. These are degenerate, and their eigenvectors span the space D / S. Because of the zero eigenvalues, the diagonalization of \({\widehat{S}}_{ij}\) is in practice rather difficult: the procedure we use to identify the subspace S and its dimensionality \(N_{\mathrm{sub}}\), and then diagonalize the projection of \({\widehat{S}}_{ij}\) into S, is described in some detail in Appendix A.

Next we project the shift vector \(\delta _i\) onto the eigenvectors,

$$\begin{aligned} \delta ^\alpha = \sum _{i=1}^{N_{\mathrm{dat}}} \delta _i e_i^\alpha \, . \end{aligned}$$
(5.5)

These projections \(\delta ^\alpha \) should be of the same order as the size of the ellipse in this direction, i.e. the \(s^\alpha \): more specifically in an ideal world 68% of the \(\delta ^\alpha /s^\alpha \) would be less than one. This is all the meaningful statistical information that is contained in \({\widehat{S}}_{ij}\).

Fig. 8
figure 8

Schematic representation of the geometric relation between the shift vector \(\delta \in D\) (here drawn as a three dimensional space), and the component \(\delta ^S\) of the shift vector which lies in the subspace S (here drawn as a two dimensional space, containing the ellipse E defined by the theory covariance matrix). The angle \(\theta \) between \(\delta \) and \(\delta ^S\) is also shown: the dotted line shows the other side of the triangle, \(\delta ^{\mathrm{miss}}\in D/S\)

Finally, we can now resolve the shift vector \(\delta _i\) into its component lying within S

$$\begin{aligned} \delta _i^{S} = \sum _{\alpha =1,\ldots , N_{\mathrm{sub}}} \delta ^\alpha e_i^\alpha , \end{aligned}$$
(5.6)

and the complementary component within the remaining space D / S, \(\delta _i^{\mathrm{miss}} = \delta _i-\delta ^S_i\). For a successful test, we expect most of \(\delta \) to lie within S, so \(|\delta _i^{\mathrm{miss}}|\ll |\delta _i|\), or equivalently \(|\delta _i^{S}|\approx |\delta _i|\). By construction \(\delta _i^S\) and \(\delta _i^{\mathrm{miss}}\) are orthogonal (since the subspaces S and D / S are orthogonal spaces), thus the three vectors \(\delta _i^S\), \(\delta _i^{\mathrm{miss}}\) and \(\delta _i\) form a right-angled triangle, with \(\delta _i\) being its hypotenuse. The geometrical relation between the shift vector \(\delta _i\), and the component of the shift vector which lies in the subspace S, \(\delta ^S_i\) is illustrated in Fig. 8.

With these definitions, the theory covariance matrix \(S_{ij}\) provide a reasonable estimate of the MHOU if the angle

$$\begin{aligned} \theta = \arccos \left( {{|\delta ^S_i|}\over {|\delta _i|}} \right) = \arcsin \left( {{|\delta ^{\mathrm{miss}}_i|}\over {|\delta _i|}}\right) \, \end{aligned}$$
(5.7)

between the shift \(\delta _i\) and its component in the subspace S, \(\delta _i^S\) is reasonably small. As mentioned above, for a global PDF fit the typical situation that one encounters is that \(N_{\mathrm{dat}}\gg N_{\mathrm{sub}}\) (in the present case \(N_{\mathrm{dat}}\sim \mathcal {O}(3000)\), while \(N_{\mathrm{sub}}\sim \mathcal {O}(30)\)). So this validation test is highly nontrivial, since finding the relatively small subspace S in the huge space D is rather hard: for a random symmetric matrix \(S_{ij}\), components of \(\delta _i\) in D / S will generally be as large as those in S, and thus \(|\delta _i^{S}|\ll |\delta _i|\), and \(\theta \) will be very close to a right angle.

5.4 Results of validation tests

Fig. 9
figure 9

The diagonal uncertainties \(\sigma _i\) (red) symmetrized about zero, compared to the shift \(\delta _i\) for each datapoint (black), for the symmetric prescriptions: 5-point (top), \({\overline{5}}\)-point (middle), and 9-point (bottom). All values are shown as percentage of the central theory prediction

Fig. 10
figure 10

Same as Fig. 9 but for the asymmetric prescriptions: 3-point (top) and 7-point (bottom)

We now explicitly perform the validation tests discussed in Sect. 5.3, with the NLO theory covariance matrices \({\widehat{S}}_{ij}\) (normalized to NLO theory, as in Eq. (5.3)) constructed from scale variations for all data points in Table 2, and for each prescription of Sect. 4. These are then validated using the shift vector \(\delta _i\) constructed as the difference of NNLO and NLO theory, normalized to the latter, as in Eq. (5.4).

A very first comparison can be done at the level of diagonal elements \(\sigma _i\), where \({\widehat{S}}_{ii} = (\sigma _i)^2\), by comparing them directly to the normalized shifts \({\delta }_i\) Eq. (5.4). This already tells us whether the overall size of the scale variation is of the right order of magnitude: one expects the shifts \(\delta _i\) and the uncertainties \(\sigma _i\) to be of roughly the same order.

These comparisons are shown in Figs. 9 and 10. In each plot the data points are presented sequentially on the horizontal axis, organized by process as in Table 2. The shape of the estimated MHOU imitates the shape of the true shift rather faithfully, for each of the five processes, and for each prescription. This shows that the theory covariance matrix gives a qualitatively reliable estimate of the true MHOU, in the sense that the estimate is small when the MHOU is small, large when it is large, and moreover correctly incorporates the correlations in the MHOU between nearby kinematic regions, responsible for the shape. There is little discernible difference between all the various point prescriptions, except in the overall size of the estimates: for example comparing the symmetric prescriptions, we see that 5-point is the least conservative and \({\overline{5}}\)-point is the most conservative, whilst 9-point lies somewhere between the two. This is particularly noticeable in the DY data.

It is clear from these plots that the overall size of the estimated uncertainties, given by varying renormalization and factorization scales by a factor of two in either direction (i.e. as in Eq. (4) with \(w = \ln 4\)) is, by and large, roughly correct: if the range were significantly smaller, some of the uncertainties would have been underestimated, whereas if it were larger all uncertainties would have been overestimated. This said, for several data points the MHOU at NLO is clearly overestimated by scale variation: this is particularly true of the small-x NC DIS data from HERA in the center of the plot.

Overall, these plots demonstrate that since there are only small differences in the diagonal elements of each prescription, it is in the detailed correlations between data points where the differences in performance between the prescriptions lies. To expose this, we need to diagonalize the theory covariance matrix (using the procedure in Appendix A), so that we can see in detail which components of the shift vector are correctly estimated, and which are missed, as explained in Sect. 5.3.

Table 3 The angle \(\theta \) Eq. (5.7) between this shift and its component \(\delta _i^S\) lying within the subspace S (see Fig. 8) spanned by the theory covariance matrix for different prescriptions. The dimension of the subspace S in each case is also given
Table 4 Same as Table 3 for each process of Table 2. The number of data points in each process is given directly below the name of the process
Fig. 11
figure 11

The NNLO-NLO shift \(\delta _i\) (black) compared to its component \(\delta _{\mathrm{miss}}\) (blue) which lies outside the subspace S, computed using the 9-point prescription

As discussed in Sect. 5.3, once we have the eigenvectors corresponding to the nonzero eigenvalues of the theory covariance matrix, the first validation test consists of checking how much of the shift vector \(\delta _i\) lies within the space spanned by these eigenvectors, S, and has thus been included in the estimation of MHOU provided by the theory covariance matrix. The results of this test for the global dataset, described in Sect. 5.1, are shown in Table 3: for each prescription we give the dimension \(N_{\mathrm{sub}}\) of S, i.e. the number of linearly independent eigenvectors \(e_i^\alpha \) of \(S_{ij}\), and then the value of the angle \(\theta \), defined in Eq. (5.7), between the shift \(\delta _i\) and its component \(\delta ^S_i\), defined in Eq. (5.6), lying within the subspace S spanned by \(e_i^\alpha \). We note that all the angles are reasonably small, despite the fact that \(N_{\mathrm{sub}}\) is so much smaller that the dimension 2819 of the data space.

The 9-point prescription performs best, with an angle of \(\theta =26^{{\circ }}\) between the shift \(\delta _i\) and its projection \(\delta _i^S\) in the subspace S: clearly the more complicated pattern of scale variations (compared to the other two symmetric prescriptions) improves the estimation of the MHOU. The 3-point prescription performs worst, suggesting that lack of correlation in the factorization scale between processes in this prescription means that much of the correlation in the MHOU due to universal PDF evolution has been missed. The 7-point prescription is however only a little worse than 9-point, presumably due to the dilution of the correlation in factorization scale variation which is a feature of this prescription. Note that since these results for \(\theta \) are geometrical, they are largely independent of the range of the scale variation Eq. (4).

Fig. 12
figure 12

The projection \(\delta ^\alpha \) Eq. (5.5) of the normalized shift vector \(\delta _i\) Eq. (5.4) along each eigenvector \(e^\alpha _i\) of the normalized theory covariance matrix Eq. (5.3), compared to the corresponding eigenvalue \(s^\alpha \), ordered by the size of the projections (from largest to smallest). In each case results are shown as absolute (upper) and as ratios \(\delta ^\alpha /s^\alpha \) (lower), the horizontal line indicating when this ratio is one. The length of the component of \(\delta _i\) that is not captured at all by the theory covariance matrix, \(|\delta ^{\mathrm{miss}}_i|\) is also shown (blue star). Results are shown for the symmetric prescriptions: 5-point (top left), \({\overline{5}}\)-point (top right), and 9-point (bottom)

It is interesting to ask whether all processes are equally well described, and whether there are significant differences in correlations between processes or within a process. To this purpose, in Table 4 we list the angle \(\theta \) computed for each individual process using the various prescriptions. Three conclusions emerge from inspection of this table. First, when each process is taken individually, the results seen in Table 3 for the relative merits of each prescription are replicated process by process: again 3-point is worst, and 9-point is best. Secondly, processes with large numbers of data points are much harder to describe than those with only a few data points (i.e. \(\theta \) is smallest for smaller datasets): this is hardly surprising, since the larger datasets cover a wider kinematic range and thus have more structure to predict. Finally, the quality of the description of the global dataset for each prescription is in each case dominated by the process (DIS NC) which is described worst, however the global dataset is actually described a little better (for each prescription) than the dataset for this process, particularly for 9-point, less so for 3-point. This suggests that correlations across processes are actually described reasonably well, and are anyway less critical than correlations within processes.

We next look in more detail at the part of \(\delta _i\) which falls outside the subspace S, \(\delta ^{\mathrm{miss}}_i = \delta _i-\delta ^S_i\). This is shown for the 9-point prescription in Fig. 11. While this is generally uniformly small, of order a few percent, across the full range of processes, it also has nonzero components in all datasets, and all processes. Furthermore, for most processes the shape of \(\delta ^{\mathrm{miss}}\) closely follows that of the shift \(\delta _i\). This may suggest that a significant fraction of \(\delta ^{\mathrm{miss}}\) might be due to the fact that there is a component of \(\delta _i\) which is systematically missing for most or all processes. This in turn suggests that a sizable part of \(\delta ^{\mathrm{miss}}\) might be due to poor estimation of the MHOU in PDF evolution, rather than poor estimation of MHOU in hard cross-sections which can vary substantially between different processes (and indeed different kinematics). Indeed, as already mentioned in Sect. 3.5, our current treatment of factorization scale variation is only approximate, and a more sophisticated treatment would involve performing separate scale variation for each eigenvalue of perturbative evolution.

Having established that most of the NNLO-NLO shift \(\delta _i\) lies within S, we now proceed to examine what fraction of \(\delta ^S_i\) lies with the error ellipse E specified by the theory covariance matrix. To that end, the eigenvalues \(\lambda ^\alpha = (s^\alpha )^2\) of the theory covariance matrix of the global dataset are shown in Fig. 12 for symmetric prescriptions, and in Fig. 13 for the asymmetric ones: these define the length of the semi-axes of E. Since there are five distinct processes, there are 8, 12 and 28 positive eigenvalues for the symmetric 5-point, \({\overline{5}}\)-point and 9-point prescriptions respectively, and 6, 14 positive eigenvalues for the asymmetric 3-point and 7-point prescriptions, as explained in Appendix A. Also shown are the projections \(\delta ^\alpha \) of the normalized shift vector \(\delta \) Eq. (5.4) along each corresponding eigenvector \(e^\alpha _i\), Eq. (5.5).

Fig. 13
figure 13

Same as Fig. 12 but for the asymmetric prescriptions: 3-point (left) and 7-point (right)

Fig. 14
figure 14

The components \(e^\alpha _i\) (green) of the eigenvectors, corresponding to the five largest eigenvalues for the 9-point theory covariance matrix, shown in the same format as Fig. 9. The NNLO-NLO shift, \(\delta _i\) (black), is shown for comparison

Inspection of these plots confirms that all the prescriptions seem to perform reasonably well. The largest eigenvalue is always very similar in size to the shift, and the size of the eigenvalues generally falls as the projected shifts get smaller. As expected, the 3-point prescription clearly overestimates uncertainties, since \(\delta ^\alpha < s^\alpha \) for all the eigenvalues. The same is true, but to a lesser extent, for both 5-point and \({\overline{5}}\)-point. For the more complicated 7-point and 9-point prescriptions the largest projections (corresponding to the first seven or eight eigenvalues) are estimated rather well, though still perhaps a little conservatively, but for the smaller projections the scatter increases significantly, with some projected shifts hardly predicted at all. This is perhaps not surprising: when varying just six independent scales, we can only expect to obtain a limited amount of information on the MHO terms. However the correct estimation of the largest projected shifts shows that the theory covariance matrix is giving a reasonable estimation of the MHOU, especially when implemented through the more complicated prescriptions.

On each of these plots, we also show the length of the component \(\delta ^{\mathrm{miss}}_i\) that is orthogonal to S, and thus completely outside E. For the symmetric prescriptions, \(|\delta ^{\mathrm{miss}}_i|\) is always less than the largest component of \(\delta \) in S, while for the asymmetric prescriptions it is greater, very significantly so for the 3-point prescription. This is another indication that the symmetric prescriptions give a better account of the correlations in theoretical uncertainties.

A more detailed understanding of the physical meaning of each eigenvector can be acquired by inspecting its components \(e_i^\alpha \) in the data space. These are shown in Fig. 14 for the eigenvectors corresponding to the five largest eigenvalues in the 9-point prescription: the shift vector \(\delta _i\) is also shown for comparison. It is clear that there is a close correspondence between eigenvectors and MHO contributions to individual processes. For instance the first eigenvector contributes mostly to DIS NC, the second to both DIS NC and DIS CC, the third to DY, the fourth mainly to DIS CC, and the fifth mainly to JETS. Clearly the ordering of these larger eigenvalues is related to the number of data points for the respective processes: the more datapoints, the larger the eigenvalue of the (correlated) uncertainty estimate. Even relatively small eigenvalues can give an important contribution, though to processes with fewer datapoints: for example the ninth eigenvector (not shown) clearly dominates TOP.

In summary, from these validation tests it is apparent that the 9-point prescription gives a reasonable estimate of most of the MHOU, both for individual processes and for the global dataset, with the 7-point being just slightly worse. Based on this, we will therefore adopt 9-point as a default prescription for the theory covariance matrix in the PDF determination to be discussed in the next section.

6 PDFs with missing higher order uncertainties

We can now present the main results of this work: the first determination of the parton distributions of the proton which systematically accounts for the MHOUs affecting the theory calculations of the input processes for the fit. First we present the results for PDFs obtained by fitting only DIS data. This provides us with an initial test case, which we will study by comparing PDFs obtained including the combined experimental and theoretical covariance matrix to the corresponding baseline fit in which only experimental uncertainties are included.

We then turn to the global PDF determination, which offers a nontrivial validation of our methodology, specifically by comparing NLO PDFs, with and without MHOUs, to NNLO PDFs. For global fits, we also study the stability of the results to changes in the prescription used for the computation of the theory covariance matrix: specifically, we compare PDFs obtained with the 9-point prescription (which is our default) to those based on the 7- and 3-point ones. We also study PDFs determined by only partially including the theory covariance matrix, either only in the data generation or only in the fitting. As discussed in the introduction, this provides us with a way of disentangling the impact of the theory covariance matrix on the central value of the PDFs or on the PDF uncertainty.

As discussed in Sect. 2, the theory uncertainties are included by simply replacing the experimental covariance matrix \(C_{ij}\) with the sum \((C+S)_{ij}\) of the experimental and theory covariance matrices in the expression for the likelihood of the true value given the data. The NNPDF methodology, as used specifically in the determination of the most recent NNPDF3.1 PDF set [5], is otherwise unchanged. Within this methodology, the covariance matrix is used to generate \( N_{\mathrm{rep}}\) pseudodata replicas \(D^{(k)}_i\) for each datapoint i, with \(k=1,\dots , N_{\mathrm{rep}}\), whose distribution must reproduce the covariance of any two data points. This means that with theory uncertainties included,

$$\begin{aligned}&\lim _{N_{\mathrm{rep}}\rightarrow \infty }{{1}\over {(N_{\mathrm{rep}}-1)}}\sum _{k=1}^{N_{\mathrm{rep}}} \left( D_i^{(k)}-\langle D_i\rangle \right) \left( D_j^{(k)}-\langle D_j\rangle \right) \nonumber \\&\qquad = C_{ij}+S_{ij}, \end{aligned}$$
(6.1)

with \(\langle D_i\rangle = {{1}\over {N_{\mathrm{rep}}}}\sum _{k=1}^{N_{\mathrm{rep}}} D_i^{(k)}\) denoting the average over Monte Carlo replicas.

A PDF replica is then fitted to each pseudodata replica \(D_i^{(k)}\) by minimizing a figure of merit, which in the presence of theory uncertainties becomes

$$\begin{aligned} \chi ^{2}={{1}\over {N_{\mathrm{dat}}}}\sum _{i,j=1}^{N_{\mathrm{dat}}}\left( D_i-T_i\right) \left( C+S \right) ^{-1}_{ij} \left( D_j-T_j\right) , \end{aligned}$$
(6.2)

where \(T_i\) is the theory prediction evaluated with the central scale choice, and the theory covariance matrix \(S_{ij}\) is computed using one of the prescriptions presented in Sect. 4.

It is thus clear that the inclusion of a theory-induced contribution in the covariance matrix affects only two steps of the procedure: the pseudodata generation, and the minimization. Everything else is unchanged, and is identical to the default NNPDF methodology. Note that in particular the experimental covariance matrix C used in the fitting is determined, as in NNPDF3.1 and previous NNPDF releases using the so-called \(t_0\) method for the treatment of multiplicative uncertainties, in order to avoid d’Agostini bias (see Refs. [27, 80] for a detailed discussion). As in previous NNPDF releases, minimization is thus performed using the \(t_0\) definition of the \(\chi ^2\), but all \(\chi ^2\) values shown are computed using the covariance matrix as published by the respective experiments.

In the sequel, in order to assess fit quality we will provide \(\chi ^2\) values, and also, we will study the estimator, defined in Ref. [4]

$$\begin{aligned} \phi = \sqrt{\left\langle \chi _{\mathrm{exp}}^2[T_i]\right\rangle - \chi ^2_{\mathrm{exp}}[\langle T_i\rangle ]}\, , \end{aligned}$$
(6.3)

where by \( \chi _{\mathrm{exp}}^2[T_i]\) we denote the value of the \(\chi ^2\) computed using the i-th PDF replica, and only including the experimental covariance matrix (thus Eq. (6.2), but with \(S_{ij}\) set to zero). The average \(\chi ^2\) values which enter Eq. (6.3) are then \(\langle \chi ^2[T_i]\rangle \), the mean value of this \(\chi ^2\) averaged over replicas, and \(\chi ^2[\langle T_i\rangle ]\), the value of the \(\chi ^2\) computed using the “central” PDF set which is found by averaging over replicas.

Table 5 Summary of the PDF sets discussed in this section. The dataset, perturbative order and nature of the treatment of uncertainties for each set are indicated
Table 6 The values of the \(\chi ^2/N_{\mathrm{dat}}\) and of the \(\phi \) estimator in the NNPDF3.1 DIS-only fits with the theory covariance matrix \(S^{\mathrm{(9pt)}}\), compared to the results based on including only the experimental covariance matrix C

It was shown in Ref. [4] that \(\phi \) then gives the average over all datapoints of the ratio of the uncertainties of the predictions to the uncertainties of the original experimental data, taking account of correlations:

$$\begin{aligned} \phi = \left( {{1}\over {N_{\mathrm{dat}}}}\sum _{i,j=1}^{N_{\mathrm{dat}}}(C)^{-1}_{ij} T_{ij} \right) ^{1/2}\, , \end{aligned}$$
(6.4)

where \(T_{ij}= \langle T_i T_j\rangle - \langle T_i\rangle \langle T_j\rangle \) is the covariance matrix of the theoretical predictions. For an uncorrelated covariance matrix, this is just the ratio of the uncertainty in the prediction using the output PDF to that of the original data. Hence, the value of \(\phi \) provides an estimate of the mutual theoretical consistency of the data which are being fitted: consistent data are combined by the underlying theory and lead to an uncertainty in the prediction which is significantly smaller than that of the original data. Note that \(\phi \) is always defined so that the uncertainty in the prediction is normalized to the original experimental uncertainty (rather than combined experimental and theory uncertainties). In particular, when considering PDFs determined including a theory covariance matrix, this means that PDFs are determined minimizing the \(\chi ^2\) Eq. (6.2), but \(\chi ^2_{\mathrm{exp}}\) is instead used in the computation of \(\phi \) Eq. (6.3).

When changing the covariance matrix from C to \(C'=C+S\) the fluctuations of the replicas will change, according to Eq. (6.1), and if theoretical uncertainties change in the same proportion one would expect the value of \(\phi \) to become \(\phi '= r_\phi \phi \), with

$$\begin{aligned} r_\phi = \left( 1+{{1}\over {N_{\mathrm{dat}}}}\sum _{i,j=1}^{N_{\mathrm{dat}}}(C)^{-1}_{ij} S_{ij} \right) ^{1/2}. \end{aligned}$$
(6.5)

Thus, when including MHOU, all else being equal, we would expect PDF uncertainties to increase by a factor \(r_\phi \). This will provide us with a baseline to which we can compare the change in uncertainty which is actually observed.

All the PDF sets which have been produced and which will be discussed in this section are listed in Table 5. For each of the fits, we indicate its label, the input dataset, the perturbative order and the covariance matrix used.

For the fits that include a theory covariance matrix, we also indicate the prescription with which it has been constructed. In the remainder of this section we discuss the main features of these PDF sets.

Fig. 15
figure 15

Comparison of DIS-only PDFs determined with and without MHOUs in the covariance matrix. The gluon (left) and quark singlet (right) are shown at \(Q=10\) GeV. The theory covariance matrix S has been constructed using the 9-point prescription. The central value of the NNLO determined without MHOU is also shown. All results are shown as a ratio to the central value of the set with theory covariance matrix not included. Note that the uncertainty band has a different meaning according to whether the theory covariance matrix is included or not: if not it is the standard PDF uncertainty coming from data, while if it is included, then it is the total uncertainty including the MHOU

6.1 DIS-only PDFs

We first discuss PDF sets based on DIS data only. Fit quality indicators are collected in Table 6. The theory covariance matrix is always constructed using the 9-point prescription. We show the value of \(\chi ^2/N_{\mathrm{dat}}\) and of the \(\phi \) estimator defined in Eqs. (6.2, 6.3) respectively. Results are shown for both the total dataset and for the individual DIS experiments of Table 2. Note that the total \(\chi ^2\) is no longer just the weighted sum of the individual \(\chi ^2\)s, because it now also includes correlations between experiments.

It is apparent from Table 6 that in all cases the \(\chi ^2\) improves when including the theory covariance matrix, both for individual experiments and for the total dataset. Specifically, the \(\chi ^2\) decreases by about 2-3% when including theory a covariance matrix \(S^{\mathrm{(9pt)}}\) evaluated with the 9-point prescription.

The value of \(\phi \) increases very substantially, suggesting a significant increase in the PDF uncertainty. The expected increase according to Eq. (6.5) is \(r_\phi =2.07\): NLO MHOUs in DIS are much larger than experimental uncertainties. The observed increase, by a factor of 2.17, is in good agreement with this expectation. It is interesting to observe that the NNLO value of \(\phi \) is actually also rather larger than the NLO value, though not quite so much larger, suggesting that at NNLO the MHOUs in DIS might still be quite large.

Next we compare PDFs: in Fig. 15 we compare the gluon and the total quark singlet PDF at \(Q=10\) GeV with and without MHOUs in the covariance matrix, determined using the 9-point prescription.

The NLO results are also compared with the central value of the NNLO fit based on the experimental covariance matrix only. Note that in these comparison plots the PDF uncertainty band is always computed using standard NNPDF methodology, i.e., as the standard deviation over the PDF replica sample. Therefore, this uncertainty band has a different meaning dependent on whether or not the theory covariance matrix is included: when it is not included, the band represents the conventional “PDF uncertainty”, reflecting the uncertainties from the data (and methodology), while when it is included, the band provides the combined “PDF” and MHO uncertainty.

The comparison shows that for PDFs which are strongly constrained by data, such as the quark singlet PDF for \(x\gtrsim 10^{-3}\), the uncertainty does not increase much upon inclusion of the theory covariance matrix, and sometimes it even decreases. However, for several PDFs, including the gluon PDF, which is only loosely constrained by the DIS data, the uncertainty increases substantially with MHOUs. This is of course consistent with the fact that, in the absence of stringent experimental constraints, an extra contribution to the covariance matrix will lead to increased uncertainties in the best fit.

6.2 Global PDFs

We now discuss PDFs determined from the global dataset presented in Sect. 5.1. Only NLO PDFs will be discussed here, with global NNLO PDFs left for future work. The \(\chi ^2\) values and \(\phi \) values are shown in Tables 7 and 8 respectively, both for the total dataset and for the individual processes of Table 2. In comparison to the DIS-only case of Table 6 we now also show results obtained using the 7-point and 3-point prescriptions, and also for the default 9-point prescription but where the results were obtained by including the theory covariance matrix either only in the \(\chi ^2\) definition Eq. (6.2), or only in the data generation Eq. (6.1), in order to understand better the two distinct effects. The baseline NLO and NNLO PDF sets (without theory covariance matrix) are identical in all respects to the NNPDF3.1 PDF sets [68], except for the somewhat different dataset as discussed in Sect. 5.1.

As in the case of the DIS-only fit, upon adding the MHOU we find a reduction of \(\chi ^2\) both for the global fit and for individual datasets. Specifically, the \(\chi ^2\) for the NLO global fit with theory covariance matrix computed with the 9-point prescription decreases by about 3%, and almost coincides with the NNLO \(\chi ^2\), suggesting that indeed the theory uncertainty is correctly accounting for the missing NNLO correction. The pattern at the level of individual datasets is more complex, due to a variety of reasons. In particular, consider the CMS Z \(p_T\) distribution, where a very significant decrease in \(\chi ^2\) is observed when going from NLO to NNLO, but not when adding the theory uncertainty to the NLO. This turns out to be due to a sizable uncorrelated uncertainty which must be added to the NNLO theory prediction in order to account for numerical instabilities (see the discussion of Fig. 6 in Ref. [68]).

On the other hand, the value of \(\phi \) now increases much less than expected: with our favorite 9-point prescription the increase is by about 30%, while the expected \(r_\phi =1.69\). This is an indication that by accounting for the missing NNLO terms, the inclusion of MHOUs resolves some of the tensions in the fit with only the experimental uncertainties, thus reducing the overall effect of the MHOUs. The NNLO fit also shows an increase, to 0.36, but the fact that this is already quite close to 0.41 perhaps suggests that the effect of adding MHOUs to the NNLO global fit will be relatively modest.

Comparing the different prescriptions, results are reasonably stable, even when comparing to the 3-point prescription which, as discussed in Sects. 4.35.3, spans a much smaller subspace of theory variations. However, the 9-point prescription appears to perform best in terms of \(\chi ^2\) quality with very little difference in \(\phi \), in agreement with the results of Sect. 5.4.

We finally turn to fits in which the theory covariance matrix is included either in the \(\chi ^2\) definition Eq. (6.2) but not in the data generation Eq. (6.1), or in the data generation Eq. (6.1) but not in the \(\chi ^2\) definition Eq. (6.2). In the former case, we expect the MHOUs to affect mostly the central value (since the relative weighting of different data points is altered during the fitting according to the relative size of their MHOUs), and to a lesser extent the uncertainties (since the data replicas only fluctuate according to the experimental uncertainties). The results show that indeed including the MHOU in the \(\chi ^2\) definition alone leads to a \(\chi ^2\) value which is very close to that found when the MHOU is fully included, consistent with the expectation that it is the inclusion of the theory covariance matrix in the \(\chi ^2\) which mostly drives the best fit, while the \(\phi \) value increases somewhat less. In the latter case, we expect to obtain increased uncertainties but a worse fit, since the data replica fluctuations are wider due to the MHOU, and this is not accounted for in the \(\chi ^2\). The results indeed show a significant deterioration of fit quality, as expected for an inconsistent fit: the \(\chi ^2\) goes up, and also the \(\phi \) value goes up dramatically, showing the increase in uncertainty due to the inclusion of MHOU in the sampling, now uncompensated by a rebalancing of the datasets through the inclusion of MHOU in the fit.

Table 7 The values of the \(\chi ^2/N_{\mathrm{dat}}\) in NLO global fits with the theory covariance matrix S, compared to the results based on including only the experimental covariance matrix C. Results are shown for the 9-, 7-, and 3-point prescriptions. For the 9-point prescription we also show results obtained including the theory covariance matrix in the \(\chi ^2\) definition Eq. (6.2) but not in the data generation Eq. (6.1) (marked \(S_{\mathrm{fit}}^{9pt}\)) and then in the data generation Eq. (6.1) but not in the \(\chi ^2\) definition Eq. (6.2) (marked \(S_{\mathrm{sampl}}^{9pt}\)). Values corresponding to the NNLO fit with experimental covariance matrix C only are also shown
Table 8 Same as Table 7, but for the values of the \(\phi \) estimator

We now move on to discuss the corresponding results at the PDF level, in analogy with the comparisons presented for the DIS-only fits in Fig. 15. Specifically, in Fig. 16. we show the results of the NLO fits based on C and \(C+S^\mathrm{(9pt)}\), as well as the central value of the NNLO fit based on C, for the gluon, the total quark singlet, the anti-down quark, and the total strangeness PDFs, all at \(Q=10\) GeV. We also show in Fig. 16 the same PDFs but at the scale \(Q=1.6\) GeV at which PDFs are parametrized.

Fig. 16
figure 16

Same as Fig. 15 now for the NNPDF3.1 global fits. We show the results of the NLO fits based on C and \(C+S^{\mathrm{(9pt)}}\) normalized to the former, as well as the central value of the NNLO fit based on C. for the gluon, the total quark singlet, the anti-down quark, and the total strangeness PDFs, all at \(Q=10\) GeV

We find that in the data region the PDF uncertainty is only very moderately increased by the inclusion of the theory covariance matrix, while central values can shift significantly, by up to one sigma. This is consistent with the observation that the \(\phi \) values in Table 8 increase by only a moderate amount upon inclusion of the theory covariance matrix. This provides evidence that in the data region the inclusion of the theory covariance matrix resolves tensions which are otherwise present in the global dataset. In contrast, in regions where PDFs which are only loosely constrained by the data, and in particular in the extrapolation regions, the PDF uncertainty increases significantly.

Fig. 17
figure 17

Same as Fig. 16 but now with results shown at the scale \(Q=1.6\) GeV at which PDFs are parametrized

When comparing PDFs at the parametrization scale in Fig. 17, an especially interesting comparison is with respect to the central NNLO value: not only is this quite compatible with the uncertainty band, but there is now clear evidence that upon inclusion of the NLO MHOU the central best fit moves towards the correct NNLO result. Of course, this improved agreement of the best-fit NLO and NNLO PDFs is scale-dependent, since PDFs at NLO and NNLO evolve in different ways, and the scale at which NLO and NNLO become closest will depend on the scale of the data which dominate the determination of each PDF combination. However, the agreement is seen in Fig. 16 to persist by and large also at high scale. This is further evidence that indeed the theory covariance matrix has resolved tensions due to MHOs. This improved agreement of the central value of the NLO \(C+S^{(\mathrm 9pt)}\) with the NNLO C fits is non-trivial: for instance, inclusion of the theory covariance matrix leads to a suppression of the gluon at large x and an enhancement of strangeness, both of which are indeed also observed at NNLO.

Next, in Fig. 18 we compare PDFs obtained using different prescriptions. The corresponding relative PDF uncertainties are compared in Fig. 19. In agreement with what we saw for the \(\chi ^2\) and \(\phi \) values in Tables 78 results are quite stable with respect to the choice of prescription, though in the most extreme case of the 3-point prescription, where factorization scale variations are entirely uncorrelated between different processes, we observe somewhat smaller uncertainties, and a central value which is closer to that when the MHOU is not included.

Fig. 18
figure 18

Same as Fig. 16 now comparing the results of the NNPDF3.1 global fits with the theory covariance matrix constructed accordingly to the 3-, 7-, and 9-point prescriptions, normalized to the central value of the latter

Fig. 19
figure 19

Same as Fig. 18, now showing relative PDF uncertainties, normalized to the central value of the baseline set. Note that the y-axes ranges are different for each PDF combination

Finally, in Fig. 20 we compare PDFs obtained including the theory covariance matrix only in the \(\chi ^2\) definition Eq. (6.2) but not in the data generation Eq. (6.1) and conversely. We see that when the theory covariance matrix is included in the replica generation but not in the \(\chi ^2\), uncertainties increase very significantly. This result is in agreement with the observation from Table 7 that in this case the fit quality significantly deteriorates, which is because the fit becomes inconsistent due to the \(\chi ^2\) not matching the wider fluctuations in the data. The effect is particularly visible for the quark distributions. On the other hand, including the theory covariance matrix only in the \(\chi ^2\) singles out the effect of the theory covariance matrix on central values, due to rebalancing of datapoints in the fit according to their relative MHOU. Indeed in this case the central value is very close to that obtained when including the MHOU is both data generation and fit. We also see that the change in uncertainties in the data region is now very small, consistent with Table 7. These results confirm our expectation that in the full fit, while the MHOU results in a substantial increase in the fluctuations of data replicas, this is compensated by a relaxation of tensions due to the inclusion of MHOU the fit, with the net result that while central values shift, overall uncertainties do not increase much.

Fig. 20
figure 20

Same as Fig. 16, now comparing the results of the baseline \(C+S^{(\mathrm 9pt)}\) fit with those in which the theory covariance matrix S is included either in the \(\chi ^2\) definition or in the generation of Monte Carlo replicas, but not in both

7 Implications for phenomenology

Whereas a full assessment of the impact of the inclusion of MHOU in PDFs will be possible only once we have global NNLO sets with MHOU, it is worth performing a first phenomenological investigation, by computing reference LHC standard candles with the NLO PDF sets which include MHOUs presented in Sect. 6, and comparing to results with the corresponding NLO PDF sets in which no MHOU is included.

In this section we will specifically consider Higgs boson production in gluon-fusion and in vector-boson fusion, top quark pair and Z and W electroweak gauge boson production. Note that the latter processes are among those which have been used for PDF determination, see Table 2. This raises the issue of possible double counting of uncertainties between the MHOU in the PDF and in the hard matrix element. This will be addressed in Sect. 8.1 below.

As discussed in Sect. 6, once the MHOU is included in the covariance matrix, the standard NNPDF methodology can be used, but with the PDF uncertainties now also including a theory-induced contribution. Specifically, PDF uncertainties (which now include the MHOU uncertainty) are obtained as standard deviations over the replica sample. The total uncertainty on a physical prediction is then obtained by combining this uncertainty with that on the hard cross-section for the given process. The latter is conventionally obtained as the envelope of a 7-point scale variation, see e.g. Ref. [18]. Of course, an alternative possibility is to compute the theory uncertainty on the hard cross-sections in exactly the same way as we compute it when performing PDF determination, i.e. using the theory covariance matrix. In this case, the MHOU on any measurement is found as the diagonal element of the covariance matrix, evaluated for the given measurement. Here we will compute the theory uncertainty both using the theory covariance matrix (with the 9-point prescription, given in Eq. (4.9)), and as a 7-point envelope. The MHOU uncertainty on the hard cross-section can then be combined with the total uncertainty on the PDF (which includes both MHOU and data uncertainties) in quadrature. A more detailed discussion of prescriptions for the computation of the total uncertainty on a physical observable, including explicit formulae, will be given in Sect. 8.1 below.

The current state of the art for precision phenomenology is NNLO, and thus NNLO PDFs would be needed for accurate predictions. However, as discussed in Sect. 6, at present only NLO global PDFs with MHOU are available. In principle, NNLO PDFs from a DIS only fit are also available. However, also as discussed in Sect. 6, some of these PDFs (specifically the gluon) are affected by large uncertainties due to the lack of experimental constraints. The comparison of PDFs with and without MHOU for such sets would thus be rather misleading. Therefore, in this section we will focus on NLO PDFs. It should of course be kept in mind that NNLO PDFs with MHOU are likely to have smaller uncertainties.

7.1 Higgs production

We first discuss Higgs production in gluon fusion (ggF) and in vector boson fusion (VBF). These two processes are of direct relevance for the characterization of the Higgs sector and are both currently known at N\(^3\)LO accuracy [81,82,83,84]. Note that the perturbative behavior and leading partonic channels for these processes are quite different. Higgs production in gluon fusion is driven by the gluon-gluon luminosity and its perturbative expansion converges slowly, with manifest convergence reached only at N\(^3\)LO. Vector boson fusion is driven by the quark-antiquark luminosity and it exhibits fast perturbative convergence.

In Table 9 we present predictions for Higgs production in gluon fusion at the LHC for \(\sqrt{s}=13\) TeV. We perform the calculation at NLO, NNLO and N\(^3\)LO in the rescaled effective theory approximation using ggHiggs [85,86,87,88,89,90] with \(\mu _f=\mu _r=m_H/2\) as central scale, with the NLO global sets obtained in this paper, with and without MHOUs, as input PDFs at all orders. The results are displayed graphically in Fig. 21, where, for the NNLO computation, we also show the central value found using NNLO PDFs.

We find that for all perturbative orders the central values obtained with PDFs with and without MHOU are very similar, while the PDF uncertainty is about 50% larger when MHOU are included in the PDF fit. This can be understood by noticing that for the intermediate values of the momentum fraction, \(x\simeq 10^{-2}\), relevant for Higgs production in gluon fusion, the PDF uncertainty of the gluon is increased in the \(C+S^{(\mathrm 9pt)}\) fit as compared to the C-only fits, see Fig. 16. Comparison to the result obtained using NNLO PDFs (for the NNLO computation) shows that upon inclusion of the MHOU the PDF uncertainty band of the result with NLO PDFs now includes the NNLO PDF result, while it would not in the absence of MHOU, both because of the (small) shift in central value and of the widening of the uncertainty band.

From Table 9 one can also observe that the MHOU on the hard matrix element uncertainty \(\sigma _{\mathcal {F}}^{\mathrm{th}}\) evaluated using the 9-point theory covariance matrix, Eq. (4.9), is compatible with the canonical 7-point envelope if the latter is symmetrized by taking the maximum value between the lower and upper uncertainties. In particular, the theory covariance matrix estimate is slightly larger than the envelope prescription at NLO and at NNLO, while it becomes a little smaller at N\(^3\)LO. Even so, the NLO uncertainty band does not contain the NNLO central value, which lies just above the edge of the band.

We conclude that using NLO PDFs in the N\(^3\)LO calculation, the inclusion of MHOU in the PDFs translates into a few per-mille increase of the PDF uncertainty at the cross-section level. In Ref. [82] NNLO PDFs were used with the N\(^3\)LO calculation in order to provide a state-of-the art result, and a MHOU uncertainty on the NNLO PDF was estimated based on the difference between results obtained using NLO and NNLO PDFs. Once NNLO PDFs with MHOUs determined within our approach are available it will be interesting to compare our results with this estimate.

Table 9 The total cross-sections for Higgs production in gluon fusion (in pb) obtained by using NLO global PDFs based on either C or \(C+S^{\mathrm{(9pt)}}\), see Table 5. We quote the central prediction, the total PDF uncertainty (first) and the MHOU uncertainty on the hard cross-section (second) expressed as a percentage of the central value. The latter is evaluated both using the theory covariance matrix (9-point prescription) or, in parenthesis, a (symmetrized) envelope of the 7-point scale variations (see Sect. 8.1), obtained by taking the maximum value between the lower and upper uncertainties
Table 10 Same as Table 9, now for Higgs production in vector boson fusion
Fig. 21
figure 21

Graphical representation of the results of Tables 9 and 10. At each perturbative order the pair of uncertainty bands on the left (blue) is computed with PDFs based on the experimental covariance matrix C, while the pair of uncertainty bands on the right (red) with PDFs based on the combined experimental and theoretical covariance matrix \(C+S\) (9-point prescription). The light-shaded bands represent the uncertainty on the hard cross-section (“scale uncertainty”) evaluated using the theory covariance matrix (see text) the dark bands represent the PDF uncertainty. For the NNLO result, we also show the central value obtained using NNLO PDFs as a dashed horizontal line

We now turn to Higgs production in vector boson fusion. We perform the calculation at N\(^3\)LO accuracy using proVBFH-inclusive [84, 91] with central factorization and renormalization scales set equal to the squared four-momentum of the vector boson. Results are collected in Table 10 and shown in Fig. 21. The MHOU corrections to the PDFs are very small, so PDF uncertainties with or without theory covariance matrix are very similar. Also in this case, like for gluon fusion, the uncertainty on the hard matrix element computed with the 9-point theory covariance matrix is similar to the one obtained by symmetrizing the 7-point envelope.

The smallness of the MHOU in the PDF follows from the fact that VBF Higgs production is driven by the quark-antiquark luminosity, which in turn is dominated by the quark PDF in the data region, whose uncertainties, as we have seen in Sect. 6.2, are almost unaffected by the inclusion of MHOU. Comparison to the result obtained using NNLO PDFs (for the NNLO computation) shows that the NNLO PDF result is at the edge of the PDF uncertainty band of the result with NLO PDFs if MHOU are included, while it is off by almost two \(\sigma \) if they are not. This is essentially due to the significant shift in central value, in agreement with the observation made in Sect. 6.2, where we noticed that MHOUs have the effect of moving the central value of the PDFs in the data region towards the NNLO result. The shift in the central value of the VBF cross-section due to the MHOU is in fact quite significant: its size is comparable to the MHOU \(\mathop {\sigma _{\mathcal {F}}}\nolimits ^{\mathrm{th}}\) on the NLO matrix element, and indeed the shift when going from NLO to NNLO matrix elements, and thus much larger that the corresponding N\(^3\)LO correction.

We conclude that for VBF the main effect of including the MHOU in the PDF is a significant shift in the central value of the prediction. Also in this case estimates of the MHOU on the NNLO PDF were presented in Ref. [84], and it will be interesting to compare them to our approach once NNLO PDFs with MHOU determined within our approach are available.

A common feature of gluon fusion and vector-boson fusion is that it is only upon inclusion of the MHOU that the result found using NNLO PDFs is within or at the edge of the PDF uncertainty band of the result found with NLO PDFs.

7.2 Top quark pair production

We now study the impact of the PDF-related MHOU on the total top-quark pair production cross-section at the LHC for different center-of-mass energies. In Table 11 we collect, using the same format as Table 9, the predictions for the top-quark pair-production cross-sections at \(\sqrt{s}=7\), 8 and 13 TeV obtained using the top++ code [92] and setting the central scales to \(\mu _f=\mu _r=m_t=172.5\) TeV. The results in the case of 8 and 13 TeV are also displayed in Fig. 22, where again at NNLO we also show the result obtained using NNLO PDFs.

Table 11 Same as Table 9, now for top-quark pair-production at \(\sqrt{s}=7,8\) and 13 TeV
Fig. 22
figure 22

Same as Fig. 21 for top-quark pair production at 8 and 13 TeV, see also Table 11

Just as in the case of Higgs production via gluon-gluon fusion, we find that for top-quark pair production the central values obtained with PDFs with and without MHOU are rather similar, and well within the one-\(\sigma \) PDF uncertainty. We also observe that the PDF uncertainty at \(\sqrt{s}=7\) and 8 TeV (13 TeV) is about 50% (20%) larger once MHOU are included in the determination of the PDFs. This is again compatible with the corresponding behavior of the gluon PDF shown in Fig. 16, where it can be observed that, for \(x\simeq 0.1\), relevant for top pair production at \(\sqrt{s}=7\) and 8 TeV, the PDF uncertainty is increased in the \(C+S^{\mathrm{(9pt)}}\) fit compared to the C-only fit, while this increase is less marked for \(x\sim 0.3\), relevant for top pair production at \(\sqrt{s}=13\) TeV. Also in this case, the NNLO prediction using NLO PDFs is in better agreement with the that using NNLO PDFs once MHOUs are included, and in fact only in this case the latter is within the PDF error band of the former.

In addition, we note once again that the uncertainty on the hard cross-section \({\sigma _{\mathcal {F}}}^{\mathrm{th}}\) evaluated using the 9-point covariance matrix is rather similar to that obtained from the symmetrized 7-point envelope. In particular, the 9-point result is slightly larger (smaller) than the 7-point envelope at NNLO (NLO). Finally, from Fig. 22 we notice that for this process the MHOU on the hard cross-section dominates the PDF uncertainty (with or without MHOU included), even with NLO PDFs.

7.3 Z and W gauge boson production

We finally turn to gauge boson production, for which we obtain predictions using the computational framework Matrix [93]. In this formalism, all tree-level and one-loop amplitudes are obtained from OpenLoops [94,95,96]. For these theoretical predictions for inclusive W and Z production cross sections at \(\sqrt{s}\) = 13 TeV, we adopt realistic kinematic cuts similar to those applied by ATLAS and CMS. The fiducial phase space for the \(W^{\pm }\) cross-section is defined by requiring \(p_{l,T}\ge 25\) GeV and \(\eta _{l} \le \) 2.5 for the charged lepton transverse momentum and pseudo-rapidity and a missing energy from the neutrino of \(p_{\nu ,T}\ge 25\) GeV. In the case of Z production, we require \(p_{l,T}\ge \) 25 GeV and \(|\eta _l|\le \) 2.5 for the charged leptons transverse momentum and rapidity and 66 \(\le m_{ll} \le \) 116 GeV for the di-lepton invariant mass.

In Table 12 we display a similar comparison as in Table 9 now for W and Z gauge boson production at \(\sqrt{s}=13\) TeV. The corresponding graphical representation of the results is provided in Fig. 23, again using the same conventions as in Fig. 21 and again also showing the NNLO result with NNLO PDFs.

We find that when including the MHOU the PDF uncertainty is increased by \(\simeq 70\%, 30\%\) and \(75\%\) for Z, \(W^+\), and \(W^-\) production respectively. Given that W and Z production at ATLAS and CMS at \(\sqrt{s}=13\) TeV is sensitive to the light sea quarks down to \(x\simeq 10^{-3}\), this increase in the PDF uncertainty once MHOU are accounted for is consistent with the corresponding increase reported in the case of the singlet PDF in Fig. 19.

Table 12 Same as Table 9, now for W and Z gauge boson production at \(\sqrt{s}=13\) TeV. The cross-section is given in nb
Fig. 23
figure 23

Same as Fig. 21 for \(W^\pm \) and Z gauge boson production at \(\sqrt{s}=13\) TeV, see also Table 12

Similarly to Higgs production in vector-boson-fusion, we find that the inclusion of MHOU in the PDF shifts the central value of the prediction, by an amount which is comparable to or larger than the data-driven PDF uncertainty. Yet again, the agreement of the NNLO prediction with NLO PDFs with that which is obtained when NNLO PDFs are used is significantly improved: for Z production within the PDF error band and for W production just barely outside it. We conclude that for weak gauge boson production at the LHC the impact of the MHOU associated to the PDFs is twofold: on the one hand an overall increase in the PDF uncertainties that ranges between 30 and 70% depending on the process, and on the other hand a shift in the central values which is comparable to that of the PDF uncertainties of the fit without MHOU.

8 Usage and delivery

As mentioned previously, the PDF sets with MHOU presented in Sect. 6 can be used in essentially the same way as the standard NNPDF sets. In this section we discuss how MHOUs included in PDF sets should be combined with those in hard matrix elements, specifically addressing some conceptual issues, and we then provide detailed instructions for their use. We then discuss the delivery of the PDF sets presented in this work, and provide a list of the sets which are being made publicly available by means of the LHAPDF interface.

8.1 Combining MHOUs in PDFs and hard matrix elements

As discussed in the introduction, the MHOU on PDFs discussed in this paper arises due to the fact that PDFs are determined using perturbative computations performed at a finite order in the perturbative expansion, and it manifests itself in the fact that PDFs change when varying the order at which they are determined: NLO and NNLO PDFs differ. We have further seen in Sect. 3 that there exist two distinct sources of MHOU in the PDF: that related to MHOs in the computation of the hard cross-sections for those processes used for PDF determination, and that coming from MHOs in the anomalous dimensions. These two sources of MHOU in the PDFs are respectively associated with renormalization and factorization scale variation and can be treated as independent of each other, at least with the definition given here and summarized in Table 1.

On top of this MHOU on the PDF, when computing a factorized prediction for a PDF-dependent hard process not used in the determination of the PDFs, but rather predicted using a given PDF set, there is then the usual MHOU on the hard process itself. This, in turn, just like the MHOU on the PDF, comes from two separate sources: the MHOU on the hard cross-section for the given process, and the MHOU on the evolution of the PDF from the initial scale to the scale of the process. This has been seen explicitly in the phenomenological results presented in Sect. 7, Tables 9, 10, 11, 12 and Figs. 21, 22, 23. So each prediction carries two uncertainties, a PDF uncertainty, which includes the MHOU in the determination of the PDFs (shown as a dark band in the plots, and given as the first uncertainty in the tables), and a “scale” uncertainty in the prediction (shown as a light band in the plots, and given as the second uncertainty in the tables). Note that in all these plots and tables the PDF uncertainty (when including the theory covariance matrix) includes both the MHOU, and the standard PDF uncertainty due to the uncertainties in the experimental data, while the “scale” uncertainty is just the usual MHOU in the prediction.

In summary, a factorized prediction is affected by two different sources of MHOU: the MHOU in the PDF determination, included in the PDF uncertainty, and then the MHOU in the calculation of the prediction itself. Each in turn receives contributions from both renormalization and factorization scale variation. This immediately raises the question as to whether some of these uncertainties are correlated, and – if this is the case – whether this correlation can be easily accounted for.

A first obvious source of correlation arises when producing a prediction for a process which is among those included for the PDF determination. Examples of this category of processes are top quark pair and gauge boson production, discussed in Sect. 7. They are already included among the processes of Table 2. The MHOU coming from renormalization scale variation is then correlated. Indeed, we know from Fig. 7 that any two predictions for the same physical process are highly correlated, particularly at points which are kinematically close. One might choose to ignore this problem, on the grounds that the main purpose of PDF determinations is to predict new processes, such as Higgs production, or BSM processes: after all, if there is new data for an existing process, it can be included in the PDF fit, and then all correlations would be retained. However this (partial) solution is not available for factorization scale variations, which are used to estimate the MHOU in the evolution between different scales: since the PDFs are universal, these MHOUs are correlated across all processes, both within the fit and also in any predictions made subsequently using the PDFs.

The existence of correlations between MHOU in the fitted process and MHOU in the predicted process can be demonstrated rather clearly [15] by noting that PDFs are merely a tool to express a physical observable in terms of other physical observables. In particular QCD predicts the cross-section for one observable in terms of measurements of cross-sections for the same or other observables. Normally to do this one first extracts the PDF from the cross-section data at a range of scales, and then computes cross-sections at some other scale using the extracted PDFs. However in the case of nonsinglet structure functions (discussed in Ref. [15] as a simple paradigm), where the relation between structure function and PDF is straightforward and linear, one can eliminate the nonsinglet PDF altogether: given the structure function at one scale, QCD then predicts the structure function at a different scale, with no reference to any PDF.

Now, it is clear that when expressing one process in terms of another process directly, without any PDFs, there is a significant cancellation of MHOU, specifically that related to perturbative evolution, estimated by means of factorization scale variation. In the example of the nonsinglet structure function, if the structure function at one scale is predicted from its value at some different scale, the factorization scale uncertainty will only depend on the evolution between the two scales involved. Hence there is only one source of MHOU in the prediction. On the other hand, when using a PDF, there are, as explained above, two sources of MHOU estimated through factorization scale variation: that from evolving the initial PDF up to the scale of the data used in fit, and that from evolving the initial PDF up to the scale of the prediction. Hence, one has in effect two sources of MHOU, and if these are assumed to be uncorrelated, and thus added in quadrature, any cancellations are lost and the result will inevitably be an over-estimate of the uncertainty.

If PDFs are to be delivered in the usual way as a universal (i.e. process independent) PDF set, much of the detailed information about the specific data, their uncertainties, and the theoretical calculations, and in particular their MHOUs that have gone into determining the PDFs is lost: all that remains are the process independent PDFs. Given only the PDFs, it is clearly impossible to reconstruct the original data, or the MHOUs specific to calculations at each data point, since many different data sets, from different processes, can yield the same PDFs. Consequently, when using PDFs to make a prediction, the correlation between the MHOU in the prediction and that in the calculations used to determine the PDFs cannot be computed, even in principle: with only the universal PDFs as input, the correlation is no longer available. The loss of this correlation is the inevitable price to pay for PDF universality.

Having understood that neglecting such correlations is inevitable, at least without extending the range of deliverables, one may ask how serious the issue is. The total MHOU in the determination of the PDF arises from the combination of the MHOU of theoretical predictions made for a large number of datapoints. The correlations between all these are automatically kept into account by the fitting procedure. Inevitably the fit adjusts to take the MHOU into account: datapoints associated with large MHOU (compared to the experimental uncertainty) will be deweighted in the fit, while the effect of data with small MHOU (compared to their experimental uncertainty) will be relatively unchanged. This rebalancing of the fit is one of the main consequences of including the MHOU.

Hence, as we saw in our global fit results, the MHOUs have only a relatively small impact on the overall PDF uncertainty: rather by resolving tensions in the fit due to MHOs in the theoretical predictions, they lead to significant shifts in the central value. However when making a prediction, the uncertainty due to MHOU in the hard process can be large: in fact in many cases as large or even larger than the total PDF uncertainty (including its MHOU). Neglecting the correlation between the MHOU in the prediction (which might be large) and the MHOU in the PDFs (which is relatively small) by adding them in quadrature is then likely to be a small effect. Note that this does not mean that the MHOU on the PDF was negligible in the first place: and indeed as we have seen it may significantly affect the central value of the prediction. Rather, it is its effect on the overall PDF uncertainty which, at least in the data region that we are discussing here, is relatively small. Furthermore, because what is being neglected is a correlation which would lead to a cancellation of uncertainties, it can at worst lead to a small overestimate of uncertainties.

We conclude that the while there is clearly a correlation between the MHOU in the determination of the PDFs and the MHOU of the hard matrix element of the predicted process, ignoring this correlation, and thus adding the two sources of MHOU in quadrature, will give a result which is at worst a little conservative. Given all the well known uncertainties intrinsic to the estimate of MHOUs through scale variation, we consider such an approach both pragmatic and realistic.

8.2 Computation of the total uncertainty

Having concluded that uncorrelated combination of the MHOU on the PDF and on the hard matrix element is justified, we summarize our procedure for computing uncertainties in practice.

To begin with, the PDF uncertainty \(\sigma ^{\mathrm{PDF}}_{\mathcal {F}}\) associated with a given cross-section \(\mathcal {F}\) is evaluated as usual in the NNPDF methodology as the standard deviation over the replica set:

$$\begin{aligned} \sigma ^{\mathrm{PDF}}_{\mathcal {F}} = \left( {{1}\over {N_{\mathrm{rep}}-1}} \sum _{k=1}^{N_{\mathrm{rep}}} \left( \mathcal {F} [ \{ q^{(k)} \}] - \left\langle \mathcal {F} [ \{ q \}] \right\rangle \right) ^2 \right) ^{1/2}. \end{aligned}$$
(8.1)

If this prescription is applied to a PDF set with “standard” PDF uncertainty (such as the published NNPDF3.1 [5]) set, the resulting uncertainty only includes the correlated statistical and systematic uncertainties from the data, and the methodological uncertainty intrinsic to any PDF fit. If the PDF sets including MHOU presented in Sect. 5 of this paper are used instead, the resulting uncertainty obtained from Eq. (8.1) accounts for both the data-driven and MHOU on the PDF, with all correlations taken into account.

Because the MHOU on the hard matrix element is treated as uncorrelated to the PDF uncertainty, it can in principle be computed with any prescription preferred by the end-user. A commonly used prescription is 7-point scale variation [18]. Our preferred prescription is instead to use the same methodology as used for the computation of the theory covariance matrix. In this case, the uncertainty on the cross-section \(\mathcal {F}\) is then simply the corresponding diagonal entry of the covariance matrix element, namely

$$\begin{aligned} {\sigma _{\mathcal {F}}}^{\mathrm{th}} = \left[ S^{(\mathrm 9pt)}_{{\mathcal {F}}{\mathcal {F}}}\right] ^{1/2}, \end{aligned}$$
(8.2)

where \(S^{(\mathrm 9pt)}_{{\mathcal {F}}{\mathcal {F}}}\) is evaluated using our default 9-point prescription defined by Eq. (4.9), with \(\Delta _{ij}\) computed for \(i=j=\mathcal {F}\), i.e. the theory prediction for the given observable. We showed in Sect. 7 that for various standard candles our 9-point theory covariance matrix prescription and the 7-point envelope prescription give very similar results, provided the envelope prescription is symmetrized.

The PDF uncertainty Eq. (8.1) and the uncertainty on the hard matrix element Eq. (8.2) can then be treated as uncorrelated uncertainties. It is then appropriate to combine them in quadrature, so the total uncertainty on the cross-section \(\mathcal {F}\) is simply

$$\begin{aligned} \sigma _{\mathcal {F}}^{\mathrm{tot}} = \left( \left( {\sigma _{\mathcal {F}}}^{\mathrm{th}} \right) ^2 + \left( \sigma ^{\mathrm{PDF}}_{\mathcal {F}} \right) ^2\right) ^{1/2} \, . \end{aligned}$$
(8.3)

We believe that this prescription provides a conservative estimate of the combined MHOU on the predicted cross-section.

Note that when using a \(\chi ^2\) to assess the quality of the agreement between experimental data and the associated theory predictions for a PDF set which includes MHOUs, the MHOU must be always be included in the definition of the \(\chi ^2\) estimator, ideally (though not necessarily) by means of the theory covariance matrix. This is because, as seen in Sect. 6.2, the inclusion of MHOU modifies the best-fit central value, and thus if the MHOU were not included in the \(\chi ^2\), these PDFs would not provide the best fit, and the results might be misleading. Because the theory covariance matrix has been included in the fitting (based on the argument of Sect. 2) as uncorrelated to the experimental covariance matrix, when assessing fit quality it should be regarded as an additional systematic uncertainty, specific to the determination of PDFs from the data, to be added in quadrature to the usual experimental systematics.

8.3 Delivery

The variants of the NNPDF3.1 NLO global sets presented in this work are publicly available in the LHAPDF format [22] from the NNPDF website:

http://nnpdf.mi.infn.it/nnpdf3-1th/

In the following, we list the PDF sets that are made available. The NLO sets based on the theory covariance matrix are:

NNPDF31_nlo_as_0118_scalecov_9pt

NNPDF31_nlo_as_0118_scalecov_7pt

NNPDF31_nlo_as_0118_scalecov_3pt

which correspond to the fits based on Eq. (6.2) in the cases in which the theory covariance matrix \(S_{ij}\) has been evaluated with the 9-, 7-, and 3-point prescriptions, respectively.

We have also constructed NLO PDF sets based on scale-varied theories, to be discussed in Appendix B below. These are determined using Eq. (B.1), and they are

NNPDF31_nlo_as_0118_kF_1_kR_1

NNPDF31_nlo_as_0118_kF_2_kR_2

NNPDF31_nlo_as_0118_kF_0p5_kR_0p5

NNPDF31_nlo_as_0118_kF_2_kR_1

NNPDF31_nlo_as_0118_kF_1_kR_2

NNPDF31_nlo_as_0118_kF_0p5_kR_1

NNPDF31_nlo_as_0118_kF_1_kR_0p5

where the naming convention indicates the values of the scale ratios \(k_f\) and \(k_r\). Note that the NNPDF31_nlo_as_0118_kF_1_kR_1 set is also the baseline (central scales and experimental covariance matrix only) to be used in the comparisons with the fits based on the theory covariance matrix listed above. Finally, we also provide the set

NNPDF31_nnlo_as_0118_kF_1_kR_1

which corresponds to the NNLO fit with central scales and experimental covariance matrix only, that has been produced for validation purposes.

It is important to bear in mind that the variants of the NNPDF3.1 fits presented in this work are based on a somewhat different dataset to that used in the default NNPDF3.1 analysis. Therefore, when using these sets it is important to be consistent: for example by comparing fits with and without MHOU that are based on a common input dataset.

In addition to the sets listed above, the other PDF sets presented in this paper, such as the DIS-only fits based on scale-varied calculations and on the theory covariance matrix, are available from the authors upon request.

9 Summary and outlook

In this work we have presented the first PDF determination that includes MHOU as part of the PDF uncertainty. This is in principle required for consistency, given that MHOU are routinely part of the theoretical predictions for hadron collider processes, and likely to become a requirement for precision collider phenomenology as other sources of uncertainties decrease.

The bulk of our work amounted to establishing a general language and formalism for the inclusion of MHOU when multiple processes are considered at once in the global PDF fit, constructing prescriptions for estimating these MHOU by means of scale variation, and for validating them in cases in which the higher order corrections are known. The formalism presented here is sufficiently flexible that it can also be applied to different sources of theoretical uncertainty, such as nuclear corrections or higher twists, and could also be used in conjunction with alternative ways of estimating MHOU, such as for example the Cacciari–Houdeau method.

The validation studies presented here suggest however that the conventional scale variation method to estimate the MHOU works remarkably well. Indeed, when coupled to the theory covariance matrix formalism that we introduced, this method turns out to be free of the instabilities that plague envelope techniques, and it leads to results which appear to be reasonably stable and thus insensitive to the arbitrary choices that are inherent to its implementation. The reason for these properties is essentially that, within a covariance matrix approach, possible directions which do not correspond to actual MHO have no impact on the fitting.

Our results however also suggest that even more realistic estimates of MHOU might be obtained through more complex patterns of scale variation than those considered here. Specifically, a more refined treatment of factorization scale variation is likely to be advantageous, in which independent variation is performed for each eigenvalue of the anomalous dimension matrix. Also, it might be advantageous to vary independently the renormalization scales in different partonic sub-channels. Indeed, we have observed from the validation of our estimate of MHOU, while always reasonably successful for the datasets considered here, deteriorates as the size of the dataset increases, which suggests that more complex structures might be required. Here we have performed a first investigation, and the exploration of these more complex patterns of scale variation will be left for future work.

On the phenomenological side, our results show that at least at NLO the main effect of the inclusion of MHOU in PDF determination is to improve the accuracy of the result, while not significantly reducing its precision. Indeed, whenever experimental information is abundant, in particular for a global dataset, we have found that the total PDF uncertainty is only moderately affected by the inclusion of MHOU – in fact, for the datapoints included in PDF determination it even decreases – but the central value moves closer to the true result. Moreover, the fit quality improves, thereby showing that the main effect of the inclusion of MHOU is in reducing tensions between datasets due to imperfections in their theoretical description.

The most interesting future phenomenological development will be of course the extension of our methodology to the determination of MHOU in a state-of-the-art global NNLO PDF set. It will be interesting to assess to what extent the behaviour observed at NLO persists there. More generally, the inclusion of MHOU at NNLO is expected to lead to the most precise and accurate PDF sets that can be determined with currently available theoretical and experimental information.