On the Consistent Use of Scale Variations in PDF Fits and Predictions

We present an investigation of the theoretical uncertainties in parton distribution functions (PDFs) due to missing higher-order corrections in the perturbative predictions used in the fit, and their relationship to the uncertainties in subsequent predictions made using the PDFs. We consider in particular the standard approach of factorization and renormalization scale variation, and derive general results for the consistent application of these at the PDF fit stage. To do this, we use the fact that a PDF fit may be recast in a physical basis, where the PDFs themselves are bypassed entirely, and one instead relates measured observables to predicted ones. In the case of factorization scale variation we find that in various situations there is a high degree of effective correlation between the variation in the fit and in predicted observables. In particular, including such a variation in both cases can lead to an exaggerated theoretical uncertainty. More generally, a careful treatment of this correlation appears mandatory, at least within the standard scale variation paradigm. For the renormalization scale, the situation is less straightforward, but again we highlight the potential for correlations between related processes in the fit and predictions to enter at the same level as between processes in the fit or prediction alone.


Introduction
The history of the determination of parton distribution functions (PDFs) from comparison to data goes back many decades, see [1] for a recent review. For some years the precision in both the data and theory was such that no systematic uncertainty estimate on the PDF at all was warranted or required. If some estimate of uncertainty were needed, then a comparison of different PDFs from different groups, or using different assumptions for one group, were thought to be sufficient. The situation changed in the first years of the new millennium. This was largely driven by the very precise measurements of structure function data over a wide range of both x and Q 2 by the HERA experiment (see [2] for the final Run I + II combination). In addition, various apparent observed excesses over the Standard Model predictions, such as in high E ⊥ inclusive jet production at CDF [3] were subsequently explained by a suitable modification of the PDFs [4], rather than being due to new physics. A systematic evaluation of PDF uncertainties therefore became essential, and the first global PDFs with an estimate of the uncertainty due to the experimental precision were released by CTEQ [5,6] and MRST [7], building on earlier DIS-only fits [8][9][10][11].
In these first fits PDF uncertainties were a few percent at best, and much larger for many PDF flavours and x regions. Soon after this the full calculation of the next-to-next-to leading order (NNLO) splitting functions for the evolution of PDFs were presented in full [12,13] and NNLO extractions of PDFs became possible, as in e.g. [14,15] (with minor approximations for some data sets). At this point it was assumed that the theoretical precision on the PDFs was rather better than the uncertainties from the data, as well as being more difficult to quantify. Hence PDF uncertainties were always interpreted as being an experimental uncertainty due to the statistical and systematic uncertainties of the data included in the fit. In recent years the understanding of the experimental uncertainties on PDFs has improved, and as well as the Hessian-based approach of the original PDF fitters an alternative approach based on neural networks and generation of statistically distributed replicas of PDFs has reached full maturity, as implemented in the NNPDF sets (see [16] for the most recent fit). Good agreement between the two approaches, both for central values and uncertainties is seen [17]. Indeed PDFs can be combined and those generated via the Hessian procedure can be converted to replicas and vice versa [18,19].
Any systematic consideration of theoretical uncertainties in PDF fits was limited to variations due to changes in the strong coupling α S (M 2 Z ), quark masses and sometimes possible higher twist terms. In addition to this, the convergence of different variants of variable flavour number schemes has been demonstrated [20], and some agreement on the influence of use of variable flavour as opposed to fixed flavour schemes, and on fitting rather than imposing cuts for higher twist effects has been demonstrated [21][22][23]. However, these omit a potentially significant source of uncertainty, namely due to the fact that fixed-order perturbative predictions are used for the theoretical input to PDF fits. The uncertainty due to the approximate form of these, namely due to missing higher order (MHO) corrections, has until very recently not been considered at all, even on a semi-quantitative basis. We should note that introduction of a tolerance in PDF fits, such as the dynamic tolerance procedure for ∆χ 2 determination introduced in [24], is largely introduced to take account of tension between data sets in the fit, as justified in for example [18]. However, some of the apparent tension between data sets in a fixed order fit may in fact be due to these missing higher order corrections. Such an effect is clearly seen in a LO PDF fit, for example, where the NLO corrections to DIS and Drell Yan cross sections are very different. Hence some part of the tolerance can likely be attributed to this, although in a NNLO fit this is probably a small component, and it is certainly challenging to quantify this precisely. Nonetheless, it would be interesting to pursue a quantitative study of the degree to which observed tensions between data sets in the fit reduce as additional higher order corrections and/or uncertainties due to the missing higher orders are included.
A more focussed study of the theoretical uncertainties in PDFs has just begun in some quarters, and preliminary results have been presented in [25]. This is based on a variation of factorization and renormalization scales by a fixed factor of two in the theory input for the fit, a method that is frequently taken as the standard means of estimating MHO corrections in QCD. The purpose of this article is to critically examine a potentially important and quite general issue with taking such an approach, based on the fact that the PDFs are not themselves physical quantities. In particular, we argue that this type of straightforward scale variation does not provide a particularly obvious definition of what one means by 'theoretical uncertainty' for PDFs.
In more detail, in practice one obtains the PDFs by fitting to data for one physical quantity (or more generally a set of them), and then predicts another physical quantity from these, using perturbatively calculated partonic cross sections for these quantities. Ultimately it is the uncertainty on the predicted quantity that is required. This clearly has a contribution from the experimental uncertainty on the PDFs, and this is included as standard. There is then also a theoretical uncertainty on the prediction arising from the finite order of the calculation for both the predicted cross section and for the cross sections entering the PDF extraction. The former of these is included as standard (normally using scale variations) while the latter is not. However, we will argue in this article that for factorisation scale variation there is a highly non-trivial interplay between the scale variation when obtaining the PDF and when obtaining the required prediction, and this can potentially lead to a misinterpretation of the 'theory uncertainty' on the prediction. In short, the aim of the process is to measure one physical quantity, and in terms of this predict another quantity. If, as seems natural, one interprets the 'theory uncertainty' as that inherent in expressing the predicted quantity in terms of the measured quantity due to MHOs in the relationship between the two, then we will show that varying the factorisation scale by a set amount in both the PDF extraction and also in the prediction in terms of PDFs can lead to an effectively exaggerated factorisation scale variation when determining the full theoretical uncertainty. Our arguments rely on the fact that it is possible, and sometimes preferable (in principle at least), to bypass the intermediate PDFs entirely, instead working purely at the level of physical observables (structure functions and so on) and the relationships between them. This was behind the original proposal of the DIS factorisation scale [26], and has subsequently been developed in for example [27][28][29][30][31][32] under the general name of 'physical schemes'.
We also consider the case of the renormalization scale variation, which we find to be less amenable to this treatment, implying that the conventional approach should be reliable here. Nonetheless, one basic implication of working in this physical basis where one considers a PDF fit to be a (complex) relation between physical observables does follow in this case. Namely, any correlation between renormalization scale variations that one assumes is present in related physical processes entering the fit, should also in principle be included at the same level between fit and predicted processes.
The outline of this paper is as follows. In Section 2 we consider the simplest possible case of fitting to a non-singlet structure function observable, and then predicting a second such structure function, and a non-singlet Drell-Yan cross section, in Sections 2.1 and 2.2, respectively. In Section 3 we generalise this to the case of structure functions involving both quark and gluon contributions, and comment on the corresponding high and low x limits. In Section 4 we discus the case of renormalization scale variation. In Section 5 we discuss the implications of our findings and conclude. In Appendix A we briefly summarise the case of DGLAP evolution in the diagonal basis at NLO. To illustrate the key issue, we start with the simplified case of a fit to a non-singlet structure function, and corresponding prediction of a second (distinct) non-singlet structure function. This represents the simplest example of our general argument, and having done this we will show how the case of a predicted hadronic observables, namely the non-singlet Drell-Yan cross section, follows straightforwardly.
The structure function we consider is given purely in terms of the quark non-singlet distribution. A concrete example of this is the neutral current structure function F NC 3 due to Z exchange, but we will for the sake of generality refer to the observable as F NS . We write this to NLO as whereα S ≡ α S /2π, µ is the factorization scale, C (n) q is the order n coefficient function, and we consider for simplicity only one quark flavour. Note that for now we assume a fixed renormalization scale, the argument of which is suppressed for simplicity, and do not consider any issues related to its variation 1 . To keep the expressions which follow simpler we have also taken the leading order coefficient function as C (0) q = 1, which can always be achieved by a suitable redefinition of the normalisation of F NS . We use the shorthand throughout. The non-singlet quark combination q NS = q − q obeys the usual NLO DGLAP evolution We now consider an idealised PDF fit of q NS to the structure function, that is we assume that F NS has been measured to arbitrary accuracy over the x region we are interested in. We are free to do this as we are only considering the impact of theoretical uncertainties on the fit, and the inclusion of the (unrelated) experimental sources of uncertainty will not qualitatively effect the argument which follows. Indeed, we are precisely most interested in the case where the former dominates over the latter. Defining the ratio a i = µ 2 /Q 2 , we can rewrite (1) as where in the second line we consistently drop terms of O(α 2 S ). Note a 'standard' fit, i.e. where one does not consider any scale variation and takes the conventional choice of µ 2 = Q 2 , simply corresponds to while for example a standard factor of 2 scale variation about the central scale µ = Q would correspond to taking a i ∈ ( 1 4 , 4). We now use this to predict a second, distinct, non-singlet structure function, F ′ NS : where we have defined a f = µ 2 /Q 2 . We now consider the standard fit approach, that is using (6) to express the quark distribution in terms of the structure function, giving Thus we can rewrite our prediction so that no reference is made to the intermediate PDFs, and instead we express one observable quantity, F ′ NS , in terms of the another, F NS . This expression is accurate to O(α S ) and following the standard rule of thumb approach we can then evaluate the theoretical uncertainty from MHOs by performing a scale variation with a f ∈ ( 1 4 , 4). We now consider the case that the scale is allowed to vary in the fit as well, which simply corresponds to keeping the a i dependence as in (5). We find where we have defined a f i ≡ a f /a i . Thus the a f and a i dependence is entirely contained in the ratio a f /a i , such that (9) is identical to (8) upon the replacement a f → a f i . Note that really at this stage the distinction between a f i and a f is entirely artificial. There is only one physical relation between the two structure functions where a corresponds to the relative difference in (squared) scale at which we evaluate F NS and F ′ NS . In other words, we can see that the effect of varying the scale in the fit (5) is the same as the previous approach, but with a larger range of variation, a = a f i ∈ ( 1 16 , 16). How should we interpret this result? The rule of thumb variation is applied to a broad category of observables, under the expectation that this will provide an estimate of the MHO uncertainty. Concretely, one varies the logarithms in a within a reasonable range, in order to track of the decreasing dependence on these with increasing perturbative order, but nonetheless keeping the argument a to be O(1) in order to avoid spoiling the overall perturbative convergence. The precise choice of a ∈ ( 1 4 , 4) above is of course arbitrary, but is nonetheless guided by these principles.
We have seen in the above scenario that the intermediate PDFs themselves can be bypassed entirely in favour of a straightforward and arguably more fundamental relation between the physical observables F NS and F ′ NS . This simply reflects the fact that the PDFs are not themselves observables, and follows in a similar way to the physical factorization approach discussed elsewhere [27][28][29]32]. In terms of this relation, there is only one degree of freedom for scale variation, namely a in (10). Within the context of the standard rule of thumb variation, the only reasonable and consistent choice appears to be to take (10) and vary a ∈ ( 1 4 , 4). Now of course from a practical point of view one will not in general work explicitly in this physical framework, but rather in terms of the PDFs. The aim should therefore be to be consistent with the above results when doing so and evaluating a theoretical uncertainty on the PDFs themselves. We have seen above that in our example one should either vary the factorization scale in the prediction by the canonical factor of 2, or equivalently in the fit, but not in both. One may clearly call into question the reliability of such simple scale variations, but nonetheless at least under the assumption that this a ∈ ( 1 4 , 4) variation provides an accurate estimate of the theoretical uncertainty for general observables, this result will hold.
While the above result is demonstrated at NLO for simplicity, this remains true at arbitrary order. This becomes clearest when we work in Mellin space, where we can write the DGLAP evolution (3) in the simple form where j denotes the Mellin moment. Then, our expression for F NS , at a scale Q, can be written in the form where we define a i = µ 2 /Q 2 as before. In the above expression, and in what follows, it is understood that the result at any arbitrary order in perturbation theory is given by expanding this out to the desired order in α S . Here correspond to the non-singlet qq anomalous dimension and Mellin transform of the coefficient function at order n, respectively. Note that we have defined f (0) q ≡ 1 in the structure function case above, but we leave the expression completely general here. Now if we consider the second structure function at the same scale Q, we have and so expressing F ′ NS (j, Q 2 ) in terms of F NS (j, Q 2 ) we obtain the very simple result We can see that there is now no explicit dependence on the factorization scale, µ, at all. The above situation is however defined for the quite specific case that we wish to relate the two observables at exactly the same scale. More generally, we have the freedom to express , that is choose to express the predicted quantity a different physical scale, Q f , to the scale at which we express the measured quantity, Q i . In this case we have Hence, we see that to any arbitrary order the relationship between the prediction and the measurement can be expressed as a ratio of scales, and that the relationship between the factorization scale chosen when fitting the PDF to that chosen when making the prediction becomes equivalent to the relationship between the scale at which one predicts the physical quantity and that at which one evaluates the physical quantity entering the fit. To be completely explicit, we can suggestively define Q 2 ≡ Q 2 f and µ 2 ≡ Q 2 i , in terms of which we have where F NS obeys the same DGLAP evolution equation (11) as q NS , up to terms in the ratio of the coefficient function which depend on the running of the coupling; indeed, if we more correctly re-introduce the scale dependence of α S into the above results, these would follow in precisely the same way, up to this difference in evolution. Thus, we can express this at any arbitrary order, as is done in (1) at NLO, and we have an analogous freedom to vary the scale µ at which one evaluates F NS as in the case of the standard factorization scale. However, as we have only one ratio of scales involved in this relation, there is only one such degree of freedom, and not the two implied by varying the factorization scale independently in the fit and prediction. Finally, we note that while it might seem most direct in the above expression to choose the scale of F NS (µ = Q i ) equal to the scale at which the measurement is made, this is, of course, not mandatory. If the scale at which one structure function is fit is significantly different to that at which the second is to be predicted (Q = Q f ) it would normally be assumed to be more sensible to express the measured quantity at a scale similar to the predicted quantity, relying on the validity of the evolution equation and avoiding obvious large logarithms in the expression relating the two physical quantities.

Drell-Yan Cross Section
As a second simple example of the above argument, and to demonstrate how the above result can readily be generalised to the case of hadronic observables, we can use the same fit as above to predict the non-singlet Drell-Yan production cross section, i.e. q N S (x 1 )q N S (x 2 ) → l + l − . We can write this at NLO as where Q 2 is the dilepton invariant mass and τ = Q 2 /s. As before, for simplicity we normalise our cross section so that the LO coefficient function is unity. Defining a f = µ 2 /Q 2 , to the order we are calculating we can rewrite this as .
Proceeding as before, and expressing the quark distribution in terms of the non-singlet structure function, we have where a = a f for the case that we vary the scale only in the prediction, and a = a f i if we vary in both cases. In other words, the argument follows through in exactly the same way. Exactly as in the previous example, we note that the above results can be more simply expressed in Mellin space. In particular, taking the Mellin transform with respect to τ the DY cross section simply becomes dσ DY NS (j, Q 2 ) dQ 2 = q NS (j, µ 2 )aα sγqq (j,αs) where the anomalous dimension can be calculated at any arbitrary order, and the '· · · ' indicate these higher order contributions to the coefficient function.
3 A more general example: the quark singlet and gluon

Set-up
The above examples considered the special case of observables given in terms of a single nonsinglet quark distribution. This leads to a simple and transparent result, but it is not immediately clear how it will generalise to the case that includes both quark and gluon partons, which obey the fully coupled DGLAP equation.
For the sake of generality and demonstration, we consider a PDF fit to a pair of arbitrary 'structure function' observables F and H, which are then used to predict a third such observable, K. The generalisation to the case of hadronic observables would render the corresponding analysis a great deal more complex in practice, but in principle should not change the basic argument. We will also work in Mellin space, as this will simplify the calculation, although all the results which follow hold analogously in x space as well. We write Thus, we assume that these observables depend on the gluon (g) and total quark singlet (Σ q ) PDFs only. In other words, any dependence on non-singlet quark combinations, which would be introduced by e.g. including a quark flavour-dependence (due typically to the quark EW charges), is omitted to limit the observables we need to consider, although the set-up can readily be generalised to include these. We will drop the Mellin moment argument j for brevity in what follows. We can write the coefficient functions at NLO as and similarly for H. Note that the q ↔ g mixing introduces a corresponding mixing in the coefficients f q,g of the expansions of the F q,g , and similarly for H q,g . This simplifies if we instead use the basis of eigenvectors of the DGLAP equation, which we denote Σ ± . In terms of these we can write at arbitrary order. Here the diagonal anomalous dimensions γ ± and the PDF eigenvectors are given explicitly in Appendix A. These then define the coefficients F ± and H ± above, which are given in terms of F q and F g , and similarly for H. We now consider a PDF fit to these observables, for which we take the factorization scales µ 2 = a f Q 2 and µ 2 = a h Q 2 . In this case we have (see (46) which we can then invert to give Substituting these into the third structure function K at scale µ 2 = a k Q 2 , at NLO we find where it is understood that the fixed-order coefficients, H ± and so on, should also be expanded out to the appropriate order, but we omit this for clarity. For comparison, if we do not vary the scales in the fit, i.e. we take a f = a h = 1, then we have Thus we can immediately see that the situation is somewhat more complex then in the simplified purely non-singlet case considered in the previous section. In particular, our general expression (30) contains the two ratios a f,h /a k corresponding to the arguments of the F, H, similar to the non-singlet case we considered before. However in addition we can see that the result includes a contribution that depends on the ratio a h /a f , which is purely due to the scale variation in the fit stage, and is completely absent in the prediction without this variation. We note that while the above expression is written in terms of three ratios, only two of these are independent, exactly as we would expect following the discussion towards the end of Section 2.1. In particular, we are now expressing one predicted physical quantity defined at one scale in terms of two measured physical quantities, each of which may be evaluated at a different scale. There are therefore two independent scale ratios, and two physically meaningful ratios of factorization scales. On the other hand, while the ratio a h /a f can be written in terms of the independent ratios a f,h /a k , we can see that this results in a mixing of these two ratios which does not immediately reduce to the simple situation we had in the case of the non-singlet structure functions. We note that in [25] it was advocated that, as all physical quantities share common PDFs, the factorization scale should be varied in a fully correlated way across all processes entering the fit. In the above analysis, we can see that this corresponds to taking a f = a h , and we are left with (31), after replacing a k → a k /a f = a f /a h . In other words, our situation and conclusions are exactly the same as for the simpler non-singlet case, i.e. variations of factorization scale in the predictions are entirely equivalent to those in the fitting and vice versa. We fully expect this to hold in the more general case appropriate to a global fit, as here too we will only have one independent ratio of scales for any given predicted process. Thus, if one makes this assumption, one could bypass the complication of including these variations at the fit stage entirely and simply include them in the prediction, with the assumption of full correlations implying that this should be done in the same way for different predictions at the same time. On the other hand, varying the factorization scale in both the fit and prediction would be a type of double counting, i.e varying the scale by a factor of two more than may naively be expected.
However, as discussed further below, in general this appears to be an overly constraining assumption, given there is the question of the central choice of scale to consider and, potentially more significantly, the fact that the partons and x range probed by the fit processes can be rather different. With this in mind, how do we interpret our above result if we do not make this simplifying assumption of fully correlated factorization scales for quantities in the PDF fit? We will first consider this result in various kinematic limits, before discussing the more general implications.

The low and high x limits
One can simplify the full result (30) by for example assuming that F − = 0 and H + = 0, in other words that F, H are only sensitive to the contribution from either the negative or positive eigenvectors. In this case we have and the terms proportional to the ratio a f /a h vanish. As an aside, if in addition the prediction is only sensitive to one eigenvector, then this will reduce to a single ratio, in precisely the same way we saw for the non-singlet distribution.
While at first glance it may appear somewhat arbitrary to consider these limits, in fact in the low and high x regions this can be precisely the situation we find ourselves in. As discussed further in Appendix A, if we take the high x limit, we have the well known result that that is the quark distribution (∼ Σ q , q N S ) and gluon are independent eigenvectors of the DGLAP evolution. In the alternative low x (j ∼ 1) limit we have for sufficiently high scale µ. That is, the positive eigenvector is dominant. These regimes play a direct role in PDF phenomenology at the LHC and elsewhere. For example, a topical case is the high x gluon, which is relatively poorly determined, and in which there is currently a great deal of interest in placing further constraints. This typically involves the use of LHC observables such as inclusive jet and tt production, and the Z boson p ⊥ distribution, for which the high x gluon plays a dominant role. Although a global fit includes of course a wider dataset, the extracted high x gluon will to a significant extent be driven by these. One can then take the result of this fit and predict the gluon-initiated production of e.g. a high mass BSM object. In such a scenario the gluon evolution will be effectively decoupled and both the fit and predicted observables will be dominated by the positive eigenvector Σ + . In other words, we are in an analogous situation to Section 2, where the factorization scales for the fit and prediction are fully correlated, and varying the factorization scale in both will lead to and effective double counting of the theoretical uncertainty. The corresponding situation for the high x quark, where both the singlet and non-singlet are decoupled from the gluon, is similar.
At low x, as we increase the scale of the observed process we find that the quark and gluon contribution are completely correlated by evolution, and only the positive eigenvector contributes. This is equally true for observables such as scaling violations of the structure function, dF 2 /d ln Q 2 , which only depends on the positive eigenvector at low x, for all scales. Thus, for fit processes such as a dF 2 /d ln Q 2 and predicted processes such as Drell-Yan production at the LHC (in particular in the lower mass region), we will expect a large degree of correlation.
We note that in a realistic PDF fit we will in general include multiple observables at the fit stage which may be dominated by a particular PDF eigenvector. This will therefore introduce a common set of factorization scale dependent logarithms in to the corresponding predictions, or equivalently the scale evolution of these observables will be same up to running coupling effects. It therefore seems natural in such a case to vary the factorization scale about some fixed central value for each data set in a correlated way across these observables, either in the fit or prediction stage (but not both). However, the choice of central/best fit is not obvious, e.g. µ = Q 2 may be more appropriate for DIS data and µ = M/2 may be more appropriate for Drell Yan production. Further to this, while one might argue that if one varies the scale for one type of structure function from a central value of µ = Q to µ = 2Q, then one does it for all structure functions, it does not seem so clear that e.g. for some jet data related observable one should simultaneously vary the scale from a central value of e.g. µ = p T to µ = 2p T 2 . Such scale allocations are therefore not in any clear sense correlated, and certainly the degree of variation in the cross section when applying the rule of thumb variation of the scale will depend on the central scale, the precise 'preferred' value of which is not necessarily clear. We do not advocate strictly fitting the best scale for each type of process, but advise that some note, based on experience, should probably be taken to use central scales that provide good fits for a given process (or at the very least, to avoid those known not to be optimal).
As described above, even for processes that depend dominantly on the same eigenvector, the correlation of the scale variation between these between processes is not entirely trivial. However, if two quantities (either fit and predicted, or both fit) are dominated by alternative PDF eigenvectors and therefore completely independent, then clearly the variation of scales, in either fit or prediction, is not correlated. In this case imposing correlation between scale variations in these processes can result in artificial correlations between the predicted processes (depending on how they depend on the initial PDFs), or in the context of a fit, the PDFs themselves. In general, most predictions will depend on combinations of fit PDFs where full correlation is, to a lesser or greater degree, an overly restrictive assumption.

Renormalization scale Variation
In this section we will discuss how the issues presented above for factorization scale variation apply to the separate question of renormalization scale variation. We define a physical quantity beginning at O(α S ): where µ i is the renormalization scale. We first consider this to be the quantity we measure in order to determine the value of the coupling. With this, we wish to predict the second quantity As with the PDFs and structure functions one can invert (35) to obtaiñ where as usual a i = µ 2 i /Q 2 and we drop terms of O(α 2 S ); note that while the physical quantities begin at O(α S ), the relation between them is one order lower, and is accurate to O(α S ). For B(Q 2 ), taking a f = µ 2 f /Q 2 and substituting in our expression for A(Q 2 ), we get Hence, we can indeed derive a final result which is expressed only in terms of a single ratio a f i = a f /a i , as in the structure function case.
In the above example we have used our initial physical quantity in order to determine the coupling constant, not the PDFs. This is really the most natural thing to to when thinking about renormalization scale variation, since the scale is that used for the definition of the coupling, while the factorization scale is that used for the definition of the PDFs. However, in a PDF fit one instead uses the physical quantity A(Q 2 ) to determine the PDFs. To see how this changes the result, we now assume our toy observables depend on a single PDF q (the non-singlet quark, say, although this is not essential). Implicitly we work in Mellin space to avoid complications with convolutions, but as before this does not change the basic argument. We consider the case of a fixed factorization scale µ 2 F = Q 2 , while setting µ 2 i = a i Q 2 for the renormalization scale. We have In terms of this the PDF can be written as Inserting this into the expression for B(Q 2 ), we obtain We see that now the expression is certainly not just a function of the ratio of scales µ 2 f /µ 2 = a f i . Let us examine the explicit consequence of this. For example, in our earlier case of factorisation scale variation, the choice of µ 2 f = µ 2 i i.e. a f i = 1 resulted in no change in the expression for the prediction compared to choosing both a i = a f = 1. In fact for the first term in (41) this equivalent result appears, and similarly for the sum of the second and third terms. However, for the fourth and fifth terms, i.e. those dependent on the coefficients of the scale-independent parts of the NLO expressions for A(Q 2 ) and B(Q 2 ), this is not the case. In this limit each becomes proportional toα s (µ 2 f ), but depends on the absolute value of the scale. This term depends on the difference in the relative size of these NLO corrections (compared to the LO contributions to each quantity), so the violation of the dependence on ratios is violated by the scale independent NLO corrections. If we consider other types of scale variation, e.g. multiplying µ f by 2 but dividing µ i by 2 we see that even though the effect in the combination of the first, second and third terms is close to effect of either multiplying µ f by 4 or dividing µ i by 4, it is not identical, and the discrepancy is larger in the fourth and fifth terms.
The fact that the expression of the predicted physical quantity in terms of the measured physical quantity does not break down into an expression depending on the ratio of the renormalization scales used for each calculation is a consequence of the fact that the renormalization scale is fundamentally associated to the scale of the coupling, but here we do not directly relate the physical quantities to the coupling constant, but to the PDF. It is also the case that different physical quantities depend on the coupling in different ways, i.e. the perturbative order starts at zeroth, first or second order for very standard quantities (and higher order for more exclusive quantities). Here we have given perhaps the simplest example of two quantities which each start at first order. However, the common input in PDF fits of the F 2,3 structure functions starts at zeroth order, so at lowest order has no renormalization scale dependence in the hard cross section. The renormalization scale dependence of F 2 will therefore be suppressed by a power of α S relative to the case of e.g. top-pair production in hadron-hadron scattering, which begins at O(α 2 S ). In contrast, all cross sections are linear in the PDF of any of the hadrons participating in the scattering.
Finally, we can also consider the case of two related physical observables. This could be for example, jet production at ATLAS and CMS, or more generally W and Z boson production, for which the LO results are of course uncorrelated in normalization, but the effect of higher-order QCD corrections is similar. Considering the latter example, for our toy observables above we would have and so (41) can be written as Now to maintain consistency with our requirement that this ratio should be approximately constant under inclusion of higher-order QCD corrections, we can see that we must take µ f = µ i (a f = a i ), i.e. vary the renormalization scale in the fit and prediction in a correlated way. It is of course a well-known procedure to vary QCD renormalization scales in such a way when predicting this type of ratio (of e.g. the W to Z boson cross sections 3 ). In [25] this argument is extended to hold between processes at the fit stage. It is perhaps not particularly surprising to find an equivalent requirement between the fit and prediction here, and clearly the inclusion of this in a global fit would be intractable. Nonetheless, we can see that this correlation enters in principle at the same level as that between processes entering the fit, and so the question of whether it is necessary or sensible to include one without the other requires further investigation. Certainly, the relative importance of the correlation between processes in the fit stage and between the fit and prediction will in general depend on the specific data sets being considered.

Summary and Conclusions
In this paper we have discussed the inclusion of theoretical uncertainties in PDFs due to missing higher-order terms in the pQCD results for the processes entering the fit. Such uncertainties, while routinely included in the predictions, have previously not been explicitly included in the PDF fit itself. We are now firmly in the high precision LHC era, both in terms of the available data for PDF fitting and the standard for phenomenology which applies these PDFs. Therefore, such an approach may be increasingly called into question, and certainly requires careful consideration. As a first step towards this, we have considered the standard approach to evaluating MHO uncertainties, namely due to QCD factorization and renormalization scale variation around a central value by some set factor. Focussing on the case of the factorization scale, we have in particular shown that if we take this standard criterion seriously and apply it consistently to both a PDF fit and the predicted observables resulting from that fit, then in general there is a strong overlap between the variation in the fit and prediction stage. To demonstrate this, we have considered in Section 2 the simplest possible case of a fit to a non-singlet structure function, before generalising to include coupled quark and gluon contributions in Section 3. We have shown how the explicit dependence on the PDFs can be removed entirely, and the outcome of the fit recast instead in terms of observable quantities only. We have then found that written in this way, scale variation in the fit corresponds to precisely a scale variation in the prediction in certain regimes, in particular at low or high enough x. Our results have relied on the basic fact that it is possible, and sometimes preferable, to bypass the intermediate PDFs entirely, instead working purely at the level of physical observables (structure functions and so on) and the relationships between them. This idea of working in such a 'physical basis' is in fact quite an old one; here we simply derive the implications of this for scale variation uncertainties in PDF fits.
We have also briefly considered the case of renormalization scale variation, finding that the situation is not as straightforward. This is unsurprising, given the quite different roles that the the QCD coupling and PDFs play in fits. However, one basic implication of working in a physical basis is that the motivation for including correlated renormalization scale variations between related processes in the fit or prediction state, is equally present between processes entering both the fit and prediction. While including such correlations in a realistic global fit would certainly in practice be impossible, clearly this raises questions if for example one wishes to include such correlations at the fit stage. Now, the true situation in a global fit is certainly significantly more complicated than the examples we have considered explicitly in this paper. Here, we fit a very wide array of structure function and hadron collider data, sensitive to a range of different (and overlapping) x values and scales. Indeed, while in the simple non-singlet case of Section 2 we find a complete overlap between the fit and prediction, we have seen that in the somewhat more general (although still simplified) scenario of Section 3 the situation is not as straightforward. Nonetheless, as mentioned above in certain (e.g. low and high x) regimes the same conclusion holds, and more generally these considerations serve as clear guidance for the case of a genuine PDF fit. In particular, a naive variation of factorization scales in both the PDF fit and prediction will certainly correspond to a degree of overestimation in the total theoretical error, and should be avoided. On the other hand, the considerations of Section 3 also suggest that variation of the overall factorization scale in the prediction alone does not capture the full degree of uncertainty due to MHOs in the problem. This suggests that the correct approach, maintaining generality while avoiding overlap in the regimes where it may occur, would be to consider scale variations only in the fit and not in the prediction.
Further to this, we have also seen that if one considers factorization scales between all quantities to be fully correlated in the fit, then the factorization scale variation can equivalently be performed entirely in the calculation of the predictions, with the first assumption implicitly leading to full correlation also being maintained between the factorization scale across different predictions. As discussed earlier, this seems to be a overly strong assumption in general, but as the arguments in Section 3 suggest it may, in practice, not be such a bad assumption for certain specific physical quantities. Therefore, for factorisation scale variations, the current approach of only varying the scale in the prediction is certainly an underestimate of the full uncertainty from this source, but probably not as significant an underestimate as might naively be expected. The arguments in this article then suggest it is more reliable to consider factorization scale in the fit alone, but this should in general include a variation which is only correlated for physical processes which depend on the same independent PDF combinations. This type of procedure would clearly need significant compromise in practice, as few physical processes depend on exactly the same PDFs, so many sets of processes will either be weakly correlated, strongly correlated, or somewhere in between. We note that in all cases the choices of central scale is largely a separate issue: this should be taken independent for different quantities, allowing for something close to the best possible fit at a fixed theoretical order, and possibly relieving some tensions between data sets in a fit. Indeed, such an approach also has the likely benefit of reducing the sensitivity of the fit quality, χ 2 , to MHOs, which may confuse the interpretation of PDF fits.
The eventual interpretation of these results has the potential to be a matter of some debate, given the known issues with the 'rule of thumb' scale variation approach and availability of alternative, potentially superior, approaches (see for example [36][37][38]). Nonetheless the initial investigations of the inclusion of theoretical uncertainties in PDF fits currently apply the scale variation paradigm [25,33,39], and so this result is certainly directly relevant to these studies. Thus, one is free to apply a potentially more complete and reliable approach than scale variation to evaluating the theoretical uncertainty due to missing higher orders in the PDF fit. Indeed, our result may be taken as further evidence that such an approach is preferable. In such a case, the analysis above will not directly apply, although some element of the basic approach, namely expression of the predicted observables directly in terms of the fit observables, will certainly be relevant. If, on the other hand, one does apply the standard factorization scale variation approach, then clearly considerable care is necessary to maintain consistency with the requirements demonstrated in this paper. Future work will consider the impact of such variations, consistently performed, within the context of the global MMHT fit and its interplay with the tolerance criteria to evaluate the PDF uncertainties.