A First Determination of Parton Distributions with Theoretical Uncertainties

The parton distribution functions (PDFs) which characterize the structure of the proton are currently one of the dominant sources of uncertainty in the predictions for most processes measured at the Large Hadron Collider (LHC). Here we present the first extraction of the proton PDFs that accounts for the missing higher order uncertainty (MHOU) in the fixed-order QCD calculations used in PDF determinations. We demonstrate that the MHOU can be included as a contribution to the covariance matrix used for the PDF fit, and then introduce prescriptions for the computation of this covariance matrix using scale variations. We validate our results at next-to-leading order (NLO) by comparison to the known next order (NNLO) corrections. We then construct variants of the NNPDF3.1 NLO PDF set that include the effect of the MHOU, and assess their impact on the central values and uncertainties of the resulting PDFs.

The search for new physics at present [1] and future [2] high-energy colliders, and specifically at the LHC, has turned from the mapping of the energy frontier to the exploration of the precision frontier: the search for subtle deviations from Standard Model predictions. In this endeavor, an accurate estimate of uncertainties associated to these predictions is crucial. At present, these uncertainties have two main origins. The first is the missing higher order uncertainty (MHOU) that arises from the truncation of the QCD perturbative expansion. The second is related to knowledge of the structure of the colliding protons, as encoded in the parton distribution functions (PDFs) [3].
PDFs are extracted by comparing theoretical predictions to experimental data. Currently, PDF uncertainties only account for the propagated statistical and systematic errors on the measurements used in their determination. However, the same MHOU which affects predictions at the LHC also affect predictions for the various processes that enter the PDF determination. These are currently neglected, perhaps because they are believed to be generally less important than experimental uncertainties. However, as PDFs become more precise, in particular thanks to ever tighter constraints from LHC data [4], eventually MHOUs in PDF determinations will become * Electronic address: stefano.forte@mi.infn.it significant. Already in recent PDF sets which make extensive use of LHC data, such as NNPDF3.1 [5], the shift between PDFs at next-to-leading order (NLO) and the next order (NNLO) is sometimes larger than the PDF uncertainties from the experimental data.
Here we present the first PDF extraction that systematically accounts for the MHOU in the QCD calculations used to extract them. MHOUs are routinely estimated by varying the arbitrary renormalization µ r and factorization µ f scales of perturbative computations [1], though alternative methods have also been proposed [6][7][8]. Our inclusion of the MHOU in a PDF fit involves two steps: first we establish how theoretical uncertainties can be included in such fit through a covariance matrix [9,10], and then we find a way of computing and validating the covariance matrix associated to the MHOU using scale variations [11]. By producing variants of NNPDF3.1 which include the MHOU, we are then able to finally address the long-standing question of their impact on state-ofthe-art PDF sets. A detailed discussion of these results will be presented in a companion paper [12].
Assuming that theory uncertainties can be modelled as Gaussian distributions, in the same way as experimental systematics, then the associated theory covariance matrix S ij can be expressed in terms of nuisance parameters where ∆  spect to the central theory prediction for the i-th crosssection, T i , due to the theory uncertainty, and N is a normalization factor determined by the number of independent nuisance parameters. Since theory uncertainties are independent of the experimental ones, they can be combined with them in quadrature: the χ 2 used to assess the agreement of theory and data is given by with D i the central experimental value of the i-th datapoint, and C ij the experimental covariance matrix. More details of the implementation of the theory covariance matrix in PDF fits may be found in Refs. [9,10].
The choice of nuisance parameters ∆ (k) i used in Eq. (1) to estimate a particular theoretical uncertainty is not unique, reflecting the fact that such estimates always have some degree of arbitrariness. Here we focus on the MHOU, and choose to use scale variations to estimate ∆ (k) i . A standard procedure [1] is the so-called 7-point prescription, in which the MHOU is estimated from the envelope of results obtained with the following scales (k f , k r ) ∈ {(1, 1), (2, 2), (.5, .5), (2, 1), (1, 2), (.5, 1), (1, .5 are the ratios of the renormalization and factorization scales to their central values. Varying µ r estimates the MHOU in the hard coefficient function of the specific process, while the µ f variation estimates the MHOU in PDF evolution. In order to compute a covariance matrix, we must not only choose a set of scale variations, but also make some assumptions about the way they are correlated. We do this by, first of all, classifying the input datasets used in PDF fits into processes as indicated in Table I: chargedcurrent (CC) and neutral-current (NC) deep-inelastic scattering (DIS), Drell-Yan (DY) production of gauge bosons (invariant mass, transverse momentum, and rapidity distributions), single-jet inclusive and top pair production cross-sections. Note that this step requires making an educated guess as to which cross-sections are likely to have a similar structure of higher-order corrections.
Next, we formulate a variety of prescriptions of how to construct Eq. (1) by picking a set of scale variations and correlation patterns. A simple possibility is the 3-point prescription, in which we vary coherently both scales (thus setting k f = k r ) by a fixed amount about the central value, independently for each process. More sophisticated prescriptions are constructed by varying the two scales independently, but by the same amount, and assuming that while µ r is only correlated within a given process, µ f is fully correlated among processes. This assumption is based on the observation that µ f variations estimate the MHOU in the evolution equations, which are universal (process-independent). In the appendix we provide expressions for S ij in the 3-and 9-point cases.
We then proceed to the validation of the resulting covariance matrices at NLO. We use the same experimental data and theory calculations as in the NNPDF3.1 α s study [13] with two minor differences: the value of the lower kinematic cut has been increased from Q 2 min = 2.69 GeV 2 to 13.96 GeV 2 in order to ensure the validity of the perturbative QCD expansion when scales are varied downwards, and the HERA F b 2 and fixed-target Drell-Yan cross-sections have been removed (for technical reasons). In total we then have N dat = 2819 data points. The theory covariance matrix S ij has been constructed by means of the ReportEngine software [14] taking as input the scale-varied NLO theory cross-sections T i (k f , k r ), provided by APFEL [15] for the DIS structure functions and by APFELgrid [16] combined with APPLgrid [17] for the hadronic cross-sections.
Since for the processes in Table I the NNLO predictions are known, we can then validate the NLO covariance matrix against the known NNLO result. For this exercise, a common input NLO PDF is used in both cases. In order to validate the diagonal elements of S ij , which correspond to the overall size of the MHOU, we first normalise it to the central theory prediction, Then we compare in Fig. 1 the relative uncertainties, σ i = S ii to the relative shifts between predictions at NLO and NNLO, , for each of the N dat = 2819 cross-sections. In all cases, δ i turns out to be smaller or comparable to σ i , showing that this prescription provides a good (if somewhat conservative) estimate of the diagonal theory uncertainties.
The validation of the full covariance matrix including correlations is subtler. We first diagonalise S ij , by finding the (orthonormal) eigenvectors e a i which correspond to positive eigenvalues (s a ) 2 : these define a subspace S orthonormal to the large null subspace. The dimension N S of S depends on the total number of independent scale variations, the number of processes, and the correlation pattern. For the 5 processes in Table I, and the 9-point prescription, we find N S = 28, while for the simpler 3-point prescription N S = 6. We then compute the N S projections δ a of the NLO-NNLO shifts δ i along each eigenvector, and compare them to the square root of the corresponding eigenvalues, s a . Finally we compute the   correlations of theory shifts in a 2819-dimensional vector space is well captured by just 28 nuisance parameters. Adding the theory covariance matrix S ij to the experimental covariance matrix C ij , while increasing the diagonal uncertainty on each individual prediction, also (and perhaps more importantly) introduces a set of theoryinduced correlations between different experiments and processes, even when the experimental data points are uncorrelated. This is illustrated in Fig. 3, showing the combined experimental and theoretical (9-point) correlation matrix: it is clear that sizable correlations appear even between experimentally unrelated measurements.
We can now proceed to a NLO global PDF determination with a theory covariance matrix S ij computed using the 9-point prescription. From the point of view of the NNPDF fitting methodology, the addition of the theory contribution to the covariance matrix does not entail any changes: we follow the procedure of Ref. [18], but with the covariance matrix C ij now replaced by C ij +S ij , both in the Monte Carlo replica generation and in the fitting. In Table II we show some fit quality estimators for the resulting PDF sets obtained using only the experimental covariance matrix, and then also the theory covariance matrix with two different prescriptions. In particular, we show the χ 2 per datapoint and the φ estimator [18], which corresponds to the ratio of the uncertainty in the predictions using the output PDFs to that of the original  data. The quality of the fit is improved by the inclusion of the MHOU, with the 9-point prescription performing rather better than 3-point. Interestingly, φ is unaffected by the inclusion of the theory covariance matrix, implying that taking the MHOU into account does not increase the PDF uncertainties in the fitted cross-sections but instead resolves some of the tensions between data and theory, so that the larger overall uncertainty is compensated by the improved fit quality.
In Fig. 4 we compare at Q = 10 GeV the gluon and quark singlet PDFs obtained at NLO with and without theory covariance matrix, normalised to the latter. We also show the central NNLO result when the theory covariance matrix is not included. Three features of this comparison are apparent. First, when including the MHOU, the increase in PDF uncertainty is rather moderate (as seen in Tab. II, the uncertainty on predictions is unchanged). Second, the NLO-NNLO shift is fully compatible with the overall uncertainty. Finally, also the central value is modified by the inclusion of S ij in the fit, as the balance between different data sets adjusts according to their relative theoretical precision. Interestingly, the  Finally, in Fig. 5 we compare the dependence of the fit results on the specific choice of prescription for S ij , specifically for the 3-and 9-point cases, normalized to the latter. Whereas results with the 3-point prescription have somewhat smaller uncertainties, and a central value which is closer to that when the MHOU is not included (see Fig. 4), in general the results are consistent.
An alternative way of benchmarking our results is to compare to PDFs determined using different choices of central scale. One may then compare the PDF fit results obtained using Eq. (2) to the envelope of PDF central values obtained with different scale choices in the theory prediction T i (k f , k r ). This option is briefly discussed in the appendix and more in detail in [12].
In summary, we have presented the first global PDF analysis that accounts for the MHOU associated to the fixed order QCD perturbative calculations used in the fit. The inclusion of the MHOU shifts central values by an amount that is not negligible on the scale of the PDF uncertainty, moving the NLO result towards the result of the NNLO fit. PDF uncertainties increase moderately, because of the improvement of fit quality due to the rebalancing of datasets according to their theoretical precision. Note that for this to be effective, the correlations in S ij play a crucial role.
Our results pave the way towards a fully consistent treatment of MHOU for precision LHC phenomenology. The NLO results presented here will be upgraded to NNLO, and this will be facilitated by tools such as the APPLfast grid interface to the NNLOJET program [19]. We thus anticipate that the upcoming NNPDF4.0 PDF set will be able to fully account for MHOU both at NLO and NNLO, as well as other sources of theory uncertainty, such as those related to nuclear corrections [10,20].

Theory covariance matrix in the 3-point and 9-point prescriptions
In this appendix, we provide the explicit expressions for the entries of the theory covariance matrix S ij , Eq. (1), determined using either the 3-point or the 9-point prescriptions. For the 3-point prescription, in which k f = k r and variations are uncorrelated between different processes, one has if i and j are two datapoints from the same process, and if i and j are from different processes (see the classification in Table I). Here we have defined as the scale-induced shifts with respect to the central scale choice for the theory prediction T i (k f , k r ). For the 9-point prescription, in which variations of k f are correlated across all processes, but variations of k r are uncorrelated between different processes, the theory covariance matrix is if i and j are two datapoints from the same process, while if i and j belong to different processes where we have defined additional nuisance parameters in terms of the scale-varied cross-sections T i (k f , k r ): Further details concerning the construction of different prescriptions for the theoretical covariance matrix S ij associated to the MHOU, as well as their corresponding validation in terms of statistical estimators such as those provided in Figs. 1 and 2, will be presented in the extended companion paper [12].

PDF fits from scale-varied theories
As mentioned in the main manuscript, an alternative validation of our determination of the MHOU associated to the PDFs can be provided by comparison to PDFs determined based on theory calculations with different choices of renormalization and factorization scales. In this case, only the standard experimental covariance matrix is used in each PDF determination, but different choices of the renormalization µ r and factorization µ f scales are made each time. That is, instead of Eq. (2), now the figure of merit used for the minimisation is with T r ), and where s labels the corresponding individual scale choices used in each fit. Note that here we are assuming that renormalization scale variations for different processes are fully correlated, whereas the theory covariance matrix approach is more flexible concerning the assumptions on the correlation patterns.
In the upper panels of Fig. 6 we show the central values of the gluon and the quark singlet PDFs obtained in the fits based on theories T (s) i with different choices of scale in Eq. (9). In particular, results for eight variations of (k f , k r ) are shown, normalized to the central scale choice (k f , k r ) = (1, 1). Note how a 9-point envelope, including the (k f , k r ) = (2, 0.5) and (0.5, 2) variations, would lead to unrealistically large theory uncertainties, thereby justifying the common usage of the 7-point envelope instead. This is due to the fact that theory predictions based on these scale choices lead to unnaturally large corrections and produced unstable output PDFs. In the lower panels of the same figure the 7-point envelope constructed from these shifts is compared both to results obtained including the MHOU through the theory covariance matrix, using the 9-point prescription, and to the known NNLO result, see also Fig. 4. Hence, an envelope prescription strongly depends on the set of scale variations included in the envelope. This is to be contrasted to the approach based on a theory covariance matrix, where, as seen in Fig. 5 results are quite stable upon variation of the prescription used for the computation of the covariance matrix.
Note that whereas the covariance matrix approach leads to a combined total experimental and theory uncertainty on the PDFs, the envelope method only provides an estimate of the theory uncertainty, which will then have to be combined with that associated to the experimental covariance matrix according to a suitable prescription. In comparison to the theory covariance matrix approach, the 7-point envelope prescription therefore appears to be rather more conservative, and also less clearly well-defined, with the theory covariance matrix approach offering a better prediction for the known NNLO result.

Delivery and usage
The variants of the NNPDF3.1 NLO global sets presented in this work are publicly available in the LHAPDF format [21] from the NNPDF website: https://data.nnpdf.science/nnpdf31 th/ Here we list the PDF sets that are made available. The NLO sets based on the theory covariance matrix are NNPDF31 nlo as 0118 scalecov 9pt NNPDF31 nlo as 0118 scalecov 3pt which correspond to the fits based on Eq. (2) in the cases in which the theory covariance matrix S ij has been evaluated with the 9-and 3-point prescriptions, respectively. The NLO sets based on scale-varied theories and determined using Eq. (9) are the following: NNPDF31 nlo as 0118 kF 1 kR 1 NNPDF31 nlo as 0118 kF 2 kR 2 NNPDF31 nlo as 0118 kF 0p5 kR 0p5 NNPDF31 nlo as 0118 kF 2 kR 1 NNPDF31 nlo as 0118 kF 1 kR 2 NNPDF31 nlo as 0118 kF 0p5 kR 1 NNPDF31 nlo as 0118 kF 1 kR 0p5 where the naming convention indicates the values of the scale ratios k f and k r . The central values of each of these sets have been shown in the upper panel of Fig. 6. Note that the NNPDF31 nlo as 0118 kF 1 kR 1 set is also the baseline (central scales and experimental covariance matrix only) to be used in the comparisons with the fits based on the theory covariance matrix listed above. Finally, we also provide the set NNPDF31 nnlo as 0118 kF 1 kR 1 which corresponds to the NNLO fit with central scales and experimental covariance matrix only, that has been produced for validation purposes and that was shown in Figs. 4 and 6). It is important to keep in mind that the variants of the NNPDF3.1 fits presented in this work are based on a somewhat different dataset as those from the default NNPDF3.1 analysis [5]. Therefore, when using these sets one should always be consistent: for example by comparing fits with and without MHOU that are based on a common input dataset. In terms of their usage, PDFs based on the theory covariance matrix should be treated in the same way as any other NNPDF set. In particular, the associated uncertainties, which now also account for the effects of the MHOU in the theory predictions for the fitted cross-sections, should be evaluated using standard prescriptions. For instance, the PDF uncertainty σ PDF F (including the PDF-related MHOU) associated to a given cross-section F is the standard deviation over the replica sample, When evaluating the uncertainty on any theory prediction F, the PDF uncertainty determined thus should be combined in quadrature with the theory uncertainties σ th F on F: The theory uncertainties σ th F can be estimated in the usual way, using renormalization and factorization scale variations, either through an envelope prescription or constructing the theory covariance matrix for the cross-section F. Note that combining σ th F and σ PDF F in quadrature is somewhat conservative, in the sense that it ignores the correlations [22] between the scale variations in F with those in the processes that enter the PDF fit. However, this effect should be rather small, particularly for processes F, such as Higgs production, that are not part of global PDF determinations. A more detailed study of the usage of the NNPDF fits with MHOU and their implications for precision LHC phenomenology will be presented in the extended companion paper [12].
It is important to also stress here that when assessing the quality of the agreement of any PDF set with the data, if the PDF set has been determined using a theory covariance matrix, then the χ 2 in the data-theory comparison should also include it. Otherwise, the shift in best-fit PDF induced by the theory covariance matrix would not be accounted for and the comparison would be inconsistent.