Parton distributions from lattice data: the nonsinglet case

We revise the relation between Parton Distribution Functions (PDFs) and matrix elements computable from lattice QCD, focusing on the quasi-Parton Distribution Functions (qPDFs) approach. We exploit the relation between PDFs and qPDFs in the case of the unpolarized isovector parton distribution to obtain a factorization formula relating the real and imaginary part of qPDFs matrix elements to specific nonsinglet distributions, and we propose a general framework to extract PDFs from the available lattice data, treating them on the same footing as experimental data. We implement the proposed approach within the NNPDF framework, and we study the potentiality of such lattice data in constraining PDFs, assuming some plausible scenarios to assess the unknown systematic uncertainties. We finally extract the two nonsinglet distributions involved in our analysis from a selection of the available lattice data.


Introduction
A precise understanding of the structure of the proton, encoded in the Parton Distribution Functions (PDFs), is required to make predictions and analyses in collider physics. The PDFs sets currently used in phenomenology studies are extracted from global QCD analysis over experimental data [1][2][3][4][5]. The increasing number and precision of the experiments included in these fits, together with the development of robust fitting methodologies, are pushing the determination of PDFs to a new level of accuracy.
PDFs are formally defined as matrix elements of renormalized operators in QCD. The computations of PDFs from first principle using lattice QCD has been considered a great challenge for a long time. However, recent theoretical developments [6,7] have shown how lattice QCD simulations of certain spatial correlation functions between two boosted nucleon states can give access to the so-called quasi-PDFs. These functions are then directly related to PDFs by means of a factorization theorem, just like high-energy processes cross sections are.
Following these novel ideas, a number of publications have appeared (see Refs. [8,9] for recent reviews), from both lattice and high-energy communities, addressing the main theoretical problems of the new lattice approach: the definition and renormalization of the non-local operators involved in the lattice simulation [10][11][12][13][14][15][16][17][18][19][20], the proof of the factorization theorem between PDFs and quasi-PDFs [7,[21][22][23][24][25], the computation of the matching coefficients relating latticecomputable quantities to PDFs in different renormalization schemes [21,22,[24][25][26][27][28][29][30][31][32][33]. Also, data coming from first lattice QCD simulations have started appearing and have gotten into already a relatively advanced stage over the last few years [13,[30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46]. This gives an idea of what PDFs from the lattice might look like, not only for nonsinglet quark PDFs of the nucleon, but also for the pion PDF and distribution amplitude, as well as for the gluon PDF of the nucleon. Given the general interest shown by the community, a quick improvement in the technologies involved in such lattice simulations is to be expected in the next few years. A great quantity of increasingly precise lattice data is then likely to be available in the near future, requiring detailed studies about the possible impact they might have on the overall precision of PDFs determination.
Despite the increasing number of numerical results becoming available, an optimal strategy for reconstructing the PDFs from these data has not been entirely addressed yet. For example, in Ref. [47] a series of possible approaches to tackle the problem of incomplete and discretized Fourier transform has been presented, showing some interesting and promising results. In this work, we exploit the factorization of quasi-PDFs in PDFs and perturbatively computable coefficients to extract nonsinglet distributions from the data of Refs. [30,45], setting a general strategy within the NNPDF framework, which allows to systematically extract parton distributions from the available lattice data. The discussion is focused on the quasi-PDFs case, but it can be extended to include data coming from different lattice approaches, like for example pseudo-PDFs [48][49][50], fictitious heavy/light quark [51,52] or current-current correlators [22,24,53], paving the way to a first global lattice QCD analysis [24].
The structure of the paper is the following. In Sec. 2 we revise the theoretical basis of the problem and we set up the notation. In Sec. 3 we describe the lattice data considered in this work, together with the the corresponding systematic uncertainties. We consider different possible scenarios for the latter, studying their impact on the final result. We then work out a factorization formula relating such data to the two nonsiglet distributions T 3 and V 3 , defined in Sec. 2. In Sec. 4 we describe the fitting settings, revising the main feature of the NNPDF framework and finally in Sec. 5 we present our results. We conclude in Sec. 6.

From Parton Distributions to quasi-Parton Distributions
In this section, we revise the formal definition of PDFs and quasi-PDFs as matrix elements of non-local operators in QCD. We recall the main theoretical framework concerning the definition, renormalization and matching between spatial and light-cone quantities. For a broad overview of several aspects of quasi-PDFs, we refer to the recent extensive review [8], and references therein.
The bare unpolarized quark PDF is defined as [54,55] where |P denotes a hadronic state with momentum P µ = P 0 , 0, 0, P z , and P ± = (P 0 ±P z ) √ 2 are light-cone coordinates. The index q identifies the parton under investigation. For instance, in a theory where we only consider the four lightest quarks, we have q = u, d, s, c. The momentum carried by the parton is k µ = xP µ , ψ (0) q is the bare quark field operator and the Wilson line U is given by An analogous definition can be given for the gluon bare PDFs, denoted as f (0) g (x). The superscripts (0) in the above expressions identify bare fields: the matrix elements that enter in the definition of f (0) q are ultraviolet divergent, and therefore need to be renormalized. Renormalized parton distributions are usually defined by minimal subtraction, and the relation between the bare and the renormalized quantities is given by where the indices a and b run over all the parton types (gluon and flavors of quarks) and µ denotes the renormalization scale introduced by the minimal subtraction scheme [54]. Focusing on the quark PDFs for now, the renormalized distributions introduced above have a compact support given by the interval [−1, 1]. For phenomenological applications, it is customary to define the PDFs on the interval [0, 1], and to introduce independent functions for the quarks and the antiquarks, which we denote q(x, µ 2 ) andq(x, µ 2 ) respectively. The relation between f q , q andq is It is useful to consider the symmetrised and antisymmetrised combinations of f q in the interval x ∈ [0, 1]: It can be readily shown that where q + and q − are defined by the equations above. The flavor decomposition can be rewritten by collecting the quark fields in a vector, e.g. ψ = (ψ u , ψ d , ψ s , ψ c ), and defining the following nonsinglet bare PDFs: (2.10) In this notation f 3 corresponds to , and so on. The symmetrised and antisymmetrised combinations map directly into the so-called evolution basis for the PDFs that is normally used in phenomenological studies, see e.g. Ref. [56] for a detailed definition of the flavor decomposition. More specifically, we have: 14) We will return to these relations below when we discuss the factorization of the quasi-PDFs. The renormalized PDFs f a x, µ 2 -or equivalently q(x, µ 2 ) andq(x, µ 2 ) -are currently extracted from global fits of experimental data, see e.g. Refs. [57,58] for recent reviews. Eq. (2.1) defines the PDFs in terms of matrix elements of QCD operators computed between hadron states, which makes their universal and non-perturbative nature explicit. Their direct computation through lattice QCD simulations looks therefore a very appealing target. Unfortunately, the real time dependence of PDFs makes it impossible to compute these matrix elements on the lattice, where the theory is defined on a Euclidean space with imaginary time. To overcome this problem, a new approach was proposed in Ref. [6]: the matrix element appearing in Eq. (2.1), defined along a light-cone direction, is replaced by a correlator defined along a purely spatial direction. The resulting quantity is called a (bare) quasi-PDF. Denoting by Γ a generic Dirac structure and by the suffix A the specific nonsinglet distribution we want to consider, we can introduce the Ioffe time distributions [48,59], defined as the matrix element between nucleon states with momentum P The vector bilocal operator obtained for Γ = γ µ can be decomposed in terms of two form factors depending on Lorentz invariants: The light-cone PDF in Eq. (2.9) is obtained by taking the Fourier transform with respect to ξ − of a Ioffe time distribution with Γ = γ + , ξ = (0, ξ − , 0 ⊥ ) and P = (P + , 0, 0 ⊥ ), given by Choosing instead a pure spatial direction ξ = (0, 0, 0, z), and taking the Fourier transform with respect to z, we obtain the definition of the quasi-PDF Choosing for example Γ = γ 0 we get which will be the case considered in this work. As in the case of standard PDFs (which from now on we will call light-cone PDFs), the matrix elements defining the quasi-PDFs contain UV divergences, and need to be renormalized. The perturbative renormalization of a bilocal operator, as the one appearing in Eq. (2.21), is one of the theoretical problems of this novel approach. It was shown in Ref. [16], that the UV behaviour of the quasi-PDFs is very different from that of PDFs, and that the position space operator appearing in Eq. (2.21) can be multiplicatively renormalized, according to where the exponential factor e δm|z| reabsorbs the power divergence from the Wilson line, and the position-dependent factor Z A (z) takes care of the remaining UV logarithmic divergences. Importantly, the quasi-PDFs retain a dynamical dependence on the hadron momentum P , unlike PDFs, which are defined to be invariant under Lorentz boosts. Also, their support is defined to be the full real axis. The interest in quasi-PDFs comes from the potential to relate them to light-cone PDFs; factorization allows us to rewrite the quasi-PDFs as a convolution of the light-cone PDFs with a coefficient function that can be computed in perturbation theory, up to corrections that are suppressed by inverse powers of P z . As verified at 1-loop order in Ref. [21], and proved in Refs. [24,25] using the OPE, renormalized quasi-PDFs share the same IR behaviour as the renormalized light-cone PDFs. It follows that they can be written as include the power corrections suppressed by the hadron momentum. The functions C A , usually called matching coefficients, depend on the choice of the renormalization scheme, and on the kind of quasi-PDF under consideration. The first matching expressions, for all Dirac structures, were derived in Ref. [21], using a simple transverse momentum cutoff scheme. In later works, matching coefficients were derived that relate the quasi-PDFs in different renormalization schemes to light-cone PDFs in the MS scheme. The matching from MS quasi-PDFs was first considered in Ref. [28], both for non-singlet and singlet quark PDFs, as well as for gluons. Even though one can choose operators for the latter that do not mix with singlet quark quasi-PDFs under renormalization [19,20], mixing under matching is inevitable. No mixing of the flavour nonsinglet sector with flavour singlet or gluon sectors occurs, as stated in Eq. (2.23). Ref. [28] did not, however, address the known issue of self-energy corrections, exhibiting a logarithmic UV divergence. This was resolved in Ref. [25] by adding terms outside of the plus prescription in the matching coefficient. As noticed in Ref. [30], such prescription for renormalizing this divergence violates vector current conservation, i.e. the integral of the matched PDF is different from the integral of the input quasi-PDF, and not necessarily equal to 1 over the whole integration range. As a remedy, a modified matching expression, which is given explicitly in Eq. (A.1) of App. A, was proposed in Ref. [30]. It consists in resorting to pure plus functions when subtracting the logarithmic divergence in self-energy corrections. However, this is an additional subtraction with respect to the minimal subtraction of the MS scheme and thus, defines a modified MS scheme, the so-called MMS scheme. As such, it requires the quasi-PDF to be expressed in this modified scheme. The expression for the conversion of MS-renormalized matrix elements to the MMS scheme was worked out in Ref. [45] and we refer to it for the details of the procedure. Nevertheless, this modification is numerically very small, as also shown in Ref. [45]. An alternative modification of the MS scheme that guarantees vector current conservation was derived in an updated version of Ref. [25]. This defines the so-called "ratio" scheme.
In this scheme, only pure plus functions are used, like in the MMS scheme, but the modification is done also for the "physical" region of 0 < ξ < 1 (in the notation of Eq. (A.1)). Thus, the expected numerical effect of this modification is larger, as shown explicitly in Ref. [45]. For this reason, we choose to use the MMS procedure, with details of the lattice computation of the bare matrix elements and the renormalization in the MMS scheme outlined in the next section. Yet another possibility of performing the matching consists in directly relating the quasi-PDFs in the intermediate RI-type scheme to MS light-cone PDFs. This was proposed in Ref. [29] for the unpolarized case. Obviously, such procedure is equivalent to the one adopted here, with possibly different systematic effects. All of the discussed papers considered the matching to only first order in perturbation theory. It remains an important direction to derive the two-loop formulae, which would allow to estimate the size of perturbative truncation effects in the conversion from the intermediate lattice scheme to MS and in the matching itself.

Nonsinglet distributions from quasi-PDFs Matrix Elements
In this section, we describe the lattice data used in this work, presenting briefly the quasi-PDFs matrix elements (MEs) computed in Refs. [30,45]. Using the results recalled in Sec. 2, we show that we can factorize such matrix elements into two nonsinglet distributions and a perturbatively computable coefficient, just as if they were experimental data for high-energy cross sections.

Lattice data
The field of nucleon isovector (u − d) quasi-PDFs has matured in recent years. Exploratory studies for all types of collinear PDFs -unpolarized, helicity and transversity -were performed in 2014-2016 [34][35][36][37]. They used lattice ensembles with non-physical pion masses and the results had unsubtracted divergences, due to the lack of a well-defined renormalization procedure. The latter was proposed and applied for the first time in Refs. [12,13], utilizing a variant of the regularization-independent momentum subtraction scheme (RI'-MOM) [60]. Moreover, another major progress for unpolarized PDFs was the identification of a lattice-induced mixing between the bilinear operator used in the first exploratory studies, which was defined using γ z to determine the Dirac structure, and the scalar bilinear operator (in spin space) [12]. Even though in principle it is possible to compute the matrix elements of the latter and a mixing renormalization matrix to properly subtract the divergences [13], this is bound to lead to much deteriorated precision, due to the rather poor signal for the scalar operator. Instead, it is preferable to define the quark bilinear using the γ 0 Dirac matrix, since this procedure does not give rise to mixing. Moreover the quasi-PDF computed with it converges faster in powers of 1/P 2 z to the light-cone PDF, as argued in Ref. [61]. Summarising in just one sentence, we could say that the major progresses with respect to the early works for unpolarized quasi-PDFs came from: (1) change of the Dirac structure in order to avoid the mixing, (2) non-perturbative renormalization procedure, (3) simulations at the physical pion mass. Matrix elements corresponding to such setup were computed in Refs. [30,45] and they are briefly described below. For a recent review of other available results for quasi-PDF matrix elements, see e.g. Ref. [8].
The data used in this work to illustrate the impact of lattice calculations on phenomenological fits were computed by the Extended Twisted Mass Collaboration (ETMC) 1 . They used one ensemble of gauge field configurations with two degenerate light quarks [62] with masses chosen to reproduce the physical value of the pion mass (m π ≈ 130 MeV, i.e. slightly below the actual physical value). The lattice spacing is a = 0.0938(3)(2) fm [63] and the lattice has 48 3 × 96 sites, corresponding to the spatial extent L of around 4.5 fm and m π L = 2.98. ETMC calculated bare quasi-PDF matrix elements for the unpolarized, helicity and transversity cases, but we concentrate only on the unpolarized one. The lattice data are available for three nucleon boosts, P z = 6π/L, 8π/L and 10π/L (0.83 GeV, 1.11 GeV and 1.38 GeV in physical units) and for four values of the temporal separation between the nucleon creation and annihilation operators, t s /a=8, 9, 10, 12 (0.75, 0.84, 0.94, 1.13 fm). As shown in Refs. [30,45], there are signs of convergence in the nucleon momentum (the largest two momenta give compatible results), indicating that the boost is already enough to suppress higher-twist effects below statistical precision. Moreover, as pointed out in Ref. [45], excited-states contamination at the largest source-sink separation is small, i.e. the single-state fits at this t s are compatible with two-state fits including all four values of t s . Hence, for the purpose of this study, we consider only the data at the largest nucleon boost and at the largest source-sink separation. They are shown in  The bare lattice data contain two types of divergences. First of all, there are standard logarithmic divergences with respect to the regulator, i.e. terms that behave like log(aµ). Additionally, for non-zero Wilson line lengths, further power-like divergences appear. They resum into a multiplicative exponential factor, e δm|z|/a , where δm is operator-independent. The desired renormalization scheme for the final results is the MS scheme of dimensional regularization. However, obviously, the latter is impossible on a lattice, restricted to integer dimensions. Thus, the usage of an intermediate lattice renormalization scheme is required. In Ref. [13], it was proposed to use an RI'-type prescription. The renormalization conditions are enforced on the amputated vertex functions of operators with different Wilson line lengths z, setting them to their tree-level values. A similar renormalization condition is applied for the quark propagator. This results in a set of matrix elements renormalized in the RI' scheme. Thus, a perturbative conversion from the RI' to the MS scheme is needed. Such a conversion was derived in Ref. [12] to one-loop order and was applied to the RI'-renormalized matrix elements. As we discussed in the previous section, to guarantee vector current conservation, we use a modified MS scheme, termed the MMS scheme. Thus, another perturbative conversion of the MS-renormalized matrix elements is required, according to the formula given in Ref. [45]. After this conversion, renormalized matrix elements in the MMS scheme are the starting point of the current analysis.
It is important to emphasize that despite having numerical evidence for the smallness of the effects of the nucleon momentum and of excited states, matrix elements from lattice studies come with a variety of other systematic effects. We discuss them in the next subsection. For more details about the lattice computation of the matrix elements, we refer the reader to Ref. [45].

Systematics in matrix elements of quasi-PDFs
A proper investigation of systematic effects in matrix elements evaluated in lattice QCD simulations is a difficult task, necessitating dedicated efforts. Such efforts consist in simulating with varied parameter values, such as the lattice spacing, the lattice volume, or the temporal separation between the source and the sink in nucleon three-point functions. Moreover, unrelated to the lattice regularization, there are theoretical uncertainties intrinsic to the quasi-distribution approach whose impact should also be assessed 2 . For an extensive review of these different uncertainties, we refer to Refs. [8,9], while a discussion of the systematic effects in the ETMC quasi-PDFs computation can be found in Ref. [45]. The latter contains the analysis of the effects investigated so far and a discussion of directions that need to be pursued to fully quantify all the relevant systematics.
Here, we briefly summarize the conclusions reached up to the present stage. It is important to emphasize that while the impact of some systematics is already known to a reasonable degree, reliable estimates of certain types of effects are still largely unknown. Nevertheless, rough assessments can be made even in the case of missing lattice data, by looking at the behaviour of related observables such as the average quark momentum fraction or nucleon charges that have a long history of evaluations on the lattice [66][67][68][69][70]. This allows us to build scenarios describing the potential impact of the systematics on the matrix elements of quasi-PDFs. We consider three scenarios where the systematic effect is a given percentage of the central value of the matrix element and three further ones where it is a given additive shift. We always exclude from the analysis the imaginary part of the matrix element at z = 0, equal to 0 by antisymmetry with respect to the sign change of z.
Cut-off effects. One of the most obvious systematic effects in lattice computations comes from the finite value of the lattice spacing, a, i.e. the ultraviolet cut-off imposed for the regularization of the theory. While a proper investigation of this uncertainty requires explicit simulations at a few values of the lattice spacing, which are still missing for quasi-PDFs, we may assume that discretization effects are not excessive. This expectation is based on two indirect, but related premises. First, one of the manifestations of large cut-off effects is the violation of the continuum relativistic dispersion relation, which is, however, not observed in the lattice data in Ref. [45]. Second, the first moment of the unpolarized u − d PDF gives the quark momentum fraction x u−d . This quantity was intensively investigated on the lattice and we may take the typical size of discretization effects found in such studies. Looking at a summary plot including data from different lattice groups, such as Fig. 12 from Ref. [67], we see that cut-off effects at lattice spacings comparable to the one of the present work are typically at the 5-15% level in a fixed lattice setup (same discretization, pion mass, volume etc.). Thus, we investigate 6 plausible choices for the magnitude of cut-off effects: 10%, 20%, 30% of the matrix element and additive effects of 0.1, 0.2 and 0.3.
Finite volume effects (FVE). Another natural source of uncertainty in all lattice simulations is the finite size of the box, L, which acts as an infrared regularization. Similarly to discretization effects, a robust investigation of these effects necessitates running the computations for a few values of the lattice size. However, the difference with respect to the lattice spacing effects, typically linear in a or a 2 in the asymptotic scaling regime, is that FVE are usually suppressed as exp(−m π L), where m π is the pion mass. This leads to typically O(1 − 5%) effects in hadron structure observables if m π L ≥ 3. For the matrix elements used in this work, m π L ≈ 3 -thus, the reasonable assumption about the size of FVE is approx. 5%. In addition to these "standard" FVE of lattice computations, it has been recently pointed out that the usage of a spatially extended operator, including a Wilson line, may lead to additional FVE [71]. The intuition behind this is that further FVE may appear when the Wilson line has non-negligible size with respect to the lattice length in the boost direction. The analysis of Ref. [71] pertains to a toy scalar theory and predicts a FVE of the form exp(−M (L − z)) (possibly with a polynomial amplifying prefactor), with M being the analogue of the mass of the investigated hadron in the quasi-PDF approach. Given that the nucleon mass is at the physical point around 7 times larger than the pion mass, that would lead to totally irrelevant effects, since the maximum considered z is more than 3 times smaller than L. However, it can not be excluded that in QCD, the form of this FVE can be more severe, e.g. exp(−m π (L − z)). With the physical m π and z max ≈ L/3, this could lead to the amplification of FVE from O(5%) to even above 10% at large z. We remark that ETMC has investigated FVE in the renormalization functions for the matrix elements and found no sign of excessive FVE coming at large z (total FVE not larger than around 3%) [45]. We investigate 3 scenarios for fixed percentage effects: constant FVE of 2.5% and 5%, as well as z-dependent ones of the form exp(−3 + 0.062z/a)%, where 0.062 is the pion mass value for the present ensemble, expressed in lattice units. In addition, we consider 3 shifts: 0.025, 0.05 and exp(−3 + 0.062z/a).
Excited states contamination. One of the key uncertainties in nucleon structure calculations is whether the ground state hadron state is isolated. If the temporal separation between the interpolating operators creating the nucleon and annihilating it is too small, uncontrolled excited states contamination may appear, leading to a bias in the results. In the context of quasi-PDFs, an important aspect is that this contamination strongly depends on the boost, causing a delicate interplay between the need of large momentum, required for robust matching to light-cone distributions, and excited states contamination, larger for high boost. Thus, a careful analysis is needed to ensure ground state dominance. Such an analysis was performed for the matrix elements used in this work [45]. The conclusion that we use for the present case is that these matrix elements are safe against excited states effects at the level of their statistical precision. In this way, we choose three values of uncertainty from excited states: 5%, 10% and 15%. When the renormalized matrix elements are close to zero, the relative uncertainty can be larger and thus, we consider also three additive scenarios with magnitude 0.05, 0.1 and 0.15.
Truncation effects. The perturbative ingredients of the quasi-PDF approach are of two kinds. One of them is related to the fact that the lattice approach works in integer dimensions and thus, dimensional regularization of the MS scheme is impossible. Instead, as discussed above, a non-perturbative renormalization programme has been proposed by ETMC [13], utilizing a variant of the regularization-independent momentum subtraction scheme (RI'-MOM). The renormalization correlators obtianed in this way can then be translated perturbatively to the MS scheme and finally to the MMS scheme, using formulae derived in Refs. [12,45]. These formulae are currently available to the one-loop level and thus subject to a truncation effect from unknown higher orders. A manifestation of this effect is the fact that the Z-factors have a non-vanishing imaginary part even after conversion to MS, where they should be purely real.
To evaluate the impact of this uncertainty, we compare the renormalized matrix elements with the ones obtained from applying only the real parts of the Z-factors. We find that the matrix elements obtained by this alternative procedure are compatible with the actual ones within statistical uncertainties, with relatively larger effects observed for small z/a in the imaginary part (up to O(5%)) and intermediate z/a in the real part (the real part is small there -thus, the observed absolute effects of around 0.2 can be a large percentage of the value). Apart from the scheme conversion truncation effects, the necessary perturbative ingredient of the approach is the matching between quasi-PDFs and light-cone distributions, also known to one loop [21,25,45]. Without knowing the two-loop formulae, it is difficult to estimate their size. Comparing the quasi-distribution and the resulting light-cone PDF, the numerical magnitude of the matching factor can be significant and thus, the higher order effects may be sizable. The "natural" size of such truncation effects is of O(α 2 s ), which amounts to around 10% at the renormalization scale we consider. However, they are rather uncertainties of the procedure, so they can not be translated to uncertainties of the matrix elements. These uncertainties are the analogue of the theoretical uncertainties that come from a truncated perturbative expansion in the description of observables in phenomenological fits of the PDFs, and can be treated as mentioned in the footnote above. Finally, we consider 6 scenarios for truncation effects pertinent to matrix elements (i.e. originating from the perturbative uncertainty in Z-factors): 10%, 20%, 30% of the central value of the matrix element, as well as shifts of 0.1, 0.2 and 0.3.
Higher twist effects. For the current analysis we decide to ignore the effect of higher twists, i.e. the presence of power-like corrections to the factorization formula. At this preliminary stage, we are not concerned by their effects, but a more precise phenomenological analysis should definitely take those into account. We will come back to this point in the conclusions.
Other effects. Apart from the systematics mentioned above, there are some other effects that potentially affect the results. One of them is the usage of a setup including two degenerate light quarks. However, this effect, working in the isospin limit instead of taking into account the different masses and electric charges of the light quarks, is expected to be much below the level of the current precision -of the order of the proton-neutron mass splitting, i.e. at the per mille level. A similar magnitude can be expected for the contribution of the neglected sea quark loops from heavier quarks. Such effects can at present be safely ignored and will become important only when aiming at an O(1%) precision or better.  Final scenarios. In the end, we define 6 scenarios of possible impact of systematic effects, summarized in Tab. 3.1. Scenarios S1-S3 include uncertainties that are a fixed percentage of the central value of the matrix element, while for S4-S6, the uncertainties are additive shifts independent from the value of the matrix element. Scenarios S1, S4 can be considered as the most "optimistic" ones. More realistic estimates of uncertainties are included in S2 and S5. Finally, S3 and S6 are "pessimistic", i.e. assume largest plausible estimates of the various systematic effects.

From parton distributions to lattice observables
In this work, we aim at describing the data presented in the previous subsection; further studies with more data and a more detailed treatment of systematic errors are deferred to future publications. Hence, we specialize our discussion to the case of the unpolarized isovector parton distribution. Following the notations introduced above, the parton distribution f 3 is defined as where the support is given by x ∈ [−1, 1]. The factorization theorem in Eq. (2.23) becomes where the quasi-PDF is the one given by Γ = γ 0 and the explicit expression of the matching coefficients is given in App. A. Starting from the definition of quasi-PDFs given in Eq. (2.21), it is clear that the lattice ME is given by the inverse Fourier transform of Eq. (3.2), which yields an equation relating the light-cone PDFs on the right hand side to the lattice observable: We can split the above complex identity into two real equations, relating the real and imaginary part of the ME h γ 0 ,3 (z) to the light-cone distribution f 3 . For the purpose of this work, we introduce two lattice observables, denoted by O Re γ 0 (z, µ) and O Im γ 0 (z, µ), defined as where we have only included z and µ in the arguments of O Re γ 0 and O Im γ 0 in order to simplify the notation -since we are working here with only one value of P z there is little advantage in keeping all the arguments. The explicit expression of C 3 contains plus distributions. Making them explicit we can write the equations above as where V 3 and T 3 are the nonsinglet distributions defined by where, for simplicity, the µ dependence has been omitted. The equations above relate the position space matrix elements computable on the lattice with the collinear PDFs. Similar expressions were worked out in Ref. [72] in the context of the pion distribution amplitude. The proof of Eqs. (3.6), (3.7) does require some care, and it is fully worked out in App. A, together with the explicit expressions for the coefficents C Re,Im

3
. A discussion about the convergence of the integrals involved is also reported there. The above results show how fixed z matrix elements defining the quasi-PDF in position space give direct access to two indipendent nonsinglet distributions, through the integration of the parton distribution over its full support with a perturbatively computable coefficient. We will denote this operation as .
It is useful at this point to recall the form of the QCD factorization formula for the DIS nonsinglet structure function.
and F d 2 are the structure functions of the proton and of the deuteron respectively, we have Comparing Eqs. (3.6), (3.7) with Eq. (3.10), we see that the lattice observables introduced above can be treated on the same footing as experimental data for DIS structure functions, as they are related to the nonsinglet distributions through a convolution with a coefficient that can be computed in perturbation theory. However, the form of such convolution, denoted by , is quite different from the one appearing in the DIS case, denoted by ⊗: the former involves a DIS-like convolution first, to go from the PDFs to quasi-PDFs, followed by an integration over the full x-range to go to position space. This suggests that this kind of convolution, if implemented in a PDFs fit, may constrain the output much more than what the standard DIS convolution can do.

Neural network fits
In this section, we set up a neural net fit based on the results presented in Sec. 3. The implementation is done within the NNPDF framework. We begin by briefly recalling the main features of such fits, focusing on the parametrization of the parton distributions, the minimization and cross-validation algorithms and the Monte Carlo replicas approach, referring to the specific NNPDF publications for more details. We then describe in detail the implementation of Eqs. (3.6), (3.7), and in particular the construction of FastKernel tables for the lattice observables. Once the observables can be computed using a FastKernel table, their inclusion in a PDF fit within the NNPDF environment becomes straightforward.

Fitting strategy and general settings
Given an ensemble of data whose values can be computed as a convolution of a perturbative Wilson coefficient and the PDF using some factorization theorem, the PDFs can be extracted from a minimum-χ 2 fit to these data. It is worthwhile to stress once again that in this respect there is no difference between the lattice data and experimental data. In order to define the χ 2 a parametrization of the functional form is required; the minimization is then performed as a function of the parameters that define such functional form. In our case, and more generally in the NNPDF approach, the functional form of the PDFs is given by a single-layer neural network, and the parameters are the weights and the biases of the neural network [73]. The parton distributions independentely parametrized are the gluon and the singlet distribution, (g, Σ), which mix under evolution, and the nonsinglet distributions given by (V, T 3 , V 3 , T 8 , V 8 , T 15 ), whose definition is given for example in Ref. [74]. As discussed in the previous sections, the lattice data we consider here give access only to T 3 and V 3 . Since the mixing with other flavors does not occur neither in the matching nor in the DGLAP evolution, our fits yield results for these two distributions only. The parametrization of their x-dependence at the fitting scale µ 0 = 1.65 GeV is where NN denotes a neural net, and the parameters α and β are additional free parameters, which will be fitted during the training. As extensively discussed in several NNPDF publications, see for example Refs. [74,75], the function multiplying the neural net improves the stability and the speed of the minimization procedure, without introducing a bias in the result.
Having chosen a parametrization for the PDFs, the optimal fit is determined by varying its free parameters in order to minimize some figure of merit, representing the fit quality. This is given by the χ 2 function, defined as where O (z i ) denotes the measured lattice observable and O th (z i , θ) is the corresponding theoretical prediction, in terms of the matching coefficient and the parametrized parton distribution, given in Eqs. (3.6), (3.7). We denote by θ the set of free parameters entering the neural net and the preprocessing term. The covariance matrix entering the χ 2 definition is defined to take into account the distribution of the experimental data entering the fit and their correlations. It is used for both the χ 2 definition and the generation of Monte Carlo replicas, as described below. In the specific case of lattice data, it has to be constructed using the information about the statistical and systematic uncertainties that arise in a lattice QCD simulation. The different sources of systematics are described in Sec. 3.2, and the scenarios we consider here are summarized in Tab. 3.1. The minimization of the χ 2 is performed through the CMA algorithm [76,77], employing a cross-validation technique to avoid overfitting. In this method, the available data are split in two sets. The first, the training set, is used for the minimization of the error function, while the second, the validation set, does not enter the fitting proceedure. At each iteration of the minimization algorithm, the error function between the theory predictions from the neural net and the data is computed for both the training and validation set. At an early stage of the training, both these quantities are expected to decrease. However, towards the end of the training, while the error function over the training set will keep decreasing, the same value computed over the validation data will reach a minimal value, and eventually it may even start increasing. This is a signal of overfitting, and the point in parameter space yielding the minimal value of the validation error is the one taken as the fit result. It is important to notice that such procedure is even more important when a small amount of data is available, like in the present case, since the less data there are the easier it is to fit their statistical noise.
In the NNPDF framework a Monte Carlo approach is implemented in order to get a reliable estimate of the PDF error. In this method an ensemble of N rep artificial data is generated for each experimental point, assuming a multigaussian distribution given by the experimental covariance matrix. N rep independent fits are performed, generating a Monte Carlo ensemble of PDFs that faithfully reproduces the statistical features of the original experimental (or lattice) data. The Monte Carlo method therefore propagates the error from the data to the PDFs set in a natural way, without the need of any assumption on the way error propagation happens.

FastKernel implementation
One of the primary issues in PDFs fits is the computational time required to obtain the theoretical prediction for the experimental data entering the definition of the χ 2 in Eq. (4.3): the parton distributions have to be evolved from the fitting scale up to the observable scale, and then they have to be convoluted with the correct coefficient function. As seen in Sec. 3.3, in the case of lattice observables the integration of the parton distributions over their full support is needed. This makes the form of the convolution more complicated than the one we usually have for other observables. Therefore, despite the fact that in this work we will be using a small number of lattice data, we find it useful to implement Eqs. (3.6), (3.7) by means of FastKernel tables, introduced and validated in Refs. [75,78] in the context of global QCD fits, and currently used within the NNPDF code to obtain all the required theoretical predictions in a global fit. From a practical point of view, this also allows us to treat the lattice data on exactly the same footing as the experimental ones, allowing a smooth and natural way to introduce them in a parton distributions fit. In the following we briefly revise this approach, referring to the original publication quoted above for more details.
The lattice observables O Re,Im γ 0 z, µ 2 are determined at a given renormalization scale µ 2 . They can be written in terms of the nonsinglet distributions at a given reference scale µ 2 0 , by first evolving the parton distribution up to the scale µ 2 , and then convoluting it with the coefficents C Re,Im 3 defined in Eqs. (3.6), (3.7) and worked out in App. A. For the nonsinglet distributions considered in this work, the evolution is given by where the kernels K (±) are obtained by solving the DGLAP evolution equations in the nonsinglet sector. For V 3 and T 3 we have two different nonsinglet evolution kernels, denoted by K respectively. They are currently available from a number of public codes, however one way they can be worked out and implemented is summarised in the App. B. Eqs. (4.4) and (4.5) can be rewritten expressing the parton distribution in terms of an interpolation basis [75], for instance for the case of T 3 where p is the lowest order neglected in the interpolation. In other words, the interpolating functions act by picking up the value of the PDF at some point x β of a predefined x-grid. Substituting in the evolution equation Eq. (4.4) we get The interpolation basis used at the initial scale can also be used to interpolate the parton distributions at the scale µ 2 appearing in Eqs. (3.6), (3.7). For the imaginary part of the lattice observable we get Putting together Eqs. (4.7) and (4.9) we get where αβ . (4.12) Eq. (4.12) defines the FastKernel table which enters the computation of the χ 2 during the fit. It connects the parton distribution at the fitting scale to the lattice observable, taking into account the QCD evolution, the matching and the Fourier trasnform, expressing them through a single matrix vector multiplication. Clearly a similar set of equations defines a FastKernel table that yields the real part of the lattice observable, O Re γ 0 , as a function of the valence parton distribution V 3 .

Results
Let us now proceed to presenting and discussing our numerical results. First, we study the way the available lattice data might constrain the parton distributions in a fit, by mean of closure tests: fake data for the real and imaginary part of the ME are generated according to Eqs. (3.6), (3.7) using as input a chosen PDFs set. The fitting code is then run over these pseudo-data, using exactly the same setting used in a common fit. By comparing the output of such fits with the known input PDFs sets, we can assess the accuracy we may expect to get from the current knowledge of the lattice data and their systematics.
Then we present results for fits run over the data presented in Sec. 3.1, studying the 6 different scenarios for the treatment of the systematic errors described in Sec. 3.2 and summarized in Tab. 3.1. The results presented here have been produced using the NNPDF fitting code [1] and the ReportEngine software [79].

Closure tests
As shown in Sec. 3.3, we can relate PDFs to lattice observables through the matching convolution of Eq. (3.2) followed by a Fourier transform. As already pointed out at the end of Sec. 3.3, the resulting convolution is quite different from the one entering standard QCD fits. In this section, we assess how much this operation together with the available lattice data from Refs. [30,45] are able to constrain parton distributions in a fit, running some preliminary closure tests. For a detailed description of the closure test procedure, we refer to Sec. 4 of Ref. [80]. We generate pseudo-data corresponding to the data of Ref. [30] using NNPDF31 nlo as 0118 as our input PDFs set, and we run the fitting code over them. The outcome of the closure test fit is then used to assess how well the input PDFs can be reconstructed starting from the 16 position space ME points and their uncertainties.
In order to get an idea of the impact on the fit of the statistical and systematic ME errors, we consider three different scenarios: first we generate fake data assuming no systematic uncertainties and a small uncorrelated statistical uncertainty for each point, constant for all of them and of the order of the smallest real one. From the results of this closure test we can estimate the real constraining power of the convolution , assuming an ideal scenario where all the systematics are under control and the statistical error is kept small. Second, we repeat the exercise but using the real statistical uncertainties, to assess how much the real statistics of the current simulations affect the conclusions of the previous case. Finally we look at the effect of the systematics, considering as a specific example the scenario S2 of

Closure test Statistics Systematics
real S2 Table 5.1. Closure tests with different choices of the statistical and systematic error. The results for each option above is summarised in the plots below. Looking at the results for CT1, Fig. 5.1, it is worth stressing that the lattice data entering the fit are just 16 for the real part and 15 for the imaginary part of the matrix element. Just half of them are actually used in the training procedure, while the other ones are used to build the validation set. In a standard NLO QCD global fit, like the one used as input PDF here, the number of points entering the analysis is O (4000). Fig. 5.1 shows how good the convolution is in constraining the PDFs, assuming an ideal scenario where all the systematics are under control, and the statistics are kept small. Looking at the results for CT2 and CT3 in Figs. 5.2 and 5.3, it is clear how big the impact of the statistical and systematic uncertainties of the ME is on the PDFs error: in both cases the input PDFs set is reconstructed within 1-sigma level, with some tension for V 3 at medium x in the first case. The PDFs error is increasingly big, becoming huge when the full systematics are considered. Fig. 5.3 shows what we may expect in a real life scenario.
To sum up, the results from CT1 show how promising this kind of lattice data might be in constraining PDFs. On the other hand, the results from CT2 and CT3 highlight the importance of having a good control over both the statistical and systematical uncertainties in the lattice  simulations of the ME. It is worth noticing, however, that the overall error band of the reconstructed PDFs, even in presence of the full systematic errors, would surely be reduced when new data are available.

Fit results
In this section, we present our results for fits ran over the data from Refs. [30,45], described in Sec. 3.1. As mentioned before, we consider 6 different scenarios for the treatment of the systematic errors, summarized in Table 3.1. We show results for "optimistic" (S1,S4), "realistic" (S2,S5) and "pessimistic" (S3,S6) scenarios, the difference between the elements of each couple being the nature of the systematic errors: an additive shift given by a percentage of the ME for the first, a constant shift for all the ME points for the second one.
The results of the fit for the two optimistic scenarios are shown in Fig. 5.4. S1 is sligthly more conservative than S4, but overall there is not much difference between them. The situation changes for the more realistic scenarios (Fig. 5.5), where S2 is much more conservative than S5. In the former case the tension with NNPDF31 nlo 0118 is smaller than what we observe in the previous scenarios, due to the increase in the error band and to a slight shift of the central replica of the fit. Similar comments can be made for the most pessimistic scenarios, shown in Fig. 5.6, having S3 with a huge error band and a more remarkable shift of the central replica towards the one of NNPDF31. Overall, we notice how, when the systematics are given by a percentage of the ME, we get qualitatively different results moving from one scenario to the other, while in the case we consider constant shifts there is no much difference between different cases.  Figure 5.4. S1 vs. S4: S1 results are sligthly more conservative than the S4 ones, but overall there is no significant difference between the two optimistic scenarios.

Conclusions and future work
In this work, we have used the factorization of quasi-PDFs in order to relate the unpolarized isovector parton distribution to well-defined matrix elements computable on the lattice. Using some of the currently available lattice data, we have used such result to extract the nonsinglet distributions V 3 and T 3 within the NNPDF framework, studying also different possible scenarios for the treatment of the systematic uncertainties from lattice QCD simulations.
Our first results from closure tests show how effective these lattice data might be in constraining PDFs, allowing a consistent determination of the target distribution starting from O (15) ME points. On the other hand, we show that a consistent treatment of the lattice systematics is extremely important, and how the final result of the fit strongly depends on the specific systematics scenario we consider. Considering the most realistic ones, agreement with the phenomenological PDFs is observed within 1 sigma level, for both the nonsiglet distributions considered here. The error bands are, however, very large with respect to the corresponding phenomenological PDFs, showing again how important the control over the lattice systematics is.
Despite having focused on the quasi-PDFs case, the framework we implemented is general enough to allow the inclusion in the same analysis of data coming from different lattice simulations. One direction for future work might be a similar analysis repeated considering, for example, pseudo-PDFs data rather than quasi-PDFs ones, to see weather the conclusions we got here apply also to data coming from different lattice approaches. A fully global lattice QCD fit, which would be easily implemented within this framework, would then be the next logical step: it would then be interesting to see how much the inclusion of more lattice data in the same analysis affects the error band of the resulting PDFs, and whether or not a better agreement with the PDFs determined from experimental data is reached.
Our approach allows also to combine in the same analysis both experimental and lattice data. In principle we could then study the impact of the available lattice data on an existing PDF set, like NNPDF31. This, however, would require a deeper control and assessment of the lattice systematics involved in the simulations, together with a suitable way of treating the theoretical errors involved in the computation of the theory predictions entering the fits (missing higher orders in the matching coefficents and power correction terms), to avoid biasing the global QCD fit. This will be done in a future work.
Finally, we note how the current analysis can be extended without any additional complications to the other nonsinglet distributions defined in Eq. (2.9), as soon as lattice data for the corresponding matrix elements become available. expression is given by [30,45] The matching coefficients relate the light-cone PDF to the quasi-PDF up to power suppressed terms according tof In the following, we work out the full expression of the coefficients appearing in Eqs. (3.6), (3.7). Starting from Eq. (A.2) we havẽ Let us focus on the next-to-leading order term, making the plus distribution explicit. In order to do so, we find it useful to split the integral in the two contributions for y < 0 and y > 0. A change of variables, x y = ξ, yields The plus distribution appearing in the matching coefficients is implemented as follows It can be easily verified that the two contributions appearing in the above equation are indeed well defined for every fixed x: the singularity in ξ = +1 is cured by the plus prescription, while for ξ → ±∞ the matching coefficient behaves like C (ξ) ∼ 1 ξ 2 , which is enough to guarantee the convergence of all the integrals above. For numerical stability we find it useful to avoid the singularity in ξ = +1 introducing a suitable small parameter δ ∼ 10 −6 , and rewriting the above where in the last line we have changed variables back to x ξ = y. Also, the integration range for x becomes (0, 1), since x < y < 1. Renaming variables, we have . (A.20) We now turn to the imaginary part of the Fourier transform. The computation is exactly the same as in the previous case, with the only difference that now we have a sin instead of a cos. Because of this, when re-expressing the integral Therefore, the results for the imaginary part can be obtained from those for the real part simply by replacing cos with sin and V 3 with T 3 .

B QCD evolution equations
Let us briefly summarise how the QCD evolution equation is solved for the nonsinglet sector, yielding the evolution kernel that is used in the construction of FastKernel tables. For more details and for the validation of this approach, we refer to Ref. [73] and the subsequent NNPDF publications.
Denoting the nonsiglet distributions V 3 and T 3 with f (−) and f (+) respectively, the QCD evolution equations can be written as