Neural-network analysis of Parton Distribution Functions from Ioffe-time pseudodistributions

We extract two nonsinglet nucleon Parton Distribution Functions from lattice QCD data for reduced Ioffe-time pseudodistributions. We perform such analysis within the NNPDF framework, considering data coming from different lattice ensembles and discussing in detail the treatment of the different source of systematics involved in the fit. We introduce a recipe for taking care of systematics and use it to perform our extraction of light-cone PDFs.


Introduction
Parton Distribution Functions (PDFs), encoding the structure of the proton in terms of quarks and gluons, are one of the main ingredients required to do precise high-energy phenomenology. The available PDFs sets are extracted through global fits over experimental data [1][2][3][4][5]. Their non-perturbative nature makes them a natural candidate for a lattice QCD investigation, however it has been known for a long time that it is not possible to obtain them directly from first principle computations, due to the Euclidean metric of the lattice. In the last few years, several methods have been formulated [6,7], which would allow us to compute on the lattice specific quantities that, in turn, can be related to PDFs through a factorization theorem. For a detailed discussion of the theoretical background, we refer the reader to recent reviews such as Refs. [8][9][10][11].
Examples of such quantities are the equal time correlators underlying the definition of quasiand pseudo-PDFs [12,13], given by µ (z, P ) = P |ψ (0) (z) γ µ U (0) (z, 0) ψ (0) (0) |P , (1.1) with P denoting the momentum of the external proton states, while the suffix (0) reminds us that these are bare quantities. The matrix element of the vector bilocal operator of Eq. (1.1) can be decomposed in terms of two form factors which only depend on the Lorentz invariants z 2 and ν ≡ −z · P as M (0) µ (z, P ) = 2P µ M (0) ν, z 2 + z µ N (0) ν, z 2 . (1.2) As pointed out in [14], only the first form factor, M (0) , contains leading twist information. This can be seen by choosing a light-cone separation z = (0, z − , 0 ⊥ ) together with γ µ = γ + and P = (P + , 0, 0 ⊥ ), then we get + is not directly computable on a Euclidean lattice. We can define a different quantity that is amenable to lattice simulations by choosing a purely spatial separation, z = (0, 0, 0, z 3 ), together with γ µ = γ 0 and P = (E, 0, 0, The correlators defined in Eqs. (1.3) and (1.4) are known in the literature as (bare) Ioffe-time distribution (ITD) and pseudodistribution (pseudo-ITD) respectively [13,15]. For z 2 3 = 0, in addition to usual ultraviolet (UV) divergences (leading to coupling renormalization), they have specific link-related UV divergences, which are regularized by a finite lattice spacing a. Thus, The a → 0 UV divergences are multiplicatively renormalizable [16,17]. The relevant renormalization factor Z(z 2 3 , a 2 ) does not depend on ν and, for small z 2 3 , is known at one loop. Its explicit form is inessential if one introduces the so-called reduced Ioffe-time pseudo-distributions first defined in Ref. [13] as M ν, z 2 3 = M ν, −z 2 3 ; a 2 M 0, −z 2 3 ; a 2 . (1.5) The Z-factors of the numerator and denominator are the same and cancel in the ratio leaving the reduced distribution on the left-hand side without any residual dependence on the lattice spacing.
Working in the small-z 2 3 limit, the pseudo-ITD can be matched at one-loop level to the corresponding ITD through a finite perturbative kernel, expressing the pseudo-ITD in terms of the collinear PDFs through a factorization formula based on the operator product expansion (OPE). The computation of the relevant QCD diagrams has been performed in a number of independent papers. The original QCD computation is reported, for example, in Refs. [18][19][20][21]. A simple discussion of the basic features of the derivation of the factorization formula in nongauge theories can be found in Ref. [22].
The QCD result reads Eqs. (1.6), (1.7) allow to relate collinear PDFs to quantities which are computable in lattice QCD simulations, through a factorized expression similar to those relating collinear PDFs to physical cross sections. In the spirit of the "good lattice cross sections" proposed in Refs. [23,24], this formula can be used in a fitting framework, to extract PDFs from lattice data, performing the same kind of analysis which is usually done when considering experimental data. This approach was first studied in Ref. [25], and subsequently in Ref. [26][27][28]. In Ref. [27], it has been implemented within the NNPDF fitting environment. Considering data for quasi-PDFs matrix elements produced in Refs. [29,30] and starting from the momentum space factorization formula connecting quasi-PDFs to collinear PDFs, upon numerical implementation of the Fourier transform an expression analogous to Eq. (1.6) was obtained, relating parton distributions directly to position space quasi-PDFs matrix elements. A similar analysis was very recently performed by the JAM collaboration in Ref. [31] for the spin-averaged and spin-dependent PDFs employing quasi-PDF lattice data.
In the present work we perform an analogous exercise considering this time the reduced pseudo-ITD approach, implementing within the NNPDF environment the lattice data presented in Ref. [32,33], and using the position space factorization formula of Eqs. (1.6), (1.7). This, besides being a complementary exercise to the one performed in Ref. [27], has also some practical advantages. First, when working in the pseudo-ITD approach, the factorization is realized in the limit of small-z 2 . Unlike in the quasi-PDFs approach, where the factorization is realized for high values of P , here we are allowed to keep in the analysis data coming from a wide range of momentum values, without having to remove those with lower P . This advantage is particularly important, because in lattice QCD, the low momentum data are significantly more precise for a fixed computational cost. Second, we can directly use the position space factorization formula of Eq. (1.6), relying on the analytical expression for the perturbative coefficient of Eq. (1.7) and without having to perform the numerical Fourier transform described in the appendix A of Ref. [27].
In this article we extend the general strategy that has been developed within the NNPDF framework and which allows us to systematically extract parton distributions from the available lattice data. In the implementation of this idea once the lattice data have reached some level of maturity in terms of precision and systematic effects, one could combine data from all pertinent lattice formalisms such as quasi-distributions [26,29,30,[34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50] and pseudo-distributions [28,32,33,[51][52][53][54][55]. One can also include results from the so-called "Good Lattice Cross-Sections" (LCS) approach, which is described in [23] and represents a general framework, where one computes matrix elements that can be factorized into PDFs at short distances. Papers [56][57][58][59][60] describe implementations of the latter formalism. Clearly a global analysis only makes sense after having scrutinised each set of data individually, and having understood the systematics that affect them. The structure of the paper is as follows. In Sec. 2 we define the lattice observable considered in the fit, describe the corresponding data and briefly recall the main features of the NLO terms entering the factorization formulas. In Sec. 3 we present the first set of results: we consider the fits where only the statistical uncertainties of the lattice data are taken into account. Analyzing data from different lattice ensembles we show that, in general, without accounting for systematic effects it is not possible to obtain a good fit. In Sec. 4 we discuss and quantify some of the systematic uncertainties affecting the lattice data. We include them into the analysis and study their impact on the final PDFs and on the fit quality. Sec. 5 summarizes our conclusions.

Lattice data and observables
In this section we describe the lattice observables we will consider in the present work, together with the corresponding data. By lattice observable, we mean a quantity which can be computed on the lattice on one hand, and related to some collinear PDFs through some kind of factorization theorem on the other. We will consider two different observables corresponding to the real and imaginary part of the reduced pseudo-ITD defined in Eq. (1.5).
Considering the case of the unpolarized isovector parton distribution and recalling the definition of the two nonsinglet PDFs V 3 and T 3 taking the real and complex parts of Eq. (1.6) and using Eq. (1.7), we can define the two lattice observables where the kernels B (w) and L (w), according to Eq. (1.7), are given by It is worth recalling some important features of the NLO coefficients given in Eqs. (2.5), (2.6).
The contributions proportional to the two kernels B (w) and L (w) of Eqs. (2.7), (2.8) can be seen as an evolution and a scheme change term respectively [32,61]: while the former is responsible for the evolution from the PDF scaleẑ to the pseudo-ITD scale z 2 , the latter takes into account the finite terms characterizing the specific choice of the renormalization scheme. They are plotted in Fig. 2.1 for both the real and imaginary part, using the PDFs set NNPDF31 nlo as 0118 as input. The evolution term B (w) also connects pseudo-ITD points having different values of z 2 : considering for example the real part, from Eqs.
which relates the real part of the pseudo-ITD point at the scale z 2 with the one having the same Ioffe time at the scale z 2 0 [51,62]. In the present work, we will consider the data for reduced pseudo-ITD from Refs. [32,33]: the datasets presented in Ref. [32] have been produced starting from three different lattice ensembles, denoted as fine, big and coarse and which differ for the volume and lattice spacing used in the simulations. They have been produced using values of the pion mass ranging from 390 MeV (fine) to 415 MeV (coarse and big). In the present work we will focus on the datapoints produced from the fine ensemble, while those from the coarse and big ones will be used to estimate systematic effects due to continuum limit and finite lattice volume. We will also consider pseudo-ITD points presented in Ref. [33], produced at the physical pion mass. Following the original convention of Ref. [33] we will denote the corresponding lattice ensemble as 170. These four ensembles of 2 + 1 flavor lattice QCD were generated by the JLab/W&M collaboration using clover Wilson fermions and a tree level tadpole-improved Symanzik gauge action. One iteration of stout smearing with the weight ρ = 0.125 for the staples is used in the fermion action. A direct consequence of the stout smearing is that the value of the tadpole corrected tree-level clover coefficient c SW used is very close to the non-perturbative value determined, a posteriori, using the Schrödinger functional method. The detailed features of these ensembles are reported in Tab. 2.1, together with the number of reduced pseudo-ITD datapoints n dat computed from each of them.
Given a set of lattice data for the real and imaginary part of the reduced pseudo-ITD, the distributions T 3 and V 3 can be extracted from them through a standard minimum-χ 2 fit, following the approach described in Refs. [22,27]: the unknown x-dependence of the PDFs is parameterized at the chosen scale µ 2 , using a suitable parametric form, whose best parameters  In this work we will perform this exercise using the NNPDF fitting framework, running the same machinery commonly used to extract PDFs from experimental data, already applied to lattice results in Ref. [27]. In the following we briefly recall its main relevant features, referring to Ref. [27] for more details. The x-dependence of the distributions f q (x) (V 3 and T 3 in our case) is parameterized through a neural network NN q multiplied by a preprocessing polynomial factor, as α q , β q being additional free parameters to be determined during the fit, alongside the weights and biases defining the neural network. Denoting the free parameters of the model as θ, the best fit is determined minimizing the χ 2 function, defined as where O (z i ) denotes the measured lattice observable and O th (z i , θ) is the corresponding theoretical prediction, expressed using the matching coefficients of Eqs. Refs. [63,64] in the context of global QCD fits, and currently used within the NNPDF code to obtain all the required theoretical predictions in a global fit. The covariance matrix entering Eq. (2.11) describes the distribution of the data, and takes into account the statistical and systematic uncertainties and their correlations. It enters both the definition of the χ 2 and the generation of Monte Carlo replicas [65], being therefore important for both the central value of the fit and the final PDFs error. A solid knowledge of the covariance matrix is therefore an essential ingredient to get reliable results. The minimization of the χ 2 is performed numerically: different algorithms can be implemented, here the CMA algorithm [66] is used, employing a cross-validation technique to avoid overfitting. The specific code used in the present work is the one employed for the production of the PDFs set NNPDF31 [1], together with the ReportEngine software [67].
The NNPDF methodology has been used to produce PDF sets for many years now, and provides a flexible environment within which it has been possible to fit more than 4000 experimental points, coming from a variety of different high energy processes in different kinematic ranges [1,65]. Therefore it represents a reliable framework which can be used to study and analyze the available lattice data, to assess how well these are able to constrain the PDFs and to compare lattice results with those coming from standard PDF sets. It is important to emphasise once again that in this analysis, once the FastKernel tables have been generated, the lattice data are treated exactly on the same footing as any other data, viz. the exact same methodology and code are used for fitting experimental and lattice data.

Fits over lattice data: statistical uncertainties only
In this section, we will present results for fits performed over the lattice data computed from the ensembles fine and 170, denoted as fine-stat and 170-stat respectively. Such fits have been produced considering statistical uncertainties only. We will show how, in general, without having the complete information regarding the lattice systematic uncertainties it is not always possible to obtain a good fit. In the next section, taking as example the case of the fine ensemble, we will discuss and estimate some of the possible systematic effects, studying their impact on the fit quality and on the resulting PDFs.
Parton distributions resulting from fits fine-stat and 170-stat, together with the corresponding error bands, are plotted in the upper and lower plots of Fig. 3.1, and the χ 2 values are reported in Tab. 3.1: despite the PDFs extracted from the two datasets are compatible within one σ, the error band of the fit fine-stat appears to be slightly smaller than the other, with an average χ 2 value per datapoint equal to 8.36, pointing out a possible underestimation of the error and a bad fit quality. This could be caused by inconsistencies between different datapoints, due to unknown systematic uncertainties affecting them. On the other hand, the fit 170-stat shows better χ 2 values, with an average value per datapoint equal to 1.38.
Focusing on the more problematic case of the fine ensemble results, in order to assess which points are more likely to be affected by large systematic errors, we will study the contribution to the χ 2 coming from each datapoint

1)
D i and T i being the i-th lattice point and the corresponding prediction from the fit respectively, and find out which points D i are more than 4σ (or 3σ) off from the fitted distribution T i . These are the points that, most likely, do not belong to the fitted distribution and which therefore might be affected by larger systematic effects.

The contributions
√ δ i are plotted in the upper plot of Fig. 3.2 as a function of the Ioffetime ν, with the red and yellow lines highlighting the 4σ and 3σ cut respectively: it is clear that a bunch of points having small Ioffe-time values are those giving the highest contribution to the total χ 2 , being more than 3σ or 4σ off. We can implement 4σ and 3σ cuts, removing the problematic points from the dataset and producing new fits, denoted as fine-stat-3σ and fine-stat-4σ: the new fits show more reasonable χ 2 values, reported in Tab. 3    √ δ i contributions for each datapoint of teh fine ensemble. The red and yellow lines highlight the 4σ and 3σ cut respectively. Lower plots: PDFs from fits fine-stat (orange) and fine-stat-3σ (green), normalized to the former. upon removing the outliers, the remaining points, coming from a wide range of momentum p and Euclidean separation z 3 , are fitted reasonably well. The PDFs resulting from the 3σ cut are plotted in the lower plot of Fig. 3.2, normalized to the fit without any cuts: it is clear how, despite spoiling the total χ 2 , the problematic points do not seem to have a big impact on the final PDFs.
We conclude that, depending on the specific lattice ensemble we consider, quite a high number of small Ioffe-time points do not belong to the fitted distribution. In order to get reasonable χ 2 values, such points have to be removed from the fit. This highlights possible tensions between datapoints and may point out the presence of systematic effects. In order to avoid any underestimation of the PDFs error and to introduce back in the analysis all the available points, systematic uncertainties need to be quantified and implemented in the fit.

Systematic effects 4.1 Discussion
The high χ 2 values of the fits presented in the previous section might point out the presence of some tensions between datapoints. In the following, focusing on the case of the fine ensemble results, we will show that this is indeed the case, and we will investigate possible sources of systematic uncertainties and their numerical values.
The matrix element defining the pseudo-ITD is a function of the Ioffe-time ν and of the scale z 2 . Points having the same Ioffe-time but different Euclidean separation can be related through Eq. (2.9), which can be used to evolve each pseudo-ITD point up to a chosen reference scale z 2 0 = (0.7 a) 2 . Looking at Fig. 2.1 it is clear that, given this choice for z 0 , the sign of the NLO correction of Eq. (2.9) will be positive for every datapoint, so that the evolution increases the real part of the pseudo-ITD. Considering the imaginary part, the sign of the NLO evolution term is initially negative, and it turns positive at bigger values of ν. Such effects can be seen in Fig. 4.1, where the pseudo-ITD points computed from the fine ensemble are plotted before (blue) and after evolution (red). After evolution, points having the same Ioffe time should have  the same value. In practice, they should be compatible within errors. Looking at the red points of Fig. 4.1, where each point is plotted with the corresponding statistical uncertainty, it is clear how, expecially in the small Ioffe time region, this is not always the case: after evolution, some points having the same Ioffe time are not compatible between each other. Such discrepancies might be explained by the presence of systematic effects we are not accounting for. A proper investigation of the systematic effects affecting the computation of the equal time correlators underlying the definition of pseudo-PDFs is a difficult and expensive task which would require to run different lattice simulations varying a set of parameters, like for example the lattice spacing, the lattice volume, the pion mass. Alongside systematic effects due to the lattice simulation, other sources of errors are those connected to the theoretical framework of the pseudo-PDFs approach, like the presence of higher twist effects and perturbative matching truncation effects. A detailed discussion of many of these uncertainties, together with a series of possible scenarios for their numerical values, can be found for example in Ref. [27].
As mentioned in Sec. 2, in Ref. [32] additional pseudo-ITD points were computed starting from other two lattice ensembles, with pion mass similar to that of the fine one, but having different volume and lattice spacing, denoted as big and coarse, whose features are reported in Tab. 2.1. Systematic uncertainties due to the continuum limit (CL) and finite volume (FV) can be directly estimated using these additional results as detailed in Ref. [32]: the real and imaginary components of the pseudo-ITD are fitted to a polynomial as a function of the Ioffetime ν; the difference between coarse and fine ensemble results is taken as an estimate for lattice spacing effects as a function of ν, while the analogous difference considering the coarse and big ensembles gives an estimate for uncertainties due to finite lattice volume. These differences will be considered as two independent sources of correlated systematic, affecting each datapoint entering the analysis. They are shown in the upper plots of Fig. 4.2 as functions of the Ioffe-time, denoted as FV (finite volume) and CL (continuum limit).
It is important to understand whether or not these systematic uncertainties are enough to account for the discrepancies described at the beginning of the section. In the lower plots of Fig. 4.2 FV and CL systematic effects are plotted for the relevant Ioffe-time values, together with the aforementioned discrepancies. Consistently with what observed previously, the latter seem to affect mostly low Ioffe-time points, which are also those for which the estimated finite volume and continuum limit systematics reach their minimum values. Therefore from Fig. 4.2 it follows that FV and CL systematics cannot be considered responsible for the big contributions to the χ 2 noted in the fits of the previous section. In other words, they are likely not enough to account for the observed discrepancies affecting low Ioffe-time points. It should be noted that a study of more than 2 ensembles for each systematic error may be necessary for a more definitive conclusion. Upper plots: finite volume (FV) and continuum limit (CL) systematics provided as functions of the ioffe-time ν for the real (left) and imaginary (right) part of the matrix element. Lower plots: discrepancies between data having the same ioffe-time (red) together with the total FV and CL systematic effects (blue).
Excited states contaminations might represent another possible source of systematic effects. Also missing higher orders in perturbation theory and higher twist effects could in principle be treated as additional systematic uncertainties. Unlike the case of the FV and CL systematic uncertainties discussed above, we cannot estimate the size of such effects using the current lattice results. One could then follow the approach adopted in Ref. [27], where different scenarios for the size of such systematics have been considered, and try to draw conclusions about their impact on the PDFs and on the fit quality. Here we will follow a different approach, trying to quantify an additional uncertainty which accounts for the unknown missing systematic effects, following a Bayesian approach as detailed in the following.
The figure of merit which is minimized during a Gaussian fit is defined as the probability of the data D given the model parameters θ, namely the likelihood where Σ is the covariance matrix of the data D, accounting for the known statistical and systematic uncertainties, and T (θ) is the theoretical prediction, function of the model parameters.

(4.2)
Assuming a Gaussian prior distribution P (∆) = exp − 1 2 ∆ TΣ−1 ∆ we can marginalize over ∆ getting which defines the relevant likelihood to be minimized. Eq. (4.3) shows how the presence of unknown systematic effects can be accounted for by introducing in the likelihood an additional contribution to the covariance matrix, denoted byΣ, which defines the prior probability distribution of these systematics. Its specific definition is of course arbitrary, and depends on the knowledge of the missing uncertainties we have. This Bayesian approach, despite not providing a general method to estimate the missing systematics, allows to include in the analysis the partial information we may have about them. It has already been applied in different physical problems, when the data are affected by unknown sources of systematics: in the case of global QCD analysis, in Refs. [68,69], a suitable covariance matrixΣ was defined by mean of scale variations, in order to take into account the theoretical error due to missing higher orders, while in Ref. [70] a similar approach was applied to cosmological data. In our case, we only know the discrepancies observed at the beginning of this section, not described by continuum limit and finite volume effects. We can look at such discrepancies as an indication of the minimal size of the systematic effects affecting the data and use them to construct a suitableΣ: for each couple of points having a given Ioffe-time value, we will define the two corresponding diagonal components ofΣ as half of the distance between evolved points, setting the off diagonal elements to zero. Each point sharing the same Ioffe time value with at least another one will therefore be affected by an additional, uncorrelated systematic such that, after evolution, datapoints having the same Ioffe-time will be compatible between each other. Clearly, this global, uncorrelated systematic will be the dominant one for small Ioffe-time points, where most of the problematic points are, while for higher value of ν lattice spacing and finite volume effects will dominate.

Results
To sum up, in Sec. 4.1, we have discussed and estimated three different source of systematics: the first two, accounting for finite volume and lattice spacing effects, can be computed directly from the available lattice results as a function of the Ioffe-time ν, and will be implemented in the fit as two independent source of correlated systematics; the third one has been estimated using the size of the discrepancies observed between points having the same Ioffe-time, and will be considered as an additional uncorrelated uncertainty, in order to take into account the minimal size of all the remaining systematic effects we have not directly computed. As mentioned in Sec. 2, such systematics enter the definition of the covariance matrix responsible for both replicas generation and the χ 2 definition, and therefore it has a central role in both the determination of the fit central value and its error band. The new fit is denoted as fine-sys and the resulting PDFs are plotted in Fig. 4.3, together with the results from the fit fine-stat presented Sec. 3: while the distribution T 3 is basically unaffected by the introduction of the systematic errors, both the central value and the error band of V 3 change, with an overall down shift of the former and a sizable increase of the latter. The χ 2 values are reported in Tab. 4.1: the average value per datapoint is now 1.29, showing a good fit quality.
Despite it is probably too early to draw comparisons between our results and phenomenological distributions, it is interesting to see how they look when plotted together: given the fact that nowadays V 3 and T 3 are very well constrained by experimental data, the discrepancies we observe between lattice and phenomenological results might be a good indication of the size of the systematic we are still missing, highlighting specific x-region where the lattice PDFs error might have been underestimated. In Fig. 4

Conclusions
In the present paper, we have considered the pseudo-ITD data produced in Refs. [32,33]. Using the position space factorization theorem relating such data to collinear PDFs, we have extracted two nonsinglet distributions within the NNPDF framework.
After extracting PDFs from different data sets and considering statistical uncertainties only, we have shown that in one of the cases considered, the fit quality appears to be really poor, pointing out the need for a detailed knowledge of the systematic effects. Using the results of Ref. [32] we have directly estimated those connected to finite volume and lattice spacing effects. As for systematic uncertainties which cannot be directly computed from lattice results (like for example truncation effects and higher twist corrections), starting from the observed discrepancies between low Ioffe-time points we have used a Bayesian approach to introduce an additional systematic which allows us to mitigate the tensions between the problematic datapoints, using the partial pieces of information which are available to us.
The Bayesian approach however is not completely satisfying, since it relies on a partial knowledge of the missing uncertainties and requires to make a number of assumptions about them. More work has to be done to achieve a detailed knowledge of the systematic uncertainties in lattice simulations: without a stringent control over them, it is not possible to draw reliable conclusions and to make comparisons with phenomenological distributions.
Finally, we stress once more that the analysis performed in this paper is complementary to that presented in Ref. [27], where quasi-PDFs matrix elements where considered instead, starting from the momentum space version of the factorization theorem. In both cases, results have been produced within the NNPDF environment, running the same machinery used for global QCD analysis over experimental data. The next logical step might be a global lattice QCD fit within this same framework, where data for multiple lattice observables coming from different simulations are simultaneously included in the analysis.