Determining the helicity structure of the nucleon at the Electron Ion Collider in China

Understanding how sea quarks behave inside a nucleon is one of the most important physics goals of the proposed Electron-Ion Collider in China (EicC), which is designed to have 3.5 GeV polarized electron beam (80% polarization) colliding with 20 GeV polarized proton beam (70% polarization) at instantaneous luminosity of $2 \times 10^{33} {\rm cm}^{-2} {\rm s}^{-1}$. A specific topic at EicC is to understand the polarization of individual quarks inside a longitudinally polarized nucleon. The potential of various future EicC data, including the inclusive and semi-inclusive deep inelastic scattering data from both doubly polarized electron-proton and electron-$^3{\rm He}$ collisions, to reduce the uncertainties of parton helicity distributions is explored at the next-to-leading order in QCD, using the Error PDF Updating Method Package ({\sc ePump}) which is based on the Hessian profiling method. We show that the semi-inclusive data are well able to provide good separation between flavour distributions, and to constrain their uncertainties in the $x>0.005$ region, especially when electron-$^3{\rm He}$ collisions, acting as effective electron-neutron collisions, are taken into account. To enable this study, we have generated a Hessian representation of the DSSV14 set of PDF replicas, named DSSV14H PDFs.


Introduction
Understanding the helicity structure of the nucleon in terms of quark and gluon degrees of freedom is of fundamental interest in modern hadronic physics. In the naive parton model, the proton spin is considered to be originated from its three valence quarks. This naive picture was first challenged by the pioneering measurements of polarized deep inelastic scattering (DIS) performed in the EMC experiment in the late 1980s [1], in which the spin carried by the three valence quarks has been shown to be much less than the expected 1/2. Ever since then, tremendous experimental progress has been made in the past 32 years, including fixed target experiments of polarized lepton-proton and lepton-ion scatterings at SLAC, CERN, DESY, and JLAB [2], as well as the seminal spin program of polarized proton-proton and proton-helium collisions at Relativistic Heavy Ion Collider (RHIC) [3].
Identifying the gluon spin contribution and the quark flavour discriminated helicity distributions are key steps in order to precisely pin down the proton spin configurations. The first evidence of the polarization of gluon inside the proton [4] was shown in the RHIC spin program, particularly in the measurement of jet production in polarized protonproton collisions [5], where, however, one still could not draw any reliable conclusions on the exact gluon contribution to the proton spin due to the limited kinematic coverage. Another interesting measurement at RHIC was the charged weak vector boson production in polarized proton-proton collisions [6,7], which was supposed to provide a sensitive channel to determine the flavour discriminated helicity distributions. However, again, due to the limited kinematic coverage, no conclusive statement could be made. As a result, the parton helicity distributions were left largely unconstrained.
It has long been recognized that parton helicity distributions can be probed in polarized lepton-nucleon scatterings. Recently, there have been several proposals to build a new generation of polarized Electron-Ion Collider (EIC) worldwide, such as an EIC at Brookhaven National Laboratory [8] and an EIC in China [9]. The EIC machine in the US is designed to probe the parton helicity distribution in a relatively small x region as compared to the ongoing experiments at JLab. The Electron-Ion Collider in China (EicC) is proposed to be constructed based on an upgraded heavy-ion accelerator, High-Intensity heavy-ion Accelerator Facility (HIAF) which is currently under construction in Guangdong Province of China, together with an additional electron ring. The EicC is designed to cover a center-of-mass energy range from 15 to 20 GeV with the luminosity of about 2 × 10 33 cm −2 s −1 in electron-proton collisions, which aims to bridge the kinematic coverage between EIC and JLab. In addition to a polarized electron beam with a polarization of 80% and a polarized proton beam with a polarization of 70%, it also plans to offer polarized light-ion beams such as 3 He with a polarization of 70%.
The design of the EicC offers unprecedented new opportunities to study the spin structure of the nucleon. In this paper, we present a quantitative study of the impact that future EicC inclusive and semi-inclusive deep inelastic scattering (SIDIS) measurements will have on the determination of various helicity distributions inside the nucleon as well as on the determination of their contributions to the proton spin. In our study, we take advantage of simulated DIS and SIDIS data from electron-proton and electron-3 He collisions in order to disentangle the constraining power that each initial and final state combination has on the different flavour helicity distributions. During our discussion, it will become apparent that both proton target and effective neutron targets, such as 3 He, together with identified final states of well-known flavour content, such as pions and kaons, are needed to achieve a consistent reduction of uncertainties across different helicity distributions. We will also show that the high accuracy with which these processes are planned to be measured at the future EicC will allow for unprecedented precision in the extraction of quark helicity distributions in the region usually referred as the "sea-quark" region.
Apart from the intrinsic advancement of our knowledge of the proton spin content that comes with the improved overall precision of helicity distributions, reaching such precision in the sea sector also means that the EicC will help to clarify some intriguing problems and phenomena observed in the previous experiments, such as the asymmetry in the distribution of polarized light sea quarks, and the polarization of strange quarks inside a polarized nucleon, etc. [10][11][12][13][14][15].
The paper is organized as follows. In the next section, we introduce the theoretical framework used to compute the Double-Spin-Asymmetries observable in DIS and SIDIS processes. In Sec. 3 we present the methodology applied to generate EicC pseudo-data for the different processes considered. Our main discussions and findings are collected in Sec. 4. After introducing the ePump tool [16,17] used to perform our analysis, we summarize in Sec. 4.1 the procedure used to convert replica sets of Parton Distribution Functions (PDFs) into equivalent hessian sets of them as this is a step needed in order to be able to use the chosen replica PDF set within the hessian framework utilized by the ePump software. The details of the theoretical framework upon which the ePump operates are summarized in Sec. 4.2 and 4.3. We proceed by describing in Sec. 4.4 the specific choices that went into making the theory and data tables fed to ePump. In Sec. 4.5 we present our main results and discuss them. Discussion of the sensitivity of helicity distributions to specific data samples is presented in a more quantitative manner in Sec. 4.6. We conclude by summarizing our findings and remarks in Sec. 5.

Polarized Lepton-Nucleon scatterings to access helicity distributions
In the Deep-Inelastic Scattering (DIS) process, e + p(n) → e + X, a nucleon such as a proton (neutron) is collided with an electron and gets destroyed into unobserved hadronic remnants X while keeping track of the original electron bouncing off the nucleon. Making use of both longitudinally polarized electron and proton (or effective neutron) beams, a possible observable that can be measured during this process is the so-called Double-Spin-Asymmetry (DSA): where the superscript "+" and "−" denotes the helicity of the two beams respectively, P e (P p ) means the polarization of electron (proton) beam, N is the luminosity-normalized number of events in a specific spin orientation state. It is necessary to define the kinematic variables for discussions on the experimental observables. With k, k denoting the four-momenta of the incoming and outgoing electron, p denoting the four-momentum of the incoming nucleon, one can define the following kinematic variables: 2)

4)
W h = (p + q) 2 . (2.5) The measured DSA can be written down approximately as in a kinematic region where x is small while the momentum transfer Q 2 is relatively high [18,19]. The factor R is the cross section ratio between the absorption of a longitudinally polarized virtual photon and a transversely polarized virtual photon by a nucleon, R = σ L σ T . The asymmetry A 1 is related to the longitudinal spin structure function g 1 by where F 2 or F 1 denotes the spin-independent structure function. In the inclusive DIS process, where only the scattered electrons are detected, the measured F 1 and g 1 structure functions can be expressed at leading order (LO) in the parton model, for Q being much smaller than the Z boson mass, as where q and ∆q denote the unpolarized and helicity parton distribution functions respectively.
On the other hand, in the SIDIS process, where a leading hadron such as π ± or K ± is also detected in addition to the scattered electron, the measured asymmetry A h is related to the corresponding semi-inclusive unpolarized and longitudinal spin structure functions, F h 1 and g h 1 , which can be expressed at LO in the parton model, for Q being much smaller than the Z boson mass, as Here D q→h (Q 2 , z) describes the fragmentation process from a quark q to a hadron h, z represents the momentum fraction of the final state hadron, whose four-momentum is denoted as P h , with respect to the momentum of the produced quark. Experimentally it is defined as z = P h ·p q·p . As one can tell from the LO expressions, the final-state hadron in the SIDIS processes offers different weights for different flavours of the initial state quark comparing to the inclusive DIS measurements. Hence, SIDIS processes provide a powerful way to separate single flavour distributions. In the electron-proton collision, considering π ± and K ± SIDIS processes, there will be four sets of data. In addition, using polarized 3 He as an effective neutron target offers additional four sets of data. The LO expressions for A h 1 of these eight data sets can be found in Appendix A.
Beyond LO, factorization of short (high energy) and long (low energy) distance interactions in DIS and SIDIS allows to write the previous LO expressions in an all-order form. For instance, the spin dependent structure function g h 1 can be written as where with µ we collectively denote all factorization and renormalization scales,x andẑ are the partonic counterparts of the hadronic variables x and z, and ∆C f f are spin-dependent coefficient functions. Similar expressions can be written for inclusive g 1 by removing the fragmentation functions. For unpolarized cases, such as F h 1 and F 1 , one has to use unpolarized parton distributions and unpolarized coefficient functions accordingly.
Moreover, using perturbation theory the coefficient functions can be expanded in terms of powers of the strong coupling constant α s . For example, 2π ∆C (1) and its LO expression yields, where we have set again µ 2 = Q 2 . Explicit expressions for the polarized and unpolarized coefficient functions can be found for SIDIS in [20][21][22][23][24][25][26] and for DIS in [25,27]. In the following sections, the impact study on various helicity distributions at NLO taking advantage of the sets of SIDIS data at the EicC will be discussed in detail.

Description of the Pseudo-data
The EicC accelerator is optimized to provide a 3.5 GeV electron beam on a 20 GeV proton beam (40 GeV 3 He beam). The pseudo-data were produced according to this design. The Q 2 -x coverage of the DIS process at the EicC, together with the coverage of an optional energy configuration at the US EIC and JLab-12 experiments, are shown in Fig. 1. As mentioned above, the instantaneous luminosity at the EicC is about 2 × 10 33 cm −2 s −1 per nucleon, which means that about 50 fb −1 of integrated luminosity will be accumulated with 10 months of running without considering beam delivery and detector efficiency.
The DJANGOH event generator [28] was employed to produce the pseudo-data. It has been widely used at HERA and then modified to accommodate the needs of the EIC community for various simulations. DJANGOH can simulate deep inelastic lepton-nucleon (nuclei) scattering including both QED and QCD radiative effects. It is an interface of the Monte-Carlo programs HERACLES [29] and LEPTO [30]. The HERACLES can treat the electron-proton scattering using either parametrized structure functions or PDFs in the framework of the quark-parton model. The LEPTO does the integration on electroweak cross-sections and, based on the cross-section, it simulates lepton-nucleon scattering with hadronic final states by using the JETSET library [31].
Once the pseudo-data were produced, the following cuts were applied: Q 2 > 2 GeV 2 , W 2 > 12 GeV 2 , 0.05 < y < 0.8, and 0.05 < z < 0.8. Afterwards, the data were binned in x − Q 2 two dimensions, as shown in Fig. 2. In each x − Q 2 bin, a log-likelihood was defined as where yield = (1 + λP e P p D(y)A 1 ) · σ 0 · L · A, (3.2) and the normalization Here λ, with value ±1, denotes the different spin combinations in Eq. (2.1), A is the detector acceptance, and L is the integrated luminosity. If the acceptance is the same for λ = ±1 states, the uncertainty for the A 1 measurement in a particular bin, after maximizing the log-likelihood, is given by For electron-proton collisions, the obtained uncertainty projection on A 1p , where p denotes proton data, can be used for the following impact study. While for e-3 He collisions,  Figure 2.
The Q 2 -x coverage of the pseudodata points for electron-proton and electron-3 He collisions after kinematics cuts. In electron-proton (or electron-3 He) collision, there are 5 sets of data: inclusive DIS, π ± and K ± SIDIS. after the uncertainty projection on A 1 3 He is obtained, the dilution factor needs to be considered in order to get A 1N projection, where N denotes neutron.
The ground state of 3 He nuclear wave-function is dominated by the S-state, in which the proton spins cancel each other and the nuclear spin is mostly carried by the neutron [32]. Hence, 3 He can be used as an effective polarized neutron source. Neutron asymmetries can be obtained from 3 He asymmetries using the effective nucleon polarization [33][34][35] by with dilution factor f p = 2σp σ 3 He , neutron effective polarization P n = 0.86 +0.036 −0.02 , and proton effective polarization P p = −0.028 +0.009 −0.004 . The dilution factor was calculated bin by bin for individual SIDIS channels using a dedicated electron-proton simulation using DJANGOH with proton beam energy set to be 40/3 GeV in order to match the electron-3 He collisions. To extract neutron results out of 3 He data, various nuclear effects have to be taken into account, including spin depolarization, nuclear binding and Fermi motion of nucleons, the off-shellness of the nucleons, presence of non-nucleonic degrees of freedom, and nuclear shadowing and anti-shadowing, etc. Since we are merely interested in the uncertainty propagation of data in this paper, we shall apply Eq. (3.5) as an effective method to extract neutron results out of 3 He data without including the presence of non-nucleon degrees of freedom and nuclear shadowing and anti-shadowing effects, etc. Considering the precision of the data expected in the future electron-ion colliders, one has to use the convolution approach, instead of the effective method, to include all other nuclear effects in order to obtain the correct central value of the neutron results [32]. This would require dedicated theoretical and experimental effort in the EIC era.
In the following discussion, the data sample for electron-proton and electron-3 He (effective neutron) collisions are both assumed to be 50 fb −1 , respectively. In this scenario, while discussing the impact of different data subsets, without mentioning proton or neu-tron data explicitly, for example "EicC(50 fb −1 )DIS" or "EicC(50 fb −1 )SIDIS", it means 100 fb −1 of data (50 fb −1 e-p + 50 fb −1 e-3 He collisions).

Description of the impact study using ePump
In this section, we discuss how to quantitatively study the impact of new (pseudo) data on updating the existing PDF sets. After briefly reviewing various methods for this type of study, we discuss how to convert a Monte Carlo PDF set to a Hessian PDF set. Following that, the Hessian profiling method will be discussed and applied to the pseudo-data to carry out the impact study.
The two commonly-used methods for extracting PDFs and their uncertainties from a global analysis of high-energy scattering data are the Monte Carlo method, used by NNPDF [36], and the Hessian method, used in CT14HERA2 [37,38], for example. In the Monte Carlo method, a statistical ensemble of PDF sets is provided, which are assumed to approximate the probability distribution of possible PDFs, as constrained from the global analysis of the data. In the Hessian method, a smaller number of error PDF sets are provided along with the central set which minimizes the χ 2 -function in a global analysis. These error PDF sets correspond to the plus and minus eigenvector directions in the space of PDF parameters, which are used to approximate the χ 2 -function near its global minimum.
An understanding of uncertainties due to PDFs is crucial to precision studies of the standard model, as well as to searches for new physics beyond the standard model at lepton-hadron and hadron-hadron colliders. In turn, new measurements of standard model processes can be used to constrain the uncertainties of PDFs. The most complete method for obtaining constraints from the new data on the PDFs would be to add the new data into the global analysis package and to do a full re-analysis on the PDFs. However, this is impractical for most users of PDFs. A technique for estimating the impact of new data on the PDFs, without performing a full global analysis, is extremely useful. In the context of the Monte Carlo PDFs, the PDF reweighting method has become commonplace. This involves applying a weight factor, which is dependent on the new data and the theory predictions, to each of the PDFs in the ensemble [39][40][41] when performing ensemble averages. Because the weight factor for some of the PDFs in the ensemble may be small, the effective number of PDFs in the ensemble is reduced. Therefore, the number of initial PDF replicas in the ensemble must be increased to get sufficient statistics in the reweighted averages. However, this may not be always sufficient to guarantee a successful outcome of the reweighting method. In the case of particularly significant data improvements the effective number of replicas surviving the reweighting procedure can drop down to a few dozens or less, and any statistical value of the reweighted replica set is therefore lost. For instance, this is what was observed in [19] when attempting to assess the impact on DSSV14 PDFs [42] of semi-inclusive deep inelastic scattering off helium at the future USA Electron-Ion Collider. In that specific case, they have observed the failure of the reweighting method and stressed the need for a new fit.
To overcome this possible limitation, it is also possible to estimate the impact of new data directly using the so-called "Hessian profiling" [16,17,43,44] method to update the existing Hessian PDF sets. The advantage of this Hessian updating method over the Monte Carlo reweighting method is that it directly works with the (smaller set of) Hessian PDFs, and it is a simpler and a much faster way to estimate the effects of the new data. This method directly calculates the minimum of the updated χ 2 function within the Hessian approximation.
In this work, the software package ePump (error PDF Updating Method Package) [16,17], which can update any given set of Hessian PDFs obtained from an earlier global analysis, was used. Although the DSSV14 PDFs were presented as Monte Carlo sets, we can apply the package MC2Hessian [45] to produce an equivalent Hessian set, which will be named as DSSV14H in this work. The details of this conversion will be discussed below. After that, one is able to use the ePump package to estimate the impact of new (pseudo) data on updating the DSSV14H Hessian PDFs. A few similar studies, but for unpolarized PDFs, can be found in Refs. [46][47][48].

A Hessian representation for Monte Carlo PDFs
In this subsection, a brief overview of the methodology employed by the MC2Hessian package to generate a reliable Hessian representation from a PDF Monte Carlo (MC) replica set will be presented, followed by its application in our specific case for the DSSV14 PDF set.
PDFs extracted using Monte Carlo methods are given in terms of an ensemble of functions, called "replicas", which form a discrete representation of the probability distribution describing the PDF functional space for a given set of experimental data. Although the probabilistic interpretation of PDFs' "best-fit" and "uncertainties" as the mean and standard deviation of the replica distribution is in this case straightforward, Monte Carlo methods usually do not require optimizing the end number of replicas describing a specific PDF set. As a consequence, PDFs are often given in terms of a large number of replicas which may be strongly correlated with each other. In such a case, it has been shown [49] that it is possible to find an equivalent representation of the PDFs using a smaller subset of the original replicas. This also implies that, for a sufficiently large number of replicas {f (k) α } k=1,...,Nrep , where α = 1, . . . , N pdf runs over the type of quarks, antiquarks, and the gluon PDFs, one may be able to describe the Monte Carlo sample as a linear combination of a suitably chosen subset of replicas {η where f  As one can see more in detail hereinafter, given such replica basis, it is possible to produce a Hessian representation of the original PDF set in the space of linear expansion coefficients a (k) i . However, the deviation of the Hessian representation from the original MC sample ends up being proportional to the deviation from the gaussianity of the starting probability distribution. This might be the case for specific kinematic regions (such as small-x and large-x) where limited experimental data are available and the PDF uncertainties are determined mainly by theoretical constraints. Nonetheless, for most cases, where PDF uncertainties are driven by copious experimental data, gaussianity is a reasonable approximation and the described strategy turns out to be a reliable way to convert MC PDF sets into Hessian PDF sets. From a practical point of view, this is achieved by choosing the optimal {η In order to get the coefficients {a where the averages are calculated over the original N rep replicas, one can define a figure of merit where N x runs over the number of sampling of a discretized x-grid.
The coefficients {a and calculating the Hessian matrix, defined as the diagonalized inverse matrix cov If we define v ij to be the rotation matrix used to diagonalize cov and λ i the set of obtained eigenvalues, the PDF uncertainties can be expressed as whereas the N eig symmetric Hessian eigenvectors describing the original Monte Carlo sample are given by Using Eq. (4.6), Eq. (4.5) yields i } implies that Eq (4.7) should yield similar results as the one-sigma PDF of the Monte Carlo representation defined as To determine the optimal set of {η MC2Hessian utilizes a Genetic Algorithm (GA) to optimize an "estimator" defined as for a fixed value of N eig . A detailed discussion of the specific GA is beyond the scope of this paper and we refer the reader to the original work [45]. The package has an additional optimization parameter defined as are respectively the one-sigma and 68% confidence level intervals for the α-th PDF which allows to discard points on the x-grid for which the gaussian approximation deviates more than a threshold value , i.e. α (x i , Q 2 0 ) < . For the purpose of this paper, the MC2Hessian has been applied to the DSSV14 PDF set [42], which, from its original analysis, is given in the form of N rep = 1000 Monte Carlo replicas. Among the consistency checks of the PDF Monte Carlo extraction, the DSSV collaboration has performed a comparison between the provided Monte Carlo sample and a version of the same analysis with error bands produced using the Lagrange Multiplier procedure. Similar to the Monte Carlo procedure, this method allows dropping the requirement of a linearized error analysis, typical of the Hessian representation. However, uncertainties bands are defined in terms of a tolerated increase in the χ 2 , denoted "tolerance" ∆χ 2 . For normal (Gaussian) errors a 68% Confidence Level (CL) band would correspond to a tolerance ∆χ 2 = 1. In the context of PDF fits, a deviation from this standard textbook value is usually employed to cope with neglected uncertainties which cannot be quantified and included in the analysis, such as possible tensions among data sets included in a global analysis. It is interesting to notice that in their comparison, they achieve a good agreement between the two extracted sets by setting ∆χ 2 ∼ 10 − 15. As the resulting one-sigma variance from the Monte Carlo replica method has a solid probabilistic interpretation, the corresponding comparable error band for the Lagrange Multiplier method with ∆χ 2 ∼ 10 − 15 shares the same interpretation.
Following the logic of the methodology described above, this may suggest that the deviation from the Gaussian distribution could end up spoiling our attempt to use MC2Hessian to convert DSSV14 into a Hessian representation. However, we have performed the conversion for different values of N eig and at Q 2 0 = 1GeV 2 , and have found great agreement in the range 0.001 < x < 0.9, useful to our analysis, for a value of N eig = 52. This leads to  a total of 104 error PDF sets, with symmetric error in both positive and negative eigenvector directions away from the central PDF set. Below, we refer to this set of PDFs as the DSSV14H Hessian PDFs. Moreover, no limitation on the parameter was found, which means no x-grid points are eliminated during the conversion according to the gaussianity condition α (x i , Q 2 0 ) < . Fig. 3 shows a direct comparison between the two representations; one is the original DSSV14 MC replica set, and another is the derived DSSV14H Hessian set. The difference between representations is below the per-mil level which is more than sufficient for the scope of this work. Hereafter, we shall only consider DSSV14H Hessian PDFs, which will also be referred as DSSV14 interchangeably.

A brief review of the Hessian profiling method
The error PDFs and the PDF-induced uncertainty of theoretical calculations can be effectively calculated by using the Hessian method. In order to find the Hessian eigenvector pairs of PDF sets (i.e., error PDFs) after the inclusion of new data sets, a full global fit is needed so that one can evaluate variations of PDFs around the best-fit with respect to PDF parameters. Since a full global fit might be complicated and time-consuming, one may desire a faster and simpler approach to estimate the impact of a new data set. Paukkunen and Zurita [43] introduced a method that utilizes the Hessian eigenvector set to study the impact of the new data input. This method has been implemented in the software package ePump [16], which we used in this study. In this subsection, we briefly review some details of this method.
Consider a new data set for the measurement of the observable X with N pt data points. Let's denote its experimental values as X E i , the inverse of its covariance matrix for the correlated experimental errors as C −1 ij , and the corresponding theoretical prediction values With the inclusion of this new data set, the variation of the total χ 2 becomes, where χ 2 old = T 2 N r=1 z 2 r and T is the tolerance parameter. In practice, the deviation in a particular eigenvalue direction z r is limited by the Dynamical Tolerance, T ± r , at the given confidence level. So the parameters take the values z ± r = ±T ± r /T . But for simplicity, we will ignore the Dynamical Tolerance dependence in the discussion. For more detail, refer to Refs. [16,17].
The new total χ 2 can be rewritten by expanding the X T j (z) up to the linear term, 12) where we define a vector A and a matrix M such that, (4.14) With new information of PDFs being introduced by the new data set, the PDFs parameters are then driven to the updated values. By minimizing Eq. (4.12), the updated central values of parameters can be found, Then, the r-th new eigenvector and its corresponding eigenvalue are With the inclusion of new information, the Hessian eigenvalue directions are also updated from the set of old bases z to a new set of bases z . By diagonalizing the quadratic terms of the Eq. (4.12), the new parameters are Consider another observable Y = Y (z) whose PDF-induced uncertainty is constructed by the Hessian eigenvector sets. After the inclusion of the new data set X i , the central value of Y (z) is updated to The extreme values of Y for the new r-th eigenvector can be calculated in a similar manner as in Eq. (4.18), Notice that we can of course choose the Y to be the PDFs f (x, Q 0 ). In this case, Eq. To quantitatively summarize in a single value the change in the best-fit PDFs after the new data has been added to the global fit, the measure d 0 was introduced in Ref. [16] as where, again, the dynamical tolerance T r limits the deviation in the particular (r-th) eigenvector direction. To be precise, d 0 is the length of the shift of the best-fit point in parameter space, relative to the 90% confidence level (C.L.) boundary of the original PDFs. Thus, d 0 = 1 means that the new best-fit touches the 90% C.L. boundary, while a value of d 0 1 implies a very small change to the best-fit PDFs. One should note that d 0 only reflects the change in the best-fit PDFs, so that it is still possible for the new data to produce a significant reduction in the PDF error bands, even if d 0 is small. A value d 0 > 1 would indicate that either there is tension between the new data and the original data, or else the uncertainties in the original global analysis were under-estimated [16].

A brief review of the data set rediagonalization
In the previous subsection, it has been discussed in detail that the PDF uncertainty can be effectively expressed in terms of a Hessian set, and that the impact of the measurement for a new observable is also easily accessible with the updating method. These methods are frequently used in the analyses of the PDF uncertainty with respect to experimental errors or kinematical cuts. Although they are straightforward conceptually and well-applicable numerically, repeated exercises for a Hessian set with a large number of error PDFs would still be time-consuming. It would be more convenient if one can reproduce the majority of the PDF dependence for given observables with a reduced Hessian set, so that it is not necessary to repeatedly evaluate all of the error PDFs, but a smaller number of them. The members of this optimized Hessian set are chosen in such a way that the combination of them recovers the PDF uncertainty for the observables to any desired precision.
The idea of this optimization method is based on the data set diagonalization procedure by Pumplin [50]. Noting that the representation of the diagonalized parameters z is not unique, one could take the advantage of this freedom to rotate the diagonalized parameters z into a new set of parameters z where the PDF sensitivity for a given data set is maximized on a certain direction. Note that the given data set may or may not be included in the original global fit, and that the optimized eigenvectors after the rotation contain exactly the same information as the original eigenvectors do.
Our goal is to find a direction on which the variation for a set of observables X i (z), where i runs from 1 to N pt , from its best-fit values X 0 i = X i (0) is maximized. Therefore, we define the following function, where λ is the Lagrange multiplier. To simplify the expression, let's again take the usual approximation and expand X i (z) up to the linear terms, and as defined before the X r i = (X r,+ i − X r,− i )/2. The matrix M ij is normalized in such a way that Tr{M } = N pt . The extreme values of S(z, λ) appear when the equality ∂S/∂z r = 0 holds for all Hessian indices r. Therefore the optimized parameters can be found by solving the eigenequation, we can find a new set of parameter z in the rediagonalized space, The rediagonalized error PDF is calculated as usual, For the old parameters z, we do not discriminate among N Hessian eigenvalue directions, since when diagonalizing the Hessian matrix, we have already normalized the orthogonal bases by their corresponding eigenvalues. But for the rediagonalized parameters z , each of new eigenvalue directions z r is associated with its corresponding eigenvalue λ r of the error matrix M for the interested observable X. The eigenvalue λ r provides useful information that reflects how much this rediagonalized direction z r is sensitive to the given observable X. Since the matrix M is normalized in a way that Tr{M } = N pt , we have r λ r = N pt . Therefore, one is able to quantify how many data points of the set X i are particularly constraining a specific direction z r . If a rediagonalized error PDF has a small eigenvalue λ r , one can draw a conclusion that no data points in the set X i are sensitive to this rediagonalized direction z r . Thus, one can ignore the error PDFs in this direction without a significant loss of accuracy. This procedure of the data set rediagonalization is referred as the ePump-optimization procedure, and is implemented in the ePump package.

Updating the PDFs with pseudo data
In our analysis, the quantitative study to update the DSSV14H Hessian PDFs with the EicC new (pseudo) data is done by using the above detailed Hessian updating method, via ePump. In order to perform the analysis, ePump requires two sets of inputs: data templates and theory templates. The data templates consist of the new experimental data values and their statistical and systematic uncertainties, including correlations, exactly as what would be included in a standard global analysis. The theory templates consist of the corresponding theory predictions for the same observables, evaluated using the central PDF and each of the DSSV14H Hessian eigenvector PDFs. Note that any number of new data sets can be included in the update by ePump. The output of ePump consists of an updated central and Hessian eigenvectors PDFs, which approximate the result that would be obtained from a full global re-analysis that includes the new data. As an additional benefit, ePump can also directly output the updated predictions and uncertainties for any other observables of interest (such as the cross-section in the signal region), without the necessity to recalculate them using the updated PDFs. More details about the use of ePump can be found in Refs. [16,17].
The theory predictions have been generated according to the NLO formulae discussed in Section 2 expressed in the standard M S factorization scheme. To include data from both proton and neutron targets in the analysis, we shall apply SU(2) proton-neutron isospin symmetry and impose ∆u neutron = ∆d proton , ∆ū neutron = ∆d proton , ∆d neutron = ∆u proton and ∆d neutron = ∆ū proton . We have also set all renormalization and factorization scales µ 2 = Q 2 . Factorization scale and scheme dependence investigations are well beyond the scope of this analysis. However, it may be a very interesting factor to include in the error analysis of future global extractions of PDFs, once the real EicC data will become available. Another source of theoretical errors that we defer to future studies is the fragmentation function uncertainties. In the case of parton-to-pion fragmentation functions, it has already been shown [51] that the inclusion of such systematic errors produces effects of at most a few percent level. However, due to the lower rate of production of kaons in respect to pions, and a consequent lower precision of kaon SIDIS data, uncertainties for parton-tokaon fragmentation functions are generally larger compared to the respective pion ones (see for example [52] and references therein). Analyses such as [53] suggest that future electronion collider SIDIS data will have a remarkable impact on the kaon fragmentation functions. On the other hand, it is also argued that in order to use the future rich information of the high precision SIDIS data to extract parton and fragmentation distribution functions with reliable uncertainties, a simultaneous fitting of PDFs and FFs is needed in order to disentangle the highly correlated set of parameters describing them [54][55][56]. In fact, traditional methods of global fitting of FFs using SIDIS data fix a specific PDF set as a baseline and account for their error by propagating them into the FFs themselves. This introduces a non-trivial correlated double-counting effect when such FFs are used to extract PDFs and their uncertainties. For example, in [54] when performing a reweighting of PDFs using SIDIS data, it has been shown how explicitly choosing to include (or not to include) current kaon FF uncertainties [52] extracted with traditional methods results in an under (or over) estimation of the PDF uncertainties. Solving this conundrum will definitely be a task of the more sophisticated simultaneous PDFs and FFs fitting machinery once real SIDIS data with precision comparable to the inclusive case are available. For the time being and for the scope of this article we want to concentrate on the effect of EicC future data solely on helicity PDFs. Hence, we choose not to include FFs uncertainties in our analysis and to use only the central "best-fit" values of DSS [57]. Our results may be biased by this assumption and have to be taken as the "best possible outcome" in the future case where pion and, in particular, kaon FFs will be known at very high precision.
The data templates have been constructed using the uncertainty calculations of the pseudo-data according to Sec. 3, cf. Eq. (3.4). However, the fit using ePump requires a central value of the experimental observable as well. The central value of the asymmetry A 1 , cf. Eq.(2.7), for each data point, was taken from the theory tables after a smearing procedure with a Gaussian distribution centered at 0 and a standard deviation equal to the estimated A 1 uncertainty of the pseudo-data. This ensures a reasonable estimation of the central value of A 1 while not affecting the χ 2 artificially during the fit.
Lastly, the tolerance value for the ePump updating has been set to be ∆χ 2 = 10, which is of the same order of magnitude as the tolerance used in the DSSV14 analysis when studying the uncertainties via means of the Lagrange multiplier's method.
In total, ten pairs of theory-data templates have been prepared for this analysis: two for DIS process, one for electron-proton collision and one for electron-neutron collision, and eight for SIDIS process which corresponds to each combination of the two possible nucleon  targets (proton or neutron) and the four observed final state hadrons (π ± or K ± ). In the original DSSV14 analysis they also included data with charged hadrons as the final state. For the purpose of this study, we concentrate only on the final states of known flavour content as this is very helpful to investigate the impact of specific data sets on the flavour content of the proton (neutron) target.

Updated PDFs and their moments
In the following, the results of the ePump updating procedure will be presented. All the results are presented at Q 2 = 10 GeV 2 with uncertainties given at 68% CL. Fig. 4 shows the impact of DIS and SIDIS EicC pseudo-data on the parton helicity distributions, separately. As a general remark, all plots show a larger constraining power of some degree of SIDIS data (green areas) compared to DIS data (red areas). The small impact difference between the two types of data on the ∆u distribution is to be expected as ∆u is already the most constrained flavour distribution by the already available high precision proton inclusive data. On the other hand, the largest difference between the effects of including DIS versus SIDIS data can be seen in the sea quark distributions (∆ū, ∆d and ∆s) over the whole x-range spanned by the pseudo-data, down to about x = 0.005. The ability of the EicC machine to pin down the sea distributions through the SIDIS process is a core feature around which the accelerator is being designed. Our result shows the benefit of using EicC SIDIS data to determine the sea quark distributions for which the impact of DIS pseudo-data only accounts with a minor or zero reduction of the PDF uncertainties. For ∆u and ∆d, the largest uncertainty reduction happens around x = 0.2 which is the region where the valence quarks are expected to contribute the most to the proton and neutron flavour content.
The EicC is not specifically planned to investigate the small-x gluon distribution. A better machine suited for this purpose will be the future Electron-Ion Collider planned in the USA (EIC). Nonetheless, we show in Fig. 4 that both DIS and SIDIS EicC pseudo-data are able to reduce the uncertainties on ∆g for 10 −3 x 4 × 10 −2 . The reduction of the uncertainties below the pseudo-data range x 10 −3 that we can observe in both ∆s (the green grid part) and ∆g (the red and green grid parts) can be sourced back to assumptions made in the original DSSV14 analysis. Among those, the most stringent ones are the initial parametrization bias of the helicity PDFs, and, in particular for ∆s, the hyperon β-decay constraints that will be discussed later on in this section.
To study the difference of the impact between the proton and neutron target pseudodata, in Fig. 5 the uncertainty bands for the first 5 plots are presented as a difference with their respective central values, meaning that the uncertainty bands for the original DSSV14H (light-blue), the DSSV14H including proton pseudo-data (green) and the DSSV14H including neutron pseudo-data (red) are always centered along the zero axis. In the last row of Fig. 5(b), the error bands of ∆u and ∆d are presented as ratios to central values for better visualization of percentage uncertainties.
For both DIS (Fig. 5(a)) and SIDIS (Fig. 5(b)), the proton data are able to constrain the ∆u distribution better compared to the neutron data. This is consistent with a quark model picture in which the proton content is dominated by the u quarks whereas the neutron content by the d quarks. This can be observed more in detail in Fig. 5(b), where this behavior remains the same for ∆ū and is accordingly inverted for the ∆d and ∆d distributions. More significantly, it shows the importance of including both SIDIS neutron and proton target data in order to efficiently constrain the up and down quark and antiquark distributions.   Figure 6. Results on the uncertainty band of polarized strange quark distribution after a nextto-leading order fit by including EicC pseudo-data. The light blue band represents the original DSSV14 global fit. The red (green) band shows the results by adding EicC SIDIS K (SIDIS π) pseudo-data.
In Fig. 6 the impact of SIDIS pion pseudo-data (green) versus SIDIS kaon pseudo-data (red) is shown in the same type of difference plot for the ∆s distribution. Since the flavour content of kaon mesons is dominated by strange quarks, identifying kaons in the final state effectively "tags" the strange quarks scattering out of the target. Hence, the SIDIS kaon data are able to better constrain ∆s in respect to SIDIS data with non-strange final state hadrons such as pions. Moreover, comparing with Fig. 4 we notice that the ∆s is further constrained after including the pion data on top of the kaon data. This is due to the correlation between ∆u, ∆d and ∆s introduced by the relation imposed in the DSSV14 analysis that we discuss further down in Eq. (4.30).
As shown in Figs. 4-6, the EicC pseudo-data can effectively constrain the polarized PDFs, namely, their error bands have been significantly reduced. As for the changes of PDFs' central values after the updates, we have calculated the measure d 0 defined in Eq. (4.21) and list them in Table 1. The fact that none of those d 0 values is greater than one indicates that the ePump-updating provides a reasonable fit, cf. Sec. 4.2. This result is expected by the construction of the pseudo-data, discussed in Sec. 3. Although this measure is a powerful tool to quantify the shift of the central value due to a new set of data, in the contest of pseudo-data it cannot act as a physically meaningful prediction of the shift on the best-fit that will result when real experimental data are used. Pseudo-data are constructed such that they embed a faithful estimate of the future EicC data uncertainties but they have an unknown degree of deviation from the future actual experimental data central values. For this reason, any definitive statement on central-value shifts has to be postponed for when updating will be possible with the EicC real experimental data.
Quantities of particular interest in the field are the moments of the singlet combination ∆Σ, i.e. the sum over all flavour PDFs (see Appendix A), and the gluon distribution. More specifically, their first moment, i.e. their integral over the parton momentum fraction, has a simple interpretation as the net quark and gluon contribution to the proton spin.      The impact of the EicC pseudo-data on such quantities is shown in Fig. 7, where the correlated uncertainties of the truncated first moment of the gluon and singlet distribution are depicted in a two-dimensional plot at Q 2 = 10 GeV 2 . The lower truncation of the integral is set to x min = 0.005, which corresponds to the theoretical lower momentum fraction accessible to the future EicC machine and, consequently, below which pseudo-data have not been generated for this analysis. Trying to investigate the spin contribution to the proton from quarks and gluons with smaller momentum fraction by stretching our analysis beyond this lower threshold, would return biased results only constrained by the original DSSV14 assumptions such as the choice of the initial parametrization form or the continuity, integrability, and positivity (i.e. |∆q| < q) requirements of the helicity distributions. However, lower values of x min will be accessible from the future EIC and analyses using EIC pseudo-data down to x min ∼ 10 −5 such as [19] have been performed. In that specific study, they have also extended their integration from x min ∼ 10 −5 down to x min ∼ 10 −6 and observed that the integrals tend to saturate quite early, suggesting a picture where very low-x partons become unpolarized. Independently from whether or not this feature will be confirmed by the future EIC measurements at very low-x, the actual allowed central value of the two integrals will be proportional to the precision at which the distributions are known in the full x-range. In this respect, the EicC acts as a complementary machine to the EIC by being able to better determine distributions in the sea-quark region (x 10 −2 ). The red and blue contours in Fig. 7 show the allowed values of the contribution to the integrals, together with their central values, for x > 5 × 10 −3 according to different EicC pseudo-data sets. The black cross and black contour are, respectively, the central value and the allowed values of the contribution to the integrals for the same x-region for the actual DSSV14. As can be observed in all plots, the uncertainty area of the DSSV14 is predicted to be well reduced by the ePump updating after including EicC pseudo-data.
More in detail, Fig. 7(a) explicitly shows the higher constraining power of SIDIS pseudo-data (blue region) in respect to the DIS ones (red region). The shifts on the respective central values are a consequence of the slight shifts observed for the "sea-quark" distribution ∆ū. Moreover, the effect of SIDIS pseudo-data on the central value is sizeably bigger than the effect observed for DIS pseudo-data. As already discussed above, the particular value of the shift has no real physical meaning in this analysis and depends on the specifically chosen iteration of the gaussian smearing used to produce the pseudo-data. Nonetheless, it cannot be excluded that real future EicC data may change the shape of the gluon and single flavour distribution for the region x > 5 × 10 −3 .
Figs. 7(b) and 7(c) show the effect on the integrals for EicC proton and neutron DIS and SIDIS pseudo-data, separately. Both electron-proton and electron-neutron collisions have been generated with an integrated luminosity of 50 f b −1 . However, the neutron data sample has been obtained from electron-3 He collision data through the dilution procedure described in Sec. 3. The effect of the additional uncertainties introduced in the neutron data sample, by the required neutron and proton effective polarization values, can be directly observed for both DIS and SIDIS as a lower constraining power of the neutron pseudo-data (red regions) in respect to the proton pseudo-data (blue regions). Moreover, the central values updated with proton and neutron (SI)DIS pseudo-data do not shift between each other by a significant amount. This is an expected result dictated by the underlying SU(2) proton-neutron isospin symmetry imposed when calculating the theoretical tables for this analysis. Future global fitting will be able to lift this assumption by exploiting the ability for precise SIDIS data to discriminate flavours over a large kinematical range. Actually, the DSSV collaboration already allows deviations from exact SU(2) and SU(3) flavour symmetries in their analyses in the form of two additional fitting parameters SU(2) and SU (3) . They are inserted in the fitting procedure in order to relax the constraints coming from the hyperon semi-leptonic β-decay and its implicit flavour symmetry assumptions, normally imposed in polarized PDFs extractions based on solely DIS data [57]. More specifically, this translates in various first moments being related by (4.30) where F , D are constants parametrizing the β-decay rates [58] at the input scale µ 0 = 1 GeV of the DSSV analysis, and As Eqs. (4.30) and (4.31) show, the precision at which the SU(2) and SU (3) parameters can be determined are tied together with the accuracy at which the integrals can be calculated over the full 0 < x < 1 span. Due to the lower kinematical limit lying at x min ∼ 10 −3 , data from the planned EicC machine wouldn't be sufficient by itself to impose very strict constraints to the parameters. As for x 10 −3 , the integrals in Eq. (4.30) would be strongly biased by the PDF initial parametrization form and only loosely constraint by general helicity PDFs requirements such as the continuity, integrability, and positivity of the distributions. However, precise determination of the deviations from flavour SU(2) and SU(3) from global fitting will be possible and highly improved by taking into account both future EIC and EicC SIDIS data which, combined, will span over a larger x − Q 2 area in respect to SIDIS world data currently available with unprecedented precision. A study of the effect of DIS and SIDIS EIC pseudo-data was presented in [19].
The effect of data samples with identified pions and kaons in the final state is shown in Fig. 7(d). The larger area delimited by the SIDIS kaon pseudo-data comes from the fact that larger statistical uncertainties are associated with the kaon pseudo-data as kaons are produced with a lower rate in respect to pions. In the same plot, we can observe a slightly different shift of the central values produced by the two data sets. Since in the case of the SIDIS process different PDFs are weighted with different fragmentation functions, the different flavour content of the identified final state hadrons is responsible for the dissimilar shift after ePump updating. The ability for some specific data set to constrain particular flavour distributions will be discussed in detail in Sec. 4.6.
In addition to the quark and gluon contributions to the proton spin, the remaining missing part is related to the quark and gluon orbital momentum. (For more on the subject, see [59] and references therein). In Fig. 8 we show the net contributions of quarks and gluons to the proton spin and their cumulative difference with the actual proton spin value 1/2 as a function of the lower bound x min used to compute the truncated moments. In all plots one can observe a clear reduction of uncertainties when including DIS and  SIDIS EicC pseudo-data, with SIDIS data having the larger impact. The tendency for the central values of the integrals to saturate at low-x shown in Figs. 8(a) and 8(b) is compatible with a picture where partons carrying very small momentum fraction x are mostly unpolarized. However, contributions from partons with lower momentum fraction than x ∼ 10 −3 may still contribute to the proton spin, in which case the above picture could result to be incorrect. At the moment, the huge uncertainties associated with the ∆g distribution at low-x is still the main limiting factor in order to state a more definitive conclusion on the matter. The US EIC will be the perfect machine to precisely pin down the low-x ∆g. On the other hand, the EicC is planned to explore that complementary part of the phase space particularly suited for a better determination of the "sea-quark" sector. This becomes apparent if we observe the significant uncertainties' reduction on the quark spin contribution in Fig. 8(a), which extends from x min ∼ 10 −3 up to high x min values. In contrast, in Fig. 8(b) we observe a much lower impact to the uncertainties, which is almost entirely relegated to the x min 10 −2 region of the plot. Nonetheless, this exercise shows, once again, the importance of including the information coming from the SIDIS process when it comes to precision extraction of both gluon and quarks helicity distribution functions and their moments.
Finally, Fig. 8(c) shows the evolution of the missing contribution to the proton spin as we consider contributions from partons with smaller and smaller momentum fraction x in the computation of the integrals. The central value of the quantity shown in the plot seems to saturate asymptotically for small-x, with the SIDIS data having a greater constraining effect on the uncertainties in respect to DIS data. In the assumption that all missing proton spin comes exclusively from the quark and gluon orbital momentum and that partons with x 10 −3 are mostly unpolarized (i.e. their spin contribution turns out to be negligible), the uncertainties of the plots at x ∼ 10 −3 precisely represent the room left by the EicC data to the quark and gluon orbital momentum contribution to the proton spin. Deviation from this picture will become apparent as soon the EIC will be able to precisely fix the very low-x region, and in particular the ∆g distribution, and the EicC will constrain with unprecedented accuracy the remaining middle and high-x range.

Optimizing the PDFs with pseudo data and results
To quantitatively analyze the sensitivities of individual pseudo-data set to constraining various parton flavour PDFs at certain x ranges, and to demonstrate how these eight different data sets play complementary roles in reducing the PDF uncertainty in the PDFupdating procedure, we shall deploy the ePump-optimization (or PDF-rediagonization) method of the ePump code. As explained in Sec. 4.3, this application of ePump is based on ideas similar to that used in the data set diagonalization method developed by Pumplin [50]. It takes a set of Hessian error PDFs and constructs an equivalent set of error PDFs that exactly reproduces the Hessian symmetric PDF uncertainties, but in addition, each new eigenvector pair has an eigenvalue that quantitatively describes its contribution to the PDF uncertainty of a given data set or sets. The new optimized error PDF pairs are ordered by their eigenvalues in a way that the first optimized error PDF pair possesses the largest eigenvalue while the successive error PDF pairs have smaller eigenvalues. This ordering of optimized error PDF pairs makes it easy to choose a reduced set that covers the PDF uncertainty for the data set to any desired accuracy [16,17]. Higher accuracy corresponds to a choice of eigenvector pairs for which the value of the sum over their eigenvalues is closer to the total number of new data points. The contributions of new eigenvector pairs to the uncertainties of PDFs, for a given parton flavor, provide useful information of how the new data points are sensitive to PDFs, at various x ranges.
By applying the ePump optimization method to DSSV14H PDFs with the combination of both the EicC DIS and SIDIS pseudo-data sets, which contains 332 data points totally, we found that the first six eigenvector pairs (out of the total 52 eigenvector pairs) play the most important roles in constructing the total PDF error bands. Their eigenvalues are 167.8, 38.7, 28.3, 21.5, 16.2, and 11.3 respectively. Totally these six eigenvector pairs provide 85.5% of the total PDF error bands, while the first fifteen eigenvector pairs cover 99.1% of the total error bands. The successive eigenvector pairs have even smaller eigenvalues. (f) The EV15 to ∆ū Figure 10. The same as Fig. 9, but of the sixth, tenth, thirteenth and fifteenth pairs of the optimized eigenvector PDFs. (e) The EV6 to SIDIS Figure 11. Fractional contributions of the first four and the sixth optimized eigenvector pairs to the PDF uncertainties of various SIDIS observables. The sizes of dots correspond to the relative contribution of optimized eigenvector pairs to these observables. In the legends of "SIDIS Data Set", the notation, "N+K + " for example, stands for the pseudo-data set of the experimental observable A 1 for the neutron measurement, while the final hadron state is the K + . The same rule applies to other data sets in the legends. (c) The EV15 to SIDIS Figure 12.
The same as the Fig. 11, but of the tenth, thirteenth and fifteenth optimized eigenvectors pairs. The sizes of dots are scaled differently from the Fig. 11 for better visualization.
Hence, one could ignore the successive eigenvector pairs and reproduce the major part of DSSV14 PDFs uncertainties with only the first fifteen eigenvector pairs, instead of all of the N eig = 52 eigenvector pairs or all of the N rep = 1000 replicas.
These fifteen optimized eigenvector pairs also reveal how the pseudo-data points are sensitive to different parton flavour PDFs at certain x ranges. Such information can be obtained in a two-fold way. Firstly, the optimized eigenvector pairs dominating the contributions to the uncertainties of each flavour are identified. The biggest contributions of optimized eigenvector pairs to DSSV14 PDFs are shown in Figs. 9 and 10. Secondly, the sensitivity of new data points to the optimized eigenvector pairs is assessed by how much the optimized eigenvector pairs would contribute to data points. The thickness of the dots in the x − Q 2 plots of Figs. 11 and 12 shows the fractional contributions of optimized eigenvector pairs, which are dominating DSSV14 uncertainties, to the PDF uncertainties of various EicC SIDIS pseudo-data points. The summary of this analysis is given in Table  2, while our major physical findings in this section are as follows: Table 2. The leading eigenvector pairs (EV sets), after the ePump-optimization, contributing to the PDF error band of each flavour, and the SIDIS EicC pseudo-data sets (SIDIS obs.) which provide leading constraints on the specific eigenvector pair PDFs. Note that the EV6 and EV10 are sensitive to multiple flavours, therefore, when presenting the constraints of pseudo-data sets onto the EV6 and EV10, we arrange pseudo-data sets according to their flavour contents. The meaning of the notations, such as "N+K + ", in the last column is the same as those in the Fig. 11.
• The majority of EicC SIDIS pseudo-data points are essential to improve the DSSV14 ∆u distribution. Fig. 9(a) shows that the first optimized eigenvector pair (EV1) dominates the ∆u error band, while Fig. 11(a) clearly shows that EV1 contributes largely to most of SIDIS pseudo-data points. Therefore it becomes apparent that EicC SIDIS pseudo-data is pretty sensitive to the ∆u distribution.
• The EV1 also contributes to the ∆d error band. But as shown in Fig. 11(b) it is not as dominant as it is for the ∆u. The fact that the absolute value of the down quark charge is half of that of the up quark results in this difference. The majority of SIDIS pseudo-data points have the power of constraining the ∆u and the ∆d simultaneously.
• We expect that the ∆d distribution will be particularly constrained by the future EicC Neutron+K + data. As shown in Fig. 9(d), the third optimized eigenvector pair (EV3) largely covers the ∆d error band for x > 0.1. We also notice in Fig. 11(c) that the Neutron+K + pseudo-data points receive large fractional contributions from EV3 for the same x > 0.1 region, hence indicating that the Neutron+K + pseudodata is sensitive to the ∆d. Given the flavour content of the K + meson, this is to be expected as the Neutron+K + data should be able to probe the ∆u distribution inside the neutron, which, due to the isospin symmetry, corresponds to the ∆d distribution inside the proton.
• Both EicC Proton and Neutron kaon SIDIS data will be important for constraining the ∆s. In Fig. 9(e), we observe that the fourth optimized eigenvector pair (EV4) dominates the ∆s error band. Fig. 11(d) shows that EV4 is particularly sensitive to both Proton and Neutron+K − pseudo-data sets. This is consistent with the quark model picture, where the K − meson is considered to be composed by s andū quarks.
• In the naive parton model picture, as discussed in Appendix A, one could easily conclude that the kaon data must play a decisive role in determining ∆s. To check on this, we show in Fig. 9(f) the result of another ePump optimization study in which only SIDIS Kaon pseudo-data are considered. Unexpectedly, there is no single eigenvector pair dominating the PDF error band of ∆s when only the SIDIS kaon pseudo-data sets are included in ePump-optimization. By the nature of the Hessian profiling method, the eigenvectors are orthogonal to each other. This implies that those SIDIS kaon pseudo-data sets are providing information about ∆s at different x values. On the contrary, Fig. 9(e) shows that the eigenvector pair EV4 dominates the constraint on the error band of ∆s when all the DIS and SIDIS (pion and kaon) pseudo-data are included. Hence, there must be some other pseudo-data sets that provide an additional constraint on ∆s via some underlying correlation present in the original DSSV14 PDFs. Since the theoretical predictions used in this study are generated with DSSV14 PDFs, it is possible that the underlying correlation comes from the original setting of DSSV14 PDFs. The identity of Eq. (4.30) implies a correlation between ∆s and ∆u, ∆ū, ∆d and ∆d imposed in the construction of DSSV14 PDFs, such that the pseudo-data sets sensitive to ∆u, ∆ū, ∆d and ∆d are also providing information on constraining ∆s. This explains why adding those non-kaon data can further constrain ∆s when using the DSSV14 PDFs.
• Fig. 10(d)-(f) indicate that the 8 SIDIS pseudo-data will constrain ∆ū in different ranges of x, and it takes mainly three eigenvector sets (EV10, EV13 and EV15) to represent the error band of ∆ū PDF in DSSV14. Furthermore, Fig. 12 shows that the leading data sets that contribute to the eigenvector sets EV10, EV13 and EV15 are the kaon data. This is the kind of information that can not be read out from Fig.  4, directly. Although one could perform ePump-updating by adding only one pseudodata set at a time to study the impact from each individual data set, one could use ePump-optimization to quickly gain information about the complimentary role that each data set plays in constraining a certain flavour PDF at a given x region after ePump-updating. The SIDIS EicC pseudo-data sets which provide leading constraints on the specific eigenvector pair PDFs can be read out from Table 2.
• For ∆d, the EicC SIDIS Neutron K − or π − data will be important. In Fig. 10, the uncertainty band of ∆d, as a function of x, exhibits sensitivity to EV6 and EV10. At the same time, Fig. 11(e) shows that Neutron+K − pseudo-data provide the leading constraint on EV6 for x < 0.3, and Fig. 12(a) shows that both Neutron+K − and Neutron+π − pseudo-data also constrain EV10. Hence the ∆d distribution is mostly constrained by the Neutron+K − data, while the Neutron+π − data also provide information on ∆d with x < 0.03.
• As for the ∆g, none of these fifteen optimized eigenvector pairs provide a large proportion to the error band. This is expected as the EicC SIDIS is a machine better suited to investigate the "sea-quark" sector rather than exploring the ∆g distribution, which dominates the small-x region and can be effectively probed at the EIC [8].
In short summary, by employing the ePump-optimization procedure, we have explored the complementary role played by different data sets in reducing the PDF uncertainty in the PDF-updating procedure. In Figs. 9 and 10, we show the contributions provided by the leading pairs of eigenvector PDFs to the PDF error bands of various flavours, after performing ePump-optimization with the inclusion of the pseudo-data sets considered in this study. Through this, we have identified which EV sets dominantly constrain the PDF error bands of a given flavour. Furthermore, in Figs. 11 and 12, we have depicted the fractional contributions of the leading optimized eigenvector pairs to the PDF uncertainties of various SIDIS data included in this study. This tells us whether the included pseudodata sets provide similar or independent information on reducing the PDF uncertainties in the ePump-updating procedure. Lastly, we note that, with the first 15 optimized EV sets, the DSSV14 error bands (calculated from a total of 52 eigenvector pairs given by the Mc2Hessian package) can be recovered as much as 99.1%, when applying the ePump optimization procedure to DSSV14H PDFs with the combination of both the EicC DIS and SIDIS pseudo-data sets (with a total of 332 data points).

Summary
In this work, we have presented a study that assesses the impact of future EicC data on the uncertainty bands of the DSSV14 helicity distribution functions and their moments. With EicC pseudo-data including DIS and SIDIS processes from doubly polarized electronproton (3.5 GeV × 20 GeV) and electron-3 He (3.5 GeV × 40 GeV) collisions, the DSSV14 PDF sets were updated by using a hessian updating procedure via the ePump tool. The resulting updated hessian set of the DSSV14 PDFs, named DSSV14H PDFs, was also used to evaluate the effects of specific initial and final state combinations of DIS and SIDIS processes at the future EicC on the uncertainties of the distributions and their moments. Moreover, the DSSV14H PDF set was rotated into an equivalent Hessian set via the ePumpoptimization procedure, which is employed to explore the complementary role played by different data sets in reducing the PDF uncertainty in the PDF-updating procedure. By identifying the dominant optimized eigenvector sets to the error band of each flavour and their contributions to the pseudo-data points, the sensitivities of EicC pseudo-data points to reducing the PDF error bands over various x ranges were then assessed.
As expected from the intent embedded in the design features of the EicC, we have observed a great reduction of the uncertainties of quark helicities in the sea-quark region, especially when considering SIDIS processes. It is important to remark that both electronproton and electron-helium data are needed to obtain a consistent reduction of uncertainties over all quark flavours. Also essential to pin down the strange quark distribution are the kaon SIDIS data. In this regard, one of the limiting factors that will inevitably hinder the accuracy of the strange distribution in a global analysis with real EicC data will be the somewhat still large uncertainty of kaon fragmentation functions. With the advent of the Electron-Ion Colliders, extracting high precision fragmentation functions will become a more and more essential task.
The reduction of uncertainty of the gluon helicity distribution, although not as impressive as the one reported for the future US EIC in Ref. [19], is still very significant for the low-x region. The ability of the EicC to constrain the gluon distribution at energies lower than the ones reachable by the US EIC plays, nonetheless, a fundamental role in extending the coverage of meaningful data over a larger span of the phase space that will contribute to future extraction in a global analysis of the gluon helicity distribution.
In our discussion, we have stressed the complementary role of the two machines, the US EIC and the EicC, especially when it comes to determining the relation between the proton spin and its flavour content.
As it is well known, the current electron-proton (neutron) fixed target experiments are limited by either their low center-of-mass energy or their low luminosity which let us precisely explore only the low Q 2 and high x region. Answering the fundamental question of the origin of the proton spin is one of the main objectives of the future electron-ion machines and the extension of the accessible Q 2 − x coverage to higher Q 2 and lower x values is a key component towards this goal. In our study, we have shown how the EicC data will greatly constrain the value of the spin contributions coming from quarks and gluons with momentum fraction x 10 −3 . On the other hand, the US EIC is better suited to constrain them for even lower values of x. Together, the two machines will reduce the room left to speculations and enhance our understanding on the relation between the proton spin and the quark/gluon spin as well as their orbital and angular momentum down to an extremely small momentum fraction region.
Finally, We note that in this study the tolerance value for the ePump updating has been set to be ∆χ 2 = 10, which is of the same order of magnitude as the tolerance used in the DSSV14 analysis when studying the uncertainties via means of the Lagrange multiplier's method. Using the typical choice of ∆χ 2 = 1 for this update would be inconsistent with the DSSV14 error PDFs used for generating the DSSV14H Hessian PDFs. As explained in Refs. [16,17], using the small value of ∆χ 2 = 1 to updating the given error PDFs is equivalent to overweighting these pseudo-data by about a factor of 10 in this study. This would result in much smaller PDF error bands than what we have concluded in this paper, so that using ∆χ 2 = 1 would greatly overestimate the effect of these pseudo-data on reducing the PDF errors. C.-P. Yuan is also grateful for the support from the Wu-Ki Tung endowed chair in particle physics.
From Eq. (A.2), it can be inferred that only ∆v + 3 may be determined directly from measurements of proton and neutron g 1 structure functions. On the other hand, ∆v + 8 and ∆Σ may be only determined thanks to their scaling properties. Moreover, using only DIS data it is impossible to disentangle quark from anti-quark distributions as there is no weighting factor to discriminate between them.