nNNPDF2.0: Quark Flavor Separation in Nuclei from LHC Data

We present a model-independent determination of the nuclear parton distribution functions (nPDFs) using machine learning methods and Monte Carlo techniques based on the NNPDF framework. The neutral-current deep-inelastic nuclear structure functions used in our previous analysis, nNNPDF1.0, are complemented by inclusive and charm-tagged cross-sections from charged-current scattering. Furthermore, we include all available measurements of W and Z leptonic rapidity distributions in proton-lead collisions from ATLAS and CMS at $\sqrt{s}=5.02$ TeV and 8.16 TeV. The resulting nPDF determination, nNNPDF2.0, achieves a good description of all datasets. In addition to quantifying the nuclear modifications affecting individual quarks and antiquarks, we examine the implications for strangeness, assess the role that the momentum and valence sum rules play in nPDF extractions, and present predictions for representative phenomenological applications. Our results, made available via the LHAPDF library, highlight the potential of high-energy collider measurements to probe nuclear dynamics in a robust manner.


Introduction
Decades of experimental investigations have plainly revealed the inability to describe, within the framework of perturbative QCD, high-energy scattering processes involving heavy nuclei using a free-nucleon formalism. The parton distribution functions (PDFs) of nucleons bound within nuclei, commonly known as nuclear PDFs (nPDFs) [1,2], can therefore be significantly modified with respect to their free-nucleon counterpart [3] as a result of non-perturbative dynamics. While a first-principles understanding of the theoretical mechanisms that generate such QCD dynamics remains an open challenge, phenomenological determinations of nPDFs have been able to provide vital information about parton behavior in the cold nuclear medium.
Precise extractions of nPDFs are not only crucial to study the strong interaction in the high-density regime, but are also necessary to model the initial state of heavy ion collisions which aim to characterize the Quark-Gluon Plasma (QGP) [4,5] using hard probes. Furthermore, nPDFs also contribute to global QCD analyses of the proton structure [6][7][8][9] via the inclusion of neutrino structure function data collected in reactions involving heavy nuclear targets. These measurements on nuclear targets provide important information on the quark flavor separation and strangeness in the proton [10].
This work focuses on the determination of the quarks and anti-quark nuclear PDFs, with emphasis on their flavor separation. Since measurements of neutral-current (NC) deepinelastic scattering (DIS) nuclear structure functions on isoscalar targets are only sensitive to a single quark PDF combination, one needs to rely on the information provided by independent processes to disentangle quark and antiquarks of different flavors. The main options that are available to accomplish this are neutrino-induced charged current (CC) DIS cross-sections on heavy nuclear targets, sensitive to different quark combinations than the NC case, and electroweak gauge boson production at the LHC.
From the methodological point of view, there exist two primary limitations that affect the separation between quark and antiquark flavors in nPDF extractions. The first one is the reliance on ad-hoc theoretical assumptions required to model the dependence of the nuclear modifications on both the parton momentum fraction x and atomic mass number A, where in some cases the expected behavior is hard-coded in the nPDF parameterization. The second is the lack of consistency between the nuclear PDF determination and that of the corresponding proton baseline, to which the former should reduce to in the A → 1 limit in terms of central values and uncertainties. This consistency is particularly important given that the precision LHC data impose stringent constraints on the quark flavor separation for the proton PDFs, for example via measurements of inclusive W and Z production characterized by per-mille level uncertainties. Ensuring that the LHC constrains on the proton PDF baseline are appropriately propagated to the nPDF determination for A > 2 is therefore critical.
In this study we present a model-independent determination of nuclear PDFs using machine learning methods and Monte Carlo techniques based on the NNPDF framework [38][39][40][41][42][43][44][45][46][47]. We complement our previous nNNPDF1.0 analysis of NC DIS nuclear structure functions with CC inclusive and charm-tagged measurements from fixed-target neutrino experiments as well as with inclusive W and Z production cross-sections in proton-lead collisions from ATLAS and CMS at √ s = 5.02 TeV (Run I) and 8.16 TeV (Run II). The A = 1 proton PDF baseline used in the present analysis is defined to be a variant of the NNPDF3.1 fit which excludes heavy nuclear target data. This choice allows us to indirectly incorporate the constraints on quark flavor separation provided by the pp measurements from the LHC.
The nNNPDF2.0 results allow us to tackle several important issues concerning nuclear effects among various quark flavors. First, we assess the compatibility of the LHC W and Z leptonic rapidity distributions from proton-lead collisions with the constraints coming from DIS structure functions, and demonstrate that the former allow for a marked improvement in the quark PDF uncertainties. We also study the nuclear effects on the total strange content of heavy nuclei, highlighting the interplay between the information provided by DIS and hadronic data. This interplay is also interesting from the proton PDF point of view, where the pull on strangeness provided by the ATLAS W, Z distributions [48] is the opposite from that of neutrino data and other LHC processes such as the W+c cross-sections.
We then analyze the impact that the momentum and total valence sum rule constraints have in the global nPDF determination, and demonstrate that the corresponding integrals agree with QCD predictions within uncertainties even when the sum rules are not explicitly imposed. We conclude the paper by providing theoretical predictions based on nNNPDF2.0 for representative processes of phenomenological interest in proton-ion collisions: isolated photon production in the central and forward rapidity regions and inclusive pion production.
This paper is organized in the following manner. In Sect. 2, we provide the input experimental observables used in this analysis and detail the corresponding theoretical calculations. We define a set of conventions and notation used in this work and describe new aspects of our fitting methodology in Sect. 3. The nNNPDF2.0 nuclear parton distributions are then presented in Sect. 4, followed by a discussion of phenomenological implications in Sect. 5. Lastly, in Sect. 6 we conclude with a summary and highlight future directions of study.

Experimental data and theory calculations
In this section, we provide details of the experimental measurements used as input for the nNNPDF2.0 determination. An emphasis is made in particular on the new datasets that are added with respect to those that were present in nNNPDF1.0. We then discuss the theoretical calculations corresponding to these datasets and their numerical implementation in our fitting framework.

Input dataset
Common to the previous nNNPDF1.0 analysis are the nuclear NC DIS measurements listed in Table 2.1. For each dataset, the target nuclei A 1 and A 2 used by each experiment are indicated together with their atomic mass numbers. We also list the number of data points after the DIS kinematical cuts and provide the corresponding publication reference.
The DIS kinematic cuts are the same as in our previous study, i.e. Q 2 = 3.5 GeV 2 and W 2 = 12.5 GeV 2 , consistent with the NNPDF3.1 proton PDF baseline used to satisfy our boundary condition.
Note that all NC DIS measurements listed in Table 2.1 are provided in terms of ratios of structure functions between two different nuclei. In most cases the denominator is given by the deuterium structure function, but ratios to carbon and lithium are also provided. As we will discuss in Sect. 3, our fitting approach parameterizes the PDFs entering the absolute structure functions for each value of A, after which their ratios are constructed.
The remaining input data which is newly added to our nNNPDF2.0 analysis is presented in terms of absolute cross-sections, without normalizing to any baseline nucleus. We list these data in Table 2.2, divided into two categories: CC neutrino DIS reduced crosssections on nuclear targets and leptonic rapidity distributions in electroweak gauge boson production from proton-lead collisions at the LHC. The neutrino and anti-neutrino reduced cross-sections are further separated into inclusive cross-sections from CHORUS [62] and charm-tagged cross-sections from NuTeV [63]. The LHC measurements are divided into data from ATLAS and from CMS from the Run I and Run II data-taking periods. In this table we also indicate the total number of data points included in the fit, combining the NC and CC cross-sections measurements with the LHC data. In total, the nNNPDF2.0 global fit contains n dat = 1467 data points.
Starting with the CC measurements from CHORUS, we fit the inclusive neutrino and anti-neutrino double-differential cross-sections, d 2 σ νN /dxdQ 2 . After imposing kinematic cuts, the dataset consists of n dat = 846 data points equally distributed between neutrino and anti-neutrino beams. The fitted cross-sections are not corrected for non-isoscalarity of the lead target, and therefore the corresponding theory calculations take into account effects related to the difference between Z = 82 and Z = A/2 = 104. The situation is therefore different from the treatment of NC nuclear structure functions, where the experimental results are presented with non-isoscalar effects already subtracted, as discussed in [13].
In addition to the CHORUS reduced cross-sections, nNNPDF2.0 also includes the NuTeV di-muon cross-sections from neutrino-iron scattering. Dimuon events in neutrino DIS are associated with the W ± + s (d) → c scattering process, where the charm quark hadronizes into a charmed meson and then decays into a final state containing a muon. This process is dominated by the strange-initiated contributions since other initial states are CKM-suppressed, thus providing direct sensitivity to the strange quark nuclear PDF. In fact, the NuTeV di-muon data are known to play an important role in studies of proton strangeness in global QCD analyses. After kinematic cuts, we end up with n dat = 39 and 37 data points for the neutrino and the anti-neutrino cross-sections, respectively. Together with the CHORUS cross section data, the CC measurements comprise a majority of the input dataset with a total of n dat = 922 data points.
Moving now to the LHC electroweak gauge boson cross-sections, we consider in this work the four datasets that are listed in Table 2.2. Three of the datasets come from the Run I data-taking period, corresponding to a per-nucleon center-of-mass energy of √ s = 5.02 TeV. These are the ATLAS Z rapidity distributions [23] and the CMS W ± [25] and Z [24] rapidity distributions, which contain n dat = 14, 20, and 12 data points respectively. Note   Table 2.1 for the new datasets that have been added to nNNPDF2.0. As opposed to the NC structure function measurements, these datasets are presented as absolute distributions rather than as as cross-sections ratios. We also indicate the total number of data points in the fit, combining the NC and CC structure functions with the LHC data. that ATLAS does not have a published measurement of the W ± rapidity distributions from Run I and that only preliminary results have been presented [22].
In the same way as the CC reduced cross-sections, the LHC measurements of electroweak gauge boson production are provided as absolute distributions. In this case, however, it is possible to construct new observables with the LHC W ± and Z production data that might be beneficial for nPDF determinations. For example, the EPPS16 analysis composed and analyzed the forward-to-backward ratio, where cross sections at positive lepton rapidities are divided by the ones at negative rapidities. Nevertheless, in this work we choose only to work with the absolute rapidity distributions presented in the experimental publications.
In addition to the three Run I results, we add also for the first time in an nPDF analysis measurements from Run II corresponding to a per-nucleon center-of-mass energy of √ s = 8.16 TeV. More specifically, the measurements correspond to W + and W − leptonic rapidity distributions [64] from CMS, which provide an additional n dat = 48 data points. The fact that the amount of data is more than doubled compared to the corresponding Run I measurements is a consequence of the increase in the CoM energy as well as the higher integrated luminosity. In particular, the Run II measurements are based on L = 173 nb −1 compared to L = 34.6 nb −1 available from Run I. The CMS Run II data are therefore expected to provide important constraints on the quark flavor separation of the nuclear PDFs. As opposed to the situation in proton-proton collisions, the LHC gauge production measurements do not provide information on the correlation between experimental systematic uncertainties. For this reason, we construct the total experimental error by adding the various sources of uncertainty in quadrature. The only source of systematic error which is kept as fully correlated among all the data bins of a given dataset is the overall normalization uncertainty. Note that this normalization uncertainty is correlated within a single experiment and LHC data-taking period, but elsewhere is uncorrelated between different experiments.
In order to illustrate the coverage of the experimental data that is included in nNNPDF2.0 and summarized in Tables 2.1 and 2.2, we display in Fig. 2.1 their kinematical range in the (x, Q 2 ) plane. Here the horizontal dashed and curved dashed lines correspond to the kinematic cuts of Q 2 = 3.5 GeV 2 and W 2 = 12.5 GeV 2 , respectively. In addition, we show for each hadronic data point the two values of x corresponding to the parton momentum fractions of the incoming proton and lead beams, computed to leading order.
There are several interesting observations that one can make regarding Fig. 2.1. First of all, the LHC proton-lead measurements significantly extend the kinematic coverage of the fixed-target DIS reduced cross-sections, both in terms of x and Q 2 . In particular, the LHC data reside at Q 2 values that are orders of magnitude larger while the coverage in partonic momentum fraction is extended down to x 10 −3 . Secondly, the CC reduced cross-sections have a similar coverage compared to the NC ones, providing sensitivity to different quark and antiquark combinations across the shared medium-to large-x region. Finally, the kinematics of the LHC W and Z measurements largely overlap. The ability to describe them simultaneously can therefore demonstrate the compatibility between the experimental data and theoretical calculations.

Theoretical calculations
DIS structure functions. For the NC DIS structure functions we use the same theoretical settings as in nNNPDF1.0, i.e. the structure functions are evaluated at NLO using APFEL [65] in the FONLL-B general-mass variable flavor number scheme [66]. The value of the strong coupling constant is taken to be the same as in the NNPDF3.1 proton PDF fit, α S (m Z ) = 0.118, as well as the charm and bottom mass thresholds m c = 1.51 GeV and m b = 4.92 GeV, respectively. The charm PDF is generated perturbatively by the DGLAP evolution equations and is thus absent from the n f = 3 scheme. Lastly, the structure functions are processed by the APFELgrid [67] fast interpolation tables which allow for efficient evaluations during the PDF fit.
Concerning the CC neutrino reduced cross-sections, most of the theory settings are shared with their NC counterparts. The main difference is that the heavy quark contributions in the CC predictions at NLO are accounted for in the FONLL-A scheme instead to maintain consistency with the proton baseline. Massive O α 2 s corrections to charm production in CC DIS have been presented in Ref. [68], and subsequently used to study their impact in the determination of the strange content of the nucleon in Ref. [69]. Further details about the implementation of heavy quark mass corrections in the NNPDF framework for charged-current scattering can be found in Ref. [46].
Hadronic cross-sections. The rapidity distributions from W and Z boson production in proton-lead collisions are evaluated at NLO using MCFM [70] v6. 8

interfaced with
APPLgrid [71]. We have ensured that the numerical integration uncertainties in the MCFM calculations are always much smaller than the corresponding experimental errors. Furthermore, our calculations are benchmarked with the reference theoretical values whenever provided by the corresponding experimental publications. To illustrate this benchmarking, we display in Fig. 2.2 the muon rapidity distributions for W − boson production at √ s = 8.16 TeV in the center-of-mass frame. Here we compare our MCFM-based calculation with the theory predictions provided in Ref. [64] at the level of absolute cross-sections (upper panel) and also as ratios to the central experimental values (lower panel). In both cases, the CT14 NLO proton PDF set is adopted as input and nuclear corrections are neglected. As can been seen by the figure, there is good agreement at the ∼1% level between our calculations and the reference results provided in the CMS paper. Similar agreement is obtained for the rest of hadronic datasets included in the present analysis.
Since the fast interpolation grids are computed in the center-of-mass frame of the proton-lead collision, rapidity bins that are given in the laboratory frame η lab are shifted to the center-of-mass frame η CM when required. This shift is given by η lab = η CM +0.465 both at √ s = 5.02 and 8.16 center-of-mass energies. Lastly, we note that the same theoretical The leptonic rapidity distributions for W − boson production at √ s = 8.16 TeV in the center-of-mass frame. Our MCFM-based calculation is compared with the theory predictions provided in [64] both as absolute cross-sections (upper) and as ratios to the experimental data (lower panel). In both cases the CT14 NLO proton PDF set is adopted and nuclear corrections are neglected.
settings were used for the evaluation of W and Z production in proton-proton collisions for the baseline NNPDF3.1 fit.

Fitting methodology
In this section, we describe the fitting methodology that was adopted for the nNNPDF2.0 determination, focusing in particular on the differences and improvements with respect to the nNNPDF1.0 analysis. We begin by establishing the PDF notation and conventions that will be used throughout this work. We then detail our strategy to parameterize the nuclear parton distributions, including the treatment of sum rules, preprocessing factors, and the proton boundary condition. Lastly, we outline the implementation of the cross-section positivity constraint.

Notation and conventions
Parton distributions can be parametrized in a number of different bases, all of which are related by linear transformations. Two popular ones are the flavor basis, corresponding to the individual quark and anti-quark PDFs, and the evolution basis, given by the eigenvectors of the DGLAP evolution equations [6]. For illustrative purposes we consider here three active quarks, a vanishing strangeness asymmetry, and heavy quark PDFs that are generated via perturbative evolution. At the parameterization scale Q 0 < m c , the flavor basis is composed of the u,ū, d,d, s, and g PDFs, with s =s, while the corresponding evolution basis is given by Σ, T 3 , T 8 , V, V 3 , and g. Expressed in terms of the elements of the flavor basis, the evolution basis distributions are given by where q ± = q ±q. Although the results of an nPDF analysis should be independent of the basis choice for the parameterization, some bases offer practical advantages, for example in the implementation of the sum rules which are discussed later.
In this work, we define f (N/A) to be the PDF for the flavor f associated to the average nucleon N bound in a nucleus with atomic number Z and mass number A. This object can be written as where f (p/A) and f (n/A) represent the PDFs of a proton and a neutron, respectively, bound in the same nucleus of mass number A. Assuming isospin symmetry, the PDFs of the neutron in Eq. (3.2) can be expressed in terms of the proton PDFs via Using the relations above, the strange and gluon distributions of the average bound nucleon (f (N/A) ) and bound proton (f (p/A) ) become equivalent, while the up and down flavored distributions of the average bound nucleon are instead linear combinations of the bound proton PDFs with coefficients depending on the values of A and Z.

nPDF parameterization
The cross-sections for hard scattering processes involving heavy nuclei can be expressed either in terms of f (N/A) or f (p/A) . The two options are fully equivalent, as is highlighted by the LO expressions of the observables collected in Appendix A. One can therefore choose to parameterize either the PDFs of the average bound nucleon or those of the bound proton in a global nPDF analysis. In this study we choose the latter option for two main reasons. First, the connection with the free-proton boundary condition is more straightforward. In addition, the valence sum rules for non-isoscalar nuclei are independent of A and Z. If instead f (N/A) is parameterized, one of the valence sum rules would depend on the value of the Z/A ratio and thus be different for each nuclei, making it inconvenient from the parameterization point of view.
The relation between f (N/A) and f (p/A) is trivial also for PDF combinations that comprise the evolution basis. Consider for example the total quark singlet, where the flavor combination is the same in the proton and in the neutron, i.e. Σ (p/A) = Σ (n/A) . From Eq. (3.2), it simply follows that Σ (N/A) = Σ (p/A) . The same equivalence holds also for V and T 8 . However, the distinction is important for T 3 and V 3 , for which we have so there is an overall rescaling factor of (2Z/A − 1) between f (N/A) and f (p/A) . The main consequence of this relation is highlighted by assuming an isoscalar nucleus, with Z = A/2. In this case, V = 0 while their bound proton counterparts are different from zero. Unless otherwise indicated, the nPDFs discussed in this section will always correspond to those of the bound proton.
Fitting basis and functional form. In our previous nNNPDF1.0 analysis, we parameterized only three independent evolution basis distributions at the initial scale Q 0 , namely the total quark singlet Σ(x, Q 0 ), the quark octet T 8 (x, Q 0 ), and the gluon nPDF g(x, Q 0 ). From the LO expression of Eq. (A.5), it is clear that NC structure functions are sensitive only to a specific combination of Σ and T 8 for isoscalar nuclei, in particular Σ + T 8 /4, while the contribution proportional to T 3 vanishes. In other words, Σ and T 8 are strongly anti-correlated and only the combination Σ + T 8 /4 can be meaningfully determined from the data.
The picture is quite different in the present study, where the availability of charged current DIS data and electroweak gauge boson production cross-sections in proton-lead collisions allow additional elements of the evolution PDF basis to be parameterized (see App. A). If non-isoscalar effects are neglected, there is only a single distribution to be added to our evolution basis choice, namely the total valence quark combination V = u − + d − . However, non-isoscalar corrections are necessary for the targets considered in this analysis, particularly for lead. In this case, the quark triplet T 3 = u + − d + and the valence triplet V 3 = u − − d − must also be parameterized. Note that since T 3 and V 3 correspond to bound protons, they will be different from zero even for isoscalar nuclei. However, in such cases their contribution to the scattering cross-section vanishes and therefore the data provides no constraint on these combinations.
Putting together these considerations, in this work we parameterize six independent PDF combinations in the evolution basis as follows In these expressions, NN f (x, A) stands for the value of the neuron in the output layer of the neural network associated to each specific distribution. As was done in nNNPDF1.0, we use a single artificial neural network consisting of an input layer, one hidden layer with sigmoid activation function, and an output layer with linear activation function. The input layer contains three neurons that take as input the values of the momentum fraction x, ln(1/x), and atomic mass number A, respectively. Since the hidden layer contains 25 neurons, there are a total of N par = 256 free parameters (weights and thresholds) in the neural network used to model our nPDFs.
The neural-net parameterization in Eq. (3.6) is then complemented by three normalization coefficients B g , B V , and B V 3 which are fixed by the sum rules, and by twelve preprocessing exponents α f and β f which are fitted simultaneously with the network parameters. Since our proton baseline is a variant of the NNPDF3.1 global NLO fit [72] with perturbative charm, we adopt for consistency the same parameterization scale of Q 0 = 1 GeV.
It is important to emphasize here that the parameterization in Eq. (3.6) is valid from A = 1 (free-proton) up to A = 208 (lead). As a result, the nNNPDF2.0 analysis incorporates an independent determination of the free-proton PDFs, where agreement with the proton PDF baseline is enforced by means of a boundary condition as explained below. This is a relevant distinction, implying that the A = 1 PDF can by construction differ slightly from our proton baseline, for example as a result of positivity constraints that are more general in the former case, or by new information contained in the LHC proton-lead cross-sections.
Sum rules. For every nuclei, we assume that the fitted nuclear PDFs satisfy the same valence and momentum sum rules as in the proton case. The sum rules are implemented via an overall normalization factor in the PDF parameterization, which are adjusted each time the neural network parameters are modified in order to ensure that the sum rules remain satisfied. Note that these sum rules need only to be applied at the input scale Q 0 , since the properties of DGLAP perturbative evolutions guarantee that they will also be satisfied for other Q > Q 0 . First, energy conservation leads to the momentum sum rule constraint, which is enforced by fixing the gluon normalization to be , (3.8) where the denominator of Eq. (3.8) is evaluated using Eq. (3.6) and setting B g = 1. Our nuclear PDFs are also constructed to comply with the three valence sum rules that follow from the valence quark quantum numbers of the proton: where the final relation is trivially satisfied due to our inherent flavor assumption of s =s.
To implement the former two valence sum rules in our analysis, we first must derive the corresponding constraints in the evolution basis. Adding Eqns. (3.9) and (3.10) results in This condition can then be implemented in the same way as the momentum sum rule, namely by setting the overall normalization factor of V as , (3.13) where the denominator of Eq. (3.13) is evaluated using Eq. (3.6) and setting B V = 1. The second valence sum rule in the evolution basis is the one related to the quark valence triplet V 3 . Subtracting Eq. (3.10) from (3.9) gives (3.14) which again is imposed by setting where the denominator of Eq. (3.15) is evaluated using Eq. (3.6) with B V 3 = 1.
In this analysis, the normalization pre-factors B g (A), B V (A), and B V 3 (A) are computed using the trapezoidal rule for numerical integration between x min = 10 −9 and x max = 1 each time the fit parameters are updated by the minimization procedure. With a suitable choice of the ranges for the preprocessing exponents (see discussion below), we guarantee that each quark combination satisfies the corresponding physical integrability conditions. Lastly, we have confirmed that individual replicas satisfy the sum rules with a precision of a few per-mille or better.
An interesting question in the context of nuclear global QCD analyses is the extent to which theoretical constraints such as the sum rules are satisfied by the experimental data when not explicitly imposed. In fact, it was shown in Ref. [46] that the momentum sum for the free proton agrees with the QCD expectation within ∼ 1% in this scenario. Here we will revisit this analysis for the nuclear case, and will present in Sect. 4.3 variants of the nNNPDF2.0 fit where either the momentum sum rule, Eq. (3.7), or the valence sum rule, Eq. (3.12), is not enforced. Interestingly, we will find that the experimental data is in agreement with sum rule expectations, albeit within larger uncertainties, demonstrating the remarkable consistency of the nuclear global QCD analysis.
Preprocessing exponents. The x α f (1−x) β f polynomial pre-factors appearing in Eq. (3.6) are included to increase the efficiency of the parameter optimization, since they approximate the general PDF behavior in the small-and large-x asymptotic limits [73]. Note that the exponents α f and β f are A-independent, implying that A dependence of the nP-DFs will arise completely from the output of the neural network. As in the case of the nNNPDF1.0 analysis, the values of α f and β f are fitted for each Monte Carlo replica on the same footing as the weights and thresholds of the neural network.
The ranges of the preprocessing parameters are determined both by physical considerations and by empirical observations. First of all, the lower limit of the small-x parameter is set so that each PDF combination satisfies various integrability conditions. In particular, the non-singlet combinations xV , xV 3 , xT 3 , and xT 8 must tend to zero at small-x, else the valence sum rules and other relations such as the Gottfried sum rule [74,75] would be ill-defined. Moreover, the singlet combinations xΣ and xg must be exhibit finite integrable behavior as x → 0, otherwise the momentum integral cannot be computed. Concerning the large-x exponents β f , the lower limits of their ranges ensure that PDFs vanish in the elastic limit, while the upper limit is determined largely from general arguments related to sum rule expectations. In general, however, the upper values of both α f and β f are chosen to be sufficiently large to allow flexibility in exploring the parameter space while keeping fit efficiency optimal.
Under these considerations, we restrict the parameter values for the pre-processing where the ranges in parentheses are those used to randomly select the initial values of α f and β f at the start of the minimization. We do not impose any specific relation between the small-or large-x exponents of the different quark combinations, so that each are fitted independently. It is also worth emphasizing here that the neural network has the ability to compensate for any deviations in the shape of the preprocessing function, so the dependence on x and A of the nPDFs in the data region will be dominated by the neural network output. This implies that the preprocessing exponents will primarily affect the results in the extrapolation regions.
To illustrate the values of the small and large-x preprocessing exponents preferred by the experimental data, we display in Figs. 3.1 and 3.2 the probability distributions associated with the α f and β f exponents, respectively, computed using the N rep = 1000 replicas of the nNNPDF2.0 analysis. Note how these exponents are restricted to lie in the Normalized Yield interval given by Eq. (3.16). For T 3 and T 8 , we can see that despite not imposing the strict integrability requirement that α f > 0, it is still being satisfied for a large majority of the replicas, especially for T 3 . Interestingly, the gluon seems to prefer a valence-like behavior at small-x. However, such behaviour is only observed at the parameterization scale and as soon as Q > Q 0 , DGLAP evolution drives it to its expected singlet-like behavior.
Concerning the behavior of the large-x exponents β f , we find that they are reasonably well constrained for the quark distributions, where the best-fit values are located in a region close to β f 3. The fact that they share similar β f exponents can be explained by the fact that in the large-x region the quark combinations are dominated by valence components. Interestingly, a best-fit value of β f 3 for the valence quarks is consistent with the expectations from the QCD counting rules, as discussed in [73]. Furthermore, the best-fit value for β g is also consistent with the QCD counting rules prediction of β g 5, although with significant uncertainties. The fact that β g is found to vary in a wide range is due to the lack of direct constraints on the large-x nuclear gluon PDF in the present analysis.
The free-proton boundary condition. As was done in our previous study, we again implement the condition that the proton PDF baseline, obtained with consistent theoretical and methodological choices, is reproduced when A → 1. This condition should be constructed to match the free-proton distributions not only in terms of central values but also at the level of PDF uncertainties. In other words, it should allow a full propagation of the information contained in the proton baseline, which is particularly important to constrain the nPDFs of relatively light nuclei. Note, however, that for the reasons explained above, the nNNPDF2.0 A = 1 set will in general not be strictly identical to the corresponding proton baseline.
The proton boundary condition constraint is implemented by adding a quadratic penalty term to the χ 2 of the form where the sum over flavors f runs over the six independent elements in the PDF evolution basis. Since the properties of DGLAP evolution ensure that the distributions for Q > Q 0 also satisfy the condition, only the PDFs at the parameterization scale Q 0 enter the penalty term. In Eq. (3.17), we use a grid with N x = 60 points of which 10 are distributed logarithmically from x min = 10 −3 to x mid = 0.1 and the remaining 50 points are linearly distributed from x mid = 0.1 to x max = 0.7. The value of the boundary condition hyperparameter is fixed to be λ BC = 100 as was done in the previous nNNPDF1.0 analysis. For such a value, we find that the central values and uncertainties of the proton baseline are reasonably well described (see Sect. 4).
In this analysis the proton baseline, f (p) (x, Q 0 ), is taken to be a variant of the NNPDF3.1 NLO fit [72] with perturbative charm, where the neutrino DIS cross-sections from NuTeV and CHORUS are removed along with the di-muon production measurements in protoncopper collisions from the E605 experiment [76]. As such, the proton baseline not only avoids double counting of the CC DIS data but also excludes constraints from heavy nuclear target data where nuclear effects are neglected. This choice is different to that used for nNNPDF1.0, where the global NNPDF3.1 fit was used and double-counting of experimental data was not an issue.
In order to illustrate the differences between the free-proton boundary condition used in nNNPDF1.0 and that employed in the present analysis, we compare in Fig. 3.3 the NNPDF3.1 NLO global and no heavy nuclear fits at the initial parameterization scale of Q 0 = 1 GeV. Displayed are the gluon, up quark, down sea quark, and total strange PDFs in the range of x with which the proton boundary condition is constrained by Eq. (3.17). Here one can see that removing the heavy nuclear data from NNPDF3.1 results in a moderate increase of the PDF uncertainties associated to the quarks as well as an upward shift of the central value of the total strange distribution. The former effect is primarily a consequence of information loss on quark flavor separation with the removal of neutrino scattering measurements. The strangeness feature, on the other hand, arises due to the absence of sensitivity from the NuTeV neutrino dimuon cross-sections, resulting in an upward pull by the ATLAS W, Z 2011 rapidity distributions which are known to produce an enhanced strange with respect to the up and down quark sea. The results of Fig. 3.3 highlight the importance of a consistent choice of the free-proton baseline in order to draw solid conclusions on the nuclear modifications, for example those associated to the nucleon's strange content. In order to ensure that all central values and PDF uncertainties are reproduced, we select a different replica from the NNPDF3.1 proton baseline when constructing Eq. (3.17) for each replica of nNNPDF2.0. Since we perform a large N rep number of fits to estimate the uncertainties in nNNPDF2.0, we are able to propagate the necessary information contained in NNPDF3.1 to the resulting nPDFs in a robust manner. Lastly, we note that Eq. (3.17) is the only place in the analysis where the free-proton NNPDF3.1 baseline is inserted. In other parts of the fit where a free-nucleon PDF is required, for example in the theoretical predictions of the proton-lead scattering cross-sections, the nNNPDF2.0 set with A = 1 is used instead.

Cross-section positivity
While parton distributions are scheme-dependent and thus not necessarily positive-definite beyond leading order in perturbative QCD, physical cross-sections constructed from them are scheme independent and should be positive-definite in the region of validity of the perturbative expansion. 1 In the NNPDF family of proton PDF fits, the requirement that cross-sections remain positive is implemented by adding to the χ 2 a penalty term in the presence of negative cross-section values [6]. The cross-sections that enter this penalty term correspond to theoretical predictions based on pseudo-data generated for representative processes that are directly sensitive to a sufficient number of PDF combinations.
In the nNNPDF1.0 analysis, cross-section positivity was not imposed and led to some observables, such as the longitudinal structure function F L (x, Q 2 ), becoming negative at small-x values outside the data region. To bypass this problem, and also to improve the methodological consistency with the free-proton baseline, in nNNPDF2.0 we impose the positivity of physical cross-sections for all nuclei used in the fit by adding a suitable penalty to the figure of merit. In this case, the positivity penalty is expressed as 18) for N pos positivity observables F (l) . Each of the observables contain N (l) dat kinematic points that are chosen to cover an adequately large region of phase space relevant to various PDF combinations, as we discuss in more detail below. The computed observables are summed over all N A nuclei for which we have experimental data, as listed in Tables 2.1 and 2.2, as well as for the free-proton at A = 1. Finally, the value of the hyper-parameter λ pos = 1000 is determined by manual inspection of the optimization process and is chosen so that positivity is satisfied without distorting the training on the real experimental data. 2 In Table 3.1 we list the F (l) processes used in this analysis for which the positivity of physical cross-sections is imposed using Eq. (3.18). For each observable, the LO expressions in terms of the average bound nucleon PDFs and bound proton distributions are given together with the number of pseudo-data points N dat and the corresponding kinematic coverage. Note that the LO expressions in Table 3.1 are shown for illustration purposes only, and in our analysis these observables are computed using the full NLO formalism.
Here we consider two types of positivity observables. The first type are the DIS structure functions F u 2 , F d 2 , F s 2 , and F L . The former three quantities, which contain only u, d, and s contributions, respectively, are constructed to be positive-definite since there exists consistent physical theories where the photon couples only to up-, down-, or strangetype quarks. The longitudinal structure function F L , on the other hand, largely impacts the nuclear gluon PDF since F L enters only at NLO and is dominated by the gluon contribution. Lastly, we evaluate each of these structure functions on a grid of N = 20 pseudo-data points between x = 10 −7 and x = 0.9 at Q = √ 5 GeV, which is slightly above the input parameterization scale Q 0 = 1 GeV to ensure perturbative stability.
The second type of observable for which the cross-section positivity is imposed is the double-differential Drell-Yan cross-section. In particular, we enforce the positivity of both neutral-and charged-current Drell-Yan cross-sections in pA scattering for specific combinations of quark-antiquark annihilation listed in Table 3  Drell-Yan cross-section can be written schematically as where the momentum fractions x 1 and x 2 are related to the rapidity y and invariant mass of the final state Q at at leading order by x 1,2 = (Q/ √ s) e ±y . Here we set Q 2 = 5 GeV 2 and adjust the rapidity range and center-of-mass energy √ s so that the LO kinematic range for x 1 and x 2 correspond to 10 −2 ≤ x 1,2 ≤ 0.9. Note that positivity of Eq. (3.19) will affect also the fitted nNNPDF2.0 A = 1 distribution which enters as the free-proton PDF. While most of the positivity observables coincide with those included in the free-proton baseline from which the A = 1 distribution is derived, the ud,ūd, us, andūs combinations of Table 3.1 are new in the nNNPDF2.0 determination. Consequently, we impose these new observables only for proton-iron and proton-lead collisions, the two nuclei for which experimental data from charged-current DIS and Drell-Yan reactions are analyzed to study quark flavor separation.
In Sect. 4.4 we will demonstrate the positivity of cross-sections for all relevant processes in the entire kinematical range. We have verified that, in the absence of these constraints, the DIS structure functions and the DY cross-section will in general not satisfy positivity.

Results
In this section we present the main results of this work, namely the nNNPDF2.0 determination of nuclear PDFs. We first study the features of the nNNPDF2.0 fit by assessing the quality of its agreement with experimental data, focusing largely on the LHC weak boson production cross-sections, and by studying the behavior of nuclear modification ratios across different nuclei. Subsequently, we contrast this new nPDF determination with its predecessor, nNNPDF1.0, and trace the origin of observed differences via a series of fits with systematic changes. We then study the role that the valence and momentum sum rules play in the global nPDF determination by presenting two variants of the nNNPDF2.0 fit in which one of the two sum rules is not imposed. Finally, we demonstrate that the nNNPDF2.0 fit satisfies the positivity of physical cross-sections in the kinematic range where experimental data is available.

The nNNPDF2.0 determination
We begin by discussing the fit quality which is assessed across the various datasets and quantified by the χ 2 figure of merit. A comparison is then made using the nNNPDF2.0 predictions with the LHC weak gauge boson production measurements. Following this, we take a closer look on the nNNPDF2.0 parton distributions and the corresponding ratios to the free-nucleon case. Lastly, we investigate the sensitivity of the nuclear modification factors with respect to the atomic mass A, in particular on the sea quarks and strangeness, and compare our results with those from the EPPS16 analysis. in the corresponding fits. The numbers presented in Tables 4.1 and 4.2 contain only the contribution to the χ 2 associated with the experimental data, and do not include penalty from the proton boundary condition or positivity constraint (the latter of which vanishes for all final nNNPDF2.0 replicas anyway). Moreover, while we use the t 0 prescription [43] during the optimization to avoid the D'Agostini bias, the quoted numbers correspond to the experimental definition of the χ 2 instead [79], in which the central experimental value is used to compute the correlated multiplicative uncertainties.
From the results of Table 4.1 and 4.2, one can see that the nNNPDF2.0 determination achieves a satisfactory description of all datasets included in this analysis. A good χ 2 is obtained in particular for the charged-current DIS cross-sections and LHC gauge boson production distributions. For instance, for the precise W boson rapidity distributions at √ s = 8.16 TeV from CMS one finds χ 2 /n dat = 0.74 for n dat = 48 data points. The corresponding predictions using EPPS16 (which do not include this dataset) also lead to a good agreement with χ 2 /n dat = 0.88. The description of most neutral current DIS datasets is comparable to that of nNNPDF1.0. Some datasets, such as the SLAC iron structure functions, are somewhat deteriorated with respect to nNNPDF1.0, possibly due to some mild tension with the CC cross-sections. Further, we find that our resulting fit quality to the CC deep-inelastic structure functions is similar to that obtained in the corresponding proton PDF analysis [72].
Overall, the resulting χ 2 tot /n dat = 0.976 for the n dat = 1467 data points included in the fit highlights the remarkable consistency of the experimental data on nuclear targets and the corresponding theory predictions based on the QCD factorization framework. A similar total χ 2 is obtained for the theoretical predictions computed using EPPS16 as input. As will be shown below, the fact that both global fits lead to comparable χ 2 values can be explained by the corresponding similarities at the nPDF level.
Comparison with experimental data. To facilitate our discussion regarding the comparison between data and theory calculations, we first introduce here the conventions that we use to define the nuclear modification factors. Following the notations of Sect. 3.1, the Drell-Yan rapidity distributions in proton-nucleus collision can be expressed as where the superindices N/A, p/A, and n/A indicate respectively the collision between a proton with an average nucleon, a proton, or a neutron bound within a nucleus of atomic number Z and mass number A. As in the case of the PDFs, the bound proton and nucleon cross-sections σ are related to each other via isospin symmetry. The expression in Eq. (4.1) helps in emphasizing the two reasons why the Drell-Yan cross-sections will be different between pp and pA collisions. The first is due to the modifications of the bound proton PDFs in nuclei, namely the fact that σ   factor in Drell-Yan proton-nuclear collisions as (4.2) With the above definition, one should find that R DY A (y) = 1 only in the presence of genuine nuclear corrections, that is, when σ DY . The definition of Eq. (4.2) differs from that of an observable frequently measured in proton-lead collisions, where the proton-nucleus cross-section is normalized to a proton-proton baseline, As explained above, in proton-nuclear collisions one will find R DY A,exp (y) = 1 for non-isoscalar targets even when σ  predictions using nNNPDF2.0. The theory calculations were computed using Eq. (4.1) with the nNNPDF2.0 A = 1 and A = 208 distributions, in the latter case also including the corresponding 90% CL uncertainty band. Note that for the theory cross-section obtained with the A = 1 PDFs, nuclear effects are absent since it corresponds to the free proton distributions. From top to bottom, the three panels display the absolute cross-sections as a function of the dilepton rapidity y, the ratio between data and theory, and the nuclear modification factor ratio R A (y) defined by Eq. (4.2).
Here the ATLAS and CMS measurements of the dilepton rapidity distributions are both provided in the Z-boson center-of-mass reference frame. The CMS absolute crosssections are lower than ATLAS due to the different kinematical selection cuts. We also note that for these datasets, as well as for the rest of LHC measurements, forward rapidities correspond to the direction of the incoming lead nuclei. From the comparisons in Fig. 4.1 we can see that the theory predictions are in good agreement with the experimental data. Interestingly, the R A (y) ratios exhibit a strong preference for nuclear modifications, especially at forward rapidities which correspond to small values in x for the bound nucleons. As will be discussed below, this behavior can be explained at the level of the nuclear PDFs by a notable shadowing effect at small-x for up and down quarks and antiquarks.
The corresponding comparisons for the CMS muon rapidity distributions in W − and  It is interesting to note that for the high-statistics CMS measurement at 8 TeV, the A = 208 prediction is remarkably better than the free-proton one, particularly at forward rapidities where one is sensitive to the small-x region of the bound nucleons. This feature highlights how the CMS 8 TeV W production data provides direct evidence for the nuclear modifications of valence and sea quark distributions.
Nuclear parton distributions. In Fig. 4.4 we display the nNNPDF2.0 set of nuclear PDFs for three different nuclei, 12  From the comparisons in Fig. 4.4, we can see that the nuclear PDFs exhibit a moderate dependence on the atomic number A. The resulting pattern of PDF uncertainties can partly be explained by the input data. For example, nPDF uncertainties on strangeness are smaller in 12 C and 56 Fe compared to 208 Pb, due to the impact of the proton boundary condition and the NuTeV dimuon data, respectively. We also observe that the PDF un-    certainties on the gluon (and correspondingly on the dynamically generated charm PDF) at medium and small-x are larger in iron than in carbon and lead. While the gluon uncertainties for carbon are largely determined by the impact of the free-proton boundary condition, those on lead nuclei can likely be attributed to the LHC measurements of W and Z production and the large amount of charged-current DIS data, which indirectly provide constraints via DGLAP evolution. Fig. 4.4 also shows that in the case of 208 Pb, there is a clear difference between the d v and u v distributions due to the isoscalar nature of the nucleus, where d v is larger due to the significant neutron excess in lead. The fact that u v and d v do not overlap within the 90% CL bands in a wide range of x highlights how a careful treatment of the quark and antiquark flavor separation is essential in order to describe the precise data available on lead targets, especially the weak boson production measurements in proton-lead collisions at the LHC.
To further illustrate the features of the nNNPDF2.0 determination, it is useful to study them in terms of ratios with respect to the corresponding free-nucleon baseline. In the following we will define the nuclear modification ratios of PDFs for a nucleus with mass number A as, .

(4.4)
When evaluating Eq. (4.4), it is important to account both for the uncertainties associated to the nuclear and free-proton PDFs. In the case of a Monte Carlo set such as nNNPDF2.0, this entails evaluating R A f for each of the N rep replicas and then determining the resulting median and 90% CL interval.
In Fig. 4.5 we show the nNNPDF2.0 distributions for A = 1 that enter the denominator of Eq. (4.4). They are compared with the NNPDF3.1 proton baseline used for implementation the boundary condition via Eq. (3.17) at the input parametrisation scale Q 0 = 1 GeV. Overall, there is very good agreement between our A = 1 result and the proton boundary condition, particularly in the region of x where the constraint is being imposed, 10 −3 < x < 0.7. It is important to emphasize that due to the nNNPDF2.0 A = 1 set being determined not only by the boundary condition but also by the positivity constraints and the LHC cross-sections, one expects some moderate differences with the NNPDF3.1 proton baseline. The most notable differences indeed are found in theū andd PDFs at medium to large x, where the newly added DY positivity observables forūd and ud quark combinations, as well as the LHC proton-lead data, play a significant role. Nevertheless, the level of agreement reported in Fig. 4.5 is quite remarkable and highlights how the nNNPDF2.0 determination manages to take into account the extensive information provided by the global analysis of free-proton structure.
In Fig. 4.6 we display the nuclear PDF ratios, defined by Eq. (4.4), for the same parton flavors as in Fig. 4.5. Here the ratios for 12 C, 56 Fe, and 208 Pb nuclei are compared at Q = 10 GeV. The shaded bands indicating the 90% confidence level intervals take into account also the correlations with the proton baseline. Overall, the comparison highlights the dependence on the central value and uncertainties of the nuclear ratios R A f as the value of A is varied from lighter to heavier nuclei.
For the up and down quark nPDFs in Fig. 4.6, we can see that the shadowing effects become more prominent at small-x as A increases, with the central value reaching constraints provided by the LHC data, as will be shown in Sect. 4.2. In the large-x region, deviations from the R A f = 1 scenario (no nuclear corrections) appear more prominent for the quarks and antiquarks of heavier nuclei.
Turning now to the valence quarks, one can distinguish the shadowing and antishadowing regions for all values of A, though nuclear effects in carbon are quite small. While one generally finds a suppression R A f < 1 at large x that is consistent with the EMC-effect expectation, the position of the so-called "EMC minimum" is not universal or even guaranteed at the nPDF level. For the anti-quarks, the only region where a welldefined qualitative behavior is observed is the small-x shadowing region, while at large-x the nPDF uncertainties are too large to draw any solid conclusion. Finally, we observe that the nuclear modifications on the gluon nPDF are rather stable as A is varied.  Comparison with EPPS16. In Figs. 4.7 and 4.8, we display the nuclear PDF modification ratios at Q = 10 GeV for iron, R Fe f (x, Q 2 ), and lead, R Pb f (x, Q 2 ), for nNNPDF2.0 and EPPS16, each normalized to the corresponding free-proton baseline. As in Fig. 4.6, we show the up and down quarks, the corresponding antiquarks, the total strangeness, and the gluon. The bands again correspond to the 90% CL uncertainties constructed using the appropriate prescription for each set. This means that for EPPS16, the error is computed by adding in quadrature the differences in value between the N eg eigenvectors of the  Hessian set and the best fit result. 4 Beginning with the nPDF comparison for iron nuclei, we find that there is good agreement between the results of nNNPDF2.0 and EPPS16 both in terms of central values and of the nPDF uncertainties in the range of x for which experimental data is available. In the small-and large-x extrapolation regions, the uncertainties are larger in the nNNPDF2.0 case. We also note that the Fermi-motion-like growth of R Fe u and R Fe d at very large x, which is built into the EPPS16 parameterization, is absent in the nNNPDF2.0 results. There instead one finds a suppression compared to the free-proton case, especially in the case of R Fe d . As expected, the observed pattern of nuclear modifications is very similar between up and down quarks and between the corresponding antiquarks due to iron being nearly isoscalar.
Considering the behavior of the sea quarks, nNNPDF2.0 and EPPS16 agree well in terms of central values and uncertainties in the shadowing region, x ∼ < 0.05. However, there are more significant differences at large-x, where the qualitative behavior between the two nPDF sets is the opposite: nNNPDF2.0 favors an enhancement compared to the free-proton baseline, while EPPS16 prefers instead a suppression. In any case, the differences are well within the large uncertainty bands, and additional data is needed to be able to ascertain the correct behavior in this region. Note that at large-x the free-proton baseline antiquarks are also affected by large errors, complicating the interpretation of R Fē u and R Fē d . Turning to the nuclear modification of the total strangeness, in nNNPDF2.0 we find a suppression of ∼ 20% compared to the proton baseline in the relevant range of x. This is consistent with studies of the interplay between the NuTeV dimuon and the ATLAS W,Z 2011 data in proton global analyses, where the latter data set largely suppress strangeness in contrast to the former. Such behavior is not reported by EPPS16, which exhibits much larger nPDF uncertainties that are likely due to the absence of the strange-sensitive NuTeV cross-sections in their analysis. Furthermore, the ATLAS W,Z 2011 distributions are missing from the CT14 proton baseline used by EPPS16 (although these data have been accounted for in the recent CT18 release [7]).
Finally, concerning the gluon PDF we find from this comparison that in the nNNPDF2.0 fit there is little evidence for nuclear shadowing, with R Fe g 1 in the region x ≤ 0.05. We also find that the nPDF uncertainties on the gluon are larger compared to EPPS16 by roughly a factor of two. At larger values of x, the uncertainties increase significantly and nNNPDF2.0 prefers a suppressed central value, unlike EPPS16. We note that neither of the two analyses include direct constraints on the large-x nuclear gluons, hence the sizeable nPDF uncertainties, although available data on dijet and photon production could improve this situation.
In the corresponding comparison for lead nuclei, displayed in Fig. 4.8, one observes a number of similarities and differences with respect to the nPDFs of iron. Concerning the up and down quarks, we find our nNNPDF2.0 result provides significant evidence for shadowing at small-x. For instance, at x 5 × 10 −3 we obtain R Pb u = 1 at the four-sigma level or higher. Interestingly, nPDF uncertainties in the shadowing region are up to a factor of two smaller in nNNPDF2.0 than in EPPS16. While in both cases anti-shadowing at x 0.1 is observed, the larger x qualitative behavior is different between the two analyses, with EPPS16 finding the (built-in) EMC suppression followed by Fermi-motion rise while in nNNPDF2.0 the pattern of nuclear modifications is different. In any case, the agreement between the central values of nNNPDF2.0 and EPPS16 for R Pb u and R Pb d in the region of x ∼ < 0.3 is quite remarkable given the very different methodologies employed in each study.  Concerning the nuclear modifications of the sea quarks in lead nuclei, one finds a similar qualitative behavior as in the case of iron. For x ∼ < 0.1 there is good agreement between the central values of R Pb u and R Pb d between EPPS16 and nNNPDF2.0, with the latter exhibiting smaller uncertainties. The two sets are more notably different for x ∼ > 0.1 instead, where EPPS16 predicts a EMC-like suppression common toū andd while nNNPDF2.0 favours a large suppression forū but an enhancement ford. However, the large nPDF uncertainties in this region prevent any definitive conclusions, though the two fits are fully consistent within the 90% CL bands. One possible source for the differences could be in the respective freeproton counterparts, where large-x antiquarks are poorly known. For the total strangeness, the behavior of R Pb s + is similar to that of iron, where nNNPDF2.0 predicts a suppression more or less independent of x, with rather larger uncertainties for EPPS16 compared to our nNNPDF2.0 result due to the missing constraints from the NuTeV dimuon cross-sections.
Finally, regarding the nuclear modifications of the lead gluon PDF illustrated in the bottom right panel of Fig. 4.8, we again find that R Pb g agrees with unity across all relevant x. Here the initial-scale differences are washed out partially by DGLAP evolution, but clearly the shadowing in the nuclear gluons is less apparent than for the quarks. Although the nPDF uncertainties increase at large x due to the lack of direct constraints, the qualitative behavior of R Pb g differs between the two PDF determinations.
Nuclear strangeness. The strangeness content of the proton in unpolarized PDF fits has attracted a lot of attention recently. Traditionally, the determination of s(x, Q 2 ) in global proton PDF fits has been dominated by the constraints provided by charm production in neutrino DIS [80][81][82][83][84]. These measurements suggest that the strange sea is suppressed compared to its up and down quark counterparts, favoring values of around r s 0.5 when expressed in terms of the strangeness ratio defined as Other strange-sensitive processes agree qualitatively with the constraints on r s provided by the neutrino DIS data, such as W production in association with charm quarks [85] from CMS [86][87][88] and ATLAS 7 TeV [89], and semi-inclusive deep-inelastic scattering (SIDIS) [90][91][92]. However, the ATLAS measurements of the leptonic rapidity distributions in inclusive W and Z production at 7 TeV [48,93] exhibit instead a strong preference for a symmetric strange sea with r s 1. One should point out that general considerations based on perturbative DGLAP evolution imply that r s → 1 at large Q and small-x, but at low Q and medium/large-x the value of r s is dictated by non-perturbative dynamics.
As was motivated in Ref. [10], it is important to carefully assess the nuclear uncertainties associated to the nuclear strangeness, given that these will potentially affect the determination of the proton strangeness from global fits based on neutrino data. We display in Fig. 4.9 the strangeness ratio r s (x, Q 2 ) defined by Eq. (4.5) obtained with our nNNPDF2.0 result for 1 p, 56 Fe, and 208 Pb at both the input parameterization scale Q 0 = 1 GeV and at a higher scale of Q = 10 GeV.
From the comparison in Fig. 4.9 we find that at the input parameterization scale r s is particularly suppressed in the case of lead, where the central value of nNNPDF2.0 satisfies r s < 0.5 for 5 · 10 −3 ∼ < x ∼ < 0.2. A similar preference for a suppressed strange sea, albeit less pronounced, can be seen in iron nuclei. In any case, the nPDF uncertainties affecting this ratio are rather large, in particular for the heavier nuclei. The fact that for x ∼ < 10 −3 one obtains r s 1 for all three nuclei is a consequence of the parameterization preprocessing, whose ranges are chosen to ensure that in the small-x extrapolation region all quark and antiquark flavors behave in the same way (see Figs. 3.2 and 3.1). Once DGLAP evolution takes place, r s tends to become closer to unity across a wider range in x, but even at the higher scale a suppressed strangeness for x ∼ > 0.01 is preferred for both iron and lead. The results in Fig. 4.9 suggest that including neutrino CC structure functions such as CHORUS and NuTeV in proton PDF fits without accounting for nuclear uncertainties might not be a justified approximation, given the current precision that modern fits achieve.  It will be interesting nonetheless to determine the impact on the global NNPDF proton PDF fits when nNNPDF2.0 is used to account for nuclear uncertainties using the procedure outlined in Ref. [10].

Comparison with nNNPDF1.0
We now turn to study the differences between the nNNPDF1.0 and nNNPDF2.0 determinations by tracing back the impact of the various improvements in the latter with a series of comparisons. The goal of this exercise is to assess which of these differences can be identified with specific methodological improvements, such as the cross-section positivity constraint, and which ones are related to the impact of the new experimental information, either the DIS charged current structure functions or the LHC gauge boson production measurements.
The starting point for this study will be a fit denoted nNNPDF1.0r, which has been obtained with the code used to produce nNNPDF2.0 but using the same theory, methodology settings, and input dataset as in the nNNPDF1.0 analysis. The only differences at this level are related to optimizations and improvements implemented in the code to speed up its performance. We have verified that nNNPDF1.0 and nNNPDF1.0r are statistically indistinguishable, thus we can safely adopt the latter as baseline for the comparisons in what follows.
We have then produced several variants of this nNNPDF1.0r baseline, each time adding one extra feature or dataset. The first of these two variants is a fit where the proton boundary condition has been updated to the no-nuclear NNPDF3.1 fit shown in Fig. 3.3. The second is a fit where, in addition to the updated boundary condition, the positivity of cross-sections has been imposed following the procedure described in Sect. 3.3. We display in Fig. 4.10 the comparison between nNNPDF1.0r and these two fit variants. Since the isoscalar neutral-current DIS structure functions used in nNNPDF1.0 are primarily  sensitive to the specific quark combination Σ + T 8 /4, we plot this together with the gluon distribution as a function of x at Q 2 = 10 GeV 2 for carbon, iron, and lead.
First, one can see from Fig. 4.10 that the impact of the new proton boundary condition in the nuclear fit is generally moderate concerning the size of the uncertainty band. There are some differences at the central value level for the small-x quarks and for the nuclear gluon PDF of lead, but in both cases the shifts are much smaller than the associated uncertainties. This does not imply that using the updated proton boundary condition is irrelevant for nNNPDF2.0, but rather that this choice is not particularly impactful for the specific PDF combinations that can be constrained by the nNNPDF1.0 dataset. As shown in Fig. 3.3, the differences between the two variants of the proton boundary conditions are more distinguished for the total strangeness compared to the other quark flavors.
On the other hand, imposing the positivity of the cross-sections leads to more important differences. This is not completely unexpected, since it is well known that in general a model-independent (n)PDF analysis will lead to some cross-sections being negative unless their positivity is explicitly imposed. In our case, one finds that there is not much difference in the quarks, but there are clear changes for the nuclear gluons in iron and lead, especially in the latter. Here we see that imposing the positivity of cross-sections leads to a significant reduction of the nPDF uncertainty band, which in the case of lead can be up to a factor of two.
To illustrate the impact of the cross-section positivity constraint, we display in Fig. 4.11 the longitudinal structure function F L (x, Q 2 ) at the positivity scale Q 2 = 5 GeV. We compare the predictions of the nNNPDF1.0r fits including the updated proton boundary condition with and without the cross-section positivity constraint imposed in both the extrapolation and the data regions. In the left panel, we display also the predictions from the individual replicas of the nNNPDF1.0r(newBC) fit that do not satisfy the cross-section positivity constraints. Indeed, one can observe that many F L replicas become negative in some region of x unless this constraint is explicitly imposed, and that removing them leads to a significant reduction of the nPDF uncertainties, particularly in the small-x region. Interestingly, at medium-x it is largely the upper (rather than the lower) 90% CL limit which is reduced by the positivity constraint: this can be explained by the fact that the very negative F L replicas at small-x were actually higher than the median value at medium-x in order to satisfy the momentum sum rule.
Concerning the impact of the new datasets, a direct comparison of the nNNPDF1.0rlike fits with those including CC DIS and LHC data is not possible since as discussed in Sect. 3.2 the input parameterization basis and the flavor assumptions are different. However, we are still able to assess the relative contribution of the CC structure functions and the LHC gauge boson cross-sections in determining the nNNPDF2.0 results. In Fig. 4.12 we display the nuclear modification ratios for the nPDFs in lead, as was shown in Fig. 4.8, but now comparing the nNNPDF2.0 baseline results with those of a fit that is restricted to DIS structure functions, including charged-current scattering, and that uses identical theoretical and methodological settings. One of the most remarkable features of this comparison is the sizeable impact that LHC measurements have in reducing the uncertainties of the nuclear PDFs. This effect is particularly significant for the gluon and for all quark flavors at x ∼ < 0.1. On one hand, the LHC data clearly reveals the presence of nuclear shadowing at small-x for both the valence and sea quarks, something which is not accessible in a DIS-only fit. This result is consistent with the nuclear modification ratios for the LHC Drell-Yan distributions reported e.g. in Fig. 4.3. On the other hand, the impact of the LHC data on the central values and uncertainties of nNNPDF2.0 at x ∼ > 0.1 is less prominent, although in that region one also observes a reduction of the uncertainties. The fact that Rū 1 and Rd 1 for lead nuclei at large-x is already present at the level of DIS-only fits implies that this trend is favored by the CHORUS and NuTeV charged-current structure functions.

The momentum and valence integrals in nuclei
As was discussed in Sect. 3.2, we impose three sum rules in the nNNPDF2.0 determination, namely the momentum sum rule, Eq. (3.7), and the two valence sum rules, Eqns. (3.12) and (3.14). These constraints are satisfied by adjusting the overall prefactors B f in Eq. (3.6) for the gluon g, the total valence V , and the valence triplet V 3 distributions, respectively. Furthermore, they are independently imposed for each value of A for which there is available experimental data.
Here we investigate the role played by these sum rules in the global nPDF determination. In particular, we address whether or not the physical requirements of energy and valence quark number conservation are satisfied by the phenomenological fit to experimental data (within uncertainties) when the sum rules are not explicitly imposed. Recently, theoretical arguments have been put forward that the momentum sum rule for nucleons in nuclei might not hold [94]. Motivated in this respect, we have carried out a similar study to the one presented in Ref. [46], where global proton PDF fits without imposing the momentum sum rule were performed. In the proton case, while the LO prediction for the momentum integral was to be far from the QCD expectation, both the NLO and NNLO fits exhibited remarkable agreement at the 1% level [46].
We have therefore produced two variants of the nNNPDF2.0 analysis, each based on N rep = 250 replicas, where either the momentum sum rule or the total valence sum rule is not imposed. Afterwards, we evaluate in each case the corresponding momentum and total valence integrals, defined as and assess whether or not they are in agreement with the QCD expectations, namely I M (A) = 1 and I V (A) = 3 respectively. One should note that the momentum and valence sum rules are already satisfied at the level of the proton boundary condition, and thus some constraints are expected to be propagated to the lighter nuclei. However, the analysis of Ref. [46] demonstrates that results would be largely unchanged if the momentum and valence sum rules would have been excluded also from the free-proton baseline. In Fig. 4.13 we display the distribution of the momentum and valence integrals, Eqns.    The corresponding values for the 90% confidence level intervals for each of these two integrals for the relevant values of A, as well as for the free-proton baseline A = 1, can be found in Table 4.3.
From the results presented in Fig. 4.13 and Table 4.3 one finds that the momentum integral is in agreement with the QCD expectation, I M (A) = 1, within uncertainties for all nuclei. In the case of 12 C for example, one finds that 0.97 ∼ < I M ∼ < 1.10 at the 90% confidence level, with somewhat larger uncertainties for the heavier nuclei. Even for lead, where the proton boundary condition has little effect, the median of the distribution is reasonably close to the QCD expectation. The uncertainties on I M are larger in the nuclear PDF analysis than the 1% error found in the proton case [46], as expected since the experimental data for nuclear collisions is far less abundant and further distributed between different nuclei. Nevertheless, the overall consistency with the QCD expectations is quite compelling. Note also that here the proton boundary condition is imposed only for x ≥ 10 −3 , and therefore our prediction for I M (A = 1) is expected be less accurate as compared to the proton global analysis case. The result that the momentum integral agrees with the theoretical predictions for all nuclei is a non-trivial validation of the global nuclear PDF analysis framework based on the QCD factorization hypothesis. It further demonstrates the robustness of our fitting methodology, in that the resulting nPDFs are reasonably stable regardless of whether or not the momentum sum rule is imposed during the fit. To illustrate better this latter point, in Fig. 4.14 we provide a comparison between the baseline nNNPDF2.0 fit at Q 0 = 1 GeV with the variant in which the momentum sum rule is not being imposed. We show the total quark singlet and the gluon for both 56 Fe and 208 Pb. Recall that the momentum sum rule is used to fix the overall gluon normalization in Eq. (3.6). In the case of lead, where the experimental constraints are relatively abundant, we find that both the singlet and the gluon are reasonably similar irrespective of whether or not the momentum sum rule is imposed. The momentum sum rule plays a larger role in iron, especially in reducing the gluon nPDF uncertainties, but interestingly the central value of the all distributions is quite stable when comparing the two fits. This stability is consistent with the results reported in Fig. 4.13.
The main conclusions are qualitatively similar for a fit in which the total valence sum rule has not been imposed. Results of this fit are displayed in the right panel of Fig. 4.13, where the normalized frequency of the total valence integral are shown for the same three nuclei discussed previously. The corresponding 90% CL intervals are also reported in Table 4.3. Similar to the momentum sum results, we find that for the valence integral the fit results agree with the QCD expectations within uncertainties. The preferred value of the valence integral (median) turns out to be I V 2.8 irrespective of A. This implies that even when Eq. (4.7) is not imposed explicitly, the experimental measurements favor the QCD prediction within 5% for all values of A relevant for the present study. We have also verified that, in a similar way as in Fig. 4.14, the resulting nPDFs are reasonably stable regardless of whether or not the valence sum rule is imposed.
Putting together the results of these two exercises, one can conclude that the fit results are relatively stable in the nNNPDF framework even in the absence of the sum rules, consistent with the fact that experimental data and the QCD expectations based on the factorization theorem are in agreement with each other for hard-scattering collisions involving heavy nuclei.

The positivity of physical cross-sections
As was discussed in Sect. 3.3, we impose the requirement that the cross-sections of arbitrary physical processes are positive-definite quantities. This constraint is implemented by means of an additive penalty term in the figure of merit, Eq. (3.18). Moreover, the penalty is constructed from the pseudo-data summarized in Table 3.1, which corresponds to lepton-nuclear scattering structure functions and Drell-Yan cross-sections in proton-nucleus  collisions. Recall that the kinematics of the positivity pseudo-data were chosen to cover those of the actual data used in the fit, see Fig. 2.1.
Here we want to demonstrate that the nNNPDF2.0 determination indeed satisfies these various positivity constraints. In Fig. 4.15 we display a representative selection of the positivity observables imposed in nNNPDF2.0. In particular, we show the DIS structure functions F s 2 (x, Q 2 ) and F L (x, Q 2 ), as well as the Drell-Yan rapidity distributions σ DY uū (y) and σ DȲ ud (y), where the bands indicates the 90% confidence level uncertainty interval. We use a scale of Q 2 = 5 GeV 2 , which corresponds to the same scale in which Eq. (3.18) is imposed. Furthermore, we provide the positivity predictions for both iron and lead nuclei. Note that since the Drell-Yan cross-sections are not normalized by the value of A, the absolute magnitude of the two nuclei are different. Of course, the overall normalization is not relevant for the implementation of the positivity constraint.
The selection of positivity observables in Fig. 4.15 is representative since it contains one of the quark structure functions (F s 2 ) constraining a q + combination, F L that is sensitive to the gluon positivity, and a diagonal and off-diagonal DY cross-section which are relevant for different aspects of quark flavor separation (confer also the LO expressions in App. A).
Here the Drell-Yan cross-sections are represented as a function of x 2 , which corresponds to the momentum fraction of the nuclear projectile obtained using the LO kinematics of Eq. (A.15). While the positivity constraint was only implemented for x 2 ∼ > 10 −2 with a per-nucleon center-of-mass energy of √ s = 23.5 GeV, we illustrate instead the positivity for a choice of kinematics that allow a reach to x 2 ∼ 10 −3 , with a per-nucleon center-of-mass energy of √ s = 74.5 GeV. As can be seen from Fig. 4.15, the nNNPDF2.0 determination satisfies the positivity of physical cross-sections in the entire kinematic range. Here the nPDF uncertainty bands become larger near the kinematic endpoints (x = 1 for DIS and x 2 10 −3 for Drell-Yan), since these correspond to regions of the phase space where experimental constraints are scarce. Recall that by virtue of DGLAP evolution properties, these results ensure the cross-sections involving higher momentum transfers, Q 2 > 5 GeV 2 , will also be positive provided one maintains the initial coverage in x 2 . Therefore, we conclude that while we have not explicitly imposed the positivity at the level of the nuclear PDFs, physical observables constructed from nNNPDF2.0 are guaranteed to satisfy the positivity requirement.

Implications for photon and hadron production in nuclear collisions
In this section we discuss some phenomenological applications of the nNNPDF2.0 determination. Theoretical predictions for isolated photon production in proton-lead collisions at the LHC are first compared with recent measurements from the ATLAS collaboration taken at √ s = 8.16 TeV. We then revisit the potential of the FoCal upgrade to the ALICE detector in constraining the small-x gluon nuclear PDF using measurements of direct photon production in the forward region. Finally, we provide predictions based on nNNPDF2.0 for inclusive hadron production in proton-nuclear collisions, a process that can constrain both the quark and gluon nuclear PDFs as well as the corresponding fragmentation functions in vacuum and in medium.
Several additional applications of our nNNPDF2.0 result are expected to be of phenomenological interest. In particular, our nPDFs could be used to study the constraining power of inclusive and heavy quark structure function measurements at the recently approved Electron Ion Collider (EIC) [95] and the proposed Large Hadron electron Collider (LHeC) [96]. Initial studies based on nNNPDF1.0 and EIC neutral-current structure function pseudo-data were presented in Ref. [13]. However, updated projections for EIC pseudo-data are now being finalized based on more realistic accelerator and detector settings. We will therefore defer an update to our nNNPDF1.0 study of EIC pseudo-data to an upcoming Conceptual Design Report where the more accurate EIC specifications will be presented together with impact studies from various nuclear PDF analysis groups.

Isolated photon production in pA collisions with ATLAS
Production of isolated photons in proton-proton collisions is primarily sensitive to the gluon content of the proton via the QCD Compton scattering process [97,98]. However, several complications associated with the measurement of photon production complicate a clean interpretation in terms of hard-scattering cross-sections, such as the need for subtracting the fragmentation component and the removal of photons coming from pion decays. Although early PDF fits used photon production in fixed-target scattering to constrain the gluon data, the data were eventually discarded in favor of the cleaner and more abundant data on jet production at the TeVatron [3]. However, photon production measurements from ATLAS and CMS were later revisited using NLO QCD theory in Ref. [99] and again at NNLO in Ref. [100]. These studies demonstrated the consistency of collider-based isolated photon production measurements with QCD predictions and with the rest of the datasets in the global analysis. Moreover, they help to reduce the uncertainties on the gluon PDF at x 10 −2 , a region that is particularly relevant for theoretical predictions of Higgs production in gluon fusion.
Photon production is also a highly relevant process in the context of heavy ion collisions. Being a QCD-neutral probe, it traverses the quark-gluon plasma without modifications and thus represents a robust baseline to study the hot and dense medium properties [101]. In order to disentangle hot from cold nuclear matter effects, photon production has been measured in proton-lead collisions at the LHC, providing a new channel to constrain the nuclear modifications of the gluon PDF. Here we focus on the recent ATLAS measurements from Run II at √ s = 8.16 TeV based on an integrated luminosity of L = 165 nb −1 [102]. This analysis provides the inclusive production rates of isolated prompt photons in three different rapidity regions as a function of E γ T , the photon transverse energy, in the range 20 GeV to 550 GeV. To compute the corresponding theoretical predictions, we use NLO QCD theory with the same settings as in Ref. [100] by adopting a modified version of MCFM v6.8 interfaced to APPLgrid. The settings of the calculation have been adjusted to map the experimental isolation conditions and thus bypass the need to explicitly account for the fragmentation component. We have benchmarked the results of this MCFM-based calculation with the theory predictions presented in Ref. [102] based on the JETPHOX program [103], finding a reasonable agreement but also some differences at large rapidities.
In Fig. 5.1, we display the comparison between the ATLAS measurements of the photon transverse energy (E γ T ) distributions in three rapidity bins with the corresponding NLO QCD theory calculations based on MCFM with nNNPDF2.0 and EPPS16 as input. For each rapidity bin, the upper panels display the absolute distributions and the lower panels the corresponding ratio between the central value of the experimental data and the theory calculations. The error bands in the experimental measurements indicate the sum in quadrature of the statistical and systematic experimental uncertainties, while in the theory calculations the error bands correspond to the 90% CL ranges.
From this comparison one finds that the theory calculations appears to undershoot the experimental data by roughly 25% in the three rapidity bins for most of the E γ T range, both for nNNPDF2.0 and EPPS16. This discrepancy cannot be accommodated within the stated experimental and theoretical uncertainties. A similar qualitative behavior was reported in the original ATLAS publication based on the JETPHOX predictions. On the other hand, the NNPDF3.1 proton PDFs are known to describe well the corresponding isolated photon measurements from proton collisions at both √ s = 8 and 13 TeV [100].
As expected, the disagreement between the experimental data and the theory calculations found in Fig. 5.1 is translated into poor χ 2 values. Using nNNPDF2.0, one obtains that χ 2 /n dat = 9.1, 10.5, and 8.5 in the forward, central, and backwards rapidity bin. Similar numbers are obtained in the case of the theory predictions based on EPPS16. The situation does not improve by much if the ATLAS photon data is added to the nNNPDF2.0 global analysis. In such a case the agreement between the theory calculations and the experimental data improves somewhat, with χ 2 /n dat = 6.1, 7.5, and 5.7 for the three rapidity bins, but it remains far from satisfactory.
Until the origin of this disagreement between theory and data is better understood, it will not be possible to include the ATLAS prompt photon production measurements in a global nPDF fit. Alternatively, one could instead consider fitting related observables that are presented in the same ATLAS publication. The first of these is the nuclear modifications ratio R pPb , where the absolute E γ T distributions in pPb collisions are normalized to their pp counterparts, the latter being derived from a simulation-derived extrapolation from data in proton-proton collisions at √ s = 8 TeV. The second is the ratio between different rapidity bins, such as between forward and backward rapidities, as a function of E γ T . The advantages of such ratios is that many experimental systematic uncertainties partially cancel out, thus facilitating the comparison with theoretical predictions. On the other hand, these observables might also exhibit a reduced nPDF sensitivity, in particular for the ratio between different rapidity bins. Future studies should shed more light on the usefulness of the prompt photon measurements to constrain nuclear PDFs within a global analysis.

Isolated photon production in pA collisions with FoCal
Current measurements of direct photon production at the LHC, such as those discussed above from the ATLAS collaboration [102] as well as related measurements from CMS and ALICE [104], are restricted to the central rapidity region. The reason is that this is the only region instrumented with electromagnetic calorimeters and thus suitable to identify photons. A measurement of isolated photon production in the forward region, however, is also highly interesting for nPDF studies. Not only would such measurements provide direct access to the poorly-known gluon nuclear modifications at small-x, but it would also allow testing for the possible onset of QCD non-linear dynamics [105].
With this motivation, a new forward calorimeter extension of the ALICE detector, dubbed FoCal [106,107], has been proposed. Both the acceptance and instrumentation of this detector have been optimized to provide access to the nuclear PDFs at low scales and small momentum fractions via the measurement of isolated photon production at low transverse momenta and forward rapidities in proton-ion collisions. The FoCal is proposed for installation during the Long Shutdown 3 (2025-2026) phase of the LHC.
The impact of future FoCal measurements on the small-x nuclear PDFs was first studied in Ref. [108]. In that analysis, pseudo-data based on the expected kinematical reach and experimental uncertainties for FoCal was generated and used to constrain the nNNPDF1.0 determination by means of the Bayesian reweighting method [109,110]. It was found that the FoCal measurement would constrain the nuclear gluon modifications down to x 10 −5 , leading to an uncertainty reduction by up to an order of magnitude as compared to the baseline fit. These results indicated a comparable or superior constraining power on the small-x nPDFs when compared to related projections from future facilities, such as the Electron Ion Collider [111].
Motivated by the new and improved projections for the FoCal pseudodata that have recently became available, we revisit their impact on nuclear PDFs using the present nPDF determination. In this case, the nNNPDF2.0 PDFs represent a more realistic baseline since they provide a robust quark flavor separation with a better handle on the gluon. Moreover, the positivity of physical cross-sections is guaranteed, a constraint that helps to reduce the small-x nuclear PDF uncertainties.
For this study we have adopted the same settings as in Ref. [108] and computed NLO QCD predictions with a modified version of INCNLO that benefits from improved numerical stability at forward rapidities [112]. Theoretical predictions for FoCal cross-sections have been computed with N rep = 400 replicas of nNNPDF2.0, which are subsequently used to account for the impact of the FoCal pseudo-data by means of Bayesian reweighting. 5 Fig. 5.2 displays the nuclear modification factor R pPb (p γ T ) for direct photon production in pPb collisions at √ s = 8.8 TeV for a rapidity of η γ = 4.5 as a function of the photon's transverse momentum p γ T . The theoretical predictions based on NLO QCD theory are compared with the FoCal pseudo-data for two sets of input nPDFs: the original nNNPDF2.0 set, and the variant that has been reweighted with the FoCal projections. Here the central value of the FoCal pseudo-data has been chosen to be the same as that of the nNNPDF2.0 prediction. In the right panel of Fig. 5.2 we show the gluon nuclear modification factor R g (x, Q) for Q 2 = 10 GeV 2 for both the original and the reweighted nNNPDF2.0 fits. In all cases, the nPDF uncertainty bands correspond to the 90% confidence level intervals.
From the results of Fig. 5.2, one finds that the FoCal measurements would still impact the uncertainties of the nuclear gluon modifications at small-x, especially in the upper limit of the uncertainty band. The effective number of replicas in this case is N eff = 345. Note that nNNPDF2.0 exhibits a preference for R pPb 1, and thus shadowing is not favored in the gluon sector, consistent with the results reported in Fig. 4.6. On the other hand, nNNPDF2.0 does not contain any dataset with particular sensitivity to the nuclear gluon modifications, implying that the projections for the impact of FoCal in the global nPDF analysis could be somewhat over-optimistic (see also the discussion in Sect. 6).
Crucially, however, we have assumed in this exercise that the central value of the FoCal measurement would be unchanged compared to the initial baseline prediction. In Fig. 5.3 we display instead the results of the reweighting for a scenario in which the FoCal pseudodata have a value of R pPb 0.6. In this case, the effective number of replicas is much smaller, N eff = 117, indicating that the FoCal data are adding a significant amount of new information to the global fit. Here the resulting value for the gluon nuclear modification ratio at small-x would be R g 0.7. 6 Therefore, this analysis indicates that FoCal measurements could be sensitive either to the gluon shadowing effects or to possible non-linear QCD dynamics. To disentangle one from the other, a dedicated analysis of the χ 2 and nPDF behavior in the small-x region would be required, following the approach developed in Ref. [113].

Inclusive hadron production in pA collisions
The inclusive production of pions and kaons in hadronic collisions provides information not only on the initial state (parton distribution functions) but also on the final-state hadronization mechanism of partons into hadrons. The latter is described by the fragmentation functions (FFs), which are extracted from experimental data by means of a global analysis akin to that of the PDFs [92,[114][115][116][117][118][119]. Likewise, in proton-nuclear collisions the production of identified hadrons can provide information on the initial state nuclear PDFs as well the parton-to-hadron hadronization in the presence of cold nuclear matter effects. In Fig. 5.4, we display the nuclear modification ratio R π 0 Pb for the production of neutral pions in proton-lead collisions as a function of the pion transverse momentum p T . The theoretical calculations are based on NLO QCD and use the DSS14 hadron fragmentation functions [114] for both the nNNPDF2.0 and EPPS16 predictions. 7 Moreover, the central values and 90% CL uncertainties are provided for RHIC kinematics, corresponding to √ s = 200 GeV, and for LHC kinematics, where √ s = 8.16 TeV. In both cases, the pions are assumed to be measured at central rapidities, y π 0 = 0. See Refs. [115,120] for additional details regarding the theoretical calculation of inclusive pion production in hadronic collisions. From Fig. 5.4 we can see that the nNNPDF2.0 prediction for R π 0 Pb is consistent with unity within uncertainties for all values of p T both at RHIC and LHC kinematics. At RHIC kinematics, we find that the ratio is less than one at the smallest p T values, becomes R > 1 between p T = 3 and 17 GeV, and then goes back to R < 1. Since inclusive hadron production is dominated by quark-gluon scattering, in particular the scattering of valence quarks for neutral pion production, this behavior is consistent with the results shown in Fig. 4.8. From low to high p T , one moves from the shadowing region to the anti-shadowing enhancement, and ends in the region sensitive to EMC suppression. A similar explanation 7 We are grateful to Ilkka Helenius for providing us with the results of this calculation.
can be made for the trends in R π 0 Pb at the LHC kinematics. However, here the ratio p T / √ s does not become large enough to reach the EMC region, and thus the ratio remains larger than one for most of the p T range as a result of anti-shadowing effects. Lastly, the EPPS16 predictions agree with the nNNPDF2.0 result well within uncertainties, reflecting the underlying consistency at the nPDF level.
Overall, the results of Fig. 5.4 confirm that inclusive hadron production in protonnucleus collisions can provide a handle on the nuclear PDF modifications at medium and large-x, although an optimal interpretation of the experimental data can only be achieved by the simultaneous determination of the nPDFs together with the hadron fragmentation functions.

Summary and outlook
In this work we have presented a model-independent global determination of nuclear parton distributions by incorporating the constraints from nuclear DIS structure functions and gauge boson production in proton-lead collisions. We have demonstrated that a satisfactory description of all the fitted data sets can be achieved, highlighting the reliability of the QCD factorization paradigm in the heavy nuclear sector. Our results demonstrate significant nuclear effects among the quark flavors in nuclei, in particular a shadowing of the up and down quark distributions in heavy nuclei such as lead. Nuclear modifications are found also in the strangeness of heavier nuclei, displaying a suppression with respect to the free proton across a large region of x. In addition, we have shown that upon releasing the momentum and valence sum rule constraints, the data prefer integral values that agree with QCD expectations for all values of A.
We have also explored some phenomenological implications of the nNNPDF2.0 determination. We first compared nNNPDF2.0 theoretical predictions with ATLAS measurements of isolated photon production at the LHC, an important hard probe in proton-lead collisions. We then studied the impact on the small-x nuclear gluon PDF from the forward isolated photons production at the FoCal upgrade of the ALICE detector. Lastly, we analyzed our theory predictions for neutral pion production in proton-lead collisions at RHIC and LHC center-of-mass energies. Apart from these applications, the nNNPDF2.0 PDF set can be used as input to theoretical predictions for a range of other hard processes in pPb and PbPb collisions, in particular for heavy ion collisions involving lighter nuclei, in comparisons with non-perturbative nuclear models, and with QCD calculations at small-x involving dense nuclear matter. We also expect the nNNPDF2.0 release to be used in future proton global PDF fits to estimate the theory uncertainties associated with neutrino scattering data [10], and also in high-energy astroparticle physics processes that involve hard scattering on nuclei [121,122].
While the input data set used in this work allowed for a state-of-the-art determination of the nuclear quarks and anti-quarks, it only provided loose constraints on the nuclear gluon PDF, especially for heavier nuclei where uncertainties are relatively large. To bypass this limitation, the next step in the nNNPDF family of nuclear PDF fits will be to include additional datasets that provide direct information on the nuclear gluon modifications. In addition to the isolated photon production measurements discussed in Sect. 5.1, perhaps the most attractive candidate in this respect is dijet production in pPb collisions. Measurements of dijet production from Run I in pp collisions have been recently analyzed in the framework of NNLO QCD theory in Ref. [123], demonstrating a good compatibility with the global dataset and a marked constraining power on the large-x gluon. In the corresponding pPb case, an EPPS16-based profiling analysis [37] of CMS dijet data at √ s = 5.02 TeV [124] revealed a significant pull of this measurement on the nuclear gluon modifications.
Another process that is known to provide important information on the nuclear gluon PDF is charmed meson production, in particular from the LHCb measurements in the forward region [32]. This process offers unique sensitivity to the small-x (n)PDFs down to x 10 −6 , as was demonstrated by proton [122,125,126] and nuclear studies [35,127]. Fully exploiting the constraints provided by these measurements requires, as for the rest of hard probes in nuclear collisions, a consistent theoretical and methodological treatment of charm production in both proton and nuclear global QCD analyses.
On a longer timescale, one might aim to achieve a determination of the proton and nuclear PDFs simultaneously from a universal analysis, thus bypassing the need to include proton information by means of the proton boundary condition penalty. In the same spirit of the QCD analyses of proton PDFs and fragmentation functions presented in Refs. [92,128], such an integrated fit of proton and nuclear PDFs would ensure the ultimate theoretical and methodological consistency of the determination of the nuclear modifications of the free-nucleon quark and gluon structure.
The nNNPDF2.0 determination is available in the LHAPDF6 library [129] for all relevant nuclei from A = 1 to A = 208. The nNNPDF2.0 sets are available both for the nPDFs of bound protons, f (p/A) (x, Q 2 ), and those of bound nucleons, f (N/A) (x, Q 2 ), following the conventions in Sect. 3.1. Each of these sets is composed by N rep = 250 correlated replicas, see Sect. 5 of [13] for their usage prescriptions. The naming convention used for the sets is the following: nNNPDF20 nlo as 0118 N1 nNNPDF20 nlo as 0118 N1 nNNPDF20 nlo as 0118 D2 nNNPDF20 nlo as 0118 p A2 Z1 nNNPDF20 nlo as 0118 He4 nNNPDF20 nlo as 0118 p A4 Z2 nNNPDF20 nlo as 0118 Li6 nNNPDF20 nlo as 0118 p A6 Z3 nNNPDF20 nlo as 0118 Be9 nNNPDF20 nlo as 0118 p A9 Z4 nNNPDF20 nlo as 0118 C12 nNNPDF20 nlo as 0118 p A12 Z6 nNNPDF20 nlo as 0118 N14 nNNPDF20 nlo as 0118 p A14 Z7 nNNPDF20 nlo as 0118 Al27 nNNPDF20 nlo as 0118 p A27 Z13 nNNPDF20 nlo as 0118 Ca40 nNNPDF20 nlo as 0118 p A40 Z20 nNNPDF20 nlo as 0118 Fe56 nNNPDF20 nlo as 0118 p A56 Z26 nNNPDF20 nlo as 0118 Cu64 nNNPDF20 nlo as 0118 p A64 Z29 nNNPDF20 nlo as 0118 Ag108 nNNPDF20 nlo as 0118 p A108 Z47 nNNPDF20 nlo as 0118 Sn119 nNNPDF20 nlo as 0118 p A119 Z50 nNNPDF20 nlo as 0118 Xe131 nNNPDF20 nlo as 0118 p A131 Z54 nNNPDF20 nlo as 0118 Au197 nNNPDF20 nlo as 0118 p A197 Z79 nNNPDF20 nlo as 0118 Pb208 nNNPDF20 nlo as 0118 p A208 Z82 Additional variants of the nNNPDF2.0 NLO fit present in this work, such as the fits without the momentum and valence sum rules and a N rep = 1000 replica set for lead, are available on the NNPDF collaboration website: http://nnpdf.mi.infn.it/for-users/nuclear-pdf-sets/ and evolution bases, and in terms of the average bound nucleon PDFs. In what follows, we further simplify the conventions adopted in Sect. 3.1 to 1. Lepton-nucleus scattering. The double differential cross-section for the DIS of a charged lepton off a nucleus with mass number A is given by 3) with i = NC, CC, η NC = 1, η CC = (1 ± λ) 2 η W , where η W denotes the squared ratio of the W-boson couplings and propagator with respect to those of the photon, and λ is the helicity of the incoming lepton. In the case of NC DIS we restrict ourselves to photonmediated processes, for which F NC 3 = 0. The usual DIS kinematic variables are defined as Y ± = 1 ± (1 − y) 2 and x = Q 2 2P · q , Q 2 = −q 2 , y = q · P k · P . (A.4) A1. Neutral-current DIS. For NC DIS, the underlying process is lepton-nucleus scattering mediated by a virtual photon exchange, l ± + A γ − → l ± + X. Since the available data is at Q 2 << M Z , the contributions from Z boson exchange can be neglected. The double-differential cross-section for the scattering of a charged lepton off a nucleus with atomic and mass numbers Z and A is proportional at leading order to the F 2 structure function which can be expressed as Here we define the structure function per nucleon, rather than the structure function of the nucleus as a whole. We can see from the above expression that for an isoscalar nucleus A = 2Z the contribution proportional to T 3 vanishes and the nuclear structure functions depend only on Σ and −−→ +l + + X. Due to the fact that W + and W − bosons couple to different quark flavors, the difference between neutrino and anti-neutrino structure functions provides a handle on quark flavor separation. Here we provide the expressions for ν(nu)A scattering, the expressions for the conjugate process involving the CC scattering of charged leptons is the same. The expressions for the inclusive CC structure functions via W − -boson exchange,ν + A W − −−→ +l + X, are the following: Here we consider only the contribution from up-, down-, and strange-initiated processes for simplicity. The generalization to heavy-quark initiated processes is straightforward. An interesting observation from these expressions is that F 2 and F 3 provide complementary information on quark flavor separation.
The corresponding expressions for the inclusive charged-current structure functions in the case of neutrino scattering via W + -boson exchange, ν + A W + −−→ +l − X, are given by We now turn to the exclusive CC charm production structure functions, required for the description of the NuTeV cross-sections. Following the FONLL treatment of heavy quark structure functions, the charm contribution to the inclusive structure function is defined by all terms where the charm quark couples to the W boson. Note that this definition is somewhat different from the experimental definition, where charm structure functions are identified by requesting the presence of charm in the final state, while the theory definition includes charm-initiated contributions as well. The charm production structure functions in the case of antineutrino-and neutrino-initiated scattering reads for W − as .11) and for W + : 2. Weak boson production. Here we provide the corresponding leading order expressions for weak gauge boson production in proton-nucleus collisions. Experimental measurements for NC observables are provided in terms of the invariant mass M and rapidity of the dilepton final state, while CC measurements are binned in terms of the charged lepton rapidity. Here we show the LO expressions without accounting for the gauge boson decay, so in the case of W ± production the connection with the experimental data is somewhat less direct than for Z boson production.
A2. Neutral current DY We start by discussing NC Drell-Yan (DY), namely Z-boson production in proton-nucleus collisions, p + A Z − → l + l − . The leading order expressions are given by dσ pA Z dM 2 dy ∝ A a u u 1ū2 +ū 1 u 2 + a d d 1d2 +d 1 d 2 + s 1s2 +s 1 s 2 (A.14) where q 1 = q(x 1 , M 2 ) and q 2 = q(x 2 , M 2 ) and the Drell-Yan kinematics at LO are: where t (f ) 3 is the weak isospin of fermion f , Q f is the electric charge and θ W is Weinberg's angle.
B2. Charged current DY The corresponding LO expressions in the case of W − and W + boson production in pA collisions are given by where in this case y stands for the W boson rapidity, which in general is different from the pseudo-rapidity of the charged lepton that is measured by experiments. However, in either case the PDF combinations that enter at LO remain the same. From the comparison between the LO expressions of the DIS structure functions and the W and Z production cross-sections in pA collisions, it is clear that each group of process is sensitive to a different combination of quark and antiquark PDFs. Therefore, their combination (due to the PDF universality in QCD) into a global QCD analysis provides a unique handle for a robust separation between quark and antiquark flavors in nuclei.