Parton distribution functions (PDFs) are key elements required to generate concrete predictions for processes with hadronic initial states in the context of QCD factorization theorems. The success of this theoretical framework has been extensively demonstrated in fixed-target and collider experiments (e.g., at the TeVatron, SLAC, HERA, RHIC, LHC), and will be essential for making predictions for future facilities (EIC, LHeC, FCC). Despite the above achievements, there is yet much to learn about the hadronic structure and the detailed composition of the PDFs [1, 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20].

Fig. 1
figure 1

Contribution of strange initiated channels to \(W^{\pm }\) and Z boson production for proton-lead (pPb) at the LHC. The blue lines represent total cross-sections, the yellow lines are cross-sections with the strange initiated channels subtracted. The lower panels show the ratio compared to the total cross-section. This calculation used the nCTEQ15 nPDFs [1] with a modified version of the FEWZ [2, 3] program

Although the up and down PDF flavors are generally well-determined across much of the partonic x range, there is significant uncertainty in the strange component, s(x). The strange PDF is especially challenging because, in many processes, it is difficult to separate it from the larger down component. However, as we push to higher precision and energies, an accurate determination of the strange PDF is, apart from its intrinsic fundamental importance, essential not only for LHC measurements, but for a wide variety of processes [4, 13,14,15,16,17,18,19,20]. For example, the knowledge of the nuclear strange distribution in heavy nuclei is crucial for providing a reliable baseline for hard probes of the quark gluon plasma (QGP) which is characterized by enhanced production of strangeness [21,22,23]. Additionally, small x nuclear PDFs are essential for computing the composition of air showers from ultra-high energy cosmic rays [24,25,26,27,28].

The recent results from the LHC for inclusive W/Z boson production in pp collisions prefer a large strange to light-sea ratio [29,30,31]. This is a rather surprising result as it differs from earlier determinations based on analyses of neutrino deep inelastic scattering (DIS) data from NuTeV and CCFR experiments [32,33,34] or charged kaon production data from HERMES [35]. Furthermore, LHC measurements of associated \(W+c\) production generally favor a smaller ratio for strange quark PDF, and this is more in line with the above fixed-target results. [36,37,38]. See Ref. [39] for more details on the earlier determinations of the strange distribution, and Ref. [40] for a study of the compatibility of the ATLAS and CMS results on associated \(W+c\) and inclusive W/Z boson production using the xFitter framework [41].

It is not easy to directly compare the pp LHC results with the fixed target experiments, as the earlier measurements were generally done using nuclear targets (typically Fe or Pb). Additional complications arise from the fact that there is a controversy about the proper nuclear correction factors for the charged current (CC) and neutral current (NC) DIS measurements [42,43,44,45,46]. As a result, the choice of heavy target neutrino DIS data sets varies widely among not just the many nuclear PDF (nPDF) determinations, but also for the proton PDF fits [4, 5]. Moreover, in the proton case the nuclear corrections are applied in different ways.

Conversely, it was already demonstrated that the W/Z LHC data can provide some important information on the strange and gluon nPDFs [6, 8, 47]. To demonstrate the impact of the heavy ion \(W^\pm /Z\) data on the strange PDF, in Fig. 1 we display the contribution of the strange-initiated process as a function of rapidity. We observe the strange component can be as much as 20–30% of the total. These plots were produced with FEWZ [2, 3] modified for the pPb beams using the nCTEQ15 nPDFs [1]. Since the nCTEQ15 nPDFs are based on a proton where the strange PDFs is given by \(s+{\bar{s}}=\kappa ({\bar{u}}+{\bar{d}})\), we expect these plots to represent a conservative estimate of the strange contribution to the W/Z channels.

For this reason, we concentrate in the following on the constraints for the nuclear strange and gluon distributions given by the W and Z data from proton-lead (pPb) collisions at the LHC. This process is an ideal QCD “laboratory” as it is sensitive to (i) the heavy flavor components \(\{s,c,\ldots \}\), (ii) the nuclear corrections, and (iii) the underlying “base” proton PDFs. Such an analysis provides an independent perspective on the subject and can help disentangle the flavor separation and nuclear modifications.

In the current investigation, we will study the production of W and Z bosons in proton–lead (pPb) collisions at the LHC; this involves similar considerations as the pp case, but also brings in the nuclear corrections. We will be focusing, in particular, on the strange and gluon distributions to see how these are modified when the LHC measurements are included. In Sect. 2 we review the various data sets used in our analysis along with the separate fits extracted. In Sect. 3 we present the quality of the fits and comparisons of data with the theory, and demonstrate the impact on the resulting PDFs. In Sect. 4 we compare our final PDF fit with other results from the literature. In Sect. 5 we recap the key outcomes of this study. Finally, in Appendix A we provide additional details on the normalization of data sets and this contribution to the \(\chi ^2\).

Fits to experimental data

The nCTEQ++ framework

The nCTEQ project extends the proton PDF global fitting effort by fully including the nuclear dimension.Footnote 1 Previous to the nCTEQ effort, nuclear data was “corrected” to isoscalar data and added to the proton PDF fit without any uncertainties [48]. In contrast, the nCTEQ framework allows full communication between the nuclear data and the proton data; this enables us to investigate if observed tensions between data sets could potentially be attributed to the nuclear corrections.

The details of the nCTEQ15 nPDFs are presented in Ref. [1]. The present analysis is performed in a new C++ code base (nCTEQ++) which enabled us to easily interface to external programs such as HOPPET [49], APPLgrid [50], and MCFM [51]. The nCTEQ15 fit has been reproduced in this new nCTEQ++ framework.

For the current set of fits, we use the same 16 free parameters as for the nCTEQ15 set, and additionally open up three parameters for the strange PDF, for a total of 19 parameters. Recall that for the nCTEQ15 set, the strange PDF was constrained by the relation \(s={\bar{s}}=(\kappa /2)({\bar{u}}+{\bar{d}})\) at the initial scale \(Q_0=1.3\) GeV so that it had the same form as the sea quarks.

Our PDFs are parameterized as

$$\begin{aligned} x f_i^{p/A}(x,Q_0) = c_0 x^{c_1} (1-x)^{c_2} e^{c_3 x} (1+ e^{c_4} x)^{c_5} , \end{aligned}$$

and the nuclear A dependence is encoded in the coefficients as

$$\begin{aligned} c_k \longrightarrow c_k(A) \equiv c_{k,0} + c_{k,1} (1-A^{-c_{k,2}}), \end{aligned}$$

where \(k=\{1,\ldots ,5\}\).

The 16 free parameters used for the nCTEQ15 set model the x-dependence of the \(\{g, u_v, d_v, {\bar{d}}+{\bar{u}} \}\) PDF combinations, and we do not vary the \({\bar{d}}/{\bar{u}}\) parameters; see Ref. [1] for details. To this, we now add three strange PDF parameters: \(\{c_{0,1}^{s+{\bar{s}}},c_{1,1}^{s+{\bar{s}}},c_{2,1}^{s+{\bar{s}}} \}\); these parameters describe, correspondingly, the overall normalization, the low-x exponent and the large x exponent of the strange distribution.

Experimental data sets

In this analysis we use the deep inelastic scattering (DIS), Drell–Yan (DY) lepton pair production, and RHIC pion data employed in our earlier nCTEQ15 analysis [1]. Additionally, we use W and Z inclusive data from proton-lead collisions at the LHC. Specifically, we include the following data sets: ALICE \(W^{\pm }\) boson production [57, 58], ATLAS Z boson production [53], ATLAS \(W^{\pm }\) boson production [52], CMS Z boson production [55], CMS \(W^{\pm }\) boson production [54], CMS Run II \(W^{\pm }\) boson production [56], and LHCb Z boson production [59]. The data sets are outlined in Table 1. We note that ALICE has just released a new analysis of Z boson production in Ref. [60]; this data will be included in a future update.

Table 1 The overview of the LHC \(W^\pm /Z\) data sets including the pPb system with per nucleon center-of-mass enery \(\sqrt{s_{NN}}\), experimental normalization uncertainty of the data \(\sigma _{norm}\) arising from the luminosity uncertainty, number of data points, and references

For the calculation of the \(W^\pm /Z\) cross sections, we used the MCFM-6.8 program [61] to generate APPLgrid [50] tables which allow for efficient computation inside the Minuit fitting loop. As a cross check, we validate these grids using the FEWZ program [2, 3] which has been modified to accommodate the proton-lead initial state. Similarly, we use a version of HOPPET [49], for our DGLAP evolution which is extended to accommodate grids of multiple nuclei.

All the theory calculations and PDF evolution are performed at the next-to-leading order (NLO) of QCD. Although the above tools allow for higher precision with NNLO calculations [62], the current uncertainties on the nuclear PDFs are sufficiently large that NLO accuracy is entirely satisfactory for our present study.

For the \(W^\pm /Z\) cross sections, our grids are also computed to NLO, and it is important to estimate the potential uncertainty arising from this choice. These cross sections have been computed out to NNLO in Ref. [63], and they observed a large shift between the LO and NLO results. However, comparing the NLO and NNLO results for both \(W^\pm /Z\) production, they find the uncertainty bands are decreased but the NNLO results lie within the NLO error band. In a separate analysis, Ref. [64] computed the \(W^\pm /Z\) production cross sections in a PDF framework for both NLO and NNLO for both the Tevatron (\(p{\bar{p}}\)) and the LHC (pp). This study found the variation of the \(W^\pm /Z\) production cross sections in both cases to be about 1.5%, c.f., Tables 1 and 2 of Ref. [64]. While we have not assigned any theoretical uncertainties in our current analysis, the above suggest a reasonable estimate would be on the order of a percent or two, and is small enough that it will not significantly alter our general conclusions. Similarly, the impact of the NNLO corrections to the DIS structure functions will be very small and even further minimized due to the fact that these data come as structure function ratios.

The PDF fits

We now use the nCTEQ++ framework to include the LHC \(W^\pm /Z\) pPb data and extend the nCTEQ15 fit. Comparing the LHC pPb data to the nCTEQ15 results, we find that these data generally lie above the theory predictions [47]; hence, including a normalization uncertainty is essential to obtain a good fit to these data sets.

Table 2 The LHC \(W^\pm /Z\) data sets and the corresponding ID’s are listed. The applied normalization shifts \(N_{norm}\) (where appropriate) are applied to the data sets. The normalization shifts for the nCTEQ15WZ set are the same as for the Norm3 fit
Table 3 We present the \(\chi ^2/N_{dof}\) for the individual data sets, the individual processes {DIS, DY, Pion, LHC}, and the total. We also show the total \(\chi ^2\) contribution from the LHC normalization penalty. Note that the Pion \(\chi ^2/N_{dof}\) is shown for comparison, but this data is only included in the nCTEQ15WZ fit

Normalization Factors: In our fits, we can allow for a floating normalization of individual data sets, and we label these \(N_{norm}\) as listed in Table 2. The experiments have an associated luminosity uncertainty, which we identify in Table 1 as \(\sigma _{norm}\). As our fit modifies the floating normalizations factors \(N_{norm}\), the quoted experimental luminosity uncertainties \(\sigma _{norm}\) serve as a gauge to calibrate these normalization shifts, and this contribution to the \(\chi ^2\) computation is displayed in Table 3. Additional details are presented in Appendix A.

It is reasonable to tie the normalizations for data from individual experiments to a single normalization factor as these uncertainties are fully correlated. Thus, in our collection of fits below, we use a total of 3 normalization factors \(N_{norm}\) as summarized in Table 2: (1) ATLAS Run I \(\{W^\pm ,Z\}\), (2) CMS Run I \(\{W^\pm ,Z\}\), and (3) CMS Run II \(\{W^\pm \}\). We do not add additional normalization factors for ALICE and LHCb sets as they have limited data points and we obtain a good \(\chi ^2/N_{dof}\) without any normalization shift.

The Fits: Previous studies implied a close connection between the normalization of the \(W^\pm /Z\) data and the extracted strange PDF [47]. To systematically investigate the effect of the normalization in detail, we will use a series of fits outlined in Table 3 and summarized below:


This is the original set of nuclear PDFs as computed in Ref. [1].


We include the LHC pPb data, but we do not allow for any floating normalization of the LHC data.


We include the LHC pPb data, and allow for 2 normalization factors; one for the ATLAS Run I data, one for CMS Run I; we do not normalize CMS Run II data in this fit.


We include the LHC pPb data, and we allow for 3 normalization factors; one for the ATLAS Run I data, one for the CMS Run I data, and a separate one for the CMS Run II data.


This is the same as Norm3, but we also include the RHIC inclusive pion data directly in the fit. This is discussed in Sect. 4.

All four of these new PDF fits are based on the DIS and DY data from the nCTEQ15 analysis and the LHC data sets, as outlined in Sect. 2.2 and Table 1.

As with our nCTEQ15 study, we will present results both with and without the inclusive pion data [65, 66]. For the comparison of the the \(W^\pm /Z\) normalizations fits {Norm0, Norm2, Norm3}, we will not include the pion data; however, we do compute the pion \(\chi ^2\), as shown in Table 3, to demonstrate the compatibility.Footnote 2

In Sect. 4 we then present a separate fit, nCTEQ15WZ, which does include the pion data. As we will see the impact of the pion data is marginal.

The Normalization Shifts Table 2 shows the determined normalization factors \(N_{norm}\) used in each fit. All the normalization shifts are between \(1 \sigma \) and \(2 \sigma \) of the quoted normalization uncertainty \(\sigma _{norm}\), but all are systematically below unity.

The appropriate normalization penalties are included in the \(\chi ^2\) calculations, and the detailed prescription we use for fitting data normalizations is provided in Appendix A. In the case where no normalizaton shift is applied, we effectively have \({N_{norm}=1}\), and the \(\chi ^2\) contribution of the normalization shift is zero; c.f., the last term of Eq. (6).

Results and discussion

Having presented our series of fits, we now examine (i) the quality of these fits as measured by the \(\chi ^2\) values, (ii) the comparison of the data with our theory predictions, and iii) the impact on the underlying PDFs.

Quality of the fits

Overall quality of the fits: In Table 3 we present the \(\chi ^2/N_{dof}\) for selected data sets as well as for each experiment type,Footnote 3 and the contribution of the normalization penalty to the total \(\chi ^2/N_{dof}\). We compute the normalization penalty as outlined in Appendix A, and this is included in the total.

Examining the total \(\chi ^2/N_{dof}\) of the fits, we see a broad range spanning from 1.66 for nCTEQ15 to values below 1.00, and an even larger range of the \(\chi ^2/N_{dof}\) for the individual LHC data sets.

Quality of individual data sets: To provide more details regarding the source of the \(\chi ^2\) contributions, in Fig. 2 we display the \(\chi ^2/N_{dof}\) values for each individual experiment which enters the fit. The experiments are identified by their 4-digit ID, and the number of data points is indicated at the top of each bar.Footnote 4 Additionally, the bars are color-coded to indicate the type of observable {DIS, DY,W/Z}.

The \(\chi ^2/N_{dof}\) bar charts provide incisive information as to which data sets are driving the fit. We discuss each fit in turn.

nCTEQ15: Starting with the nCTEQ15 set, we note that (except for a few outliers) the DIS and DY data is well described by these PDFs;Footnote 5 by comparison, the LHC \(W^\pm /Z\) data (which was not included in the original nCTEQ15 fit) is not well described. As was detailed in Ref. [47], an important contribution to this large \(\chi ^2\) comes from the small x region where the nuclear PDFs are poorly constrained. The re-weighting analysis of Ref. [47] demonstrated that we can improve the fit by adjusting the small x behavior of the PDFs, but this alone will not bring all the data sets into the range of \(\chi ^2/N_{dof}\sim 1\); something else is required.

Norm0: As a first step, this fit includes the LHC \(W^\pm /Z\) data, but does not include any floating normalization factors. This fit will tell us the extent to which we can adjust the PDFs to fit the LHC data before we begin to adjust the normalization factors. Examining Fig. 2, we see the impact of the DIS and DY data on this fit is generally small for many of the data sets, but does result in noticeable improvement for a few of the sets including 5115 (NMC Ca/D) and 5121 (NMC Li/D), for example. However, it does significantly improve the LHC \(W^\pm /Z\) fit reducing the partial \(\chi ^2/N_{dof}\) of this data from 6.20 to 1.47 for the 120 LHC data points. Although this is a notable improvement, a number of the LHC data sets still have \(\chi ^2/N_{dof}\) values well above one.

Norm2: In this fit, we allow two floating normalization factors (one for ATLAS and one for CMS Run I) which are allowed to vary in the fit. The contribution of the normalization penalty is included in the total \(\chi ^2\).

We see the impact of the floating normalization factors on the DIS and DY data is again small, as was the case for the Norm0 fit. But the Norm2 fit dramatically improves the LHC \(W^\pm /Z\) data reducing \(\chi ^2/N_{dof}\) of this data to 1.15 as compared to 1.47 for the Norm0 fit. While the Norm2 fit is a substantial improvement over the Norm0 results and all LHC data sets have \(\chi ^2/N_{dof}<2\), there are still a few sets at the upper limit of this range.

Norm3: Finally, we now perform a fit with three normalization factors: one for ATLAS (Run I), and one for each CMS Run I and CMS Run II.

As before, the modifications to the DIS and DY sets are minimal, but we do continue to see an improvement in the LHC sets; namely the \(\chi ^2/N_{dof}\) of this data improves to 0.91 as compared to 1.15 for the Norm2 fit.

Comparing Norm3 with nCTEQ15 for the other data sets, we see that the \(\chi ^2/N_{dof}\) for the DIS data is essentially the same (0.91 vs. 0.90), the DY increases slightly (0.73 vs. 0.77), and the pion \(\chi ^2\) (computed a posteriori) increases slightly as well (0.25 vs. 0.39); these differences are relatively small compared to the significant improvement in the LHC data (6.20 vs. 0.90).

Fig. 2
figure 2

The \(\chi ^2/N_{dof}\) of the individual experiments. The number of data points is indicated at the top of the bar, and the total \(\chi ^2/N_{dof}\) for the entire set is shown below the legend. For the nCTEQ15 set, four of the bars extend beyond the chart range; these are \(\chi ^2/N_{dof} = \{6.1, 9.7, 6.4, 13.2\}\) for experiments \(\{6231, 6232,6233,6234\}\)

Fig. 3
figure 3

Comparison of data with theory for ATLAS and CMS \(W^\pm \) production. The normalization shifts are applied to the theory so we can compare all the results on a single plot; the data is unaltered. For reference, ATLAS Run I \(\{W^-,W^+\}=\{6211,6213\}\), CMS Run I \(\{W^-,W^+\}=\{6231,6233\}\) and CMS Run I \(\{W^-,W^+\}=\{6232,6234\}\)

Fig. 4
figure 4

Comparison of data with theory for ATLAS and CMS Z production. The normalization shifts are applied to the theory so we can compare all the results on a single plot; the data is unaltered. For reference, ATLAS Run I \(\{Z\}=\{6215\}\) and CMS Run I \(\{Z\}=\{6235\}\)

Fig. 5
figure 5

Comparison of data with theory for ALICE \(W^\pm \) production. The normalization shifts are applied to the theory so we can compare all the results on a single plot; the data is unaltered. For reference, ALICE Run I \(\{W^-,W^+\}=\{6251,6253\}\)

Comparison of data with theory

To obtain a more complete view of the fit quality, in Figs. 3, 4, 5, and 6 we display the comparison of the LHC data with theory predictions. The data points and errors are taken directly from the experimental measurements. However, it is important to note that we have shifted the theoretical predictions by the appropriate normalization factors; this allows us to present the fits with different normalizations on a single plot, and provides a more accurate visual description of the quality of the fit.

Fig. 6
figure 6

Comparison of data with theory for LHCb Z production (ID 6275). The normalization shifts are applied to the theory so we can compare all the results on a single plot; the data is unaltered

Large x region

Our first observation is that the experimental data consistently lies above our theoretical predictions. From Table 2, recall that all the fitted normalization factors are less than one, indicating that the fit prefers a reduction of the data values, typically in the range of \(\sim 5\%\); because we have shifted the theory, this is not as obvious in Figs. 3, 4, 5 and 6.

Even with the normalization shifts, we see the theory predictions still lie well below the data for a number of sets. This is most evident in the negative y region for the Run I \(W^-\) data sets 6211 (ATLAS \(W^-\)) and 6231 (CMS \(W^-\) Run I), and to a lesser extent 6215 (ATLAS Z). Interestingly, the Run II data generally show good agreement across the full y range.

The negative rapidity region corresponds to the large x region of the lead PDF. The large x region is already rather well constrained by the fixed-target measurements, so there are limits as to how much the new LHC data can shift the PDFs in this region. Also note, that in the large x region we are in the “anti-shadowing region” (\(x\sim 0.1 \)) where the nuclear corrections typically enhance the nuclear PDF relative to the proton. Thus, not including the nuclear corrections in this region would increase discrepancy.

Small x region

In the large rapidity (small x) region, we generally find good agreement between our new fits and the data. But, this is in striking contrast to the nCTEQ15 PDF which lies well below many of the data points at large y; this behavior is clearly evident, for example, in 6215 (ATLAS Z), and 6213, 6232, 6233, 6234 (CMS \(W^\pm \) Run I and Run II). Clearly, the new LHC \(W^\pm /Z\) data provides important new PDF constraints in this kinematic region that were not available in the nCTEQ15 analysis.

As larger rapidity corresponds to smaller x values, this puts us in the “shadowing region” (\(x\lesssim 0.1 \)) where the nuclear PDFs are generally expected to be suppressed relative to the proton. If the nuclear shadowing correction were reduced in this region, that would bring the theory closer in line with the data without the need for large normalization factors. The precise value of the nuclear corrections is still an open question; for example, Refs. [42, 43, 68] found that the shadowing correction for the \(\nu N\) charged-current neutrino DIS was reduced as compared to the \(\ell ^\pm N\) neutral-current DIS. If such an adjustment were applied to the LHC \(W^\pm /Z\) data, it would move the theory closer to the data and reduce the normalization factor. Disentangling the nuclear effects from the underlying parton flavor components is intricate, and a reanalysis of the neutrino DIS data is currently in progress [69].

The PDFs

Finally, we make a detailed examination of the underlying flavor PDFs from these various fits. In Figs. 7, 8 and 9 we display the nPDFs for a full lead nucleus at three separate scales. The lowest scale (\(Q=2\,\hbox {GeV}\)) is close to our initial evolution scale of \(Q_0=1.3\,\hbox {GeV}\), the largest scale (\(Q=90\,\hbox {GeV}\)) is in the range relevant for \(W^\pm /Z\) production, and the intermediate scale (\(Q=10\,\hbox {GeV}\)) helps illustrate the effects of the DGLAP evolution.

Fig. 7
figure 7

The full lead (Pb) PDFs for \(Q=2\,\hbox {GeV}\). The uncertainty band for nCTEQ15 is shown in gray, and for Norm3 in blue. The increase of the Norm0 set is evident for the strange and gluon PDFs in the region of \(x\sim 0.03\)

Fig. 8
figure 8

The full lead (Pb) PDFs for \(Q=10\,\hbox {GeV}\). The uncertainty band for nCTEQ15 is shown in gray, and for Norm3 in blue. The increase in the Norm0 set for the strange and gluon PDFs is reduced, compared to the lower Q result, and shifted to smaller \(x\sim 0.02\) values

Fig. 9
figure 9

The full lead (Pb) PDFs for \(Q=90\,\hbox {GeV}\). The uncertainty band for nCTEQ15 is shown in gray, and for Norm3 in blue

Fig. 10
figure 10

Ratio of strange and gluon nPDFs compared to the corresponding nCTEQ15 nPDFs for \(Q=2\) GeV (upper row) and \(Q=90\) GeV (lower row). The uncertainty band for nCTEQ15 is shown in gray, and for Norm3 in blue

We choose to display the full lead nPDF as this is the physical quantity which enters the calculation.Footnote 6 This is computed using:

$$\begin{aligned} f_i^{(A,Z)} (x,Q) = \frac{Z}{A} f_i^{p/A} (x,Q)+ \frac{A-Z}{A} f_i^{n/A} (x,Q) , \end{aligned}$$

and we assume isospin symmetry to derive the neutron PDF.

Strange and gluon nPDFs

Examining the curves for up and down distributions, we see there is minimal variation between different fits as these flavors are strongly constrained by other data. Interestingly, we also see that the small x uncertainty is reduced at higher scales (see Figs. 8 and 9). We observe a slight modification in the \({\bar{u}}\) and \({\bar{d}}\) distributions as these are closely linked to the gluon and strange distributions which we will discuss in the following.

Turning to the gluon and strange PDFs, we see significant differences. In particular, the fits seems to prefer a larger value for both the gluon and strange PDFs at intermediate x values, which is the region relevant for the LHC heavy ion \(W^\pm /Z\) production. We discuss these fits in turn.

Norm0: Examining the Norm0 fit for \(Q=2\,\hbox {GeV}\) (Fig. 7), we see a distinct excess in the strange and gluon PDFs in the region \(x\sim 0.03\); this is also evident in Fig. 10 where we have plotted the ratio relative to the nCTEQ15 values. At \(Q=2\,\hbox {GeV}\), the peak of the gluon and strange distributions are located at approximately \(x\sim 0.03\); via the DGLAP evolution these peaks shift downFootnote 7 to the region \(x\sim 0.017\) for \(Q=90\,\hbox {GeV}\), consistent with the expectation for the central x value of \(\sim M_{W,Z}/\sqrt{s}\).

Recall that the Norm0 fit does not allow any normalization adjustment in the fit. Since the data consistently lie above the theoretical predictions, it appears that the Norm0 fit is exploiting the uncertainty of gluon and strange PDFs to try and pull up the theoretical predictions in line with the data by increasing the PDFs in the relevant x region. Additionally, we observe a similar (but less pronounced) behavior in the \({\bar{u}}\) and \({\bar{d}}\) distributions.

As momentum must be conserved, we see the Norm0 strange PDF dips below nCTEQ15 at both high and low x values, while the gluon is below nCTEQ15 at higher x values. Part of the reason the deformation of the gluon and strange PDFs is so large at \(Q=2\,\hbox {GeV}\) is to compensate for the DGLAP evolution which will tend to diffuse the excess in the gluon and strange distributions at the \(Q=90\,\hbox {GeV}\) scale, cf., Fig. 9.

Norm2 and Norm3: In contrast to the Norm0 result above, the Norm2 and Norm3 fits allow us to investigate the effect of including the normalization parameters into the fit; this is crucial in reducing the \(\chi ^2/N_{dof}\) for the LHC heavy ion data. The effect on the resulting nPDFs is evident as shown in Fig. 7 where we see that the excess in both the strange and gluon is systematically reduced as we introduce normalization parameters.

In Fig. 7 we also observe the greatly increased error band on the Norm3 strange PDF as compared to nCTEQ15. This counter-intuitive result is due to the additional fitting parameters for the strange quark included in the Norm3 analysis. The nCTEQ15 fit contained minimal data which was sensitive to the strange quark; therefore, it imposed the condition \({s \sim } \kappa {({\bar{u}}+{\bar{d}})/2}\) with the boundary condition of \(\kappa =0.5\) for \(A=1\). Consequently, the error bands Fig. 7 reflect only the uncertainty of \(({\bar{u}}+{\bar{d}})\), which is comparatively well determined. The phenomena of increasing error bars has been observed in other examples such as the transition from CTEQ6.1 to CTEQ6.6 when additional strange parameters were introduced, or the transition from EPS09 to EPPS16 when additional gluon parameters were introduced.Footnote 8

To highlight the magnitude of these differences, in Fig. 10 we plot the ratios of the PDFs compared to nCTEQ15. At \(Q=2\,\hbox {GeV}\), we see that the Norm0 gluon is nearly a factor of 2 times the nCTEQ15 value, with a peak at \(x\sim 0.03\). The Norm2 and Norm3 gluon PDFs are reduced to \({\sim 60\%}\) and \({\sim 40\%}\) above nCTEQ15, respectively.Footnote 9 Similarly, at \({x\sim 0.03}\) and \(Q=2\,\hbox {GeV}\), the strange PDF for both Norm0 and Norm2 are \({\sim 60\%}\) above the nCTEQ15 value, while the Norm3 result is reduced to \(\sim 25\%\).

In Fig. 10 we also display ratio at \(Q=90\,\hbox {GeV}\) which illustrates the effect of the DGLAP evolution. We see that the gluon is now reduced to \(\sim 15\%\) above the nCTEQ15 value, the strange is reduced to \(\sim 25\%\) above the nCTEQ15 value, and both peaks have shifted to lower x values.

Because the DGLAP evolution has “washed out” the detailed peak structure at low Q values, it is necessary for the fit to amplify the distortion at low Q so that a remnant of the effect survives at high Q. Nevertheless, the remaining excess at \(Q=90\,\hbox {GeV}\) is sufficient to improve the \(\chi ^2\) of the fits.

Additionally, we note that the heavy-flavor reweighting analysis of Ref. [70] also observed an increase of the gluon nPDF in the intermediate to small x region relative to the nCTEQ15 results. While the shift of the PDF in the reweighting was in the same direction as in this analysis, its magnitude was much smaller.

We now turn our attention to the error band of the gluon distribution in Fig. 10. At NLO, the gluon enters for the first time the \(W^{\pm }\) and Z boson production through the gq initiated contributions. The addition of the \(W^{\pm }/Z\) LHC data to the fit is thus not expected to add significant constraining power for the gluon distribution. Contrary to this naive expectation, due to high center of mass energy and relatively small values of the probed x, the gluon distribution can have a considerable contribution to \(W^{\pm }/Z\) production processes; this is reflected in the reduced error bands of Norm3 as compared with nCTEQ15. Indeed, an independent variation of the open gluon parameters around the minimum in the Norm3 fit confirms that the \(\chi ^2\) contribution from the LHC data is similarly steep or steeper than contribution from all the other data included in the fit.Footnote 10

Fig. 11
figure 11

Comparison of the full lead (Pb) PDFs at \(Q=2\,\hbox {GeV}\) for nCTEQ15WZ and Norm3 fits. The uncertainty band for nCTEQ15WZ is shown in purple, Norm3 in blue. This shows that the central value of these fits are essentially identical, and the error bands are also virtually identical with the exception of small differences in the strange quark PDF


Comparison with other nPDFs

Having investigated the impact of the \(W^\pm /Z\) heavy ion data including normalization effects, we now compare our PDFs with other results from the literature.

Fig. 12
figure 12

Comparison of the full lead (Pb) PDFs at \(Q=2\,\hbox {GeV}\) for nCTEQ15, EPPS16, nNNPDF2.0 and nCTEQ15WZ. The uncertainty band for nCTEQ15 is shown in gray, nCTEQ15WZ in violet, nNNPDF2.0 in yellow and EPPS16 in green

Fig. 13
figure 13

Comparison of the full lead (Pb) PDFs at \(Q=90\,\hbox {GeV}\) for nCTEQ15, EPPS16, nNNPDF2.0 and nCTEQ15WZ. The uncertainty band for nCTEQ15 is shown in gray, nCTEQ15WZ in violet, nNNPDF2.0 in yellow and EPPS16 in green

There are a number of nPDF sets available [67, 72, 73] including some new determinations [7, 8, 74]. The TUJU19 analysis [74] extends the xFitter framework to include nuclear PDFs; this open-source program provides a valuable tool for the PDF community. As an initial step, TUJU19 assumed \(s={\bar{s}}\) and \(s={\bar{u}}={\bar{d}}\), and the resulting nPDFs compare favorably with EPPS16 and nCTEQ15 within uncertainties.

A separate effort by the NNPDF collaboration [7, 8] uses neural network techniques to extract the gluon and quark nPDFs; this method provides a complementary approach to the traditional parameterized function-based method. Their recent analysis [8] has produced the nNNPDF2.0 nPDF set which includes charged current DIS data from NuTeV (Fe) and Chorus (Pb), and also LHC \(W^\pm /Z\) data. They also compute the strangeness ratio, \(R_s=(s+{\bar{s}})/({\bar{u}}+{\bar{d}})\), and find the nuclear value is reduced as compared to the proton. The neutrino DIS data and LHC \(W+c\) associated production seem to prefer a lower \(R_s\) value, while the inclusive W and Z production favor a larger value. These interesting observations raise some important issues, and additional investigation is warranted to better understand the strange distribution [9].

The EPPS16 data sets include DIS, DY, RHIC inclusive pion, and LHC \(W^\pm /Z\) and dijet data; in particular, this set incorporates a number of parameters to provide flexibility in both the strange and gluon PDFs. Therefore, it will be interesting to compare the variation of these flavors between our original nCTEQ15 nPDFs and our nCTEQ15WZ fit.

The nCTEQ15WZ fit is based on the Norm3 fit (with 3 normalization parameters), and in addition includes the RHIC pion data in the fitting loop. The RHIC pion data is fit with the Binnewies–Kniehl–Kramer (BKK) fragmentation functions [75] using a custom griding technique for fast evaluation [1]. The resulting nCTEQ15WZ nPDFs are nearly identical as the Norm3 nPDFs which is evident when comparing the \(\chi ^2/N_{dof}\) values of Table 3, as well as the PDFs in Fig. 11.

We now compare the results of our nCTEQ15WZ fit with the nCTEQ15, EPPS16, and nNNPDF2.0 nPDFs in Figs. 12 and 13. To begin, we focus on the plots at \(Q=2\,\hbox {GeV}\) as the variations are more evident here. For the up and down components \(\{u, d, {\bar{u}}, {\bar{d}}\}\), nCTEQ15WZ is quite similar to nCTEQ15, and these flavors generally lie below EPPS16 and nNNPDF2.0, but are within uncertainties. As discussed in Sect. 3.3.1, we recall that the nCTEQ15 error bands on the strange PDF are underestimated due to restriction of the parameters. For the strange and gluon, we see that nCTEQ15 and EPPS16 are generally similar for larger x values, and then diverge somewhat for small x. The nCTEQ15WZ nPDFs lie below nCTEQ15 and EPPS16 for large x values, and then above at intermediate to small x values; this allows s(x) and g(x) to increase the \(W^\pm /Z\) cross section in the region of the data (\(x\sim 0.02\)) while not perturbing the momentum sum rules. nNNPDF2.0 is similar to nCTEQ15 and EPPS16 for large x values, but then increases for smaller x. For the strange distribution, nNNPDF2.0 coincides with nCTEQ15WZ at small x, while for the gluon, nNNPDF2.0 exceeds nCTEQ15WZ at small x. Similar effects to the above are generally evident at larger Q values (Fig. 13), but their magnitude is diminished due to the DGLAP evolution effects.

Fig. 14
figure 14

a Recent (preliminary) result from ATLAS on the strange ratio for the proton [31]. b The nuclear strange ratio for lead (Pb) nPDFs as obtained in our fits. The uncertainty band for nCTEQ15 is shown in gray, and Norm3 in blue

Comparison with proton results

The strange quark PDF has also been studied extensively for the proton case by many groups including ABM [13], CT18 [4], JAM [15], MMHT [16, 17], and NNPDF [18]. There is a close connection between the proton and nuclear PDFs; for example, nCTEQ15 uses the proton PDF as a boundary condition, and EPPS16 fits nuclear ratios relative to the proton.

One quantity of interest we can compare between the proton and the nuclear PDFs is the ratio of the strange PDF relative to the light-sea quarks: \(R_s={(s+{\bar{s}})/({\bar{u}}+{\bar{d}})}\). A very recent analysis of the proton strange PDF was presented in Ref. [76] which includes the LHC inclusive \(W^\pm /Z\) production and associated \(W+c\) channel, as well as neutrino DIS data from NuTeV and NOMAD; this study obtains \(R_s=0.78\pm 0.20\). In Fig. 14, we compute \(R_s\) for selected Q values, and compare this to the proton result as extracted by ATLAS [31, 77].

Comparing the proton and the lead results at \(Q^2=1.9~\mathrm{GeV}^2\), we see that the behavior of the Norm3 curve (panel-b) is quite similar to the proton result (panel-a). In contrast, the nCTEQ15 result is generally flat across all x values as the strange was set to be a fixed fraction of the u/d-sea PDFs, \(s={\bar{s}}=\kappa ({\bar{u}}+{\bar{d}})/2\). Additionally, we also display the other fits, Norm0 and Norm2, to illustrate the range of possible variations. The uncertainty bands for Norm3 are displayed; these are large for small x, where the strange is poorly constrained, and also at very large x where the quark sea denominator vanishes. We also display larger Q values which illustrates the convergent effects of the DGLAP evolution.

In the previous section we raised the question as to whether the enhanced strange distribution was reflecting the true underlying physics, or was instead an artifact of the fit. The similarities of \(R_s\) between the proton and lead PDFs may indicate that the enhanced strange PDF is, in fact, a real effect. To definitively answer this question will require additional analysis, and this work is ongoing.


Our ability to fully characterize fundamental observables, like the Higgs boson couplings and the W boson mass, and to constrain both SM and BSM signatures is strongly limited by how accurately we determine the underlying PDFs [78]. A precise determination of the strange PDF is an important step in advancing these measurements.

The new nCTEQ++ framework allowed us to include the LHC W/Z data directly in the fit. While these new fits significantly reduced the overall \(\chi ^2\) for the W/Z LHC data, we still observe tensions in individual data sets which require further investigation. Our analysis has identified factors which might further reduce the apparent discrepancies including: increasing the strange PDF, modifying the nuclear correction, and adjusting the data normalization.

Compared to the nCTEQ15 PDFs, these new fits favor an increased strange and gluon distribution in the x region relevant for heavy ion \(W^\pm /Z\) production. While we obtain a good fit in terms of the overall \(\chi ^2\) values, we must ask: i) how the uncertainties and data normalization affect the resulting PDFs, and ii) whether the results truly reflect the underlying physics, or is the fit simply exploiting s(x) because that is one of the least constrained flavors? The answer to this important question will require additional study; this is currently under investigation.

The LHAPDF files of the resulting nCTEQ15WZ nPDFs will be made available at the nCTEQ website which is hosted at