An analysis of the impact of LHC Run I proton–lead data on nuclear parton densities

We report on an analysis of the impact of available experimental data on hard processes in proton–lead collisions during Run I at the large hadron collider on nuclear modifications of parton distribution functions. Our analysis is restricted to the EPS09 and DSSZ global fits. The measurements that we consider comprise production of massive gauge bosons, jets, charged hadrons and pions. This is the first time a study of nuclear PDFs includes this number of different observables. The goal of the paper is twofold: (i) checking the description of the data by nPDFs, as well as the relevance of these nuclear effects, in a quantitative manner; (ii) testing the constraining power of these data in eventual global fits, for which we use the Bayesian reweighting technique. We find an overall good, even too good, description of the data, indicating that more constraining power would require a better control over the systematic uncertainties and/or the proper proton–proton reference from LHC Run II. Some of the observables, however, show sizeable tension with specific choices of proton and nuclear PDFs. We also comment on the corresponding improvements as regards the theoretical treatment.


Introduction
The main physics motivations [1] for the proton-lead (p-Pb) collisions at the large hadron collider (LHC) were to obtain a reliable baseline for the heavy-ion measurements and to shed light on the partonic behaviour of the nucleus, particularly at small values of momentum fraction x. As such, this program a e-mail: nestor.armesto@usc.es b e-mail: hannu.paukkunen@jyu.fi c e-mail: jmanpen@gmail.com d e-mail: carlos.salgado@usc.es e e-mail: pia.zurita@usc.es constitutes a logical continuation of the deuteron-gold (d-Au) experiments at the relativistic heavy-ion collider (RHIC) but at significantly higher energy. The p-Pb data have, however, proved richer than initially pictured and also entailed genuine surprises (see the review [2]).
One of the key factors in interpreting the p-Pb data are the nuclear parton distribution functions (nPDFs) [3,4]. It is now more than three decades ago that, unexpectedly, large nuclear effects in deeply inelastic scattering were first found (for a review, see Ref. [5]), which were later on shown to be factorisable into the PDFs [6]. However, the amount and variety of experimental data that go into the global determinations of nPDFs has been very limited and the universality of the nPDFs has still remained largely as a conjecture-with no clear violation found to date, however. The new experimental data from the LHC p-Pb run give a novel opportunity to further check these ideas and also provide new constraints. The aim of this paper is, on the one hand, to chart the importance of nPDFs in describing the data (both globally and separately for individual data sets) and, on the other hand, to estimate the quantitative constraints that these data render. The latter question would have traditionally required a complete reanalysis adding the new data on top of the old ones. Luckily, faster methods, collectively known as reweighting techniques, have been developed [7][8][9][10][11][12][13].
In a preceding work [14], a specific version [10] of the Bayesian reweighting technique was employed to survey the potential impact of the p-Pb program on nPDFs by using pseudodata. However, at that point the reweighting method used was not yet completely understood and certain caution regarding the results has to be practiced. Along with the developments of Ref. [13], we can now more reliably apply the Bayesian reweighting. Also, instead of pseudodata we can now use the available p-Pb measurements. We will perform the analysis with two different sets of nPDFs (EPS09 [15] and DSSZ [16]) and, in order to control the bias com-ing from choosing a specific free-proton reference set, we will consider two sets of proton PDFs (MSTW2008 [17] and CT10 [18]). The procedure is completely general and can be applied to any process with at least one hadron or nucleus in the initial configuration. In particular it is feasible in the application to nucleus-nucleus collisions, at least for those observables that are expected to be free from effects other than the nuclear modification of PDFs. In the present situation, data on EW bosons could be used. However, in contrast to p-Pb collisions, the precision of these data is inferior, the constraining power smaller by construction in the symmetric Pb-Pb case [19], and also a higher computational load would be involved. For these reasons we do not include Pb-Pb data in the present analysis.
The paper is organised as follows: in Sect. 2 we briefly explain the Bayesian reweighting, devoting Sect. 3 to the observables included in the present analysis. In Sect. 4 we show the impact of the data on the nPDFs, and we discuss similarities and differences between the four possible PDF-nPDF combinations. Finally, in Sect. 5 we summarise our findings.

The Bayesian reweighting method
The Bayesian reweighting technique [7][8][9][10][11][12][13] is a tool to quantitatively determine the implications of new data within a set of PDFs. In this approach, the probability distribution P old ( f ) of an existing PDF set is represented by an ensemble of PDF replicas f k , k = 1, . . . , N rep , and the expectation value O and variance δ O for an observable O can be computed as Additional information from a new set of data y ≡ {y i , i = 1, ..., N data } can now be incorporated, by the Bayes theorem, as where P(y| f ) stands for the conditional probability for the new data, for a given set of PDFs. It follows that the average value for any observable depending on the PDFs becomes a weighted average: where the weights ω k are proportional to the likelihood function P(y| f ). For PDF sets with uncertainties based on the Hessian method (with N eig eigenvalues resulting in 2N eig +1 members) and fixed tolerance χ 2 (which is the case in the present study), the functional form of the likelihood function that corresponds to a refit [13] is where with C the covariance matrix of the new data y and the theoretical values y i [ f ] estimated by The ensemble of PDFs required by this approach is defined by where f S 0 is the central fit, and f S ± i are the ith error sets. The coefficients R ik are random numbers selected from a Gaussian distribution centred at zero and with variance one. After the reweighting, the values of χ 2 are evaluated as where y i are computed as in Eq. (4). An additional quantity in the Bayesian method is the effective number of replicas N eff , a useful indicator defined as Having N eff N rep indicates that some of the replicas are doing a significantly better job in describing the data than others, and that the method becomes inefficient. In this case a very large number of replicas may be needed to obtain a converging result. In this work we have taken N rep = 10 4 .

Bayesian reweighting in the linear case
The reweighting procedure begins by first generating the replicas f k by Eq. (9), which are then used to compute the observables required to evaluate the values of χ 2 k that determine the weights. In general, this involves looping the computational codes over the N rep replicas, which can render the work quite CPU-time consuming. There is, however, a way to reduce the required time if the PDFs that we are interested in enter the computation linearly. Let us exemplify this with the process p + Pb → O. The cross section corresponding to the kth replica can be schematically written as where ⊗ denotes in aggregate the kinematic integrations and summations over the partonic species. If now we replace f Pb k by Eq. (9), we have which can be written as where dσ S 0 is the cross section obtained with the central set, and dσ S ± i are the cross sections evaluated with the error sets. In this way, only 2N eig + 1 (31 for EPS09, 51 for DSSZ) cross-section evaluations are required (instead of N rep ).

Comparison with the experimental data
All the data used in this work (165 points in total) were obtained at the LHC during Run I, in p-Pb collisions at a centre-of-mass energy √ s = 5.02 TeV per nucleon: W from ALICE and CMS, Z from ATLAS and CMS, jets from ATLAS, dijets from CMS, charged hadrons from ALICE and CMS, and pions from ALICE. Some of them are published as absolute distributions and some as ratios. We refrain from directly using the absolute distributions as they are typically more sensitive to the free-proton PDFs and not so much to the nuclear modifications. In ratios of cross sections, the dependence of the free-proton PDFs usually becomes suppressed. The ideal observable would be the nuclear modification σ (p-Pb)/σ (p-p). However, no direct p-p measurement exists yet at the same centre-of-mass energy and such a reference is sometimes constructed by the experimental collaborations from their results at √ s = 2.76 TeV and √ s = 7 TeV. This brings forth a non-trivial normalisation issue and, with the intention of avoiding it, we decided to use (whenever possible) ratios between different rapidity windows instead-this situation is expected to be largely improved in the near future thanks to the reference p-p run at √ s = 5.02 TeV from LHC Run II. We note that, apart from the luminosity, no information on the correlated systematic uncertainties is given by the experimental collaborations. Thus, when constructing ratios of cross sections, we had no other option than adding all the uncertainties in quadrature. In the (frequent) cases where the systematic uncertainties dominate, this amounts to overestimating the uncertainties which sometimes reflects in absurdly small logarithmic likelihood, χ 2 /N data 1. The fact that the information of the correlations is not available undermines the usefulness of the data to constrain the theory calculations. This is a clear deficiency of the measurements and we call for publishing the information on the correlations as is usually done in the case of p-p and p-p collisions. It is also worth noting that we (almost) only use minimum-bias p-Pb data. While centrality dependent data are also available, it is known that any attempt to classify event centrality results in imposing a non-trivial bias on the hard-process observable in question; see e.g. Ref. [20].
Note that not all PDF+nPDF combinations will be shown in the figures to limit the number of plots. Moreover, the post-reweighting results are not shown when they become visually indistinguishable from the original ones.

Charged electroweak bosons
Charged electroweak bosons (W + and W − ) decaying into leptons have been measured by the ALICE [21] and CMS [22] collaborations. 1 The theoretical values were computed at next-to-leading order (NLO) accuracy using the Monte Carlo generator MCFM [24] fixing all the QCD scales to the mass of the boson.
The preliminary ALICE data include events with charged leptons having p T > 10 GeV at forward (2.03 < y c.m. < 3.53) and backward (−4.46 < y c.m. < −2.96) rapidities in the nucleon-nucleon centre-of-mass (c.m.) frame. From these, we constructed "forward-to-backward" ratios as A data-versus-theory comparison is presented in Fig. 1. While the theoretical predictions do agree with the experimental values, the experimental error bars are quite large.  Table 1 Contribution of the W ± data to the total χ 2 before the reweighting The numbers in parentheses are the amount of data points considered for each experiment The CMS collaboration has measured laboratory-frame pseudorapidity (η lab ) dependent differential cross sections in the range |η lab | < 2.4 with the transverse momentum of the measured leptons p T > 25 GeV. The measured forward-tobackward ratios are compared to the theory computations in Fig. 2 and the χ 2 values are given in Table 1 (the righthand columns). While the W + data are roughly compati-ble with all the PDF combinations, the W − data show a clear preference for nuclear corrections as implemented in EPS09 and DSSZ. These measurements probe the nuclear PDFs approximately in the range 0.002 x 0.3 (from most forward-to-most backward bin), and the nuclear effects in the forward-to-backward ratio result from the sea-quark shadowing (small x) becoming divided by the antishadowing in valence quarks. While the impact of these data looks somewhat limited here, they may be helpful for constraining the flavour separation of nuclear modifications. However, as both EPS09 and DSSZ assume flavour-independent sea and valence quark modifications at the parametrisation scale (i.e.

Fig. 2
Forward-to-backward asymmetries for W + (upper panels) and W − (lower panels) measured by the CMS collaboration [22], as a function of the charged-lepton pseudorapidity in the laboratory frame. The left-hand (right-hand) graphs correspond to the theoretical calculations with EPS09 (DSSZ) nPDFs. Results with no nuclear effects are included as dashed lines the initial scale for DGLAP evolution), the present analysis cannot address to which extent this may happen. 2

Z boson production
The Z boson production in its dilepton decay channel has been measured by three collaborations: CMS [26], ATLAS [27] and LHCb [28]. 3 As in the case of W ± , the theoretical values were computed using MCFM, with all scales fixed to the invariant mass of the lepton pair.
In the case of CMS, the kinematic cuts are similar to the ones applied for W bosons: the leptons are measured within |η lab | < 2.4 with a slightly lower minimum p T for both leptons ( p T > 20 GeV), and 60 GeV < M l + l − < 120 GeV. The A F/B data are binned as a function of y l + l − c.m. (rapidity of the lepton pair). Figure 3 presents a comparison between the data and theory values before the reweighting (NNE stands for no nuclear modification of parton densities but includes isospin effects) and Table 2 (the right-hand column) lists the χ 2 values. The data appear to slightly prefer the calculations which include nuclear modifications. Similarly to the case of W production, the use of nuclear PDFs leads to a suppression in A F/B . The rapid fall-off of A F/B towards large y l + l − c.m. comes from the fact that the lepton pseudorapidity acceptance is not symmetric in the nucleon-nucleon c.m. frame. Indeed the range |η lab | < 2.4 translates to −2.865 < η c.m. < 1.935 and since there is less open phase space in the forward direction, the cross sections at a given y l + l − c.m. tend to be lower than those at −y l + l − c.m. . This is clearly an unwanted feature, since it gives rise to higher theoretical uncertainties (which we ignore in the present study) than if a symmetric acceptance (e.g. −1.935 < η c.m. < 1.935) had been used.
The ATLAS data correspond to the full phase space of the daughter leptons within 66 GeV < M l + l − < 116 GeV and |y Z c.m. | < 3.5. The data are only available as absolute cross sections from which we have constructed the forward-  to-backward ratio A F/B . A comparison between the theoretical predictions (with and without nuclear modifications) and the experimental values before the reweighting can be seen in Fig. 4 and the χ 2 values are given in Table 2 (the left-hand column). The calculations including the nuclear modifications are now clearly preferred. For the larger phase space, A F/B is now significantly closer to unity than in Fig. 3.

Jets and dijets
Jet and dijet distributions were computed at NLO [29][30][31] and compared with the results from the ATLAS [32] and CMS [33] collaborations, respectively. The factorisation and renormalisation scales were fixed to half the sum of the transverse energy of all two or three jets in the event. For ATLAS jets we used the anti-k T algorithm [34] with R = 0.4. For the CMS dijets we used the anti-k T algorithm with R = 0.3 and only jets within the acceptance |η jet | < 3 were accepted, and the hardest (1) and next-to-hardest (2) jet within the acceptance had to fulfill the conditions p T jet,1 > 120 GeV/c, p T jet,2 > 30 GeV/c and their azimuthal distance φ 12 > 2π/3. The ATLAS collaboration measured jets with transverse momentum up to 1 TeV in eight rapidity bins. Strictly speaking, these data are not minimum bias as they comprise the events within the 0-90 % centrality class. It is therefore somewhat hazardous to include them into the present analysis but, for curiosity, we do so anyway. The ATLAS data are available as absolute yields from which we have constructed the forward-to-backward asymmetries adding all the uncertain-  (left) and DSSZ (right). Upper panels 0.3 < |y * | < 0.8. Middle panels 0.8 < |y * | < 1.2. Lower panels 1.2 < |y * | < 2.1 ties in quadrature. Let us remark that, by proceeding this way, we lose the most forward 2.1 < y * < 2.8 and central −0.3 < y * < 0.3 bins. The results before the reweighting are presented in Fig. 5 and Table 3 (left-hand column). For EPS09 the forward-to-backward ratio tends to stay below unity since at positive rapidities the spectrum gets suppressed (gluon shadowing) and enhanced at negative rapidities (gluon antishadowing). For DSSZ, the effects are milder. The data do not appear to show any systematic tendency from one rapidity bin to another which could be due to the centrality trigger imposed. Indeed, the best χ 2 is achieved with no nuclear effects at all, but all values of χ 2 /N data are very low. This is probably due to overestimating the systematic uncertainties by adding all errors in quadrature. It is worth mentioning here that, contrary to the ATLAS data, the preliminary CMS inclusive jet data [35] (involving no centrality selection) do show a consistent behaviour with EPS09. Di-jet production by the CMS collaboration [33] was the subject of study in [36], where sizeable mutual deviations between different nuclear PDFs were found. The experimental observable in this case is normalised to the total number of dijets and the proton reference uncertainties tend to cancel to some extent, especially around midrapidity. A better cancellation would presumably be attained by considering the forward-to-backward ratios, but this would again involve the issue of correlated systematic uncertainties mentioned earlier. Comparisons between the data and theoretical predictions are shown in Fig. 6 and the χ 2 values are tabulated in Table 3 (right-hand column). The data clearly favour the use of EPS09 nPDFs, and in all other cases χ 2 /N data = 3.8 . . . 7.8, which is a clear signal of incompatibility. The better agreement follows from the gluon antishadowing and EMC effect at large x present in EPS09 but not in DSSZ. However, the significant dependence of the employed free-proton PDFs is a bit alarming: indeed, one observes around 50 % difference when switching from CT10 to MSTW2008. This indicates that the cancellation of proton PDF uncertainties is not complete at all and that they must be accounted for (unlike we do here) if this observable is to be used as an nPDF constraint. The proton-proton reference data taken in Run II may improve the situation.

Charged-particle production
Now let us move to the analysis of charged-particle production. Here we consider both charged-hadron (ALICE [37] and CMS [38]) and pion (ALICE [39]) production. Apart from the PDFs, the particle production depends on the fragmentation functions (FFs), which are not well constrained. Indeed, it has been shown that any of the current FFs cannot give a proper description of the experimental results [40] on charged-hadron production. In the same reference, a kinematic cut p T > 10 GeV was advocated to avoid contaminations from other than independent parton-to-hadron fragmentation mechanism described by FFs. The same cut is applied here. Regarding the final state pions, we relaxed the requirement to p T > 2 GeV, since cuts like this have been used in the EPS09 and DSSZ analyses. The theoretical values were determined with the same code as in [41], using the fragmentation functions from DSS [42] for the charged hadrons. In the case of the DSSZ nPDFs medium-modified fragmentation functions were used [43], in accordance with the way in which the RHIC pion data [44] were treated in the original DSSZ extraction. This is, however, not possible in the case of unidentified charged hadrons, as medium-modified fragmentation functions are available for pions and kaons only.
The use of CMS data [38] poses another problem since it is known that, at highp T , the data show a 40 % enhancement that cannot currently be described by any theoretical model. However, it has been noticed that the forward-to-backward ratios are nevertheless more or less consistent with the expectations. While it is somewhat hazardous to use data in this way, we do so anyway hoping that whatever causes the high- Fig. 6 The CMS dijet data presented as differences between the data and the theory calculations. The dashed lines correspond to the nPDF uncertainty Fig. 7 Backward-to-forward ratios for charged-hadron production measured by the CMS collaboration. The theoretical curves were computed with EPS09 (left-hand plots) and DSSZ (right-hand plots) p T anomaly cancels in ratios. A comparison between these data and EPS09/DSSZ calculations is shown in Fig. 7 and the values of χ 2 are listed in Table 4 (left-hand column). These data have a tendency to favour the calculations with DSSZ but with χ 2 /N data being absurdly low.
The ALICE collaboration [37] took data relatively close to the central region and the data are available as backward-to-central ratios A B/C with backward comprising the intervals −1.3 < η c.m. < −0.8 and −0.8 < η c.m. < −0.3. A theory-to-data compar-  ison is shown in Fig. 8 and the corresponding χ 2 s are in Table 4 (middle column). The data appear to slightly favour the use of EPS09/DSSZ but the χ 2 /N data remain, again, always very low. Finally, we consider the preliminary pion data (π + + π − ) shown by ALICE [39]. In this case the measurement was performed only in the |y| < 0.5 region so no A F/B or any similar quantity could be constructed. For this reason we had to resort to the use of R pPb ratio which involves a 6 % normalisation uncertainty. 4 A comparison between data and theory before the reweighting can be seen in Fig. 9 and the values of χ 2 are in Table 4 (right-hand column).
The very low values of χ 2 /N data attained in these three measurements indicate that the uncertainties have been over- Fig. 9 Ratio of minimum-bias π + + π − production in p-Pb and the same observable in p-p collisions measured by the ALICE collaboration. Theoretical values and uncertainties were calculated with EPS09 (left-hand panel) and DSSZ (right-hand panel) estimated and these data are doomed to have a negligible constraining power-notice that the uncertainties are dominated by the systematic errors which we add in quadrature with the statistical ones, in absence of a better experimental information.

Implications for nPDFs
The comparisons presented in the previous section demonstrate that many of the considered data (CMS W, CMS Z, ATLAS Z, CMS dijet) show sensitivity to the nuclear PDFs while others (ALICE W, ATLAS jets, CMS hadrons, ALICE hadrons, ALICE pions) remain inconclusive. Some of the considered observables (ATLAS jets, CMS hadrons) are also known to pose issues that are not fully understood, so the comparisons presented here should be taken as indicative.
The most stringent constraints are provided by the CMS dijet measurements, which alone would rule out all but EPS09.
However, upon summing all the χ 2 's from the different measurements, this easily gets buried under the other data. This is evident from the total values of χ 2 /N data shown in Table 5 (upper part), as considering all the data it would look like all the PDF combinations were in agreement with the data (χ 2 /N data ∼ 1). However, excluding one of the dubious data sets (ATLAS jets) for which the number of data is large but χ 2 /N data very small, the differences between different PDFs grow; see the lower part of Table 5. The effective number of replicas remains always quite high. The reason for the high N eff is that the variation of the total χ 2 within a given set of nPDFs (that is, the variation among the error sets) is small even if some of the data sets are not properly described at all (in particular, CMS dijets with DSSZ). We must notice also that even though the initial χ 2 values for EPS09 are lower than for DSSZ regardless of the proton PDF used, the final number of replicas is lower for the former. This is due to the fact the DSSZ parameterisation does not allow for an ample variation of the partonic densities and therefore not many Fig. 10 Impact of the LHC Run I data on the nPDFs of EPS09 (left) and DSSZ (right) before (black/grey) and after the reweighting (red/light red), for valence (upper panels), sea (middle panels) and gluon (lower panels) distributions at Q 2 = 1.69 GeV 2 , except the DSSZ gluons that are plotted at Q 2 = 2 GeV 2 replicas fall far from the central prediction. Thus, N eff alone should not be blindly used to judge whether a reanalysis is required. Given the tiny improvements in reweighted χ 2 values one expects no strong modifications to be induced in the nPDFs either. Indeed, the only noticeable effect, as can be seen in Fig. 10, is in the EPS09 gluons for which the CMS dijet data place new constraints [45]. 5 It should be recalled that, for technical reasons, in the EPS09 analysis the RHIC pion data were given a rather large additional weight and they still overweight the χ 2 contribution coming from the dijets. In a fit with no extra weights the dijet data would, on the contrary, give a larger contribution than the RHIC data. Therefore these data will have a different effect from what Fig. 10 would indicate. In the case of DSSZ the assumed functional form is not flexible enough to accommodate the dijet data and in practice nothing happens upon performing the reweighting. However, it is evident that these data will have a large impact on the DSSZ gluons if an agreement is required (see Fig. 6), so a refit appears mandatory.
The impact of the LHC p-Pb data is potentially higher than what is found here also since, in the context of our study, it is impossible to say anything concerning the constraints that these data may provide for the flavour separation of the nuclear PDFs, which again calls for a refit. Another issue is the form of the fit functions whose rigidity especially at small x significantly underestimates the true uncertainty. In this sense, our study should be seen merely as a preparatory work towards nPDFs analyses including LHC data. More data for p-Pb will also still appear (at least CMS inclusive jets, W production from ATLAS) and many of the data sets used here are only preliminary.

Summary
In the present work we have examined the importance of PDF nuclear modifications in describing some p-Pb results from Run I at the LHC, and the impact that the considered data have on the EPS09 and DSSZ global fits of nPDFs. We have found that while some data clearly favour the considered sets of nuclear PDFs, some sets are also statistically consistent with just proton PDFs. In this last case abnormally small values of χ 2 /N data are obtained, however. The global picture therefore depends on what data sets are being considered. We have chosen to use, in our analysis, most of the available data from the p-Pb run, it should, however, be stressed that some of the considered data sets are suspicious in the sense that unrealistically small values of χ 2 /N data are obtained and these sets, as we have shown, can easily twist the overall picture. Incidentally, these sets are the ones that have smallest χ 2 when no nuclear effects in PDFs are included. The small values of χ 2 /N data are partly related to unknown correlations between the systematic uncertainties of the data but also, particularly in the case of ALICE pions, presumably to the additional uncertainty added to the interpolated p-p baseline. The p-p reference data at √ s = 5.02 TeV, recently recorded at the LHC, may eventually improve this situation.
The considered data are found to have only a mild impact on the EPS09 and DSSZ nPDFs. This does not, however, necessarily mean that these data would be useless. Indeed, they may facilitate to relax some rather restrictive assump-tions made in the fits. An obvious example is the functional form for DSSZ gluon modification which does not allow for a similar gluon antishadowing as the EPS09 fit functions. This leads to a poor description of the CMS dijet data by DSSZ that the reweighting (being restricted to all assumptions made in the original analysis) cannot cure. Thus, in reality, these data are likely to have a large impact. In general, these new LHC data may allow one to implement more flexibility into the fit functions and also to release restrictions related to the flavour dependence of the quark nuclear effects. Also, the EPS09 analysis used an additional weight to emphasise the importance of the data set (neutral pions at RHIC) sensitive to gluon nPDF. Now, with the use of the new LHC data, such artificial means are likely to be unnecessary. Therefore, for understanding the true significance of these data, new global fits including these and upcoming data are thus required.
Hence, both theoretical and experimental efforts, as explained above, are required to fully exploit the potentiality of both already done and future p-Pb runs at the LHC for constraining the nuclear modifications of parton densities.