Determination of the parton distribution functions of the proton from ATLAS measurements of differential $W^\pm$ and $Z$ boson production in association with jets

This article presents a new set of proton parton distribution functions, ATLASepWZVjet20, produced in an analysis at next-to-next-to-leading order in QCD. The new data sets considered are the measurements of $W^+$ and $W^-$ boson and $Z$ boson production in association with jets in $pp$ collisions at $\sqrt{s} = 8~\mathrm{TeV}$ performed by the ATLAS experiment at the LHC with integrated luminosities of $20.2~\mathrm{fb}^{-1}$ and $19.9~\mathrm{fb}^{-1}$, respectively. The analysis also considers the ATLAS measurements of differential $W^{\pm}$ and $Z$ boson production at $\sqrt{s} = 7~\mathrm{TeV}$ with an integrated luminosity of $4.6~\mathrm{fb}^{-1}$ and deep-inelastic-scattering data from $e^{\pm}p$ collisions at the HERA accelerator. An improved determination of the sea-quark densities at high Bjorken $x$ is shown, while confirming a strange-quark density similar in size to the up- and down-sea-quark densities in the range $x \lesssim 0.02$ found by previous ATLAS analyses.


Introduction
Precise knowledge of the content of colliding protons, the parton distribution functions (PDFs), is a necessary ingredient for accurate predictions of both Standard Model (SM) and Beyond Standard Model (BSM) cross sections at the Large Hadron Collider (LHC). In order to determine the PDFs to the required precision, data covering a wide range of negative squared four-momentum transfer (denoted by 2 ) and Bjorken , the fraction of the proton's longitudinal momentum carried by the parton initiating the interaction, is required. This is facilitated by combining data from multiple experiments and measurements of various processes to constrain the -dependence and flavour decomposition of the PDFs. While deep inelastic scattering (DIS) data from lepton-hadron collisions typically deliver the best constraints by utilising the lepton as a direct probe of the substructure of the hadron, a hadron-hadron experiment can provide valuable additional insight by introducing new processes which further distinguish contributions from different partons and span kinematic regions at higher 2 .
Precision measurements by the HERA collaborations [1] of neutral current (NC) and charged current (CC) cross sections in ± scattering constrain PDFs such that the HERA DIS data alone provide sufficient information to determine the PDF set referred to as HERAPDF2.0. However, they do have limitations. For example, they cannot distinguish quark flavour between the down-type sea quarks,¯and¯. Global PDF analyses [2][3][4][5] use a range of data from other experiments together with the HERA data for further constraining power. For example, additional information about quarks and antiquarks at mid-to highcomes from fixed-target DIS experiments, as well as measurements of and boson production from the Tevatron and LHC experiments.
More information about high-quarks would be advantageous since a large fraction of the fixed-target DIS data is in a kinematic region where non-perturbative effects, such as those from higher twist, are important and must be computed from phenomenological models [6,7]. In many PDF analyses, tight cuts are applied to these data to avoid those effects. Furthermore, the interpretation of DIS data using deuteron or heavier nuclei as targets is subject to uncertain nuclear corrections. The ± asymmetry measurements performed using¯collisions at Tevatron are free from these uncertainties, but there have historically been tensions between the results of the CDF [8] and DØ [9] collaborations, discussed in further detail by the MSTW group in Ref. [10] and the CTEQ group in Ref. [11].
Precision measurements from the ATLAS detector at the LHC, together with data from the HERA experiments, have been interpreted previously in a next-to-next-to-leading-order (NNLO) QCD analysis, resulting in the ATLASepWZ16 PDF set [12]. Differential and boson 1 cross-section measurements at √ = 7 TeV were used, thereby allowing the strange content of the sea to be fitted, rather than assumed to be a fixed fraction of the light sea as is required when fitting HERA inclusive data alone. It was found that these additional data were significantly better described by a strange sea unsuppressed relative to the upand down-quark sea at 0.05, in contradiction to previous assumptions based on dimuon production data from muon-neutrino CC DIS with associated charm-quark production [13]. The finding of an unsuppressed strange PDF in this kinematic region is supported by the ATLAS measurement of boson production in association with a charm quark ( + ) at 7 TeV [14]; however, a recent analysis of CMS + data at 7 and 13 TeV [15] has found a suppressed strange-quark density relative to the light sea, which is potentially in tension with these ATLAS findings.
Data on the production of a vector boson in association with jets at the LHC provides a novel source of input to PDF determination that is sensitive to partons at higher and 2 than can be accessed by and boson data alone, thereby yielding a data set complementary to the inclusive , boson measurements [16]. The tree-level production modes of a vector boson in association with jets ( + jets) have either quark-antiquark initial states with gluon radiation, or quark-gluon initial states. The process is therefore already sensitive to the gluon density of the proton at leading order in quantum chromodynamics (QCD), while providing constraints on the quark distributions in a similar way to inclusive production of a vector boson.
This paper presents a PDF analysis including data on ± + jets and + jets production collected in collisions at √ = 8 TeV by the ATLAS Collaboration [17,18] in combination with the previous inclusive and measurements at √ = 7 TeV [12] and the inclusive combined HERA data [1]. The PDF fit is performed at NNLO in perturbative QCD, made possible by recent theoretical developments for vector-boson production in association with one jet [19,20], and accounts for the correlation of systematic uncertainties between data sets. The resulting PDF set is called ATLASepWZVjet20.

Input data sets
The final combined ± cross-section measurements at HERA [1] cover the kinematic range of 2 from 0.045 GeV 2 to 50 000 GeV 2 and of Bjorken from 0.65 down to 6 × 10 −7 . Data below = 10 −5 are excluded from this analysis by requiring 2 > 10 GeV 2 , motivated by the previously observed poorer fit quality in the excluded kinematic region compared to the rest of the HERA data [1]. Possible explanations for this include the need for resummation corrections at low [21] and the impact of higher-twist corrections at low 2 . For the final HERA data set, there are 169 correlated sources of uncertainty. Total uncertainties are below 1.5% over the 2 range of 10 < 2 < 500 GeV 2 and below 3% up to 2 = 3000 GeV 2 .
The ATLAS and differential cross sections are based on data recorded during collisions with √ = 7 TeV, and a total integrated luminosity of 4.6 fb −1 , in the electron and muon boson-decay channels [12]. The ± differential cross sections are measured as functions of the -decay lepton pseudorapidity, ℓ , split into + and − cross sections. The experimental precision is between 0.6% and 1.0%. Doubledifferential distributions of the dilepton rapidity, ℓℓ , in boson decays are measured in three mass ranges: 46 < ℓℓ < 66 GeV, 66 < ℓℓ < 116 GeV and 116 < ℓℓ < 150 GeV in central (| ℓℓ | < 2.4) and forward (1.2 < | ℓℓ | < 3.6) rapidity selections, with an experimental precision of up to 0.4% for central rapidity and 2.3% for forward rapidity. The integrated luminosity of the data set used for the 7 TeV and cross-section measurements is known to within 1.8%. There are a total of 131 sources of correlated systematic uncertainty across the and data sets [12]. These data were used for the ATLASepWZ16 fit in a format in which the measurements of the electron and muon decay channels were combined, whereas for the PDF sets presented in this article the data before this combination is used. This choice was made because the uncombined data retain the physical origin of the sources of correlated uncertainties, thereby allowing these sources to be treated as correlated with those in other data sets.
The ATLAS ± + jets differential cross sections are based on data recorded during collisions with √ = 8 TeV and a total integrated luminosity of 20.2 fb −1 , in the electron decay channel only [17]. Each event contains at least one jet with transverse momentum T > 30 GeV and rapidity | | < 4.4, where jets are defined using the anti-algorithm [22,23] with a radius parameter = 0.4. The spectrum used is the transverse momentum of the boson ( T ), in the range 25 < T < 800 GeV, chosen because it provides the most constraining power. This is split into + and − cross sections, which have large correlations that are fully considered. The experimental uncertainty ranges from 8.2% to 22.1% [17]. There are 50 sources of correlated systematic uncertainty common to the + and − spectra, as well as three sources of uncorrelated uncertainties related to data statistics, background Monte Carlo (MC) simulation statistics and the statistical uncertainty of the data-driven multĳet background estimation. Full information about the statistical bin-to-bin correlations in data, induced by the unfolding process, is available for each + jets spectrum.
The ATLAS + jets double-differential cross sections are also based on data recorded during collisions with √ = 8 TeV. The total integrated luminosity of this data is 19.9 fb −1 , and the → + − decay channel is used [18]. The measurement is performed as a function of the absolute rapidity of inclusive anti-= 0.4 jets, | jet |, for several bins of the transverse momentum within 25 GeV < jet T < 1050 GeV. The experimental precision ranges from 4.7% to 37.1%. There are 42 sources of correlated systematic uncertainty and two sources of uncorrelated uncertainty related to the data and background MC simulation statistics.
The integrated luminosity of the data set used for the + jets and + jets cross-section measurements is known to within 1.9%. Systematic uncertainties which contribute significantly, such as from the jet energy scale, are treated as correlated across data sets if they correspond to the same physical source. More details of the correlation model used in this analysis are given in Appendix A.

Fit framework
This determination of proton PDFs uses the xFitter framework, v2.0.1 [1,24,25]. This program interfaces to theoretical calculations directly or uses fast interpolation grids to make theoretical predictions for the considered processes. The program MINUIT [26] is used for the minimisation of the PDF fit. The results are cross-checked with an independent fit framework [27].
For the DIS processes, coefficient functions with massless quarks are calculated at NNLO in QCD as implemented in QCDNUM v17-01-13 [28]. The contributions of heavy quarks are calculated in the general-mass variable-flavour-number scheme of Refs. [29][30][31]. The renormalisation and factorisation scales for the DIS processes are taken as r = f = √︁ 2 .
For the differential and boson cross sections, the theoretical framework is the same as that used in the ATLASepWZ16 analysis of Ref.
[12]. The xFitter package uses outputs from the APPLGRID code [32] interfaced to the MCFM program [33,34] for fast calculation of the differential cross sections at NLO in QCD and LO in electroweak (EW) couplings. Corrections to higher orders are implemented using a -factor technique, correcting on a bin-by-bin basis from NLO to NNLO in QCD and from LO to NLO for the EW contribution [35,36].
Predictions for + jets and + jets production are obtained similarly to the and predictions to NLO in QCD and LO in EW couplings by using the APPLGRID code interfaced to the MCFM program. Higher-order corrections are implemented as -factors. For the + jets data, the jetti program [19] is used to calculate and implement corrections to NNLO in QCD, while the non-perturbative hadronisation and underlying event QCD corrections are computed using the S v.2.2.1 MC simulation 2 [39][40][41]. The bin-by-bin -factors are derived as the ratio of the NNLO to the NLO calculation from jetti with the same fiducial selection as the + jets data, multiplied by the non-perturbative correction. The renormalisation and factorisation scales are set to r = f = where is the mass of the boson and the second term in the square root is the scalar sum of the squared transverse momenta of the jets. More details about the predictions are given in the respective ATLAS publication [17]. In addition to these predictions, NLO EW corrections inclusive of QED radiation effects are computed using S v.2.2.10 by the authors of Refs. [39][40][41] and applied as additional bin-by-bin multiplicative -factors.
Predictions for + jets production to NNLO in QCD and LO in EW couplings are calculated by the authors of Ref. [20], and the -factor is calculated as the ratio of NNLO to NLO predictions. The renormalisation and factorisation scales are set to r = f = 1 2 ( T,partons + √︃ 2 ℓℓ + 2 T,ℓℓ ) where ℓℓ is the electron-pair invariant mass, T,ℓℓ is the transverse momentum of the electron pair and Σ T,partons is the sum of the transverse momenta of the outgoing partons. Corrections for QED radiation effects and non-perturbative QCD corrections are each calculated using the S v.1.4.5 MC simulation, as discussed in the publication describing the ATLAS measurement [18], and each provided as a set of bin-by-bin multiplicative -factors. Corrections for NLO EW effects excluding QED radiation are computed using S v.2.2.10 and applied as additional bin-by-bin -factors. The -factors for both + jets and + jets production are typically within 10% of unity, except for the NLO EW corrections for the + jets predictions, which are as large as 20% at high T . The DGLAP evolution equations of QCD yield the proton PDFs at any value of 2 given that they are parameterised as functions of at an initial scale 2 0 . In this analysis, the initial scale is chosen to be 2 0 = 1.9 GeV 2 such that it is below the charm-mass matching scale, 2 , which is set equal to the charm mass, = . The heavy-quark masses are set to their pole masses as determined by a combined analysis of HERA data on inclusive and heavy-flavour DIS processes [1,42], = 1.43 GeV and = 4.5 GeV, and the strong coupling constant is fixed to s ( ) = 0.118. These choices follow those of the HERAPDF2.0 fit [1].
The quark distributions at the initial scale are assumed to behave according to the following parameterisation also used by the HERAPDF2.0 and ATLASepWZ16 fits [1,12] where ( ) = (1 + + 2 )e . The parameterised quark distributions, , are chosen to be the valence-quark distributions ( , ) and the light-antiquark distributions (¯,¯,¯). The gluon distribution is parameterised with the more flexible form where is fixed to a value of 25 to suppress negative contributions from the primed term at high , as in Ref. [10]. The parameters and are constrained using the quark counting rules, and is constrained using the momentum sum rule. The normalisation and slope parameters, and , of theā nd¯PDFs are set equal such that¯=¯as → 0. The strange PDF¯is parameterised as in Eq. (1), with¯= 1 and¯=¯, leaving two free parameters for the strange PDF,¯and¯. It is assumed that =¯as the data used are not sufficient to distinguish between the two.
The , and terms in the expression ( ) are used only if required by the data, following the procedure described in Ref. [1]. For the ATLASepWZVjet20 fit, this results in the usage of two additional parameters: and . In total, 16 free parameters are used in the central fit.
The level of agreement of the data with the predictions from a PDF parameterisation is quantified with a 2 . The definition of the 2 without statistical correlations between data points is as follows [ where represent the measured data, represent the corresponding theoretical prediction, ,uncor and ,stat are the uncorrelated systematic and statistical uncertainties in , and correlated systematic uncertainties, described by , are accounted for using the nuisance parameters . The summation over runs over all data points and the summation over runs over all sources of correlated systematic uncertainties. For each data set, the first term gives the partial 2 and the second term gives the correlated 2 . The third term is a bias correction term arising from the transition of the likelihood to 2 when the scaling of errors is applied, referred to as the log penalty. For the + jets data, the bin-to-bin statistical correlations are significant in contrast to the other data sets and incorporated into the 2 definition as follows in which the first term of Eq. (2) has been replaced with one which takes into account the diagonal and off-diagonal elements of the data statistical covariance matrix between bins and , stat, .

Results
In this section, the ATLASepWZVjet20 PDF set is presented and compared with an equivalent fit performed without the + jets data, where the latter is named the ATLASepWZ20 PDF set. These PDFs differ from the ATLASepWZ16 analysis by an additional parameter, , a tighter selection criterion of 2 > 10 GeV 2 and the use of ATLAS 7 TeV and data in which the electron and muon channels are not combined. The result is very similar except for a larger total uncertainty resulting from the use of more parameterisation variations. It was verified that the use of 7 TeV and data with the electron and muon channels combined provides a fit with very similar central values and uncertainties. Rather than being intended to supersede the ATLASepWZ16 PDF, the ATLASepWZ20 fit is provided to allow a more meaningful comparison with the ATLASepWZVjet20 fit by having a PDF set that differs only in the addition of the + jets data.  3 show a comparison of the + jets and + jets differential cross-section measurements with the predictions of the ATLASepWZ20 and ATLASepWZVjet20 fits. Adding the + jets data to the fit improves the + jets description significantly, particularly in the + spectrum, where agreement with data improves by approximately 20% at high T . The difference in partial 2 between the predictions of the ATLASepWZ20 and ATLASepWZVjet20 PDF sets for the + jets and the + jets data is 32 and 7 units, respectively.     The total 2 per degree of freedom ( 2 /NDF) for the ATLASepWZVjet20 fit, along with the partial 2 per data point ( 2 /NDP) and correlated 2 for each data set entering the fit, is given in Table 1. The partial 2 for the HERA and ATLAS inclusive and data in the ATLASepWZVjet20 fit is similar to those obtained in the ATLASepWZ20 fit, not showing any tension between these data and the + jets data. The partial 2 of the + jets and + jets data is reasonable, and neither the HERA nor ATLAS correlated 2 is observed to increase significantly with the inclusion of this data.

Goodness of fit and parton distributions
Additional uncertainties in the PDFs are estimated and classified as either model or parameterisation uncertainties. Model uncertainties comprise variations of the charm-quark mass ( ) and bottom-quark mass ( ), variations of the minimum 2 cut, 2 min , and the starting scale at which the PDFs are parameterised, 2 0 . The variation in charm-quark mass and starting scale are performed simultaneously to fulfil the condition 2 0 < 2 such that the charm PDF is calculated perturbatively. Each of these variations follow that of the ATLASepWZ16 analysis [12]. The parameterisation uncertainties are estimated through variations which include a single further parameter in the polynomial ( ) or relaxed constraints on the low-sea quarks. In each variation, listed with its respective total 2 per degree of freedom in Table 2, the uncertainty is calculated as the difference between the alternative extracted PDF and the nominal PDF at each value of and 2 . Whereas the model variations are treated independently and the model uncertainty is calculated as the sum in quadrature of the variations, the parameterisation uncertainty is taken as the envelope of the parameterisation variations. The total uncertainty is calculated as the sum in quadrature of the experimental, model and parameterisation uncertainties. While the total uncertainty does give an estimate of the total variability of the fit, only the experimental uncertainty is interpretable similarly to a statistical standard deviation.
The impact of theoretical uncertainties in the + jets predictions on the fit results is cross-checked. Variations of the NNLO QCD calculations are defined from the variations of factorisation and renormalisation scales by factors of two up and down and taking the envelope of these predictions. In the fit, the corresponding -factors are varied for the + jets and + jets prediction upward and downward both simultaneously and individually. Each of these variations results in PDFs well within the experimental uncertainties of the nominal ATLASepWZVjet20 set. Table 2: Total 2 /NDF for each parameterisation and model variation contributing to the parameterisation and model uncertainties, respectively, of the ATLASepWZVjet20 fit. Where a , or parameter is referred to, this means that the respective parameter is not constrained to zero in that variation. Where two or parameters are referred to in an inequality, this means that the respective two parameters are free to vary independently of each other in a fit.   5 shows the ATLASepWZVjet20 PDFs overlaid with the ATLASepWZ20 PDFs, each evaluated at the starting scale 2 0 , for comparison. The experimental and total uncertainties are displayed separately in each case. The ATLASepWZVjet20¯distribution is notably higher in the range 0.02 compared to the ATLASepWZ20 fit. In contrast, the¯distribution of the ATLASepWZVjet20 fit in the same region is lower. Together, the differences observed between the ATLASepWZ20 and ATLASepWZVjet20 PDFs allow for an increase in the + cross section, as depicted in Figure 1, while keeping the total down-type sea¯=¯+¯distribution almost unchanged up to ∼ 0.1. Additionally, the distribution is reduced in the ATLASepWZVjet20 fit at high and increased at low , compensating for the changes in the other PDFs and resulting in an = +¯+¯distribution which is similar at high . The up-type quark and gluon distributions are similar between the two fits.

The high-sea-quark distributions
The difference between the¯and¯PDFs at high has been a topic of debate over the recent decades. A measurement by the E866 Collaboration of the Drell-Yan cross-section ratios from an 800 GeV proton beam incident on liquid hydrogen and deuterium targets found the proton (¯−¯) distribution to be positive at high , peaking at (¯−¯) ∼ 0.04 at ∼ 0.1 [13]. In contrast, the ATLASepWZ16 PDF set gives a negative central distribution with its lowest value at (¯−¯) ∼ −0.035 for ∼ 0.1, although the uncertainties are such that it is compatible with zero within two standard deviations.
The (¯−¯) distribution as function of Bjorken at 2 = 1.9 GeV 2 is shown in Figure 6, with a comparison between ATLASepWZVjet20 and ATLASepWZ20 displaying the direct effect of the + jets data, and with the experimental, model and parameterisation uncertainties plotted separately. The impact of the + jets data is to place significant constraints on the total uncertainty at high , with an overall positive distribution of central values driven by the increase in the high-¯distribution, as discussed in Section 4.1.
To understand the effect of the different data sets on the high-¯distribution, a scan of 2 is performed through the parameter controlling the behaviour in this region,¯. 3 A high¯value of ∼ 10 corresponds to a lower¯distribution at high , as exhibited by the ATLASepWZ20 fit. Conversely, a low¯value of ∼ 2 corresponds to the higher¯distribution at high as exhibited by the ATLASepWZVjet20 fit.
In Figure 7(a), this scan is shown for each of the presented PDF fits, where the 2 is evaluated as a function of the scanned parameter,¯. At each point, all other parameters (including nuisance parameters associated with experimental uncertainties) are re-fitted and the minimum 2 of the scan, 2 min , is subtracted for comparison between fits. The 2 of the ATLASepWZ20 fit is smallest at a value of¯= 10 ± 1, whereas the 2 of the ATLASepWZVjet20 fit is smallest at a lower¯= 1.6 ± 0.3, corresponding to a higher¯distribution at 0.1 consistent with the PDFs presented in Section 4.1. Another shallow minimum is observed for the ATLASepWZ20 fit at¯∼ 3, corresponding to a solution similar to that of the ATLASepWZVjet20 fit; however, it exhibits a 2 approximately two units larger than in the best fit. The ATLASepWZVjet20 fit fails to converge for values of¯ 12 and no second minimum is observed.
In Figure 7(b), these 2 distributions are decomposed into contributions from the HERA and ATLAS data. These contributions include the partial, correlated and log penalty 2 , which are discussed in Section 3. In each fit, the ATLAS data favour a low¯, including in the ATLASepWZ20 fit, where the overall result is a higher¯. Similarly, the HERA data favour the higher¯value exhibited by the ATLASepWZ20 fit. The + jets data provide sufficient constraining power in addition to the inclusive and data to dominate the result and tightly constrain the¯parameter to a low value, while the ATLASepWZ20 fit lacks the necessary information. valence quarks and (c)-(d) up and down sea quarks when fitting + jets, + jets, inclusive and , and HERA data (ATLASepWZVjet20, blue bands), compared with a similar fit without + jets or + jets data (ATLASepWZ20, green bands). Inner error bands indicate the experimental uncertainty, while outer error bands indicate the total uncertainty, including parameterisation and model uncertainties. The relative uncertainties around the nominal value of each PDF centred on 1 is displayed in the bottom panel in each case.  Figure 5: PDFs multiplied by Bjorken at the scale 2 = 1.9 GeV 2 as a function of Bjorken obtained for the (a) strange sea quark, (b) gluon, (c) the total of the down-type quarks and (d) the total of the anti-down-type quarks when fitting + jets, + jets, inclusive and , and HERA data (ATLASepWZVjet20, blue bands), compared with a similar fit without + jets or + jets data (ATLASepWZ20, green bands). Inner error bands indicate the experimental uncertainty, while outer error bands indicate the total uncertainty, including parameterisation and model uncertainties. The relative uncertainties around the nominal value of each PDF centred on 1 is displayed in the bottom panel in each case.

Strange-quark density
The fraction of the strange-quark density in the proton can be characterised by the quantity , defined as the ratio = ++¯, which uses the sum of¯and¯as a reference point for the strange-sea density.
Before the first LHC precision and boson data, it was widely assumed, motivated by previous analyses of dimuon production in neutrino scattering [43][44][45][46][47], that the strange sea-quark density is suppressed equally for all relative to the up and down sea over the full range of . Best fits to this neutrino scattering data resulted in a value of ∼ 0.5 at 2 = 1.9 GeV 2 [2][3][4]48].
The QCD analysis of the inclusive and measurements by ATLAS which formed the ATLASepWZ16 PDF set led to the observation that strangeness is unsuppressed at low ( 0.023) for 2 = 1.9 GeV 2 . 4 This was the case for the ATLASepWZ16 fit for every parameterisation variation used. Furthermore, a Hessian profiling exercise of the global PDFs MMHT14 [4] and CT14 [49] demonstrated that the data constrain and increase the ratio of the strange to the total up and down sea [12]. Although profiling the PDFs does not necessarily give the same result as including the data in a fit, this effect is indeed found when the data is added to the CT18 fit, resulting in the CT18A set of PDFs [3]. It is therefore of particular interest to check the impact of the new + jets data on the strange-quark density.
The distribution plotted as a function of evaluated at 2 = 1.9 GeV 2 is shown in Figure 8, with a comparison between ATLASepWZVjet20 and ATLASepWZ20 showing the direct effect of the + jets data, and with the experimental, model and parameterisation uncertainties of ATLASepWZVjet20 shown separately. The effect of the + jets data is most significant in the kinematic region > 0.02, where the uncertainty is significantly reduced. Whereas the distribution of the ATLASepWZ20 PDFs maintained an unsuppressed strange-quark density over a wide range in , the ATLASepWZVjet20 PDFs exhibit an distribution falling from near-unity at ∼ 0.01 to approximately 0.5 at = 0.1, driven by the increase in the high-¯PDF and the complementary decrease in the high-¯PDF shown in Section 4.1. At low 0.023 and 2 = 1.9 GeV 2 , the fit with the + jets data maintains an unsuppressed strange-quark density compatible with the ATLASepWZ16 fit. Fitted values of , evaluated at = 0.023 and 2 = 1.9 GeV 2 , are given in Table 3. At the scale 2 = 1.9 GeV 2 , this corresponds to = 0.023 through DGLAP evolution.

Comparison with global PDFs
The ATLASepWZVjet20 distribution is shown in Figure 9 in comparison with the global PDF sets ABMP16 [2], CT18, CT18A [3], MMHT14 [4] and NNPDF3.1 [5] 5 . An additional comparison in the figures is made with a recent update of the NNPDF3.1 fit with some additional data including the full ATLAS 7 TeV data set labelled NNPDF3.1_strange [48]. Tension between the ATLASepWZVjet20 fit and the global analyses is reduced compared to the ATLASepWZ16 and ATLASepWZ20 PDF sets, but persists to multiple standard deviations in the range 10 −2 10 −1 for the global analyses which do not use the full ATLAS 7 TeV data set. This is highlighted in summary plots of evaluated at = 0.023, 2 = 1.9 GeV 2 and at = 0.013, 2 = 2 in Figure 10. Better agreement is observed with the CT18A PDF set, which includes both the data used in the CT18 fit and the ATLAS 7 TeV data, although tension remains with the NNPDF3.1_strange PDF set, which also uses this data. At high 0.02, the distribution of the ATLASepWZVjet20 fit falls more steeply than the distribution in global analyses and is approximately zero at 0.2. The uncertainty of the NNPDF sets are large at > 0.3 where the data give no constraint.
In Figure 11 the extracted (¯−¯) distribution at 2 = 1.9 GeV 2 is shown in comparison with the results of the latest global PDF sets, all of which use E866 data. The ATLASepWZVjet20 PDF set is consistent with these global PDF sets up to ∼ 0.1, but deviates from them for > 0.1, where the + jets and + jets data are most sensitive and demonstrate a preference for a higher¯distribution as discussed in Section 4.1. A new result from SeaQuest/E906 Collaboration has recently become available [51], which may also be in tension with the E866 data. Whereas the distribution of the CT18A and NNPDF3.1_strange fits is affected by the ATLAS data and the tension between these and the ATLAS fits is reduced, as shown in Figures 9 and 10, this is not replicated in the (¯−¯) distribution in either case. (d) Figure 9: The = ( +¯)/(¯+¯) distribution evaluated at 2 = 1.9 GeV 2 as a function of Bjorken , for the ATLASepWZVjet20 PDF set in comparison with global PDFs (a) ABMP16 and CT18, (b) MMHT14 and NNPDF3.1, and in additional comparisons with (c) CT18 and CT18A, and (d) NNPDF3.1 and NNPDF3.1_strange [2][3][4][5]48]. The experimental and total uncertainty bands are plotted separately for the ATLASepWZVjet20 results. Each global PDF set is taken at s ( ) = 0.1180 except for ABMP16 which uses the fitted value s ( ) = 0.1147. All global PDF uncertainty bands are at 68% confidence level, evaluated for the CT18 PDFs through scaling by 1.645 as recommended by the PDF4LHC group [50].  [2][3][4][5]48], and the ATLASepWZ16 and ATLASepWZ20 sets. The experimental, model and parameterisation uncertainty bands are plotted separately for the ATLASepWZVjet20 results. Each global PDF set is taken at s ( ) = 0.1180 except for ABMP16 which uses the fitted value s ( ) = 0.1147. All uncertainty bands are at 68% confidence level, evaluated for the CT18 PDFs through scaling by 1.645 as recommended by the PDF4LHC group [50].

Conclusion
This paper presents the impact of measurements, performed by the ATLAS experiment at the LHC, of vector-boson production in association with at least one jet on the parton distribution functions of the proton, resulting in a new ATLASepWZVjet20 PDF set. The + jets data was obtained from collisions at √ = 8 TeV corresponding to approximately 20 fb −1 of integrated luminosity. The data were fitted along with the data sets used for the previous ATLASepWZ16 fit, i.e. the full combined inclusive data set from HERA and the ATLAS inclusive and production data recorded at √ = 7 TeV. For the new ATLASepWZVjet20 PDF set, correlations between all significant systematic uncertainties across different data sets were considered.
The resulting PDF set is similar to the ATLASepWZ16 set for the up-type quarks and gluon. The down and strange sea-quark distributions exhibit significantly smaller experimental and parameterisation uncertainties at high Bjorken . As a result, the ratio of the strange-quark to light-quark densities, , is better constrained and found to fall more steeply at high . The (¯−¯) difference is positive, in better agreement with the global PDF analyses which use E866 Drell-Yan data up to ∼ 0.1 but differs at higher values of by up to two standard deviations. At low 0.023, the fit confirms the unsuppressed strange PDF as observed in the ATLASepWZ16 PDF set, while it maintains a positive (¯−¯) distribution at high .
(Taiwan), RAL (UK) and BNL (USA), the Tier-2 facilities worldwide and large non-WLCG resource providers. Major contributors of computing resources are listed in Ref. [52].

A Correlations between data sets
The correlation model used for the ATLAS data is summarised in Table 4, where the labels used are the same as those in the HEPData entries of the respective ATLAS + jets [17,53] and + jets [18,54] publications. Table 4: Correlation model for the systematic uncertainties of the ATLAS measurements of + jets and + jets at 8 TeV and inclusive and at 7 TeV. Each row corresponds to one source of systematic uncertainty treated as fully correlated both within and across data sets. The respective ATLAS publication describing each of these sources in detail is given. Sources in different rows are uncorrelated with each other. Each source is reported with the label of the systematic uncertainty used in the respective data set. Where entries are omitted, that systematic uncertainty either does not exist for that data set (denoted by a '-') or it was left decorrelated from the others (denoted by a '*'). For the row of source " miss T resolution", where the "MetRes" label in one column corresponds to "MetResLong" and "MetResTrans" labels in the other, this indicates that the "MetResLong" and "MetResTrans" uncertainties were combined in quadrature for one-to-one correlation. Additional uncertainties not reported are treated as uncorrelated between different data sets. Systematic uncertainties related to the jet energy scale, jet energy resolution, rejection of jets from pile-up (JVF), missing transverse momentum ( miss T ) scale, miss T resolution, electron energy scale and electron energy resolution are taken to have a one-to-one correlation between the 8 TeV data sets. Additionally, systematic uncertainties related to electron efficiency scale factors (trigger, reconstruction and isolation) as well as luminosity are considered fully correlated between data sets of the same centre-of-mass energy, but are uncorrelated between 7 TeV and 8 TeV data. The systematic uncertainty related to the top-background cross section in + jets data is taken as correlated with the top-quark pair production cross-section uncertainty in the + jets data as this is the largest of the top-quark-related background contributions in both data sets. Similarly, the systematic uncertainty related to the total diboson cross section in the + jets data is taken as correlated with the systematic uncertainty in the + jets data related to the production of two bosons, as this is the highest background contribution. In contrast, systematic uncertainties related to the multĳet background are estimated independently in each measurement, and are therefore left uncorrelated. The two systematic uncertainties in each of the + jets and + jets spectra related to unfolding (labelled in HEPData as "UnfoldOtherGen" and "UnfoldReweight" in the + jets data, and "ATL_unfold_Data" and "ATL_unfold_MC" in the + jets data) 6 are fully decorrelated between spectra and bins within a single spectrum (in addition to the aforementioned statistical uncertainties) as they contain a large statistical component in both data sets owing to MC simulation statistics. Treating this source of uncertainty as correlated between all + jets bins, for example, increases the 2 by approximately 200 for 30 data points, despite insignificant changes to the resulting PDFs. Two systematic uncertainties in the + jets data set related the miss T resolution are summed into a single component for one-to-one correlation with the miss T resolution systematic uncertainty in the 7 TeV data set. A further ten systematic uncertainties in the + jets data correspond to a single component related to the jet energy scale in the 7 TeV data; it is preferred to keep the more detailed model of ten sources in the 8 TeV data as they have a significant impact on the + jets measurements and the impact of the single source in the 7 TeV data is small. Additionally, two systematic uncertainties related to pile-up dependence of the jet energy scale, one systematic uncertainty related to jet energy resolution and one further related to the miss T scale, are also correlated between 7 TeV and 8 TeV data sets, for a total of five correlated components. Cross-checks have been performed demonstrating that alternative models, for example partially correlating the luminosity uncertainties between 7 and 8 TeV data or leaving all systematic uncertainties uncorrelated between 7 and 8 TeV data and using combined 7 TeV and data, provide similar resulting PDFs.