Parton distributions in the SMEFT from high-energy Drell-Yan tails

The high-energy tails of charged- and neutral-current Drell-Yan processes provide important constraints on the light quark and anti-quark parton distribution functions (PDFs) in the large-x region. At the same time, short-distance new physics effects such as those encoded by the Standard Model Effective Field Theory (SMEFT) would induce smooth distortions to the same high-energy Drell-Yan tails. In this work, we assess for the first time the interplay between PDFs and EFT effects for high-mass Drell-Yan processes at the LHC and quantify the impact that the consistent joint determination of PDFs and Wilson coefficients has on the bounds derived for the latter. We consider two well-motivated new physics scenarios: $1)$ electroweak oblique corrections $(\hat W, \hat Y)$ and $2)$ four-fermion interactions potentially related to the LHCb anomalies in $R(K^{(*)})$. We account for available Drell-Yan data, both from unfolded cross sections and from searches, and carry out dedicated projections for the High-Luminosity LHC. Our main finding is that, while the interplay between PDFs and EFT effects remains moderate for the current dataset, it will become a significant challenge for EFT analyses at the HL-LHC.


Introduction
The Large Hadron Collider (LHC) at CERN is currently leading the exploration of the high-energy frontier. Thanks to a wealth of precision experimental measurements, combined with the staggering progress in precise theoretical calculations and simulations, the Standard Model (SM) of particle physics is being stress-tested at high energies to an unprecedented level. The hunt for deviations from the SM predictions, which could signal the long-sought evidence for beyond-the-SM (BSM) dynamics, is being pursued at the LHC with two complementary strategies. On the one hand, by means of direct searches for new particles that might be light enough to be produced on shell. On the other hand, with indirect searches aiming to identify a consistent pattern of deviations in the properties of known particles, such as their coupling strengths, which would be induced by new particles and interactions whose characteristic energy scale lies above the direct reach of the LHC.
In the case of direct searches, the target is identifying an abrupt deviation from the SM, for example in terms of a resonance peak on top of a smoothly falling background, or with a systematic mismatch in the number of measured events as compared to the SM expectations for non-resonant searches. Indirect searches instead often aim at pinning down subtle distortions with respect to the SM predictions, such as a few-percent variation in the value of some production cross sections or decay rates, induced by yet-unknown particles and interactions. A powerful framework to identify, parametrise, and correlate in a model-independent manner the possible deviations from the SM predictions that arise in such indirect searches is provided by the effective field theory (EFT) framework, such as the Standard Model EFT (SMEFT) (e.g. [1][2][3][4][5][6][7][8][9][10][11][12][13][14]) or the Higgs EFT (HEFT) (e.g. [15][16][17][18][19][20][21][22]).
It is a common misconception that indirect BSM investigations based on EFTs are relevant only for low-energy phenomena, while instead direct searches dominate the sensitivity for high-p T observables. On the contrary: accurate measurements of observables at high energies offer one of the most promising avenues towards a discovery of BSM physics at the LHC. While the collider has now (almost) reached the design energy, its integrated luminosity continues to grow steadily, thus greatly facilitating dedicated studies of the (currently statistics-limited) high-energy tails of distributions. With this motivation, several high-p T LHC cross sections have been studied as indirect probes of new physics in the EFT framework, from the (di)lepton distributions in neutral-and charged-current Drell-Yan (DY) distributions  (the main subject of this work) to inclusive jets and dijets [49][50][51][52], top quark pair distributions [53][54][55][56][57][58][59][60][61], gauge boson pair production [61][62][63][64][65][66][67], and vector boson scattering [67][68][69][70], among several others.
Furthermore, many higher-dimensional EFT operators induce deviations from the SM which grow with the energy of the partonic collision, enhancing the BSM sensitivity of these high-energy tails. For instance, in the Drell-Yan process, the naïve scaling of the ψψ → ψψ scattering amplitude in an underlying EFT description leads to A ∝ E 2 /Λ 2 , where E is the energy of the process and Λ is the new physics scale. Thus, rather generically, a less precise measurement of a high-mass tail can compete with a low-energy precision measurement due to this energy enhancement, which can be traced back to the preservation of unitarity. This property leads to the somehow unintuitive result that the study of the high-energy lepton tails in the Drell-Yan process represents a competitive probe of BSM dynamics compared to electroweak precision tests and low-energy flavour physics test.
In order for the EFT interpretation of these high-p T cross sections to convincingly uncover a BSM effect, it becomes crucial to ensure full control over the SM inputs and their uncertainties, such as those associated to the hard-scattering (partonic) cross sections or the parton distribution functions (PDFs) of the nucleon [71]. In this context, one of the key challenges hampering the applicability of the EFT programme to the LHC high-energy tails is indeed related to the treatment of the PDFs. These are determined from experimental data assuming the validity of the SM, often using exactly the same processes that enter EFT analyses. This observation naturally prompts the question of how can one make sure that eventual BSM deviations arising in high-p T tails are not being inadvertedly reabsorbed into the PDFs.
A first take on this challenge was presented in a proof-of-concept study [72] where the simultaneous determination of PDFs and EFT coefficients from deep inelastic scattering (DIS) structure functions was demonstrated. There, it was found that the EFT corrections can indeed be partially reabsorbed into the PDFs but also that it is possible to robustly disentangle QCD and BSM effects by exploiting their different energy scaling. The main goal of the present work is to extend this approach to LHC processes, specifically with the joint determination of PDFs and EFT coefficients from DIS and Drell-Yan data. Drell-Yan processes in general, and high-mass measurements in particular, provide information on the light quark and anti-quark PDFs in a broad region of x representing an important ingredient in modern global PDF fits [73][74][75][76]. Furthermore, high-mass Drell-Yan data will be instrumental at the High-Luminosity LHC (HL-LHC) to pin down the large-x PDFs [77]. Considering that EFT signals can lead to significant deviations from the SM in these same high-energy DY tails, one would like to assess to what extent they can be reabsorbed into the PDFs and to define strategies to separate QCD from BSM effects.
In this work, in order to interpret the Drell-Yan data in the EFT framework we formulate simple, yet motivated, BSM benchmark scenarios, which are chosen to represent a wide class of UV-complete theories. In the first scenario [32], we consider theŴ andŶ electroweak parameters generated in universal theories that modify the electroweak gauge boson propagators and lead to flavour-universal deviations which grow with the invariant mass. In the second benchmark [27], we consider a flavour-specific scenario motivated by the evidence of lepton flavour universality (LFU) violation in B-meson decays recently reported by the LHCb collaboration [78][79][80][81]. Our analysis accounts for available unfolded high-mass Drell-Yan cross section data, detector-level searches based on the full Run II luminosity, and dedicated HL-LHC projections. We do not consider other data beyond the DIS and DY processes, in order to ensure a theoretically consistent description of the parton distributions in the SMEFT. The price to pay for this reduced dataset is the information loss on some flavour combinations, specifically on the gluon PDF.
The structure of this paper is as follows. First of all, in Sect. 2 we discuss the EFT benchmark scenarios that will be used in the fits. Then in Sect. 3 we summarise the datasets used in our analysis and the corresponding theoretical calculations, both in the SM and in the SMEFT, and we discuss their impact on a PDF fit. We also describe the methodology we use to simultaneously fit PDFs and SMEFT coefficients, and how we deal with several sources of uncertainties in the fits. In Sect. 4 we present the results for the simultaneous determination of the SMEFT coefficients and the PDFs from the available high-mass DY data from LHC Run I and Run II, in the two scenarios presented in Sect. 2 and assess how they modify the interpretation of BSM searches based on the SM PDFs. In Sect. 5 we present a summary of the constraints we find on the two scenarios we consider and we assess the outcome of a joint PDF and EFT analysis at the HL-LHC. Finally, we will outline our main conclusions and possible future developments in Sect. 6.
More technical discussions are collected in the appendices, and include detailed comparisons of the SM PDF fits produced in this work with previous NNPDF global fits (App. A), the quantitative assessment of the fit quality to the various input datasets (App. B), a benchmarking study for the calculation of EFT cross sections (App. C), and a study of the flavour dependence of the SMEFT PDFs (App. D).

SMEFT benchmark scenarios
In this section we present the two SMEFT benchmark scenarios that will be used in this work to interpret the LHC Drell-Yan processes. The first scenario belongs to the class of electroweak precision tests and is sensitive to a broad range of UV-complete theories proposed in the literature. The second benchmark represents a consistency check of the existing hints of lepton universality violation in rare B-meson decays reported by the LHCb collaboration. Both scenarios highlight the interplay between the PDFs and the EFT dynamics, illustrating in particular how the former changes and how constraints to the latter are modified.

Benchmark I: oblique correctionsŴ andŶ
The oblique corrections, as originally proposed in [82,83], play a key role in testing theories beyond the Standard Model. They parametrise the self-energy Π V (q 2 ) of the electroweak gauge bosons W a µ and B µ , where V = W 3 W 3 , BB, W 3 B, and W + W − . Truncating the momentum expansion at order q 4 , while imposing proper normalisation and symmetry constraints, one concludes that there are only four oblique parameters which can be identified with dimension-six operators in the SMEFT. These are the well-knownŜ,T ,Ŵ , andŶ parameters [84]. The parametersŜ andT are well constrained from precision LEP measurements [84] and grow slowly with q 2 , whileŴ andŶ scale faster implying that their effects will be enhanced for the high-energy dilepton tails at the LHC [32]. Whilê T = O(q 0 ) andŜ = O(q 2 ) , instead one has thatŴ ,Ŷ = O(q 4 ). In the universal basis (see e.g. [12]), theŴ andŶ parameters are the Wilson coefficients associated to the following two operators: where m W indicates the W -boson mass, and D ρ is the covariant derivative. The physical effects of the operators in Eq. (2.1) on the Drell-Yan process arise from the difference in the propagators through the self-energy modifications, see Eq. (1) of Ref. [32]. Alternatively, using the equations of motion, the same operators can be rotated to a basis in which the modifications to the Drell-Yan cross sections are instead captured by four-fermion contact interactions, where J L and J Y are SU (2) L and U (1) Y conserved fermionic currents, Here q, l are the SM quark and lepton left-handed doublets, while u, d, e are the righthanded singlets. Also, g and g Y are the corresponding electroweak gauge couplings, while σ a are the Pauli matrices, and the hypercharge Y f = 1/6, −1/2, 2/3, −1/3, and −1 for q, l, u, d, e, respectively. Summation over flavour indices is assumed, which implies that in this scenario the fermionic currents respect the U (3) 5 global flavour symmetry. Expanding Eq. (2.2), one can relate theŴ andŶ parameters to the coefficients of dimension-six operators in the Warsaw basis [5]. There, the operators relevant to the description of the Drell-Yan process are given by lq = (lσ a γ µ l)(qσ a γ µ q) .

(2.4)
Again, the flavour indices are contracted within the brackets, for example (lγ µ l) ≡ (l 1 γ µ l 1 + l 2 γ µ l 2 +l 3 γ µ l 3 ). Taking into account this matching between theŴ andŶ parameters and the corresponding Wilson coefficients in the Warsaw basis, we can express the SMEFT Lagrangian in this scenario, Eq. (2.2), as follows The parametrisation in Eq. (2.4) has been implemented using the SMEFTsim package [85] and cross-checked against the reweighting method used in Ref. [27] (see also [37]), as will be discussed in Sect. 3.2 and in Appendix C. The analysis in the Ref. [32] reports the following 95% confidence level intervals onŴ assumingŶ = 0,Ŵ These bound have been computed by assuming SM PDFs. In our analysis, for this benchmark scenario, we see how the limits based on SM PDFs are modified once a consistent determination of the SMEFT PDFs is done, requiring a simultaneous fit of the PDFs together with theŴ andŶ parameters from the high-mass Drell-Yan distributions.
where v ≈ 246 GeV is the Higgs vacuum expectation value and C U µ ij and C Dµ ij represent matrices of Wilson coefficients. In Eq. (2.8), i, j = 1, 2, 3 indicate quark flavour indices, and we have chosen to focus on those operators that couple the quark fields exclusively to the second lepton family.
The operators highlighted in Eq. (2.8) have received a lot of attention recently in the context of the LHCb anomalies reported in rare B-meson decays [78][79][80][81]. The reason is that the CKM-like flavour structure relates the b → sµ + µ − decays to the neutral-current Drell-Yan process at the LHC p p → µ + µ − [27]. The explicit models which successfully describe the LHCb anomalies, based on the U (2) flavour symmetry and dominant dynamics with the third generation fermions [88][89][90][91], predict that the flavour channel dominating EFT effects in the Drell-Yan production is bb → µ + µ − (see [92] for an explicit model example). The direct bs production channel is suppressed by V ts and is therefore irrelevant. If the observed deviations in R(K ( * ) ) are due to new physics, in this class of models we generically expect |C Dµ 33 | 0.001. The ATLAS dimuon search reported in Ref. [93] is recast in Ref. [27] to set the limit on this scenario. In particular, the reported 95% confidence level interval is For this second benchmark scenario we will assume that, out of the operators listed in Eq. (2.8), only a single Wilson coefficient is allowed to be non-zero. Specifically, we allow C Dµ 33 = 0, while setting to zero all the coefficients of the other four-fermion operators. This assumption implies that in this scenario the SMEFT Lagrangian is reduced to In contrast to the previous benchmark, now the electron channel is SM-like. This feature provides a useful handle to separate PDF and EFT effects in the Drell-Yan process, by using electron data to determine the former and muon data to constrain both. Another difference with respect to the first benchmark is that here the leading new physics effects arise at the dimension-six squared level, since the interference of the operator in Eq. (2.10) with the SM is subleading [27]. (Dijet production is another prominent process relevant for flavour physics [94] that enters into PDF fits. A detail study of the interplay is left for future work.) To summarise, in this second benchmark scenario there is a single non-zero Wilson coefficient, C Dµ 33 . Therefore, the determination of the SMEFT PDFs from Drell-Yan data requires a simultaneous fit of the PDFs together with the C Dµ 33 parameter. One should also note that this operator enters in the description of the DIS neutral-current structure functions via µ b scattering, though this contribution is highly suppressed due to the smallness of the bottom PDFs and the low energy scale probed by DIS data.
3 Experimental data, theory predictions, and fit settings In Sect. 3.1 we present the LHC experimental data that will be used in the present analysis for the simultaneous determination of the PDFs and the EFT coefficients from high-mass Drell-Yan cross sections. We then describe in Sect. 3.2 the corresponding theoretical calculations, both in the SM and in the two SMEFT benchmark scenarios described in Sect. 2. In Sect. 3.3 we discuss the settings of the baseline SM PDF fit and assess the specific impact of the Run I and Run II high-mass Drell-Yan data on PDFs. Finally, in Sect. 3.4 we outline the fitting methodology adopted for the determination of the PDFs in the SMEFT, along with their simultaneous determination with the EFT Wilson coefficients.

Experimental data
The present analysis is based on the DIS and DY measurements which were part of the strangeness study of [95], which in turn was a variant of the NNPDF3.1 global PDF determination [74], extended with additional high-mass DY cross sections. The DIS structure functions include the same legacy HERA inclusive combination [96] used in the DIS-only joint fit of PDF and EFT effects of [72].
No other datasets beyond DIS and DY are considered. In particular, the inclusive jet and top quark production measurements used in [95] are excluded from the present analysis. The rationale behind this choice is the following. SMEFT at dimension-6 level introduces 2499 independent parameters, many of which contribute to the processes used to extract the parton distribution functions. The full PDF fit in the SMEFT (with the consistent power counting in the inverse powers of the new physics scale) is the ultimate future goal of this line of research. Before that, we are forced to make assumptions about the subset of operators and processes involved. The restricted choice of DIS and DY is motivated by the idea that other datasets, such as inclusive jet, could potentially receive corrections from other SMEFT operators, e.g. four-quark operators while being insensitive to the semi-leptonic operators. Including all datasets to effectively determine PDF, while considering one or two operators able to impact a subset of processes, would misrepresent the realistic case.
For the purposes of our study, the DY data can be classified into low-mass, on-shell, and high-mass datasets. Table 3.1 summarises the low-mass and on-shell datasets, where in each case we indicate the experiment, the centre-of-mass energy √ s, the publication reference, the physical observable, and the number of data points. The only difference as compared to [95] is the removal of the W → eν asymmetry measurements from D0 [97], which were found to be inconsistent with the rest of the Drell-Yan data. In Table 3.2 we provide the same information as in Table 3.1 but for the neutral-current high-mass Drell-Yan datasets. In Table 3  distribution is 1D or 2D (thus differential only in the lepton invariant mass or differential in the lepton invariant mass and rapidity), the integrated luminosity L, and the values of the dilepton invariant mass m for the most energetic bin. We note that while the ATLAS and CMS measurements at √ s = 7 TeV [120,121] were already part of the strangeness study of [95], the corresponding 8 TeV and 13 TeV measurements from [86,87,122] were not and are being considered for the first time in this analysis. For those datasets where data are available in terms of both Born and dressed leptons, the ATLAS 7 TeV analysis being an example thereof, we use the Born data so that it is not necessary to supplement   Table 3.1 for the neutral-current high-mass Drell-Yan datasets considered in this work. We also indicate the final-state, whether the distribution is 1D (which are differential in the invariant mass, m , of the final-state leptons) or 2D (which are differential in both the invariant mass of the leptons, m , and in their rapidity, y ), and the values of m for the most energetic bin. Datasets indicated with (*) are used for the first time in this analysis in comparison with [95].
our fixed-order predictions with final-state QED radiation corrections. The CMS 13 TeV data on the other hand are only provided in terms of dressed leptons. In total, there are either 270 or 313 data points in this high-mass category, depending on whether the 13 TeV CMS data are included in the combined channel or in the separate electron and muon channels. From Table 3.2 one can observe that, with the exception of the CMS 13 TeV data, only one specific leptonic final state is available to be used in the fit. For the CMS 13 TeV measurement instead, one can select between the combined channel or the individual electron and muon final states, which are statistically independent. The separate use of the electron and muon channels is potentially beneficial when considering BSM effects that are not lepton-flavour universal. For example, in benchmark scenario II described in Sect. 2, the theoretical predictions for the DY electron data would be those of the SM while those of the muon data should include EFT corrections. On the other hand in the (flavouruniversal)Ŵ andŶ scenario, it is more convenient to include the data from the combined channel, which displays reduced systematic uncertainties.

Theoretical predictions
We now discuss the settings of the theoretical calculations, both in the SM and in the SMEFT. Appendix C contains further information regarding the computation and benchmarking of the SMEFT corrections for both the DIS structure functions, for which the effect in both scenarios is negligible, and the DY cross sections, for which the impact of SMEFT corrections is much more sizeable.
SM cross sections. The SM cross sections are computed at next-to-next-to-leading order (NNLO) in QCD and include next-to-leading order (NLO) EW corrections, the latter being especially significant in the high-mass region relevant for this study. In particular, the DIS reduced cross sections (combinations of structure functions) are evaluated at NNLO in the FONLL-C general-mass variable flavour number scheme [123] with APFEL [124] interfaced to APFELgrid [125]. The Drell-Yan differential distributions are computed using MCFM [126] and MadGraph5 aMC@NLO [127] interfaced to APPLgrid [128] and APFELgrid to generate fast NLO interpolation tables which are then supplemented by bin-by-bin K-factors to account for the NNLO QCD and NLO EW corrections. These K-factors are defined as where ⊗ represents the standard convolution product, dσ pp (d σ ij ) is the short-hand notation for the bin-by-bin hadronic cross section (partonic cross section for partons i, j) differential in m (in case of neutral-current (NC) Drell-Yan) or m T (in case of charged-current (CC) Drell-Yan) and the partonic luminosities L ij are defined as where m = m in the NC case and m = m T in the CC case and are evaluated at NNLO. The QCD and EW K-factors are defined as The NNLO QCD K-factors have been computed using either MATRIX [129] or FEWZ [130] and cross-checked with the analytic computations of [131,132]. The NLO EW K-factors have been evaluated with MadGraph5 aMC@NLO [127]. Eq. (3.4) accounts also for photon-initiated contributions (using the NNPDF3.1QED PDF set [133]) and final-state radiation effects, except when the latter has already been subtracted in the corresponding experimental analysis. Fig. 3.1 displays a comparison between the CMS Drell-Yan distributions at 13 TeV and the corresponding theoretical predictions as a function of the dilepton invariant mass m , separately for the dielectron and dimuon final states. The theory calculations are presented at NLO QCD, NNLO QCD, and NNLO QCD combined with NLO EW corrections, in all cases with NNPDF3.1QED nnlo as 0118 as input PDF set, to illustrate the effect of the K-factors of Eq. (3.3) and (3.4). The CMS data are provided in terms of dressed leptons, and hence final state radiation (FSR) QED effects must be included in the electroweak corrections. Accounting for these effects is essential to improve the agreement between theory and data in the region below the Z-mass peak. NLO electroweak corrections are also important in the high-energy tail in m , where they are driven by the interplay between (negative) virtual EW effects and (positive) photon-initiated contributions.
A quantitative assessment of the agreement between theoretical predictions and experimental data for the high-mass DY datasets listed in Table 3.2 is presented in Table 3.3,  which collects the values of the χ 2 per data point evaluated using the full information on correlated systematics provided by the experimental covariance matrix (3.5) where T i are the theoretical predictions, D i the central value of the experimental data and where the multiplicative uncertainties in the experimental covariance matrix (cov ij ) are treated as explained in [134,135]. One can observe how in general the NNLO QCD corrections are relatively small and that the NLO electroweak ones can be significant, especially for observables presented in terms of dressed leptons (such as the CMS 13 TeV ones) and are required to achieve a good description of the Drell-Yan data in the whole kinematical range available. Note that the input PDF sets used for these calculations include only a subset of these Drell-Yan measurements, in particular only the 7 TeV measurements, for which the data-theory agreement is comparable to the one observed in [74]. The data-theory agreement before including the 8 TeV and 13 TeV data in the PDF fit is generally good, once EW corrections are included, with the exception of the CMS 13 TeV data in the e + e − channel, for which the χ 2 per data point remains above 2. As can be observed in Fig. 3.1, the dielectron invariant mass distribution in this channel presents dips at about 500 GeV and 900 GeV which are not present in the µ + µ − channel. These dips are the origin of this worse data-theory agreement, which is partially reduced once the dataset is included in the fit (see Sect. 3.3). We have verified that excluding this dataset from the fit does not change the results of the analysis, and therefore decided to keep it. Further experimental analysis based on the full Runs II and III datasets will tell whether the dips in the distributions in the electron invariant mass will stay.
SMEFT cross sections. In this work, we augment the SM calculations of the high-Q 2 DIS reduced cross sections discussed in Ref. [72] and the high-mass Drell-Yan cross sections listed in Table 3.2 with the effects of dimension-six SMEFT operators following the two benchmark scenarios presented in Sect. 2. These corrections are negligible for dilepton invariant masses of m ≤ 200 GeV and for DIS structure functions with Q ≤ (120) GeV, and hence there can safely adopt the SM calculations. In a similar manner as for higherorder QCD and EW corrections, we can define correction factors that encapsulate the linear and quadratic modifications induced by the dimension-six SMEFT operators. Adopting an operator normalisation such that with n op indicating the number of operators that contribute to a given benchmark scenario and c n being the (dimensionless) Wilson coefficient associated to O n , the linear EFT corrections can be parametrised as with L NNLO ij being the usual partonic luminosity evaluated at NNLO QCD, d σ ij,SM the binby-bin partonic SM cross section, and d σ ij,SMEFT the corresponding partonic cross section associated to the interference between O n and the SM amplitude A SM when setting c n = 1. Likewise, the ratio encapsulating the quadratic effects is defined as which allow us to express a general Drell-Yan or DIS cross sections accounting for the dimension-six operators in Eq. (3.6) as where the dσ SM is the state-of-the-art SM prediction including NNLO QCD and NLO EW corrections. In this approach, the SMEFT predictions inherit factorisable higher-order radiative correction [27,37]. The SMEFT K-factors in Eq. (3.9) are precomputed before the fit using a reference SM PDF set and then kept fixed. The effect of varying the input NNLO PDF in Eqns. (3.7) and (3.8) is quantitatively assessed in Appendix C and it is found to be at the permil level in Scenario I and slightly more significant but still at most at the percent level in Scenario II. As a result, this effect will be neglected in the following. Further details about the implementation and validation of these K-factors can be found in Appendix C. Fig. 3.2 illustrates the size of the EFT corrections in benchmark scenario I from Sect. 2 by comparing (K EFT −1) with the relative experimental uncertainties for the ATLAS 7 TeV, CMS 8 TeV, and CMS 13 TeV Drell-Yan m distributions. We provide results for two representative points in the (Ŵ ,Ŷ ) parameter space, namely (Ŵ ,Ŷ ) = (10 −3 , 0) and (0, 10 −3 ). One can observe how for these values of (Ŵ ,Ŷ ), and particularly for the ATLAS 8 TeV data, the SMEFT corrections to the Drell-Yan cross sections become comparable with the experimental uncertainties, increasing steadily with m .

Baseline SM PDFs
We now describe the fit settings used to assemble the PDF set that represents the baseline SM PDFs in this work. In other words, the results that we present in this section correspond  to PDFs extracted from the experimental data using the SM theoretical predictions, and then in the next section we will asses how these PDFs are modified once EFT corrections to the DIS and DY cross sections are accounted for in the fit of the PDFs.
The settings for this baseline SM PDF fit are the same as those used in the strangeness study of [95], itself a variant of NNPDF3.1 [74]. As described in Sect. 3.1, in this work we consider only DIS and Drell-Yan datasets, with the latter augmented as compared to [95] with the new high-mass measurements indicated in Table 3.2. A detailed comparison between this baseline SM PDF and the NNPDF3.1 str set from [95] is provided in Appendix A, while Table B.2 in App. B details the breakdown of the χ 2 for all the datasets that enter the fit. In general, the fit quality of the baseline SM PDF set is similar to that of the global fit of [95], although the description of the CMS 13 TeV invariant mass distribution in the combined electron and muon channels remains sub-optimal. Fig. 3.3 displays a comparison between this baseline SM PDF set, labelled "DIS+DY", with the same fit but without any datapoints from the high-mass DY datasets listed in Table 3.2, labelled "DIS+DY (no HM)". We show results at Q = 100 GeV both for the PDFs normalised to the central value of the baseline and for the relative PDF uncertainties. In the latter case we also display the PDF uncertainties from a corresponding DIS-only fit. The latter comparison shows that the DY cross sections significantly reduce the PDF uncertainties of the DIS-only fit. The addition of the high-mass DY data leads to a visible  Comparison between the baseline SM PDF set of this work, labelled "DIS+DY", with the corresponding fit without high-mass DY data. We show results at Q = 100 GeV for PDFs normalised to the central value of the baseline (upper) and for the relative PDF uncertainties (lower panels). In the latter case, we also display the PDF uncertainties from the DIS-only fit. uncertainty reduction in the 0.005 x 0.3 region as compared to the "DIS+DY(noHM)" reference as well an upwards shift of the up and down quarks and antiquark PDF.
We therefore find that the available high-mass DY data can have an appreciable impact on the light quark and antiquark PDFs, despite the fact that in terms of Run II data our analysis is restricted to a single low-luminosity high-mass DY dataset. Yet more stringent constraints on the PDFs are expected from the measurements based on the full Runs II and III datasets, as well as from those to be provided by the HL-LHC [77]. We study the anticipated impact of the HL-LHC measurements in Sect. 5.

Methodology for the simultaneous PDF and EFT fits
Let us denote by c = (c 1 , c 2 , . . . , c Nop ) the array containing the Wilson coefficients associated to the N op dimension-six operators contributing to a given SMEFT scenario, where c n are defined as in Eq. (3.6). For each point c i in the scan of the EFT parameter space, we evaluate the Drell-Yan and the DIS cross sections as described in Sect. 3.2. Subsequently, we determine the best-fit PDFs associated to c i by means of the standard NNPDF methodology, which determines the minimum of the χ 2 in the space of the PDF parameters (subject to cross-validation, to avoid overlearning). We note that this χ 2 , defined in Eq. (3.5), keeps fully into account the experimental systematic correlations among all the measurements D i included in the PDF analysis.
This procedure results in a sampling of the χ 2 values in the EFT parameter space, which we denote by χ 2 eftp (c i ) (as in: EFT-PDFs). Alternatively, one could also evaluate the same DIS and DY cross sections using instead the baseline SM PDF set, ending up with χ 2 values which we denote by χ 2 smp (c i ) (as in: SM-PDFs). The comparison between the resulting bounds on the EFT coefficients obtained from χ 2 eftp (c i ) and from χ 2 smp (c i ) quantifies the relevance of producing consistent joint determinations of PDFs and Wilson coefficients when studying EFTs in high-energy tails. This strategy follows the one adopted in our proof-of-concept DIS-only study [72], now extended to LHC processes.
Close enough to a local minimum χ 2 0 = χ 2 c (0) associated with best-fit values c (0) , the χ 2 as a function of the EFT coefficients can be approximated by a quadratic form with H nm being the usual Hessian matrix in the EFT parameter space. Restricting the EFT calculations to their linear, O Λ −2 , contributions, Eq. (3.11) becomes exact in the case of χ 2 smp (c i ) (where cross sections are evaluated with SM PDFs). The reason is that in this case all dependence on the EFT coefficients is encoded in the partonic cross sections.
However, this is not true for χ 2 eftp (c i ), since now there will be a (non-linear) EFT back-reaction onto the PDFs and hence Eq. (3.11) is only valid up to higher orders in the EFT expansion, even if the EFT cross sections themselves are evaluated in the linear approximation. Eq. (3.11) can thus be only considered a reasonable approximation in the case that the SMEFT PDFs are not too different from their SM counterparts.
Hence, if we work with linear EFT calculations, provided the sampling in the EFT parameter space is sufficiently broad and fine-grained, and that the EFT-induced distortion on the PDFs is moderate, we can extract the parameters χ 2 0 and c (0) and the Hessian matrix H using least-squares regression from Eq. (3.11), using χ 2 smp for the SM PDFs and χ 2 eftp for the SMEFT PDFs. The associated confidence level contours are determined by imposing where this constant depends on the number of degrees of freedom. For linear EFT twoparameter fits, such as those for benchmark scenario I in the context of HL-LHC projections, imposing Eq. (3.12) leads to elliptic contours in the (Ŵ ,Ŷ ) plane.
In the case of fits to χ 2 profiles obtained from EFT calculations which include both linear, O Λ −2 , and quadratic, O Λ −4 , contributions, such as those arising in benchmark scenario II, rather than working in the Hessian approximation we instead carry out a (onedimensional) quartic fit of the form with the χ 2 values being χ 2 smp (χ 2 eftp ) for the SM (SMEFT) PDFs, and then determine confidence level intervals by imposing ∆χ 2 (c) = χ 2 i (c) − χ 2 0 = constant. We determine this constant numerically by finding the likelihood contour, L ∝ exp(−χ 2 /2), containing 95% of the total probability (for the 95% CL intervals).
To conclude this section, we give details on how we account for PDF uncertainties and the statistical uncertainty associated to the finite replica sample of the NNPDF Monte Carlo sets that we use here.
PDF uncertainty. In Sects. 4.1, 4.2 and 5.3 we will present bounds on the EFT parameters using the SM PDFs with and without the PDF uncertainties being accounted for. In order to estimate these, we follow the procedure detailed above to determine the confidence level intervals for the EFT parameters but now using the kth Monte Carlo replica of the PDF set, rather than the central replica k = 0 as done when PDF uncertainties are neglected. One ends up with N rep values of the upper and lower bounds: and then the outermost bounds in the 68% envelope are considered to be the bounds on the EFT parameters c, now including the 1σ-PDF uncertainty. This is very important to account for, given that in the case of the bounds determined using χ 2 eftp , the PDF uncertainty is already included by construction, given that the Wilson coefficients are determined from the global set of PDFs, exactly as in the case of the α s determination from a global set of PDFs of [136,137]. A more sophisticated way to extract parameters such as α s of the Wilson coefficients from a global fit of PDFs, that includes the correlations between these parameters and the PDFs, is given by the correlated replica method proposed in the more recent α s determination in [138]. The latter would allow better accounting of the correlations between Wilson coefficients and PDFs. However we do not use it here due to the fact that these correlations of the PDFs with the Wilson coefficients are much smaller than those with the strong coupling constant and due to its large computational cost. We endeavour to address this issue in future work.
Methodological uncertainty. In a simultaneous fit of PDFs and EFT coefficients, for each set of Wilson coefficients c i one has a PDF fit composed of N rep Monte Carlo replicas.
The major methodological uncertainty is associated to finite-N rep effects can be estimated by bootstrapping across the replicas, as explained in the α s (m Z ) extraction of [138]. Specifically, for each value of c i we perform N res re-samples of all N rep replicas with replacement, and compute the theory predictions: such that there are N res re-samples each composed of an N rep -sized array of theory predictions. Since this re-sampling is done with replacement, it differs from the original sample in that it contains duplicates and missing values. The average theory prediction is then obtained for each of these bootstrapped sets: These bootstrapped theory predictions T i,l are used to evaluate the χ 2 to data, with the finite-size uncertainty given by the standard deviation across each bootstrap re-sample: A value of N res 10 4 re-samples is found to be sufficient to achieve stable results for the estimate of the finite-size uncertainties defined by Eq. (3.17).

Results
In this section, we start by presenting results for the SMEFT PDFs extracted from DIS and Drell-Yan data in benchmark scenario I. We compare them with their SM counterparts at the level of partonic luminosities and assess how the bounds obtained on theŴ andŶ parameters in this simultaneous SMEFT and PDF fit compare to those based on assuming SM PDFs. We then investigate the sensitivity of available high-mass Drell-Yan data to benchmark scenario II, where only the dimuon final state is modified by EFT effects. Finally, we quantify the impact that the consistent use of SMEFT PDFs has on the reinterpretation of high-mass dilepton BSM searches.

PDF and EFT interplay in current high-mass Drell-Yan data
By deploying the methodology described in Sect. 3.4, we have extended the PDF analysis based on SM predictions presented in Sect. 3.3 to account for the effects of non-zero EFT coefficients within benchmark scenario I defined in Sect. 2.1. Here, we present results for one-dimensional fits where only one of theŴ or theŶ parameter is allowed to be non-zero. The reason for this choice is that, in a fit including only high-mass neutral-current Drell-Yan processes, there exists a flat direction whenŴ andŶ are varied simultaneously, since both operators scale as q 4 and thus cannot both be constrained by a single 1D distribution. This degeneracy can only be lifted once high-mass charged-current DY data is included in the fit. As we demonstrate in Sect. 5, thanks to the HL-LHC it will be possible to carry out a simultaneous fit of the PDFs and the two EFT parameters (Ŵ ,Ŷ ).
Taking into account the existing bounds reported in Sect. 2, as well as the sensitivity of available high-mass Drell-Yan data to the EFT coefficients illustrated by Fig. 3.2, here we have adopted the following sampling ranges for theŴ andŶ parameters: 20,20] .  Table 3.1 and 3.2 that receive non-zero EFT corrections, namely the DIS datasets that have a reach in Q 2 above (120) 2 GeV 2 (namely HERA and NMC), and the ATLAS and CMS high-mass Drell-Yan measurements in Table 3.2. The use of such a partial χ 2 rather than the global χ 2 is a necessary approximation due to the limitation of our current methodology. The statistical fluctuations of the global χ 2 are significantly larger than those of the partial χ 2 and can only be tamed by running a very large batch of replicas for each benchmark point inŴ andŶ and by increasing the density of benchmark points in the region that is explored, as it was done for the scan of α s in Ref. [138]. However, while for the scan of α s all processes contribute to the parabolic behaviour of the ∆χ 2 , in this case the dominant contributions to χ 2 eftp come by far from the SMEFT corrections to the hard cross section of these processes, and from the changes in the PDFs induced by non-zero Wilson coefficients. The latter changes in PDFs are confined to the large-x light quark and antiquark distributions, which affect the high-mass Drell-Yan data. The analysis of the χ 2 eftp computed on the subset of data captures the dominant effects, while minimising the level of statistical fluctuations. A further approximation is given by the fact that only linear EFT effects are included in the calculation of the DIS and DY cross sections, while the (subleading) quadratic corrections are neglected in this scenario. The error bars in the ∆χ 2 i points of Fig. 4.1 indicate the methodological finite-size uncertainties evaluated with the bootstrapping method described in Sect. 3.4 and the horizontal line corresponds to the ∆χ 2 = 4 condition associated to a 95% CL interval. We also show in Fig. 4.1 the results of the associated parabolic fits, and likewise for ∆χ 2 (Ŷ ). From the results in Fig. 4.1, one observes that both theŴ and Y parameters agree with the SM expectation within uncertainties. Fig. 4.2 then compares the results of the parabolic fits based on the SMEFT PDFs as displayed in Fig. 4.1 with their counterparts obtained in the case of the SM PDFs. That is, in the latter case one carries out parabolic fits to the χ 2 smp values, as is customary in the literature for the EFT analyses. The insets highlight the region close to ∆χ 2 0. For theŴ parameter, the consistent use of SMEFT PDFs leaves the best-fit value essentially unchanged but increases the coefficient uncertainty δŴ , leading to a broader parabola.    Similar observations can be derived for theŶ parameter, though here one also finds a upwards shift in the best-fit values by ∆Ŷ 2 × 10 −4 in addition to a parabola broadening, when SMEFT PDFs are consistently used. We note that the SM PDF parabolas in Fig. 4.2 are evaluated using the central PDF replica and hence do not account for PDF uncertainties.  are consistently used instead of the SM PDFs (either without or with PDF uncertainties): and likewise for theŶ parameter.
In the specific case of the SM PDF results, Table 4.1 indicates the bounds obtained without (upper) and with (lower entry) PDF uncertainties accounted for; recall that the SMEFT PDF bounds already include PDF uncertainties by construction (see Sect. 3.4). The methodological (finite-size) uncertainty is included according to the approach described in Sect. 3.4 and it amounts to 4.7 · 10 −5 in the case ofŴ and 1.0 · 10 −4 in the case of Y , corresponding to 4% and 5% respectively of the 95% C.L. bounds for theŴ andŶ coefficients.
By comparing the bounds obtained when PDF uncertainties are accounted for to those neglecting PDF uncertainty, one observes a systematic broadening of the bounds from both the lower and upper limits, as was also reported in [72].
When PDF uncertainties are neglected (accounted for) when using the SM PDFs to constrain the EFT parameters, the consistent use of the SMEFT PDFs leads to both a shift in the best-fit values of magnitude ∆Ŵ = −2 × 10 −5 and ∆Ŷ = +1.6 × 10 −4 as well as to an increase (decrease) of the fit parameter uncertainties, with δŴ and δŶ growing by 15% and 12% (decreasing by 11% and 13%) respectively. This result shows that, given available Drell-Yan data and once PDF uncertainties are accounted for, the bounds on the EFT parameters are actually improved once SMEFT PDFs are adopted.
All in all, the effect of the consistent treatment of the SMEFT PDFs in the interpretation of high-mass DY cross sections is moderate but not negligible, either loosening or tightening up the obtained bounds on the EFT parameters (depending on whether or not PDF uncertainties are accounted for to begin with) by up to 15% and, in the case ofŶ parameter, shifting its central value by one-third of the 68% CL parameter uncertainty. Such a relatively moderate effect can be partly understood from the limited availability of high-mass DY measurements for EFT interpretations, with a single dataset at 13 TeV, and even in this case, with it being restricted to a small fraction of the Run II luminosity. As we will demonstrate in Sect. 5, the impact of SMEFT PDFs becomes much more significant once higher-statistics measurements of the NC and CC Drell-Yan tails become available at the HL-LHC, loosening the bounds onŴ andŶ by up to a factor 5.
Comparing the limits on theŴ andŶ parameters from Table 4.1 with those of Ref. [32] and reported in Sect. 2, we observe that our bounds are more stringent. There are two main reasons that could explain this difference. On the one hand, on top of the ATLAS and CMS high-mass DY cross sections at 8 TeV, we also include the corresponding 7 and 13 TeV data that provide additional weight to the high invariant mass region of the spectrum in the fit. On the other hand, in our analysis we fit the whole invariant mass spectrum and do not cut away the low m region below 120 GeV, thus we do not ignore the correlations between the low and high ends of the spectrum which are important even if the former is not affected by SMEFT corrections.
We now move to assess how the SMEFT PDFs relate to their SM counterparts, and determine the extent to which it is possible to reabsorb EFT effects into the PDFs.  In all cases, one finds that the EFT-induced shifts on the luminosities are smaller than their standard deviation. The biggest differences, relative to uncertainties, are observed in the quark-antiquark luminosities for m X 500 GeV. This finding can be understood from the fact that the NC Drell-Yan cross section is proportional to the uū and dd combinations at leading order, but the up and down quark PDFs are already well constrained by lowerenergy DIS measurements. Furthermore, we have verified that the size PDF uncertainties is unchanged in the SMEFT fits. The results of Fig. 4.3 are consistent with those of Table 4.1 and demonstrate that, with current data, the interplay between EFT effects and PDFs in the high-mass Drell-Yan tails is appreciable but remains subdominant as compared to other sources of uncertainty.
One important question in this context concerns how one could disentangle the EFTinduced shifts in the PDF luminosities displayed in Fig. 4.3 (see also the corresponding   PDF comparisons in Fig. D.1) from other possible sources of deviations, such as internal inconsistencies in some datasets or missing higher orders in the SM calculations. An attractive strategy in this respect is based on exploiting the energy-growing effects associated to the higher-dimensional EFT operators, which translate into an enhanced sensitivity to thê W andŶ parameters for large values of the dilepton invariant mass m . To this purpose, it is useful to define the following ratio: where m (max) is the upper bound on the value of the dilepton invariant mass bins that enter the χ 2 calculation. In Eq. (4.5), both the numerator and the denominator are evaluated using either χ 2 smp (for the SM PDFs) or χ 2 eftp (for the SMEFT PDFs), and the denominator corresponds to the χ 2 value (per data point) in the kinematic region for which EFT effects are negligible. 1 The R χ 2 estimator defined in Eq. (4.5) allows for the isolation of the contribution to the total χ 2 that arises from the high-m bins that dominate the overall sensitivity to thê W andŶ parameters. For small values of m (max) , say 200 GeV, one is cutting away all m bins with EFT sensitivity and hence one expects R χ 2 1. As m (max) is increased, the χ 2 will include the contributions from the m bins more sensitive to EFT effects, and , 0, 0)SMPDFs  thus one expects to find a large deviation with respect to the R χ 2 1 reference value. Furthermore, EFT effects should induce an approximately monotonic growth of R χ 2 with m (max) , which would instead be absent from other possible sources of PDF distortion and thus represent a smoking gun for BSM physics in the high-energy DY tails. These expectations are verified in Fig. 4.4, which displays the R χ 2 estimator (normalised to its SM value) as a function for m (max) for representative values of theŴ andŶ parameters both for the SM and the SMEFT PDFs, where the horizontal line indicates its reference SM value. Indeed we observe an approximately monotonic growth of R χ 2 arising from the energy-growing effects in the EFT. Due to the limited experimental information the binning in m is rather coarse, explaining the observed fluctuations. In the specific case of theŴ parameter, the SMEFT PDF curve lies slightly below the SM PDF one, highlighting how EFT effects are being partially (but not completely) reabsorbed into the PDFs.

EFT constraints on scenario II from current high-mass Drell-Yan data
In contrast to benchmark scenario I, which is flavour universal, the second SMEFT scenario to be explored in this work and described in Sect. 2.2 contains a four-fermion interaction involving muons but not electrons, which therefore modifies the rates of the dilepton process pp → µ + µ − but not those of pp → e + e − . This property implies that, without introducing further assumptions, the Wilson coefficient C Dµ 33 can be only constrained from DY measurements carried out in the dimuon (rather than in the dielectron or in the combined) final state. As indicated in Table 3.2, only the CMS data at 7 TeV and 13 TeV include DY distributions in the dimuon final state.
Due to these restrictions in the input dataset, the interplay between PDFs and SMEFT effects is expected to be milder as compared to the results presented in Sect. 4.1. For this reason, here we do not attempt to perform a joint determination of the PDFs and the C Dµ  , where here χ 2 smp includes only the contributions from the two available DY measurements in the dimuon final state. We present fits based on cross sections that account only for the linear, only for the quadratic, and for both the linear and quadratic terms in the EFT expansion. In all cases, these cross sections are computed using the baseline SM PDF set. The inset displays the outcome of the linear EFT fit with an enlarged x-axis range.
The results of Fig. 4.5 indicate that C Dµ 33 is essentially unconstrained at the linear EFT level, and only once quadratic corrections O(Λ −4 ) are accounted for is one able to obtain reasonable bounds on this coefficient. The reason for this behaviour is that for this operator the interference with the SM amplitude is suppressed, and hence the leading EFT effects arise at the quadratic level from the square of the EFT amplitude, thus being proportional to C Dµ 33 2 [27]. In the case of the polynomial fit to the χ 2 profile evaluated on the full quadratic EFT cross sections, we find the following 95% CL limits on this Wilson coefficient: which can be compared with the bounds on the same operator obtained in [27] from recasting the ATLAS dilepton search data of [93], given by Eq. (2.9). The fact that our bound in Eq. (4.6) is around a factor three looser than in Eq. (2.9) is explained because  The data (number of events per 10 GeV bin) from the ATLAS Z search from [139] in the di-electron (left) and di-muon (right) channels. We also display the theoretical predictions associated to the contributions from Drell-Yan and from the rest of the backgrounds, taken from the ATLAS publication.
the dilepton search data from [93] benefits from an extended coverage in m as compared to the available unfolded DY cross sections. The same result, this time for theŴ andŶ parameters, will be obtained in the next section where we assess the impact of the SMEFT PDFs in the EFT interpretation of the ATLAS dilepton search dataset.

On the EFT interpretation of high-mass dilepton searches
As mentioned above, a single high-mass DY cross section measurement is available at 13 TeV, and even in this case it is only based on a small subset of the Run II luminosity. As a consequence, the highest energy bin of this dataset is rather wide, m ∈ [1.5, 3.0] TeV. This implies limited sensitivity to deviations in the tails of DY distributions, for which using a large number of narrow bins is most beneficial to constrain heavy resonances, for instance.
Here we would like to quantify the interplay between PDF and EFT effects at the level of a recent ATLAS 13 TeV search for Z bosons in the dilepton channel [139] based on the complete Run II luminosity of L = 139 fb −1 . Since these are detector-level measurements, which cannot therefore be included in a PDF analysis, our aim is to use the SMEFT PDFs to investigate how the bounds on BSM physics are modified as compared to the standard approach based on computing theory predictions using SM PDFs.
The ATLAS data, displayed in Fig. 4.6, consist of event counts in 100 dilepton invariant mass, m , bins in both the dimuon and dielectron channels in the range m ∈ (225, 6000) GeV. We take this data from HEPdata [140] and denote the event count in the i th bin by n i . The narrow binning and broad m coverage allowed ATLAS to constrain Z masses to M Z 4 TeV. This is a much higher reach than the DY cross section measurements used for the SMEFT PDF fits in the previous subsections, and should therefore provide stronger constraints on the EFT benchmark scenarios described in Sect. 2. By including this search data in our study, we can investigate whether such strong constraints are sensitive to the EFT-induced modifications in the PDF luminosities highlighted in Fig. 4.3.
In order to constrain theŴ andŶ parameters in benchmark scenario I from the ATLAS dilepton search data, for each bin we compute a theory prediction y i = y i (Ŵ ,Ŷ ) given by the sum of background b i (top, diboson) and signal s i (Ŵ ,Ŷ ) (Drell-Yan) components. The ATLAS search provides an estimate of the total SM contribution (sum of top, diboson, and DY) without a breakdown into components. This estimate is provided as a continuous function of m . We thus estimate our background (top and diboson) by subtracting our own DY simulation from the estimated total SM event counts found by evaluating this function at each bin centre. We compute the DY signal in each bin as where s i,SM indicates the detector-level prediction for the i th bin of the m distribution evaluated at NLO QCD using MadGraph5 aMC@NLO [141], Pythia [142] and Delphes [143] and using the NNPDF31 nnlo as 0118 set as PDF input set. In Eq. (4.7), K(Ŵ ,Ŷ ) is the K-factor calculated as the ratio of cross sections in each bin, accounting for the impact of non-zero EFT correctionsŴ ,Ŷ = 0 both in the partonic cross section and in the PDFs where the integration in τ goes from τ min to τ max in each bin. For comparison, we will also present results where the K-factor is instead evaluated as usual in terms of the SM PDFs, The likelihood L is defined as the product of Poisson probabilities in each bin,  PDF uncertainties can be included by taking the confidence level intervals on the bounds given by the NNPDF Monte Carlo replicas, as it is explained in Sect. 3.4.
The results for the fits obtained by using SM PDFs in the K-factors, Eq. (4.9), compared to those obtained by using SMEFT PDFs, Eq. (4.8), are displayed in Fig. 4.7, with the corresponding bounds being provided in Table 4.2. We find that inclusion of PDF uncertainties has a much smaller impact on the parabolic fit than in the previous analysis and thus only the parabola including PDF uncertainties is displayed. Secondly, we observe that the shift in the bounds that one has using SM versus SMEFT PDFs is not entirely negligible. This finding indicates that it is important to use consistent PDFs determined with the same settings as the theoretical predictions in the partonic cross section. We can similarly recast the ATLAS search data to constrain the scenario II Wilson coefficient C Dµ 33 , finding the following constraints at 95% CL: 12) in good agreement with the results reported in Eq. (4.6).

Overview of current constraints
In order to summarise the results obtained in this section, Fig. 4.8 displays the 95% CL bounds derived on the EFT parametersŴ andŶ (in scenario I) and on C Dµ 33 (in scenario II), both from the high-mass DY cross section measurements ( the bounds derived in [32] for theŴ andŶ parameters from the ATLAS 8 TeV data and in [27] for the C Dµ 33 coefficient from the same ATLAS Z search data. As discussed above, our main findings are that the consistent simultaneous determination of the PDFs together with the EFT parameters leads to a moderate increase in the uncertainties (in this case, up to 10%) as well as to a small shift in their central values. As we demonstrate in the next section, the interplay between PDFs and EFT coefficients becomes much more marked in the case of the high-mass DY measurements that will become available at the HL-LHC.

Projections for the High-Luminosity LHC
The results presented in the previous section indicate that, given the available unfolded Drell-Yan measurements, the impact of a simultaneous determination of the PDFs together with the EFT parameters remains moderate. However, it is conceivable that this interplay between PDFs and BSM effects in the high-energy tails of Drell-Yan cross sections will become more significant once more data are accumulated. With this motivation, we revisit the analysis of Sect. 4 now accounting for the impact of projected High-Luminosity LHC pseudo-data generated for the present study. We demonstrate that in the scenario under consideration, in which no other data apart from the high-mass Drell-Yan constrain the large-x quark and antiquark distributions, a consistent joint determination of PDFs is crucial for EFT studies at the HL-LHC. We will also discuss how the inclusion of further Overview of the results obtained in this section concerning the EFT parametersŴ andŶ (in scenario I) and C Dµ 33 (in scenario II). We compare the 95% CL bounds derived in [32] with those obtained in this work from the high-mass DY cross section measurements (Table 4.1) and from the ATLAS Z search data (Table 4.2), in both cases displaying the results obtained with either the SM or the SMEFT PDFs. In the former case, we indicate the results that account for PDF uncertainties; these are included by construction for the SMEFT PDFs.
LHC data, which can constrain the large-x region without being affected by potential energy-growing new physics effects, can soften the interplay observed in this study and disentangle new-physics effects.

Generation of HL-LHC pseudo-data
Following the strategy adopted in [77] to estimate the ultimate PDF reach of the HL-LHC measurements (see also [144,145]), here we generate HL-LHC pseudo-data for NC and CC high-mass Drell-Yan cross sections at √ s = 14 TeV and for a total integrated luminosity of L = 6 ab −1 (from the combination of ATLAS and CMS, which provide L = 3 ab −1 each). For these projections, theoretical predictions are evaluated at NNLO in QCD including NLO EW corrections, as is explained in detail in Sect. 3.2. The PDF set used as an input to generate the theoretical prediction is the DIS+DY baseline that was presented in Sect. 3.3. For the generation of the NC pseudo-data, we adopt as reference the CMS measurement at 13 TeV [122] based on L = 2.8 fb −1 . The dilepton invariant mass distribution m is evaluated using the same selection and acceptance cuts of [122] but now with an extended binning in the m to account for the increase in luminosity. We assume equal cuts for electrons and muons and impose |η | ≤ 2.4, p lead T ≥ 20 GeV, and p sublead T ≥ 15 GeV for the two leading charged leptons of the event. In the case of the CC pseudo-data, the lack of unfolded measurements of the m T distribution at 13 TeV to be used as reference forces us to base our projections on the ATLAS search for W bosons in the dilepton channel [146]. As in the case of the NC projections, theory predictions for the m T distribution at highmass are generated using the same selection and acceptance cuts as in [146] but now using an extended coverage in m T . Both in the case of NC and CC Drell-Yan cross sections, we restrict ourselves to events with either m or m T greater than 500 GeV. Otherwise, the total experimental uncertainty would be limited by our modelling of the expected systematic errors and thus our projections could become unreliable. Furthermore, we require that the expected number of events per bin is bigger than 30 to ensure the applicability of Gaussian statistics. Taking into account these considerations, our choice of binning for the m (m T ) distribution at the HL-LHC is displayed in Fig. 5.1 (Fig. 5.2), with the highest energy bins reaching m 4 TeV (m T 3.5 TeV) for neutral-current (charged-current) scattering.
The percentage statistical and systematic uncertainties associated to the HL-LHC pseudo-data are displayed in the lower panels of Figs. 5.1 and 5.2 and have been estimated as follows. Let us denote by σ th i the theoretical prediction for the DY cross section, including all relevant selection cuts as well as the leptonic branching fractions. The expected number of events in this bin and the associated (relative) statistical uncertainty δ stat i are given by Note that this bin-by-bin relative statistical uncertainty is the same both at the level of number of events and at the level of fiducial cross sections. The HL-LHC systematic uncertainties are also estimated from the same reference measurements. If δ sys i,j denotes the j th relative systematic uncertainty associated to the i th bin of the reference measurement, and if this bin contains N th i events, then for our projections we assume that the same systematic error associated to a bin with a similar number of expected events will be given by f red,j δ sys i,j , where f red,j is the expected reduction in systematic errors foreseen at the HL-LHC. 2 This assumption is justified since most systematic errors improve with the sample size thanks to e.g. better calibration.
Adding in quadrature systematic uncertainties with the statistical error, the total relative uncertainty for the ith bin of our HL-LHC projections is where λ, r i are univariate Gaussian random numbers, δ exp tot,i is the total (relative) experimental uncertainty corresponding to this specific bin (excluding the luminosity and normalisation uncertainties), and δ exp L is the luminosity uncertainty, which is fully correlated amongst all the pseudo-data bins of the same experiment. We take this luminosity uncertainty to be δ exp L = 1.5% for both ATLAS and CMS, as done in Ref. [77].
Here we adopt the baseline SM PDF set described in Sect. 4, which is denoted as "DIS+DY", to evaluate the σ th i cross sections entering Eq. (5.3). We have verified that, both at the pre-and post-fit levels, the fit quality to the HL-LHC pseudo-data satisfies χ 2 /n bin 1 in the case of the SM PDFs as expected. Furthermore, we assume f red,j = 0.2 for all systematic sources, as done in the optimistic scenario of Ref. [77]. We note that more conservative values for the reduction of systematic errors, such as f red,j = 0.5, are not expected to qualitatively modify our results. The reason is that, as indicated by the bottom panels of Figs. 5.1 and 5.2, for the highest energy bins (which dominate the EFT sensitivity), specifically above m ≈ 1.7 TeV and m T ≈ 1.5 TeV, the measurement will be limited by statistical uncertainties.

Impact on PDF uncertainties
From Figs. 5.1 and 5.2, one can observe that the PDF uncertainties in the SM PDF baseline used to generate the pseudo-data are either comparable or larger than the corresponding projected experimental uncertainties at the HL-LHC. Specifically, for the highest m bin of the NC distribution the PDF errors are twice the experimental ones, while in the CC case the associated PDF errors become clearly larger than the experimental ones starting at m T 2 TeV. This comparison suggests that one should expect a significant uncertainty reduction once the HL-LHC pseudo-data is included in the PDF fit.
To validate this expectation, Fig. 5.3 displays the impact of the HL-LHC pseudodata on the quark-antiquark luminosity L qq as a function of the final state invariant mass m X at √ s = 14 TeV. We compare L qq for the SM PDF baseline fit (DIS+DY) with the same quantity from the corresponding fits including the HL-LHC pseudo-data, either only NC or also with CC cross sections. The right panel displays the associated relative PDF uncertainties. We find a significant reduction of the PDF uncertainties affecting the quarkantiquark luminosity (and hence the Drell-Yan cross sections) in the high mass (m X 1 TeV) region once the HL-LHC pseudo-data constraints are accounted for. For instance, at m X 2 TeV, PDF uncertainties on L qq decrease from 5% in the baseline down to 2.5% ( 1.5%) once the NC (NC+CC) HL-LHC pseudo-data is included in the fit. The effect of the inclusion of HL-LHC projections becomes more dramatic as m X increases.
On the other hand, other partonic luminosities such as the quark-quark and gluon-gluon ones are essentially unaffected by the HL-LHC constraints. In terms of fit quality, the only noticeable effect is a mild improvement in the χ 2 of the high-mass DY datasets listed in Table 3.2.

PDF and EFT interplay at the HL-LHC
The finding that the projected HL-LHC pseudo-data has a significant impact on the quarkantiquark PDF luminosity, summarised in Fig. 5.3, suggests that the interplay between PDFs and EFT effects in the high-energy DY tails should become enhanced as compared to the results reported in the previous section. With this motivation, we first of all repeat the joint determination of PDFs and theŴ ,Ŷ coefficients from EFT scenario I presented in Sect. 4.1 now accounting for the constraints of the HL-LHC pseudo-data. An important difference in this case is that the inclusion of CC data lifts the flat direction in the (Ŵ ,Ŷ ) plane, making a full two-dimensional fit possible. Secondly, the availability of the HL-LHC pseudo-data allows us to assess the interplay between the PDFs and the EFT coefficient C Dµ 33 from benchmark scenario II, whose analysis in Sect. 4.2 was restricted to fixed SM PDFs.
Scenario I. For the simultaneous determination of PDFs and theŴ ,Ŷ coefficients accounting for the constraints provided by the HL-LHC pseudo-data, we use 35 sampling values of (Ŵ i ,Ŷ i ), 25 of which are equally spaced in eitherŴ ∈ (−1.6, 1.6) × 10 −5 or Y ∈ (−8, +8) × 10 −5 (hence in steps of ∆Ŵ = 0.8 × 10 −6 and ∆Ŷ = 4 × 10 −6 respectively), and then 10 additional points along the diagonals. In order to assess the robustness of the results, we added 12 more sampling values, 8 further away from the origin and 4 more along theŴ = 0 andŶ = 0 axes, and verified that the confidence level countours are stable upon their addition.
We find that the constraints on the (Ŵ ,Ŷ ) parameters are completely dominated by the HL-LHC projections and that current data exhibit a much smaller pull, consistent with the findings of previous studies [32,37]. Also, the χ 2 eftp contour is more stable and requires less replicas if only the HL-LHC projections are included in the computation of the χ 2 . The corresponding marginalised bounds onŴ andŶ are reported in Table 5.1 using the same format as in Table 4.1.
From Table 5.1, one can observe how including high-mass data at the LHC both in a fit of PDFs and in a fit of SMEFT coefficients and neglecting the interplay between them could result in a significant underestimate of the uncertainties associated to the EFT parameters. Indeed, the marginalised 95% CL bound on theŴ (Ŷ ) parameter becomes looser once SMEFT PDFs are consistently used, with a broadening, defined in Eq. (4.4), of 500% (110%), even once PDF uncertainties are fully accounted for. This effect would have been even more marked if PDF uncertainties had not been accounted for in EFT fits based on SM PDFs, where the same broadening factors would be 940% and 190% respectively.
A further important question is whether the bounds obtained with SM PDFs appearing on the left column of   high-mass Drell-Yan sets (neither the HL-LHC projections nor the current datasets listed in Table. 3.2) and compare the bounds obtained using this set of PDFs to those obtained consistently using SMEFT PDFs. We observe that, once this set of conservative PDF is used as an input PDF set and the PDF uncertainty is included in the computation of the bounds, the latter increases as compared to the bounds in Table 5.1. As a result, the size of the bounds obtained by keeping fixed SM PDFs is closer to the size obtained from the simultaneous fits, although still slightly underestimated. At the same time, the shift in the best-fit becomes more marked.
Results are graphically displayed in Fig. 5.4, where the 95% confidence level contours in the (Ŵ ,Ŷ ) plane obtained from the DIS+DY fits that include the high-mass Drell-Yan HL-LHC pseudo-data when using either SM PDFs, SM conservative PDFs or SMEFT PDFs are compared. All solid countours include PDF uncertainties, while the dashed contours that do not include PDF uncertainties are also indicated to visualise the impact of the inclusion of the PDF uncertainties.
To conclude, we should also emphasise that, while in this work we use pseudo-data and hence the best-fit values are by construction unchanged, this would not necessarily be the case in the analysis of real data, where improper treatment of PDFs could result in a spurious EFT 'signal', or even missing a signal which is indeed present in the data. A detailed study aimed at a precise definition of 'conservative' PDFs in a more general scenario is beyond the scope of this paper and will be the topic of future work; a thorough comparison of the consistent simultaneous approach, versus the use of conservative PDF sets, will be of particular interest in cases of EFT manifestations of new physics.
The increased role that the interplay between PDFs and EFT coefficients will play at the HL-LHC can also be illustrated by comparing the expected behaviour of the quarkantiquark luminosity, displayed in Fig. 5.5, for the SMEFT PDFs corresponding to representative values of theŴ andŶ parameters of benchmark scenario I as compared to the SM PDFs. Note that the corresponding comparison for L qq in the fits to available Drell-Yan data was displayed in Fig. 4.3. Indeed, the central value of the quark-antiquark luminosity for SMEFT PDFs corresponding to values of (Ŵ i ,Ŷ i ) selected along the grid used to derive Fig. 5.5 changes greatly, well outside the one-sigma error band of the SM PDFs, while the PDF uncertainties themselves are unchanged. This change in central value of the large-x PDFs partially reabsorbs the effects in the partonic cross section induced by the SMEFT operators and leads to better χ 2 values as compared to those obtained with the SM PDFs.
Even neglecting SMEFT PDF effects, we note that our marginalised bounds on theŴ andŶ coefficients from HL-LHC pseudo-data using SM PDFs turn out to be more stringent than those reported in [37] by around a factor of 4 forŴ and a factor 2 for (Ŷ ). This is due to a combination of factors. First of all we use the 13 TeV measurements as reference to produce the HL projections. Furthermore we assume a total integrated luminosity of L = 6 fb −1 (from the combination of ATLAS and CMS) rather than 3 fb −1 as well as a more optimistic scenario concerning the reduction of the experimental systematic uncertainties. Fig. 5.6 then displays the R χ 2 estimator, defined in Eq. (4.5) and shown in Fig. 4.4 for the case of available LHC data, now evaluated from the fits including the HL-LHC pseudodata. In this case, the m (max) cut applies to m for the neutral-current distributions and to the transverse mass m T for the charged-current ones. As in the case of Fig. 4.4, we observe an approximately monotonic growth of R χ 2 for the SM PDFs arising from the energy-growing EFT corrections that dominate the high-energy DY tails, an effect which is now rather larger thanks to the presence of the HL-LHC pseudo-data. The most striking difference as compared to Fig. 4.4 is that now the SMEFT PDF curve is much flatter, indicating that EFT effects are being almost totally reabsorbed into the PDFs. The findings summarised by Fig. 5.6 demonstrate that, at the HL-LHC, EFT-induced deviations could be indeed inadvertently "fitted away" into a PDF redefinition, explaining the large broadenings reported in Fig. 5.5, and highlight the need to devise novel strategies to disentangle the effects of PDFs and EFT contributions from the high-energy tails of LHC cross-sections. Such strategies could exploit, for instance, the availability of measurements sensitive to large-x PDFs but not to high scales, such as forward electroweak gauge boson production by LHCb [77].
Scenario II. We now turn to present the corresponding results of the simultaneous fits of the PDFs and the EFT coefficients including the HL-LHC pseudo-data for the case of benchmark scenario II. As motivated in Sect. 2.2, a non-zero value of the C Dµ 33 coefficient affects only the NC and CC muon final states, while the electron ones remain described by the SM calculations. This property implies that, in fits presented below, the EFT corrections modify only the shapes of the HL-LHC distributions in the muon channel, the right panels in Figs. 5.1 and 5.2, but not those of the electron pseudo-data.  sampling is constituted by 21 points uniformly distributed in C Dµ 33 ∈ [−0.02, 0.02]. As in Fig. 4.1, the error bars indicate the uncertainties associated to the finite number of Monte Carlo replicas used for each value of C Dµ 33 . The profile in ∆χ 2 exhibits a double minimum structure (bimodal distribution), explained by the fact that in this scenario it is the quadratic rather than the linear terms in the EFT expansion that dominate. The corresponding quartic polynomial fit using Eq. (4.2) can be seen to successfully reproduce the ∆χ 2 values obtained in this joint analysis. The right panel of Fig. 5.7 then compares the polynomial fit obtained with the SMEFT PDFs with the corresponding one when using instead fixed SM PDFs to determine the ∆χ 2 values, with the inset focusing on the region close to ∆χ 2 0. The associated 68% and 95% CL bounds are then reported in Table 5.3, where we note that since the 68% CL interval is disjoint we evaluate the shift and broadening only for the 95% CL bounds.
Inspection of Fig. 5.7 and Table 5.3 indicates that, even at the HL-LHC, the interplay between PDFs and EFT coefficients remains moderate in this particular scenario. Indeed, in contrast with the marked effects in scenario I (Fig. 5.4), where the bounds on theŴ andŶ worsened by up to an order of magnitude when the SMEFT PDFs were consistently used, in scenario II the obtained bounds on C Dµ 33 would only loosen by around 30%. The origin of this rather different behaviour can be traced back to the fact that in scenario II the electron channel data do not receive EFT corrections, and hence all the information EFT parameter taking into account HL-LHC pseudo-data. that they provide makes it possible to exclusively constrain the PDFs. The muon channel distributions then determine the allowed range for C Dµ 33 , restricted by the well-constrained large-x quarks and antiquark PDFs from the electron data. This finding demonstrates how the availability of measurements in separate leptonic final states is of utmost importance to test BSM scenarios that account for violations of Lepton Flavour Universality.
In the same manner as in Fig Table 5.3. The result that the two values lead to the same effect on L qq follows from the dominance of the quadratic EFT terms in this scenario. One finds that the shift in the central values of the quark-antiquark luminosity induced by a non-zero value of C Dµ 33 is well within PDF uncertainties. This is consistent with the result of Fig. 5.7 indicating that bounds on C Dµ 33 obtained with SM and with SMEFT PDFs are relatively similar in this scenario even after accounting for the HL-LHC constraints.

Conclusions and outlook
Indirect searches for new physics beyond the SM, such as those carried out in the SMEFT framework, often aim at pinning down subtle distortions with respect to the SM predictions, such as a few-percent deviation in the value of a given production cross section or decay rate. Exploiting the full potential of current and future precision measurements at the LHC for these indirect BSM searches requires the development of novel data interpretation frameworks that are able to account for hitherto ignored effects that can no longer be neglected. A pressing example of this is the interplay between PDF and EFT effects in the high-energy tails of LHC distributions. Indeed, the very same datasets are being used both to determine the parton distributions (assuming SM cross sections) and, independently, to constrain EFT coefficients (assuming SM PDFs). Given that these LHC processes provide significant information for both PDF and EFT fits, it is of paramount importance to ascertain the extent for which eventual EFT signals can be reabsorbed into the PDFs, as well as how current bounds on the EFT coefficients are modified within a consistent simultaneous determination together with the PDFs.
In this work, building upon our previous DIS-only study [72], we have presented a first simultaneous determination of PDFs and EFT coefficients from high-energy LHC data, specifically from high-mass Drell-Yan cross sections. Our analysis has considered available unfolded measurements, detector-level searches based on the full Run II luminosity, and tailored HL-LHC projections. The EFT interpretation of the Drell-Yan data is formulated in terms of two benchmark scenarios, first a flavour universal one leading to modifications of theŴ andŶ electroweak parameters [32], and second a flavour-specific scenario motivated by the recent evidence for lepton flavour universality violation in B-meson decays [27].
The main findings of this work are summarised in Fig. 6.1. We demonstrate how, for the analysis of all available unfolded Drell-Yan data, the consistent simultaneous extraction of the PDFs together with the EFT parameters leads to a modest increase in the uncertainties of the latter (up to 15%, in the case of theŴ andŶ parameters), as well as to a shift in their central values by up to a third of a sigma. Furthermore, while our results indicate that for current data the interplay between PDF and EFT effects remains moderate, the impact of their cross-talk will become much larger at the HL-LHC: using SM rather than SMEFT PDFs would lead to artificially precise bounds, even mimicking new physics effects. This result indicates that including high-energy data in PDF fits should be done with care, as PDFs can actually absorbe the effects of new physics. On the other hand, we have seen that in this simple case, using a conservative set of PDFs that does not include any of the high-mass Drell-Yan data and accounting for the large contribution of the PDF uncertainty on the bounds inflates them and makes them of the same order of magnitude as those obtained in a simultaneous fit of PDFs and SMEFT coefficients.
At the same time, once real data at HL-LHC are considered, neglecting the PDF interplay and simply using conservative sets of PDFs might miss EFT manifestations of new physics or misinterpret them. One should also emphasise that estimators such as those shown in Fig. 5.6 only become available in the joint PDF+EFT fit, and cannot be defined in the "conservative PDF" approach. In particular, they provide information on the kinematic dependence of any possible deviation between the data and the SM predictions, and the extent to which this can be reabsorbed into the PDFs. Hence, they represent a powerful diagnosic tool to separate QCD effects from genuine BSM deviations. A complementary strategy to disentangle QCD effects from BSM effects would be to account for the constraints on the large-x PDFs arising from other processes for which EFT corrections can be neglected, such as forward W, Z production at LHCb. This way the uncertainties associated to the PDFs at large-x would be reduced and the indirect signal for new physics could be more easily disentangled. A detailed study aimed at a definition of conservative PDFs in a more general scenario is beyond the scope of this paper, and will be the topic of future work.
Furthermore, concerning the next steps in this program, it would be interesting to consider other more general EFT benchmark scenarios, as well as accounting for new 13 TeV measurements based on increased luminosity. One could also envisage complementing the EFT analysis of inclusive DY cross sections with related processes, such as dilepton production in association with extra jets, which provides sensitivity to different combinations of dimension-six operators.
In addition to increasing the dataset, one would need to address a major challenge of the current fitting methodology for the joint determination of PDFs and EFT coefficients, namely that for each point in the EFT parameter space one needs to carry out a fullfledged NNPDF fit, which requires intensive computing resources. To tame the instability of the χ 2 , once all datasets are kept into account, one needs to run a very large number of replicas. Furthermore, in the simultaneous fit of EFT coefficients and PDFs, we currently ignore the correlation between the two sets of parameters. The kind of analysis presented in this work could then be streamlined and made more efficient by adapting the fitting methodology to exploit the relatively simple dependence of hadronic cross sections on the EFT coefficients, which is either linear at O(Λ −2 ) or quadratic at O(Λ −4 ). Such methodological developments would make it possible to simultaneously fit the PDFs with a large number of EFT coefficients, something that is currently infeasible but that is being developed.
Finally, beyond single gauge boson production, it would also be important to ascertain the interplay that arises between PDFs and EFT effects for the interpretation of gluondominated LHC processes, such as top-quark pair production and inclusive jet and dijet (or even multijet) production. The reason is that these two groups of processes have been shown to provide crucial information for, on the one hand, pinning down the gluon PDF over a broad range of x values [147,148], and on the other hand, constraining a large number of EFT dimension-six operators [50,55,56,149] which cannot be accessed by other probes. Given that in modern global PDF analyses the gluon for x 10 −2 is almost entirely determined by these and related high-energy LHC processes, it is conceivable that the PDF and EFT interplay there could be more significant than for the inclusive DY processes studied in this work, even just accounting for the already available measurements.

Acknowledgments
We thank Emanuele Mereghetti, Tevong You and Celine Degrande for insightful discussions about the project. We thank Claude Duhr and Bernhard Mistlberger for kindly sending us the NNLO and N3LO QCD corrections for the Drell-Yan invariant and transverse mass distributions. We thank Andrea Wulzer and Lorenzo Ricci for benchmarking the charged current K-factors and for suggesting to add the results obtained by using conservative PDFs. M. U. and Z.

A Detailed SM PDF comparisons
In this appendix we present detailed comparisons between different sets of SM PDFs to complement the discussions in Sect. 3. To begin with, we compare the baseline SM PDF of this work, based on DIS+DY data, with the recent global NNPDF3.1 str fit obtained in the context of the proton strangeness study of [95]. Fig. A.1 is the counterpart of Fig. 3.3, now displaying the gluon, singlet, up, anti-up, down, and anti-down quark PDFs at Q = 100 GeV both for the baseline SM PDF (labelled "DIS+DY") and for the global NNPDF3.1 str determination.
We observe an overall good compatibility between our DIS+DY baseline and the NNPDF3.1 str global fit, with PDFs in agreement at the one-sigma level in all cases except for the quark singlet Σ in the region 0.01 x 0.1. As discussed in Sect. 3.3, the new high-mass DY data included in this analysis as compared to [95] are responsible for the bulk of the differences observed in Fig. A.1, both in terms of central values and uncertainties, for the quark and anti-quark PDFs. Specifically, the upwards shift in the central values of the quark and anti-quark PDFs in this x-region for the DIS+DY baseline as compared to the NNPDF3.1 str determination is consistent with the comparisons in Fig. 3.3 illustrating the impact of the high-mass Drell-Yan data in the fit, and the same applies for the associated reduction of the quark and anti-quark PDF uncertainties.
As is well known, the PDF uncertainties on the gluon become rather enlarged in the DIS+DY baseline due to the lack of information from the top and jet cross sections. However, this does not impact the results of the present joint PDF and EFT interpretation,  given that gluon-induced contributions to inclusive Drell-Yan processes enter only starting at NLO. Furthermore, we also find somewhat larger uncertainties in the strangeness of the DIS+DY baseline as compared to NNPDF3.1 str due to the missing constraints from the NOMAD neutrino dimuon cross sections. All in all, with the exception of gluon-initiated processes, we can conclude that the DIS+DY baseline to be used in this work is competitive with a full-fledged global PDF determination.
Next, we display in Fig. A.2 the corresponding comparison between the baseline SM PDF set based on DIS and DY data (dubbed "DIS+DY") with the same fit but only including DIS structure functions. Note that the comparison between the PDF uncertainties in these two fits was already displayed in the lower panels of Fig. 3.3. One can observe how in general there is excellent consistency between the two fits. Indeed, PDFs are in agreement at the one-sigma level except for very specific cases, such as the up quark PDF at x 0.05, but even there the differences are at most at the 1.5σ level. The very marked reduction of PDF errors is also appreciable in the DIS+DY fit as compared to the DIS-only fit, highlighting the importance of the DY data in the global PDF fit to constrain the light quark and antiquark PDFs in a broad range of x.
Finally, in Fig. A.3 we compare the PDF luminosities in the DIS+DY baseline with those from the same fit excluding all the data of the high-mass DY datasets listed in Table 3.2. The corresponding comparisons at the PDF level was shown in Fig. 3. 3 We focus on the gluon-gluon, quark-antiquark, and quark-quark luminosities at √ s = 14 TeV as a function of the invariant mass m X of the produced final state, and display both the luminosity ratio to the reference as well as the relative PDF uncertainties in each case. Again, one finds that the high-mass DY measurements constrain the luminosities in the range 100 GeV m X 2 TeV, consistent with the kinematic coverage in m of the data used in the fit. Their main effects are a reduction of the qq uncertainty for m X between 500 GeV and 2 TeV and an upwards (downwards) shift in the central values of the qq (gg) luminosities within this m X region. Th uncertainty of L qq is barely changed and its central value i shifted within 1σ PDF uncertainties once the high-mass Drell-Yan datasets of Table. 3.2 are included in the fit. This comparison further highlights how the high-mass DY data provide useful information for constraining the PDF luminosities and in turn the high-p T processes relevant for both direct and indirect BSM searches at the LHC.

B Fit quality for SM and SMEFT PDFs
In this appendix, we provide detailed information about the PDF fit quality as quantified by the figure of merit used in the fits: the χ 2 per data point, defined in Eq. (3.5) and   evaluated with the t 0 prescription described in Ref. [134]. We will do this both for the SM PDFs based on different datasets and for the SMEFT PDFs from the fits with the baseline dataset and for different values of the EFT parametersŴ andŶ . For completeness, we also provide here the χ 2 values obtained when using NNPDF3.1 str as the input PDF set, with all other settings such as the partonic matrix elements unchanged. Note that here the kinematical cuts are slightly different as compared to [95], the differences being summarised in Table B.1. The rationale behind having different cuts is that in this work we include electroweak corrections to the high-mass DY cross sections, thus the m ≤ 200 GeV restriction applied in NNPDF3.1 is not necessary anymore. With the current cuts, essentially all high-mass DY data points can be included in the fits. The exception is a subset of points from the CMS 7 TeV and ATLAS 8 TeV datasets, where we restrict ourselves to the tree-level kinematic condition |y | ≤ ln( √ s/m ). The reason is that our calculation of the EFT corrections is based on tree-level SM cross sections which must satisfy this requirement. Furthermore, for the CMS 7 TeV dataset the last rapidity bin is excluded for all m µµ bins, since it is found to deviate from the SM predictions by a large amount suggesting the need to account for threshold resummation effects [150]. Henceforth, here we evaluate the predictions based on NNPDF3.1 str for the same set of kinematical cuts as in this work. Table B.2 summarises the values of the χ 2 for the baseline SM PDF fit labelled "DIS+DY" compared to the most recent NNPDF global fit NNPDF3.1 str, as well as for the corresponding fits based on reduced datasets, namely the DIS-only fit and the fit excluding the high-mass DY data. The entries in italic indicate the datasets that do not enter the corresponding fit. We observe that the quality of the description of the DIS data is similar across all fits considered. As far as hadronic data are concerned, we observe that the fit quality of the LHCb data slightly deteriorates when the high-mass Drell-Yan data are included. Also, the description of the CMS 13 TeV invariant mass distribution in the combined electron and muon channels is not optimal, even after including the data in the fit. However, the overall χ 2 is statistically equivalent to the most recent NNPDF3.1 set.
Then in Table B.3 we list again the χ 2 values in the SM PDF fit (same as the "DIS+DY" column of Table B First of all, we observe that as expected the addition of the EFT corrections does not affect the description of the DIS structure functions. Differences are also small for the lowmass and on-shell DY data, and slightly larger for the HM measurements. For instance, the χ 2 to the high-mass Drell-Yan datasets is 1.471 forŴ = 0.0006 to be compared with 1.478 for the SM PDFs. In any case, the differences at the level of χ 2 between the SM and SMEFT PDFs are reasonably small, consistent with the finding that the best-fit values of theŴ andŶ parameters are close to the SM expectation.

C Validation of the SMEFT K-factors
As described in Sect. 3.2, in this work the effect of the dimension-six SMEFT operators considered in the two benchmark scenarios is accounted for at the level of cross sections via the K-factor approach, Eq. (3.10). In this appendix, we provide further details about the calculation and validation of these EFT K-factors. Specifically, we compare the numerical values for these K-factors, which have been obtained using SMEFTsim [85,151] interfaced with MadGraph5 aMC@NLO, with the analytic calculation presented in [27]. DIS structure functions. SMEFT corrections to the neutral-current deep-inelastic structure functions F 2 , F 3 in the benchmark scenario I of Sect. 2 are obtained by means of a direct calculation in perturbation theory 3 . In order to determine these corrections, we rewrite Eq. (2.5) as the linear combination of four-fermion operators of the form q λ γ µ q λ¯ λ γ µ λ , where q λ is a quark field of helicity λ (with λ = +1 for a right-handed field and λ = −1 for a left-handed field) and λ is a lepton field of helicity λ . The relevant operators for theŶ parameter are already of this form in Eq. (2.5). For theŴ parameter, the associated operators can be expanded explicitly as: where the index i runs over generations, and the flavour-changing contributions have been dropped since they only contribute to low energy CC structure functions. The EFT corrections to the DIS structure functions induced by a specific four-fermion operator of the form can be shown to be given by where e is the positron charge, e q is the charge on the quark q in units of the positron charge, θ W is the Weinberg angle, and K Z = Q 2 / sin 2 (2θ W )(Q 2 + m 2 Z ). The vector and axial couplings are given by V e = − 1 2 + 2 sin 2 (θ W ), A e = − 1 2 , V q = I q 3 − 2 sin 2 (θ W )e q and A q = I q 3 , where I q 3 is the third component of the quarks' weak isospin. These formulae are the natural generalisations of those derived in [72], where only right-handed four-fermion operators were considered. Taking combinations of these DIS structure-function corrections according to Eq. (2.5) for theŶ parameter and to Eq. (C.1) for theŴ parameter yields the sought-for EFT corrections for DIS observables.
This calculation has been implemented in APFEL [124] following the strategy presented in [72]. FK tables are produced from APFEL [125] and then used to evaluate the DIS Kfactors defined in Eq. (3.10). Furthermore, we have used APFEL to include the higher-order QCD corrections in the SMEFT sector, so that in fact Eq. (3.10) holds exactly for the DIS K-factors in our study. Y = −10 −4 . These maps should be compared with Fig. 1 of [72], which considered different EFT scenarios. We find that the overall effect of non-zeroŴ andŶ parameters is rather small, well below the percent level even for the highest bins in Q 2 covered by the HERA data. This comparison highlights how, in this benchmark EFT scenario, the constraints on theŴ andŶ parameters will be completely dominated by the high-mass Drell-Yan cross sections.
Drell-Yan distributions. As demonstrated in Ref. [27], in the case of the dilepton m distribution in neutral-current Drell-Yan, one can derive the following analytic expression where F ql (m ll , q ) represents a form factor that depends on the values of the SMEFT coefficients One can then match the Warsaw-basis parametrisation of Eq. (2.4) to the contact terms q ijkl ≡ q δ ij δ kl in the above equation, where i, j (k, l) are the quark (lepton) flavour indices. There are four combinations for quarks q = u L , u R , d L , d R and two combinations for charged leptons = e L , e R relevant for the description of the neutral-current Drell-Yan process. Specifically, the matching is SMEFTsim k-factor with cuts SMEFTsim k-factor Analytic k-factor An analogous expression involving the transverse mass m T can be derived in the CC case. These K-factors can be matched to the experimental data kinematics by integrating over the suitable ranges in m (m T ) and rapidity y (y ).
1D distributions. For the benchmarking between the analytical and numerical calculations, we use the relation between theŴ coefficient and the c lq = 1.65 × 10 −2 ) and the kinematics of the ATLAS 7 TeV DY data. The label "cuts" indicates that we impose acceptance requirements of p T ≥ 25 GeV and |η | ≤ 2.5 in the numerical (SMEFTsim) calculation on the final-state leptons; these cuts cannot be applied in the analytical calculation. The right panel shows the relative difference in these K-factors, with the analytical calculation as a reference.
From this comparison we observe, first of all, the perfect agreement between the analytical and numerical K-factors in the case of no acceptance cuts, and second, that the acceptance cuts on the leptonic variances leave the K-factor value essentially unchanged. Furthermore, we have verified that the same level of agreement in the calculation of the EFT K-factors between the numerical and analytical approaches is obtained in the case SMEFTsim k-factor with cuts SMEFTsim k-factor Analytic k-factor of theŶ parameter, as well as once the quadratic EFT corrections are accounted for. In addition, the calculation of the EFT K-factors for the neutral current DY pseudo-data used in the HL-LHC projections of Sect. 5 has been validated by ensuring that the analytical and SMEFTsim calculations are in perfect agreement. As discussed in [27], far above the Z peak it is sufficient to include higher-order corrections to the SM cross section in Eq. (C.3) to achieve good theoretical accuracy in the SMEFT, due to the (approximate) factorisation of higher-order QCD corrections in this region. We also note that renormalisation group evolution effects [6,7] are not required in this calculation, since for the operators considered in our benchmark scenarios the corresponding anomalous dimensions are either Yukawa-suppressed or suppressed by NLO electroweak contributions.
2D distributions. In Drell-Yan datasets such as the CMS 7 TeV data of [121], the measurement is presented as a double-differential distribution in the dilepton invariant mass m (or equivalently τ = m 2 /s) and rapidity |y |. Also in this case, bin-by-bin EFT K-factors can be computed using the prescription of Eq. (3.10). also forŴ = −10 −3 ) now for the EFT K-factors for the double-differential Drell-Yan cross sections in m and |y |, for the highest m bin of the ATLAS 7 TeV DY measurement. Note that we only compute these K-factors for the |y | that satisfy the LO kinematics. As in the case of the 1D distributions, we find good agreement between the analytical and numerical calculations, with differences well below the absolute magnitude of the K-factors, and also that the impact of the acceptance cuts in the leptonic variables is negligible.
EFT K-factors for CC Drell-Yan at the HL-LHC. The left panel of Fig. C.4 displays the EFT K-factor for high-mass charged-current DY production at the HL-LHC as a function of m T , the transverse mass of the neutrino-lepton pair, for a parameter value of W = −10 −3 . We find rather large EFT corrections, with K-factors as large as K EFT 5 for the highest m T bin. To validate this SMEFTsim-based calculation, we show a compari- son between k EF T and the linear k-factor k Ricci et. al provided by the authors of [37]. The quadratic k-factors are also shown in the left panel. In the right panel we plot the relative k-factor k EF T k Ricci et.et − 1, finding good agreement between the two calculations. In addition, one can observe that the quadratic EFT corrections become larger than the linear ones for m T 3 TeV. Nevertheless, we point out that in our projections for the HL-LHC data from Sect. 5 we restrict ourselves to the linear approximation in benchmark scenario I.
Effect of varying PDFs in the computation of the SMEFT K-factors. The SMEFT K-factors in Eq. (3.9) are precomputed before the fit using a reference SM PDF set and then kept fixed. Here, we quantitatively assess the effect of varying the input NNLO PDF in Eqns. (3.7) and (3.8).
Note that the first term of Eqs. (C.6) and (C.7) are equal, thus we can express the approximation asσŴ ,ij ⊗ L SMEFT,iĵ σ SM,ij ⊗ L SMEFT,ij ≈σŴ ,ij ⊗ L SM,iĵ σ SM,ij ⊗ L SM,ij , (C. 8) or, equivalently, as RŴ (SMEFT) ≈ RŴ (SM) (C.9) where R is defined in Eqns. (3.7) and (3.8) and is computed either with SMEFT PDFs or fixed SM PDFs. In what follows we test whether R(SM) is a good approximation for R(SMEFT), taking each of the coefficients of Scenario I and II at a time.
We will first consider Scenario I and theŴ parameter. We use the PDFs including HL-LHC pseudodata and calculate RŴ for the HL-LHC NC Drell-Yan bins outlined in Section 5. In Fig. C.5 we observe that, using fixed SM PDFs in the computation of RŴ yields a 2% deviation in the highest invariant mass bin. In the same figure we assess the impact of these differences on the SMEFT K-factors themselves, KŴ = 1 +Ŵ RŴ , calculated at each of the benchmark pointsŴ = ±4 · 10 −5 and we observe that the difference at the level of the observable is completely negligible, at the permil level. In the case of theŶ parameter we observe a rather larger deviation in the highest invariant mass bin, which reaches to 4%, as shown in Fig C.6. However, as in the case of W , the impact of these discrepancies on the SMEFT K-factors, calculated at each of the benchmark pointsŶ = ±1.2 · 10 −4 , thus on the observable is below the percent level, which is still negligible compared to the experimental and theoretical uncertainties associated to the last bin if the invariant mass distribution. Finally, we turn to Scenario II, in which we expect to observe the largest deviation between our approximation, based on using fixed SM PDFs in the computation of the SMEFT K-factors, and the full calculation. This is expected because we include both the linear and quadratic terms in the EFT expansion, and because of the flavour nonuniversal structure of Scenario II. Here we will denote the K-factor by K = 1 + C Dµ 33 R lin + (C Dµ 33 ) 2 R quad and calculate the dependence of each R lin , R quad on the SMEFT PDFs. We observe a 10% deviation in the computation of R in the highest invariant mass bin, as shown in Fig C.7. However, the impact of such differences on the actual SMEFT K-factors, thus on the observables, calculated at each of the benchmark points C Dµ 33 = ±0.014, is still at the percent level, which is acceptable compared to the experimental and theoretical uncertainties.

D Flavour dependence of the SMEFT PDFs
In Sect. 4, when discussing the results of the joint PDF and EFT fits to available Drell-Yan cross-section data, we presented the comparison between the SM and SMEFT PDFs in benchmark scenario I for different values of theŴ andŶ parameters in terms of the partonic luminosities, Fig. 4.3. In this appendix we present the corresponding comparisons between SM and SMEFT PDFs at the level of individual PDF flavours. The impact of the SMEFT PDFs on R lin , R quad (above) and the SMEFT Kfactor K = 1 + C Dµ 33 R lin + (C Dµ 33 ) 2 R quad (below) calculated at the Scenario II benchmark points C Dµ 33 = ±0.014.
are chosen to be close to the upper and lower limits of the 68% CL intervals reported in Table 4.1. The error band in the SM PDFs corresponds to the 68% CL PDF uncertainty, while for the SMEFT PDFs only the central values are shown. In all cases, one finds that the EFT-induced shifts on the PDFs are smaller than their uncertainties, though in some cases these shifts can represent up to one-third of a standard deviation. In particular, the up and down antiquarks in the region x 10 −2 are the PDF flavours most affected by the EFT effects. This finding can be understood from the fact that the NC Drell-Yan cross section is proportional to the uū and dd combinations at leading order, but the up and down quark PDF are already well constrained by lower-energy DIS measurements. Furthermore, we have verified that the PDF uncertainties themselves are unchanged in the SMEFT fits. The results of Fig. D.1 are consistent with those of Table 4.1 and demonstrate that, with current data, the interplay between EFT effects and PDFs in the high-mass Drell-Yan tails is appreciable but remains subdominant as compared to other sources of uncertainty.  Table 4.1.