A POWHEG generator for deep inelastic scattering

We present a new event generator for the simulation of both neutral- and charged-current deep inelastic scattering (DIS) at next-to-leading order in QCD matched to parton showers using the POWHEG method. Our implementation builds on the existing POWHEG BOX framework originally designed for hadron-hadron collisions, supplemented by considerable extensions to account for the genuinely different kinematics inherent to lepton-hadron collisions. In particular, we present new momentum mappings that conserve the special kinematics found in DIS, which we use to modify the POWHEG BOX implementation of the Frixione-Kunszt-Signer subtraction mechanism. We compare our predictions to fixed-order and resummed predictions, as well as to data from the HERA ep collider. Finally we study a few representative distributions for the upcoming Electron Ion Collider.

Electron-proton (ep) colliders are powerful tools to perform high-precision studies of quantum chromodynamics (QCD) and act as microscopes to probe the internal structure of the proton. Particularly well suited to that end, are deep inelastic scattering (DIS) processes where a photon or massive vector boson of high virtuality is exchanged between the lepton and the partonic constituents of the proton. In fact, in such a reaction, the space-like vector boson exchanged in the t-channel probes the charged constituents of the protons through the electromagnetic and weak interaction in the cleanest possible environment. From the external momenta of the incoming and outgoing leptons (p l and p ′ l ) one can determine the internal hard space-like momentum q which probes the proton structure, The Hadron Electron Ring Accelerator (HERA) at the Deutsches Elektronen Synchrotron (DESY) was the first dedicated high centre-of-mass energy ep collider. HERA operated in two phases -HERA I, from 1991 to 2000, and HERA II from 2002 to 2007, colliding protons up to energies of 920 GeV and electrons (or positrons) at 27.5 GeV, spanning several orders of magnitude in Q 2 , thereby probing the proton structure at the attometer level. Besides measurements of exclusive reactions and diffraction, the main legacy results from HERA collisions as measured by the H1 and ZEUS collaborations include precise determinations of parton distribution functions (PDFs) resulting in the HERAPDF family [1][2][3][4], a range of precision QCD studies [5][6][7][8][9][10][11][12][13][14] and constraints on physics beyond the Standard Model [15][16][17][18]. Proton PDFs from HERA played a crucial role for physics studies at the Tevatron and at the Large Hadron Collider (LHC). In particular, the fast discovery of the top quark at the Tevatron would not have been possible without the knowledge of proton distribution functions determined using data collected by the H1 and ZEUS collaborations. HERA data are still included in global fits of PDFs, though more recent PDF determinations rely more and more on LHC data (see e.g. ref. [19] and references therein). This is particularly the case for the gluon distribution function which is mostly probed indirectly at HERA, through the precise measurement of the evolution of the quark distribution functions via the DGLAP equations [20][21][22].
In June 2021, the U.S. Department of Energy has authorised the start of the project execution phase of a new electron-ion collider (EIC), with construction planned to start in 2024 at Brookhaven National Laboratory (BNL). 1 Other possible lepton-hadron colliders included in the European Strategy for Particle Physics [23] are a Large Hadron electron Collider (LHeC) at CERN and a Future Circular electron-hadron Collider (FCC-eh). These new-generation lepton-hadron colliders will enable experimentalists to collect much higher luminosity compared to HERA, and they will open up the possibility to explore an even larger range in energy scales.
The EIC will collide 5 to 18 GeV electron beams with proton beams spanning the energies from 41 to 275 GeV, with the possibility to have both the electron and the proton beams polarised. An electron-proton peak luminosity of 10 34 cm −2 s −1 at 105 GeV centreof-mass energy is foreseen. Furthermore, a rich heavy ion program is planned, including nomenological results at HERA (Sec. 4.1) and at the EIC (Sec. 4.2). We present our summary and outlook in Sec. 5. Technical details regarding the phase-space parametrisation are provided in App. A, the generation of final-and initial-state radiation in App. B.1 and B.2, respectively, and the matching to the Pythia parton shower is described in detail in App. C.

Details of the implementation
In this section, we provide a detailed description of the process considered in this work and elaborate on the extensions made to the POWHEG BOX RES framework for its implementation. Specifically, we present comprehensive details regarding three key aspects: the phase-space generation, the generation of radiation and the treatment of real-radiation damping.

The DIS process
To set the stage it is useful to first recall the leading order (LO) kinematics of DIS. We consider the scattering of a massless (anti-)quark q off a massless (anti-)lepton l via the exchange of a photon or electroweak gauge boson V of virtuality Q 2 . In our notation, the external four-momenta are given by k i (incoming lepton), k f (outgoing lepton), p i (incoming quark), and p f (outgoing quark).
It is customary to define a set of DIS variables x B , Q 2 , and y DIS , given by where P is the proton four-momentum. At LO, neglecting the proton mass, the Bjorken x B variable coincides with the longitudinal momentum fraction x carried by the incoming quark, p i = xP . The LO phase space is (2.2) and the differential partonic cross section (for photon exchange), 3 after integrating over the azimuthal angle of the lepton, is given by where we have used that Q 2 = x B y DIS S, with S = 2P · k i the total squared centre-of-mass energy. At next-to-leading order (NLO) the process receives both virtual loop corrections and real emission tree-level corrections. The full three-particle DIS phase space dΦ 3 for the real correction is given by

POWHEG ingredients
Before describing the modifications we made to handle DIS, we briefly summarise the main ingredients of the POWHEG method [36] as implemented in the POWHEG BOX [37]. A building block of the POWHEG cross section is the inclusive NLO cross section whereΦ n denotes the phase space of the underlying Born configuration, f b labels the partonic subprocess contributing at LO, and f r is summed over the partonic subprocesses entering the real contribution. B f b corresponds to the Born matrix element (including luminosity and flux factors), V f b corresponds to the UV-renormalised virtual corrections, and R fr is the real matrix element. The real cross section is partitioned in several contributions, labelled with the index α, each of them associated with a singular region. The notation "α ∈ f r → f b " means that all the singular regions leading to the underlying Born subprocess f b are considered. This writing assumes that the phase space for the real contribution can be written in a factorised form dΦ α n+1 = dΦ n dΦ α rad . (2.6) The radiation phase space Φ α rad is parameterised in terms of three variables, an energy fraction ξ, the cosine of the angle between two partons that can become collinear y, and an azimuthal angle ϕ, according to the Frixione-Kunszt-Signer (FKS) [57] subtraction technique. The exact expression of Φ α rad depends on the singular region. In the DIS case, there are two singular regions, one associated with initial-state radiation, one with finalstate radiation.
The POWHEG cross section reads is a quantity used to measure the hardness of an emission, that depends on the radiation variables ξ and y, and becomes equal to the transverse momentum of the emission in the soft-collinear limit, µ 0 is an infrared scale of the order of 1 GeV, below which real radiation is considered unresolved, and is the Sudakov form factor. After integrating over the radiation phase space, the squared bracket appearing in Eq. (2.7) yields 1.For this reason, preserving the DIS invariants when building the radiation phase space ensures that one exactly reproduces the NLO distributions for x DIS , y DIS and Q 2 DIS , and does not introduce spurious higher-order corrections in inclusive quantities. In Secs. 2.2.2 and 2.2.3 we present new parametrisations of the radiation phase space, for ISR and final state radiation (FSR) respectively, that enable one to preserve the DIS invariants. We also need to modify our definition of the hardness variable k α T (ξ, y), as detailed in Sec. 2.2.4. One of the features of Eq. (2.7), is that it can significantly depart from the fixedorder NLO calculation when considering non-inclusive observables (i.e. observables that are vanishing at LO) even in the limit in which the radiation is very hard. This is due to the ratioB/B, and to higher-order effects encoded in the Sudakov form factor of Eq. (2.7) (e.g. related to the treatment of the QCD coupling constant, which is modified to include the dominant logarithmically-enhanced corrections at all orders [58]). To remedy this, one can introduce a monotonic function h(k T ), such that and separate the real cross section into a singular (s) and a finite (f ) contribution, In the POWHEG BOX, this procedure is dubbed the hdamp mechanism. In the POWHEG BOX, it is also possible to use the Bornzerodamp mechanism, which moves to R (f ) α all the configurations where the real matrix element departs significantly from its soft or collinear approximation. 5 The impact of the damping functions is discussed in App. D. If regular contributions (i.e. those not associated with any singularity) are present, those are also treated alongside the remnant contributions.

Phase-space parameterisation for initial-state radiation
In order to evaluate the phase-space of Eq. (2.4) for the case of ISR, we write the centreof-mass momenta in the final state as where ξ, y and ϕ are the FKS variables that are used to parametrise the real-radiation phase space. After some algebra one may express the three-particle phase space, dΦ 3 , of Eq. (2.4) in terms of the two-particle phase space in Eq. (2.2) as follows: where ∆ϕ = ϕ − ϕ k and, as in Ref. [37], we use the bar to indicate underlying Born quantities, like the squared Born centre-of-mass energys = x B S. The two δ-functions arise due to energy conservation, which gives rise to a quadratic equation in λ =x/x. The two solutions are given by is not yet suitable for numerical implementation in the POWHEG BOX due to the presence of the δ-functions and the additional associated integration over λ.
Schematically, the ξ and λ integrations of a generic function f (λ, ξ, y, ϕ) can then be written as (2.17) where f ± (ξ) = f (λ ± , ξ, y, ϕ) and the limits in the ξ integrations are set by requiring that the λ ± solutions are physical. The explicit expressions for ξ 0 and ξ max are given in App. A. As shown in that appendix the integral in the above equation can then be written as Since the λ integration has been eliminated and the ξ integral has been remapped into a single integral between 0 and 1, one can evaluate the radiation phase space as usual in the POWHEG BOX.

Phase-space parameterisation for final-state radiation
The starting point for the FSR phase-space derivation is the same as the one given in Eq. (2.4). We first introduce the momentum sum k = p f + p r of the two outgoing QCD partons. The radiation variables are then given as in the POWHEG BOX, i.e.
where ⃗ η is an arbitrary direction that serves to define the origin of the azimuthal angle. In this case, after some algebra, one arrives at an expression in terms of the Born phase space and the FKS radiation variables that can be integrated numerically, given by , (2.21) where the value of λ 0 can be found in Eq. (A.56).

Generation of radiation
The standard POWHEG BOX generation of FSR radiation is discussed in detail in Ref. [37] and recalled explicitly in App. B.1.1. In the case of DIS, since the energy of the incoming lepton is fixed, the energy of the incoming parton is reduced even in the case of FSR, by an amount equal to .
(2.22) -8 -Since in the soft or collinear limits λ → 1, and hence s ≈s, one can use as ordering variable which now involves explicitly the underlying Born centre-of-mass energy. The upper bound for κ 2 t is in this case simplys. One can then generate a radiation phase-space point in the usual way in the POWHEG BOX, and accept or reject it using the standard hit-and-miss technique with an upper bound of the form (see App. C of Ref. [59]) dξdy . (2.24) In the case of ISR, the standard POWHEG code in the default setup handles the two collinear regions along the beam together. In our case instead we only have one collinear region. For example, if the collinear region is for y → +1, our upper-bound is identical to the one in Eq. (2.24). Furthermore one can use as ordering variable which, as shown in the App. B.2, is also bounded from above by κ 2 t <s. In the soft (ξ → 0) or collinear limit (y → 1) is it easy to verify that κ 2 t → s 2 ξ 2 (1 − y), i.e. it corresponds to the transverse momentum of the emission.

Code validation and comparison to existing predictions
In the following we present a validation of our code and a comparison to other existing theory predictions, both for inclusive observables, as well as for observables related to the jet kinematics. All the POWHEG BOX results presented in this section have been obtained using the Bornzerodamp mechanism, described in Sec. 2.2.1, to separate the singular and non-singular contributions in the real cross section. Alternative choices of the damping functions are discussed in App. D.

Inclusive observables
One of the defining features of an NLO+PS generator is that it should reproduce quantities that are inclusive in radiation not present at Born level, with NLO accuracy [60]. In DIS one typically decomposes the inclusive cross section in terms of the three proton structure functions F 1 (or F L ), F 2 , and F 3 [61], where x B , y DIS , and Q 2 are the usual DIS variables as defined in Eq. (2.1), and α is the electromagnetic fine-structure constant. The structure functions themselves depend on both x B and Q 2 . At leading order F 2 = 2x B F 1 and F 3 only receives contributions -9 -from diagrams with a Z boson. The above equation is therefore often recast using F L = where we have suppressed again the arguments of the structure functions. One then defines the (dimensionless) reduced cross section by [62] σ R (x, which has the property that at leading order it is equal to F 2 when considering only photon exchange. In this validation section, we use the reduced cross section to investigate inclusive predictions of the new POWHEG BOX generator. For the numerical results presented here we will consider positron and proton collisions at energies of 27.6 GeV and 920 GeV, respectively. We will use the NNLO PDF set NNPDF30_nnlo_as_0118_hera [63] based on HERA data with the associated strong coupling α s (M Z ) = 0.118 as implemented in LHAPDF v6.5.3 [64]. We set the central renormalisation, µ R , and factorisation, µ F , scales equal to Q, and to estimate the perturbative uncertainty we do a standard 7-point scale variation by a factor of two around these values. The number of active flavours is set to N f = 5.
In Figs. 1 and 2 we show the reduced double differential cross section, as defined in Eq. (3.3). The results are either binned in Q 2 and plotted in log x B , or binned in x B and plotted in log Q 2 , where Q is measured in GeV. In particular, we compare our NLO+PS result to an NNLO calculation obtained with a private code that uses the DIS structure function branch of HOPPET [65] which was developed in the context of the proVBFH programs [33,[66][67][68] and uses the NNLO DIS coefficient functions computed in Refs. [69,70]. We obtain NLO+PS results using two different versions of the Pythia8 6 showers [43], namely the default Pythia8 and the dipole like Pythia8 shower introduced in Ref. [71]. For the purposes of the comparison in this section, the main difference between the two is that the default shower does not preserve the lepton kinematics, whereas the dipole variant does. In these plots we only run the shower phase of Pythia8, i.e. we do not include hadronisation and underlying event simulation. More details on the matching procedure with Pythia8 are given in App. C. As explained in Sec. 2, our POWHEG BOX implementation is constructed such that the radiation mappings preserve the underlying DIS kinematics, i.e. the lepton kinematics. For this reason the reduced cross sections obtained at pure NLO and at the level of the unshowered Les Houches Event (LHE) [72] file are in perfect agreement, i.e. there are no spurious higher-order terms induced by the POWHEG Sudakov for this observable. 7 We note that this would not have been the case if our mappings did the POWHEG BOX the weight of events with negative PDF values is set to zero. We have checked that the discrepancy reduces when one uses a PDF set exhibiting fewer negative values. Small differences between the LHE and the NLO distributions also arise in the very small Q region due to the momentum reshuffling procedure to introduce mass effects for final-state particles.
-10 - NNLO pythia8 dipole pythia8 default Q 2 < 15000 GeV 2 , with Q n < Q < Q n+1 Figure 1: The reduced double differential cross section, defined in Eq. (3.3), binned in Q 2 and plotted as a function of log x B . For a given n the lower bin-edge in Q 2 is given by the printed Q 2 n and the upper limit is Q 2 n+1 . For the last bin with n = 25 the upper edge is given by Q 2 = 15000 GeV 2 . Note that in order to plot all curves in the same panel they have been multiplied by 2 25−n . Plotted are fixed-order NNLO results (dark purple), POWHEG events showered with the dipole Pythia8 shower (blue), and POWHEG events showered with the default Pythia8 shower (red), both at parton level.
-11 - and plotted as a function of log Q 2 . For a given n the lower bin-edge in x B is given by the printed x n and the upper limit is x n+1 . For the last bin with n = 16 the upper edge is given by x B = 0.5. Note that in order to plot all curves in the same panel they have been multiplied by 2 16−n . Plotted are fixed-order NNLO results (dark purple), POWHEG events showered with the dipole Pythia8 shower (blue), and POWHEG events showered with the default Pythia8 shower (red), both at parton level.  not preserve the DIS kinematics. Furthermore, with a parton shower which preserves the DIS invariants, such as the dipole Pythia8 shower [71], the reduced cross-sections after parton shower are also identical. For this reason in Figs. 1 and 2 we do not show explicitly the NLO and LHE curves, as they are almost identical to the dipole (POWHEG+)Pythia8 shower results. On the other hand, the default Pythia8 shower does not preserve the DIS kinematics, and therefore one can expect modifications from the shower to the reduced cross sections, as is evident from Figs. 1 and 2, even though the hardest emission event generated by POWHEG does preserve the DIS kinematics. In particular, there are significant deviations for larger values of x B together with small to moderate values of Q 2 . It is interesting to note that in this kinematic region the NNLO prediction is very close to the dipole Pythia8 prediction (i.e. it is very close to the NLO result), and hence true NNLO corrections are tiny in these regions. The discrepancies between the two Pythia8 showers can therefore be seen as spurious effects. This was already pointed out in Ref. [71] where it was found that the dipole Pythia8 shower correctly reproduces the singular limits of LO DIS matrix elements, whereas the default Pythia8 shower does not. To make the effect more visible, we select a few representative bins from the above two figures, and plot the ratios of the various predictions to the respective fixed-order NLO results in Figs. 3-5. Here it can be seen very clearly that for large values of x B there is a discrepancy between the two Pythia8 showers which is not accounted for by scale variations or true NNLO corrections. We do not include the scale variation band for the reference NLO prediction, as it is near identical to the band obtained with the Pythia8 dipole shower.

Impact of alternative momentum mappings
While the above discussion clearly demonstrates that our POWHEG implementation achieves NLO accuracy for inclusive quantities, it is instructive to explore how an implementation using mappings closer to the standard POWHEG BOX mappings would look like. The main     Fig. 3, but for the bin 3500 GeV 2 < Q 2 < 15000 GeV 2 as a function of log x B (left) and for the bin 0.08 < x B < 0.13 as a function of log 10 Q 2 (right).
kinematical difference between hadron-hadron collisions and DIS is that the incoming lepton momentum in DIS is fixed, whereas the incoming partons in a hadronic collision are sampled in their energy fractions. In order to simulate DIS it is therefore necessary that the mappings used by POWHEG preserve the incoming lepton momentum. The standard FSR mapping in POWHEG already does this, as it preserves both x 1 and x 2 (but we stress that these mappings do not preserve the DIS variables). In contrast, the ISR mappings modify both x 1 and x 2 .
It is, however, straightforward to adapt the ISR mapping such that it only modifies the incoming parton momentum, but leaves the incoming lepton untouched. The minimally modified POWHEG map for ISR can be straightforwardly obtained by applying an additional longitudinal boost along the direction of the incoming electron, in order to restore its original value of the energy, to the kinematic reconstruction detailed in Sec. 5.1.1 of Ref. [37]. This boost changes the value of the x fraction associated with the incoming parton, that, at variance with Eq. (5.7) of Ref. [37], now becomes x =x 1−ξ , but does not alter the values of the variable ξ, y and ϕ, which are defined in the partonic centre-of-mass frame. The only -14 - pythia8 dipole NNLO pythia8 dipole (minimal POWHEG) Q 2 < 15000 GeV 2 , with Q n < Q < Q n+1 other necessary modifications are then the expression for upper bound for ξ, which now becomes ξ <x 2 , and for κ ISR t , which is now bounded by S−s 4 √ S , being √ S the total hadronic centre-of-mass energy, and √s the underlying-Born centre-of-mass energy, which coincides with the mass of the recoiling system. However, with this minimally modified mapping, the outgoing lepton still takes recoil and hence the DIS variables are not conserved.
-15 -  In Figs. 6-10 we show plots similar to the above, but now we use the minimally modified momentum mappings (red lines, labelled "minimal POWHEG" in the plots). For reference we show the POWHEG+Pythia8 dipole prediction (blue), using our new mappings from the above plots, as well as NNLO results (purple). As can be seen clearly, the predictions obtained with these minimally modified mappings exhibit sizable differences compared to  the new mappings for inclusive quantities. In particular, we observe very large deviations for small Q and x B . It can also be seen that the deviations do not approximate the true NNLO corrections well.
It is interesting to note that LHC processes involving the exchange of colourless particles in the t-channel, like vector boson fusion (VBF) and single top production, which have been implemented in the POWHEG BOX in Refs. [73][74][75][76], exhibit kinematics that are essentially double-DIS like. For this reason one would expect that these processes could benefit from a different momentum mapping. For instance, for VBF, it was observed in Ref. [33] that the description of the rapidity separation between the two hardest jets as predicted with POWHEG has a very different shape compared to both NLO and NNLO predictions. It would be interesting to see if this tension can be resolved with our new mappings. We leave this question for future work.   Figure 10: Similar to Fig. 5, but showing the minimally modified momentum mappings that do not preserve DIS kinematics in red. We do not show the default Pythia8 here and we normalise to our default "pythia8 dipole" result here in blue.

Exclusive observables and comparison with resummation
In addition to the inclusive cross section considered above, information on the physics of DIS reactions can be gained from exclusive observables. In order to explore the complementarity of fixed-order perturbative calculations, resummed predictions and NLO+PS simulations, we performed a comparison of these approaches for two event shape variables. Following the Breit-frame 8 definition of event shapes employed in the experimental analyses, we can distinguish between the remnant and current hemisphere, where particles in the remnant (current) hemisphere have positive (negative) pseudo-rapidities when the incoming photon has negative rapidity. Event shapes are formulated only in terms of particles in the current hemisphere. In particular, in this section we consider the thrust distribution τ z,Q and broadening B z,E , which are defined as where the momenta are defined in the Breit frame, the photon three-momentum determines the z-direction, and h denotes all the hadrons in the current hemisphere. These observables are continuously-global and we can obtain resummed predictions at NLL accuracy, e.g. from CAESAR [50], according to the practical details in section 3.2 of Ref. [78]. In particular, the matching of resummation and fixed order is performed with the mod-R scheme defined in that same reference. Fig. 11 depicts τ z,Q and B z,E for the photon-exchange contribution to e − p → e − X with E p = 904.5 GeV, E e = 27.6 GeV. We consider fixed-underlying Born kinematics, 8 The Breit frame is defined by 2xB ⃗ P + ⃗ q = 0, where ⃗ P denotes the incoming proton momentum and ⃗ q the momentum of the virtual boson characterizing the DIS topology. In this frame, the exchanged photon has no energy and is anti-aligned to the incoming parton. We follow Appendix 7.11 in Ref. [77] for the actual frame transformation.
As discussed in Ref. [80], starting from order α 2 s there can be configurations where the current hemisphere is populated only by soft large-angle radiation from partons in the remnant hemisphere. It is then necessary to introduce a cut on E curr to remove sensitivity to these soft emissions in event shapes normalised with respect to E curr (or to h |⃗ p z,h |), that would be otherwise infrared unsafe.
In the parton-level Pythia8 curve, we dress POWHEG events with the QCD Pythia8 dipole shower. In the hadron-level curve we also include hadronisation and beam remnants effects. The fixed-order predictions are obtained with DISENT [46] 9 . Since the event shapes shown here vanish at Born-level, the predictions shown here are effectively LO and NLO accurate even though we label the predictions by their inclusive accuracy, i.e. NLO and NNLO.
At NLO, the distributions steeply increase towards small values. This behaviour reflects the distinguished kinematics of a LO DIS configuration with exactly one final-state parton, which at NLO can only be altered by the emission of a single additional parton. One more parton can arise at NNLO where we observe a peak at small values of τ z,Q 9 The version of DISENT used here and below includes the bug fix reported in Refs. [81,82]. anti-k t , R=0.4, p jet T > 30 GeV, -4.5 < η jet < -1.5 e -(300 GeV) p (6500 GeV) → e -+ X NNPDF30_nnlo_as_0118_hera μ R = μ F = Q 7-point variation 30   and B z,E , which is due to the distributions turning negative (and diverging) due to large negative virtual corrections.
The divergences of the fixed-order calculations are removed by the resummation of all-order leading (and next-to-leading) logarithmically enhanced terms. In the NLO+PS results the parton shower has a similar effect.
We note that, away from the divergence, our POWHEG prediction is much closer to NNLO than to NLO at the parton level, indicating that higher order terms induced by the shower and matching approximate well the true NNLO corrections.
We also notice that, while for the broadening distribution (right panel), both the NLO+PS (parton) and the NNLO+NLL curves are peaked around the same value of B z,E . For τ z,Q (left panel), the NNLO+NLL Sudakov peak is at a much lower value (i.e. τ z,Q ≲ 0.02) and it is not visible in the parton shower predictions because of the PS cutoff µ min ≈ 0.5 GeV.
Hadronisation effects are sizable for small values of the event shapes. In case of the thrust distribution, they correspond to a roughly constant shift, while for the broadening this shift depends on the value of B z,E .

Jet and VBF related observables
Although the full implementation and study of VBF or single top production is beyond the scope of this paper, it is possible to mimic the kinematics of jets produced in these processes at the LHC. As discussed above, our POWHEG BOX implementation reproduces the DIS structure functions exactly at NLO since the momentum mappings preserve the DIS variables, or equivalently they leave the lepton momenta untouched. However, if we instead turn to the kinematics of the DIS jet, we see that it is clearly modified by the extra radiation. The POWHEG method still guarantees that we describe the hardest jet with NLO -20 - anti-k t , R=0.4, p jet T > 30 GeV, -4.5 < η jet < -1.5 e -(300 GeV) p (6500 GeV) → e -+ X NNPDF30_nnlo_as_0118_hera μ R = μ F = Q 7-point variation 30 Figure 13: Same as Fig. 12, but now showing the rapidity of the hardest jet satisfying p jet T > 30 GeV.
accuracy, but with higher-order terms in α s induced by the POWHEG Sudakov form factor (and the subsequent showering).
Typical VBF analyses at the LHC look for two well-separated jets with a large invariant mass. Therefore, to mimic VBF like topologies, we produce events in proton-electron collisions, with a proton energy of 6500 GeV and an electron energy of 300 GeV. Additionally we require that The cut on x B ensures a large invariant mass of the colliding system, and the range in Q is close to the typical momentum transfer in VBF of the order of the W mass. We also only include photon-mediated DIS to allow for a comparison with a fixed order code. We cluster the events in the laboratory frame with the anti-k T algorithm [83] with R = 0.4 using FastJet v3.4.0 [84]. We then look for the jet of hardest transverse momentum with pseudo-rapidity −4.5 < η jet < −1.5 (the incoming parton is in the minus z-direction). We require this jet to have p jet T > 30 GeV. We stress that these quantities are defined in the lab frame, not in the Breit frame, hence they are non-vanishing already at LO.
In Figs. 12-13 we show the transverse momentum and the rapidity, respectively, of the hardest jet. In addition to our new POWHEG implementation (shown in red) we also show fixed-order predictions up to NNLO and events generated with the minimal POWHEG implementation described in the previous section (orange). We use the Pythia8 dipole shower to shower both sets of POWHEG events. The bands are obtained through the usual 7-point scale variation around the central scales µ R/F = Q. The fixed order predictions are obtained with a private code using DISENT [46] and the projection-to-Born method [33] together with the NNLO DIS coefficient functions [69,70] as implemented in the DIS structure function branch of HOPPET [65].
We note that the minimal POWHEG implementation is typically further away from the fixed order NLO curve than the new implementation presented here. The differences between the two POWHEG generators are, however, compatible in size with the scale uncertainty -21 -  Figure 14: Thrust distribution (left) and broadening (right) for different bins in Q, at the hadron level for the dipole (red), and Vincia (blue) showers, and at the parton level (i.e. without hadronisation and beam-remnant effects) for the dipole shower (magenta), together with the H1 data of Ref. [79]. For a given Q-bin n, the average value of Q is denoted by ⟨Q n ⟩, and the corresponding curve is multiplied by a factor of 50 2(6−n) for better readability.
band, although the true NNLO corrections (in purple) tend to favour our new POWHEG implementation. It will be interesting to see if this pattern persists if one were to use our new mappings in VBF production or single top. The size of the scale variation in both POWHEG implementations deserves some comments. Naively one would expect the scale variation bands to be commensurate in size with those of the NLO calculation. However, in POWHEG when one varies the renormalisation and factorisation scales this only impacts theB function. The scale at which α s is evaluated in the Sudakov form factor remains the same (the transverse momentum of the emission). TheB function, introduced in Sec. 2.2.1, is essentially the NLO inclusive cross section, which has tiny scale variations for the values of Q probed here. As a consequence the scale variation is dramatically underestimated -even more than in the fixed order prediction. In App. D we study this effect in more detail. Although the scale variation band is always underestimated compared to the fixed order NLO band, including more damping can ameliorate this effect somewhat.

Phenomenological studies
In this section we present hadron-level predictions for the HERA and EIC experiments. In particular, we interface our NLO+PS generator with Pythia8 [44], which also provides hadronisation and beam-remnant effects. The hadron-level predictions are obtained considering both the dipole shower [71], which we have employed in the previous section, as -22 -  well as the Pythia8 implementation of the default antenna Vincia shower [85]. We do not include QED radiation or hadron-decay effects. 10

Comparison to HERA data
We now compare predictions obtained with our new POWHEG BOX implementation with HERA data analyzed by the H1 Collaboration in Ref. [79] corresponding to an integrated luminosity of L int = 106 pb −1 . Following their study, we consider collisions of electrons or positrons of energy E e = 27.6 GeV and protons of energy E p = 820 GeV or E p = 920 GeV resulting in centre-of-mass energies √ s of 301 GeV and 319 GeV, respectively. Electron and positron samples are generated independently and combined in the results discussed below corresponding to the composition of the data sample of Ref. [79], i.e. e + p: are taken into account. Furthermore, following the H1 analysis, we accept events where the energy in the current hemisphere exceeds a minimum value ϵ lim , according to Eq. (3.6). As 10 The implementation interface between our code and the Vincia showers largely relies on the works of Refs. [86,87].
-23 -  Figure 16: Same as Fig. 15, but for the broadening distribution. discussed in Sec. 3.2, this cutoff ensures the collinear and infrared safety of event shapes which are normalised with respect to E curr (or h |⃗ p z,h |), by removing events with no hard, but only arbitrarily soft partons in the current hemisphere. Event shape variables constitute a class of quantities particularly suited to probe the interplay of the hard scattering and the hadronisation mechanism governing DIS processes (see e.g. Ref. [88]). Following Ref. [79], we introduce the thrust variable τ z,E is defined by where the summations run over all hadronic objects h in the current hemisphere, ⃗ p h denotes the Breit-frame three-momentum of parton h and ⃗ p z,h its component along the z axis, which is chosen along the direction of the virtual boson. 11 The thrust variable is a measure of the momentum components of the hadronic system parallel to the z axis in the Breit frame. For the broadening we use the definition of Eq. (3.5).
In Fig. 14 we show τ z,E and B z,E , respectively, for various Q bins at the same time, together with the H1 data of [79]. To assess the impact of soft-physics effects, we also produce parton-level predictions in which hadronisation and beam-remnant effects are not included. We do so only for the dipole shower, as the modelling of these effects is the same for Vincia predictions. In Figs. 15 and 16 we consider the same event shapes for selected ranges of Q together with the 7-point variation of µ R and µ F . We select the lowest Q-bin, 14 GeV < Q < 16 GeV, which is dominated by photon exchange contributions, one with intermediate values of Q, and one including the value where Q coincides with the mass of the Z boson.
For τ z,E we find good agreement of our hadron-level predictions with H1 data. Especially at low values of Q hadronisation effects are crucial for a reasonable description of 11 Note that this definition of thrust differs from the one for τ z,Q of Eq. (3.4) used in the previous section.
- 24 - Figure 17: Same as Fig. 14, but for the squared jet mass (left) and the C-parameter (right).
data. As expected, at higher values of Q the impact of these effects becomes less relevant. While agreement between predictions and data is generally worse for the broadening, a similar trend as in the thrust distribution can be observed, with hadronisation effects being particularly important at low values of Q and differences between the dipole and the antenna showers being small throughout. Scale uncertainties are generally small for both distributions, but largest towards their respective upper ends, which reflects the relevance of higher-order corrections in these kinematic regions. The smallness of the scale variation in these plots can mostly be attributed to the fact that the plots show normalised distributions.
Similarly to the event shapes of Eqs. (4.2) and (3.5), the squared jet mass ρ and the C-parameter are defined as where h and h ′ are two different hadronic objects in the current hemisphere separated by an angle θ hh ′ . In Fig 17 we   We note that a large impact of non-perturbative effects has also been reported in [89] for the so-called 1-jettiness distribution which is closely related to the thrust distribution.

Predictions for the EIC
After employing our new POWHEG BOX implementation for the description of H1 legacy results, we turn to DIS at the future EIC. We consider electron-proton collisions with E e = 18 GeV, E p = 275 GeV, both in the neutral current (NC) and charged current (CC) modes with the incoming lepton either remaining intact or being converted into a neutrino.
Following Ref. [90], the DIS kinematics are restricted by 25 GeV 2 < Q 2 < 1000 GeV 2 , 0.04 < y DIS < 0.95 . (4.5) In contrast to the settings used in the HERA analysis of Sec. 4.1, for our EIC predictions we employ the PDF4LHC15_nlo_100_pdfas parton distribution set [91] to account for LHC constraints on the proton structure. Jets are reconstructed in the laboratory frame with the anti-k T algorithm [83] using an R-parameter of R = 0.8 and restrictions on transverse momentum and pseudorapidity, 12  LO results in a non-uniform way, slightly shifting the Q 2 distribution to larger values. Also the shape of the x B distribution is modified by NLO corrections with a tendency to smaller x B values at LO. Both the Vincia and the dipole showers preserve the lepton kinematics, hence the NLO+PS results agree with the NLO result. 13 For the NLO+PS results, we also performed a 7-point scale variation, modifying the renormalisation and factorisation scales independently by factors of two around their central value Q.
In Fig. 21 we display the transverse-momentum and pseudorapidity distributions of the hardest jet reconstructed with cuts of Eqs. (4.6)-(4.5). We remind the reader that these quantities are defined in the lab frame, hence they are non-vanishing already at LO. At LO, the accessible range of transverse momentum is limited by the upper limit on Q 2 of 1000 GeV 2 , p jet T < Q ∼ 32 GeV. Beyond LO, the transverse momentum available for the hadronic system can instead be distributed among various final-state partons resulting in non-vanishing contributions to the respective cross sections beyond this threshold. Here NLO predictions are effectively leading order accurate, which is reflected by the larger scale uncertainly bands. We observe that the NLO corrections considerably reduce the p jet T distribution at low values, and the parton shower slightly enhances that effect. The shape of the pseudorapidity distribution is modified by NLO corrections in an asymmetric way with largest effects at high values of |η jet |. In this range an additional, though smaller shape distortion is caused by the parton shower. The impact of the shower is large in kinematic 13 The 1% difference at very small Q 2 values is induced by the reshuffling procedure that POWHEG applies to introduce heavy-quark mass effects in the momenta that are written in the LHE files, and is not related to the parton shower or non-perturbative effects. regions that are not accessible at LO, but require the presence of additional radiation radiation, such as the large transverse-momentum region, or for very negative values of η jet . In particular, we find that the dipole and the Vincia shower agree remarkably well with each other, except for η jet ≲ −2, where differences between the two shower models reach 10-15%. Scale uncertainties are generally smaller than differences between fixed-order and NLO+PS results. We now consider the CC case. The main difference between the NC and the CC processes is that the former can proceed via the exchange of a virtual photon or a Z boson between the scattering electron and proton, and so is divergent for small Q 2 or p jet T values, while the CC cross section is entirely due to weak boson exchange contributions, which leads to a finite cross section also for vanishing Q 2 . Nonetheless, in the most characteristic distributions, radiative corrections display similar features as in the NC case. The Q 2 and x B distributions depicted in Fig. 22 exhibit negative NLO corrections of about 5% to 10% over the entire range of Q 2 . At small values of x B the NLO corrections are small and negative, while they become positive beyond x B ≈ 0.3 and reach values of almost 25% at large x B . Parton-shower effects are small in each case.
The transverse-momentum and pseudorapidity distributions illustrated in Fig. 23 turn out to be less sensitive to NLO corrections at low p jet T in the CC than in the NC case, but receive small negative NLO corrections at intermediate transverse momenta. The different behaviour of this jet distribution at low p jet T can be traced back to the presence of photonexchange contributions in the NC case. The pseudorapidity distribution, which is most sensitive to perturbative corrections at large values of |η jet | where the cross section itself is small, exhibits a similar behaviour as in the NC case.

Summary and conclusions
In this paper we have presented the first implementation of an NLO+PS event generator for DIS in the POWHEG BOX. The code will be made publically available there. While the POWHEG BOX allows for almost automated generation of hadron-hadron collisions, the kinematics of the DIS process required us to address a number of problems.
In particular we had to modify the FKS momentum mappings that are used in the POWHEG BOX, in order to preserve both the incoming and outgoing lepton momenta. The standard ISR map in the POWHEG BOX is such that it modifies the kinematics of both the incoming legs. Although the FSR map does not modify the incoming energy fractions, it does not preserve the DIS variables, x B , y DIS , and Q. We have shown that if one makes only minimal modifications to these mappings, such that the ISR map conserves the incoming lepton momentum, but the FSR maps is untouched, the resulting POWHEG generator substantially modifies even very inclusive distributions, even outside the NLO scale uncertainty. On the other hand, with our new and significantly different mappings, which do preserve DIS kinematics, NLO accuracy is numerically retained. Since the momentum mapping introduced preserves the DIS variables, it is possible to be fully differential in x B and Q 2 or to consider specific ranges in x B and Q 2 . We have presented several phenomenological studies. Firstly, we compared our results to event-shape distributions measured by H1. Overall, we observe a reasonable agreement, but for certain event shapes, there are discrepancies between the shapes of our theoretical predictions and the data. This is not unexpected as event shapes are described only at LO+PS in our generator, starting at O(α s ). In the future, we plan to extend the description of DIS processes in POWHEG to include DIS + one jet. This extension would allow us to achieve NLO accuracy for both inclusive and one-jet quantities within the MINLO framework. By doing so, we aim to improve the precision and reliability of our predictions for a broader range of observables in DIS and in particular to achieve NLO accuracy both for inclusive and one-jet quantities.
We then considered a possible future setup at the future EIC. We find that NLO corrections can be important and must be included to have an accurate description of this process. This is the case both for the Q 2 and and x B dependence of the inclusive cross section, as well as for the transverse momentum and rapidity distribution of the leading jet, where NLO corrections give rise to sizable shape difference compared to LO and parton shower effects are also important.
Although the present study focused on DIS, our work has implications also for LHC processes which involve the exchange of colourless particles in the t-channel like VBF and single top production. We leave it for future work to investigate the impact of the new momentum mappings in these processes. Very recently a family of NLL-accurate parton showers for DIS and VBF was presented in Ref. [92]. It will be also interesting in the future to investigate the matching of these showers to our POWHEG generator. Our code can be downloaded from the following SVN repository: svn://powhegbox.mib.infn.it/trunk/User-Processes-RES/DIS For the two zeros of the argument we find where p 0 f is given in Eq. For the allowed range of the variables λ, y DIS , ξ, the left-hand side of this equation is always positive. The right-hand side of the equation is clearly also always positive, therefore both sides can be squared without generating spurious solutions, resulting in In this equation, the sign of the right-hand side is determined by the sign of the factor cos ∆ϕ. Therefore only solutions for λ ± are allowed, where the left-hand side has the same sign as cos ∆ϕ. The two solutions have the property that λ + (λ − ) becomes equal to λ − (λ + ) when the sign of cos ∆ϕ is changed. By inserting λ ± in the l.h.s. of the above equation one finds that in the case of cos ∆ϕ > 0 only the λ − solution gives the correct sign up to a maximum value of ξ equal to ξ 0 = 2y DIS 1−yy DIS +y DIS +y . On the other hand, in the case of cos ∆ϕ < 0 the λ − solution is always a correct solution and the λ + solution is only correct if ξ > ξ 0 .
-34 -Rewriting x = x B /λ, for the phase-space integral of Eq. (A.14) can be rewritten as This three-particle phase-space integral can be expressed in terms of the two-particle phasespace integral in Eq. (A.5), and the radiation variables ξ, ϕ, y as In the above equation, the λ integral is now constrained to the rangex < λ < 1, for each λ ± solution, and the ξ integral is constrained as described above. One can observe that in the collinear and soft limits only λ − is a valid solution. Schematically, the ξ integration can then be written as where ξ 0 is given above and ξ max depends on the value ξ 1 where λ + = λ − and A in Eq. (A. 21) vanishes. Explicitly, one has Therefore, in the case where λ + is not a valid solution ξ 0 = ξ max . The integral can then be written as with ξ ′ max = 2ξ max − ξ 0 . Lastly, one can make the transformation toξ = ξ/ξ ′ max , to obtain

A.2 Phase-space parameterisation for final-state radiation
Here, we use the same notation as in the previous section and work in the centre-of-mass frame. We write the phase space as We introduce k, the sum of the momenta of the two outgoing partons, We parameterisation k as where k 0 = p 0 f + p 0 r . If we use k instead of p f , the phase space becomes Next, we need rotate k along the z-axis, such that By doing that rotation we are constricting the two integration variables y k andφ. Hence, we have to transform them into new angles. To that end, we choose the corresponding angles of the incoming parton. The rotation matrix R can be written explicitly as In the rotated frame one has Note that c ψ and ϕ r are not the FKS variables. One can perform a change of variables y k → −c p andφ → ϕ p to obtain the usual angles of the incoming parton in the rotated frame.
We can integrate over k f to obtain k f = −k and k 0 f = k. Thereby, the momentum of the outgoing lepton in the rotated frame is simply The phase space now becomes where we used that p 0 f = (p 0 r ) 2 + k 2 − 2c ψ p 0 r k. We now computed the DIS variables in the rotated frame In order to express the phase space as a function of the underlying Born one, we replace k and c p by DIS variables using (A.51) We also replace the Q 2 integral introducing λ = Q 2 /(sy DIS ) Additionally, we transform p (R) r into spherical coordinates as indicated above. The phase space becomes -37 -Next we remove the last delta distribution by integrating over λ. The root of the argument of the delta distribution is and we get the additional Jacobian factor of Therefore, the integration over λ yields .
The last step is to transform c ψ and ϕ r into the FKS variables y and ϕ. Here, y is the cosine of the angle between the emitter and the radiation, while ϕ denotes the azimuthal angle of p r around k, where k i , i.e. the z-axis of the usual centre-of-mass frame, serves as origin for the angle. We can transform the momenta back into the usual centre-of-mass frame using R −1 of (A.40). For the FKS variables we get which leads to Therefore the phase space becomes .
Finally, we change variable to from x to x B = λ 0 x and factor out the Born phase space to get . (A.63)

B Generation of radiation
In this appendix we describe how we modify the default implementation of the event generation for radiation, in order to be able to handle DIS. In practice, for every singular region α (associated with a given underlying Born f b ), we want to find a function U α (ξ, y) such that We also need to introduce a dimensioned variable κ α t (ξ, y) that approaches the transverse momentum of the emission in the soft-collinear limit, so that we can integrate analytically In practice, we generate a random number r, and we determine k T by solving ∆ (U ) α (k T ) = r. We then generate ξ uniformly in U α (ξ, y(ξ, k T ), while ϕ is generated uniformly between 0 and 2π. The emission is then accepted with probability .
To handle DIS we need two upper bound functions, one for FSR and one for ISR.

B.1.1 The standard POWHEG BOX implementation
For final state radiation (FSR) in general the cross section can have a logarithmic divergence in the soft (ξ → 0) or collinear (y → 1) limit, therefore it is convenient to parameterise the upper bound for the generation of radiation as follows (see App. C of Ref. [59]) where κ t can be seen as the POWHEG evolution variable and reads Notice that for FSR, s does not change between Born and real contributions. POWHEG chooses convenient values forb 0 andΛ such that The emitted parton can carry at most an energy fraction where M rec is the mass of the recoiling system, which coincides with the final-state lepton in our case. Since y ≥ −1, Eq. (B.5) implies that κ 2 t ≤ ξ 2 s. So we have (notice that we -39 -absorbed some constants in N ) S(κ 2 t ) = U (ξ, y) dξ dy dϕ = 2πN ξmax 0 dξ ξ scalup as starting scale for the showers. To achieve this task, for both the Vincia and simple Pythia time-and space-like showers we need to set pTmaxMatch = 1. Alternatively, we can start the shower at the maximum kinematical limit (pTmaxMatch = 2), calculate the transverse momentum of each emission, and veto emissions harder than the original LHE event hardness. To enable the veto, which is performed by the PowhegHooks and PowhegHooksVincia classes, which are part of Pythia8.3, we need to set POWHEG:veto = 1.
We use scalup as event hardness, and the transverse momentum of every emission is computed using Eq. (C.1). 16 This is achieved via the settings POWHEG:pThard = 0 POWHEG:pTdef = 1.
We have re-implemented the function pTpowheg contained in the PowhegHooks and the PowhegHooksVincia classes, which are part of Pythia8.3. Notice that in case of g → gg and g → qq final-state splittings, we use as ξ the energy fraction of the softer of the two partons.
Pythia8 (dipole and Vincia) showers are only formally LL accurate, so all the above options preserve their logarithmic accuracy. However, since these showers also capture many NLL effects, providing e.g. the correct NLL DGLAP evolution of initial-state partons [93], we believe that the last option for matching should be used as default. Furthermore it was shown in Ref. [94] that not accounting correctly for the veto can lead to a breakdown of exponentiation, which indicates failure at the LL level already.

D Real radiation damping
In this section we compare three options for the definition of the singular real contribution R (s) , that we have introduced in Sec. 2.2.1. In particular, we consider 1. no damping, i.e. R (s) = R, with R being the whole real cross section; 2. Bornzerodamp mechanism on; 3. Bornzerodamp and hdamp mechanisms simultaneously activated.
For the latter option, we have modified the implementation of the damping function h(k T ) of Eq. (2.10) to be h(k T ) = Q 2 α h k 2 T + Q 2 , (D.1) 16 Notice that we rely on the Pythia8.3 definition ofs, i.e. the centre-of-mass energy before the last emission, as we have not yet generalised our mappings beyond the first emission.
-46 -   where α h is a parameter that can be varied. In our study we considered α h = 1. Notice that in the POWHEG BOX RES, the hdamp mechanism is applied only for ISR. However, in our implementation, we use it also for FSR.
In this appendix we consider only the photon-exchange contribution to e − p → e − X with E p = 904.5 GeV, E e = 27.6 GeV, and we fix the underlying Born kinematics to be x B = 0.116 and Q = 57.6 GeV. Events are required to have E curr > Q/ 10. In Figs. 24 and 25 we compare NLO predictions with distributions obtained from unshowered LHE events, produced using these three definitions of the singular contribution entering the POWHEG cross section dσ PWG of Sec. 2.2.1 for τ z,Q of Eq. (3.4) and B z,E of Eq. (3.5). For small values of the event shapes, all the LHE level distributions agree with each other, and the presence of the POWHEG Sudakov form factor of Eq. (2.9) regulates the divergent behaviour, which is instead present at NLO.
We observe that if we do not introduce any damping factor, the LHE distribution overshoots the NLO one by roughly 10% in the tail. This enhancement is within the scaleuncertainty band of the NLO result. However, we notice the scale-variation band for the LHE curve is almost absent. This is due to the fact that the hardest radiation is always generated using the transverse momentum as scale entering the emission probability [37], so that the factorisation and renormalisation scale variation affects only the total weight, but not the differential distribution. While the NLO cross section is smaller than the LO one (i.e.B/B ∼ 0.965 < 1), the increase of the distribution in the tail, is due to higherorder corrections such as those coming from the treatment of the running coupling [58] in the squared bracket of Eq. (2.7), which can capture the bulk of NLL corrections arising from subsequent unresolved emissions, or due to the scale choice in the PDF. The inclusion of a damping function does instead ensure that such corrections are not applied for large values of the event shapes. In this case the central value of the LHE curve aligns with the NLO one. We also notice that in this case, scale-variation bands are larger, as the argument of the PDFs and the coupling constant appearing in the remnant cross section are varied accordingly. The inclusion of the hdamp mechanism, on top of the Bornzerodamp one, leaves the central value of the curve almost unaffected, but increases the size of the uncertainty band in the tail of the distribution. In all cases, however, the scale-uncertainty band produced by LHE-level distributions is much smaller than the NLO one.