1 Introduction

Deep inelastic scattering (DIS) of electronsFootnote 1 on protons at centre-of-mass energies of up to \(\sqrt{s} \simeq 320\,\)GeV at HERA has been central to the exploration of proton structure and quark–gluon dynamics as described by perturbative Quantum Chromo Dynamics (pQCD) [1]. The two collaborations, H1 and ZEUS, have explored a large phase space in Bjorken x, \(x_\mathrm{Bj}\), and negative four-momentum-transfer squared, \(Q^2\). Cross sections for neutral current (NC) interactions have been published for \(0.045 \le Q^2 \le 50{,}000 \) GeV\(^2\) and \(6 \times 10^{-7} \le x_\mathrm{Bj} \le 0.65\) at values of the inelasticity, \(y = Q^2/(sx_\mathrm{Bj})\), between 0.005 and 0.95. Cross sections for charged current (CC) interactions have been published for \(200 \le Q^2 \le 50{,}000 \) GeV\(^2\) and \(1.3 \times 10^{-2} \le x_\mathrm{Bj} \le 0.40\) at values of y between 0.037 and 0.76.

HERA was operated in two phases: HERA I, from 1992 to 2000, and HERA II, from 2002 to 2007. From 1994 onwards, and for all data used here, HERA operated with an electron beam energy of \(E_e \simeq 27.5\) GeV. For most of HERA I and II, the proton beam energy was \(E_p = 920\) GeV, resulting in the highest centre-of-mass energy of \(\sqrt{s} \simeq 320\,\)GeV. During the HERA I period, each experiment collected about 100 pb\(^{-1}\) of \(e^+p\) and 15 pb\(^{-1}\) of \(e^-p\) data. These HERA I data were the basis of a combination and pQCD analysis published previously [2]. During the HERA II period, each experiment added about 150 pb\(^{-1}\) of \(e^+p\) and 235 pb\(^{-1}\) of \(e^-p\) data. As a result, the H1 and ZEUS collaborations collected total integrated luminosities of approximately 500 pb\(^{-1}\) each, divided about equally between \(e^+p\) and \(e^-p\) scattering. The paper presented here is based on the combination of all published H1 [310] and ZEUS [1124] measurements from both HERA I and II on inclusive DIS in NC and CC reactions. This includes data taken with proton beam energies of \(E_p = 920\), 820, 575 and 460 GeV corresponding to \(\sqrt{s}\simeq \) 320, 300, 251 and 225 GeV. During the HERA II period, the electron beam was longitudinally polarised. The data considered in this paper are cross sections corrected to zero beam polarisation as published by the collaborations.

The combination of the data and the pQCD analysis were performed using the packages HERAverager [25] and HERAFitter [26, 27]. The method [3, 28] also allowed a model-independent demonstration of the consistency of the data. The correlated systematic uncertainties and global normalisations were treated such that one coherent data set was obtained. Since H1 and ZEUS employed different experimental techniques, using different detectors and methods of kinematic reconstruction, the combination also led to a significantly reduced systematic uncertainty.

Within the framework of pQCD, the proton is described in terms of parton density functions, f(x), which provide the probability to find a parton, either gluon or quark, with a fraction x of the proton’s momentum. This probability is predicted to depend on the scale at which the proton is probed, called the factorisation scale, \(\mu _\mathrm{f}^2\), which for inclusive DIS is usually taken as \(Q^2\). These functions are usually presented as parton momentum distributions, xf(x), and are called parton distribution functions (PDFs). The PDFs are convoluted with the fundamental point-like scattering cross sections for partons to calculate cross sections. Perturbative QCD provides the framework to evolve the PDFs to other scales once they are provided at a starting scale. However, pQCD does not predict the PDFs at the starting scale. They must be determined by fits to data using ad hoc parameterisations.

The name HERAPDF stands for a pQCD analysis within the DGLAP [2933] formalism. The \(x_\mathrm{Bj}\) and \(Q^2\) dependences of the NC and CC DIS cross sections from both the H1 and ZEUS collaborations are used to determine sets of quark and gluon momentum distributions in the proton. The set of PDFs denoted as HERAPDF1.0 [2] was based on the combination of all inclusive DIS scattering cross sections obtained from HERA I data. A preliminary set of PDFs, HERAPDF1.5 [34], was obtained using HERA I and selected HERA II data, some of which were still preliminary. In this paper, a new set of PDFs, HERAPDF2.0, is presented, based on combined inclusive DIS cross sections from all of HERA I and HERA II.

Several groups JR [35], MSTW/MMHT [36, 37], CTEQ/CT [38, 39], ABM [4042] and NNPDF [43, 44], provide PDF sets using HERA, fixed-target and hadron-collider data. The strength of the HERAPDF approach is that a single coherent high-precision data set containing NC and CC cross sections is used as input. The new combined data used for the HERAPDF2.0 analysis span four orders of magnitude in \(Q^2\) and \(x_\mathrm{Bj}\). The availability of precision NC and CC cross sections over this large phase space allows HERAPDF to use only ep scattering data and thus makes HERAPDF independent of any heavy nuclear (or deuterium) corrections. The difference between the NC \(e^+p\) and \(e^-p\) cross sections at high \(Q^2\), together with the high-\(Q^2\) CC data, constrain the valence-quark distributions. The CC \(e^+p\) data especially constrain the valence down-quark distribution in the proton without assuming strong isospin symmetry as done in the analysis of deuterium data. The lower-\(Q^2\) NC data constrain the low-x sea-quark distributions and through their precisely measured \(Q^2\) variations they also constrain the gluon distribution. A further constraint on the gluon distribution comes from the inclusion of NC data at different beam energies such that the longitudinal structure function is probed through the y dependence of the cross sections [45].

The consistency of the input data allowed the determination of the experimental uncertainties of the HERAPDF2.0 parton distributions using rigorous statistical methods. The uncertainties resulting from model assumptions and from the choice of the parameterisation of the PDFs were considered separately.

Both H1 and ZEUS also published charm production cross sections, some of which were combined and analysed previously [46], and jet production cross sections [4751]. These data were included to obtain the variant HERAPDF2.0Jets. The inclusion of jet cross sections allowed for a simultaneous determination of the PDFs and the strong coupling constant.

The paper is structured as follows. Section 2 gives an introduction to the connection between cross sections and the partonic structure of the proton. Section 3 introduces the data used in the analyses presented here. Section 4 describes the combination of data while Sect. 5 presents the results of the combination. Section 6 describes the pQCD analysis to extract PDFs from the combined inclusive cross sections. The PDF set HERAPDF2.0 and its variants are presented in Sect. 7. In Sect. 8, results on electroweak unification as well as scaling violations and the extraction of \(xF_3^{\gamma Z}\) are presented. The paper closes with a summary.

2 Cross sections and parton distributions

The reduced NC deep inelastic \(e^{\pm }p\) scattering cross sections are given by a linear combination of generalised structure functions. For unpolarised \(e^{\pm }p\) scattering, reduced cross sections after correction for QED radiative effects may be expressed in terms of structure functions as

$$\begin{aligned} \sigma _{r,{\mathrm{NC}}}^{\pm }=\frac{\textstyle \mathrm{d^2} \sigma ^{e^{\pm }p}_{\mathrm{NC}}}{\textstyle \mathrm{d}x_\mathrm{Bj}\mathrm{d} Q^2}\cdot \frac{Q^4 x_\mathrm{Bj}}{2\pi \alpha ^2 Y_+} = \tilde{F_2} \mp \frac{Y_-}{Y_+} x\tilde{F_3} -\frac{y^2}{Y_+} \tilde{F_\mathrm{L}}, \end{aligned}$$
(1)

where the fine-structure constant, \(\alpha \), which is defined at zero momentum transfer, the photon propagator and a helicity factor are absorbed in the definitions of \( \sigma _{r,{\mathrm{NC}}}^{\pm }\) and \(Y_{\pm }=1 \pm (1-y)^2\). The overall structure functions, \(\tilde{F}_2\), \(\tilde{F}_\mathrm{L}\) and \(x\tilde{F}_3\), are sums of structure functions, \({F}_\mathrm{X}\), \({F}_\mathrm{X}^{\gamma Z}\) and \({F}_\mathrm{X}^Z\), relating to photon exchange, photon–Z interference and Z exchange, respectively, and depend on the electroweak parameters as [52]

$$\begin{aligned}&\tilde{F}_2= F_2 - \kappa _Z v_e \cdot F_2^{\gamma Z} + \kappa _Z^2 (v_e^2 + a_e^2 ) \cdot F_2^Z, \nonumber \\&\tilde{F}_\mathrm{L}= F_\mathrm{L} - \kappa _Z v_e \cdot F_\mathrm{L}^{\gamma Z} + \kappa _Z^2 (v_e^2 + a_e^2 ) \cdot F_\mathrm{L}^Z, \\&x\tilde{F}_3= - \kappa _Z a_e \cdot xF_3^{\gamma Z} + \kappa _Z^2 \cdot 2 v_e a_e \cdot xF_3^Z,\nonumber \end{aligned}$$
(2)

where \(v_e\) and \(a_e\) are the vector and axial-vector weak couplings of the electron to the Z boson, and \(\kappa _Z(Q^2) = Q^2 /[(Q^2+M_Z^2)(4\sin ^2 \theta _W \cos ^2 \theta _W)]\). In the analysis presented here, electroweak effects were treated at leading order. The values of \(\sin ^2 \theta _W=0.23127\) and \(M_Z=91.1876\) GeV were used for the electroweak mixing angle and the Z-boson mass [52].

At low \(Q^2\), i.e. \(Q^2 \ll M_Z^2\), the contribution of Z exchange is negligible and

$$\begin{aligned} \sigma _{r,{\mathrm{NC}}}^{\pm }= F_2 - \frac{y^2}{Y_+} F_\mathrm{L}. \end{aligned}$$
(3)

The contribution of the term containing the longitudinal structure function \(\tilde{F_\mathrm{L}}\) is only significant for values of y larger than approximately 0.5.

In the analysis presented in this paper, the full formulae of pQCD at the relevant order in the strong coupling, \(\alpha _s\), are used. However, to demonstrate the sensitivity of the data, it is useful to discuss the simplified equations of the Quark Parton Model (QPM), where gluons are not present and \(\tilde{F}_\mathrm{L}=0\) [53]. In the QPM, the kinematic variable \(x_\mathrm{Bj}\) is equal to the fractional momentum of the struck quark, x. The structure functions in Eq. 2 become

$$\begin{aligned} \begin{aligned}&\begin{aligned} (F_2, F_2^{\gamma Z}, F_2^Z)&\approx [(e_u^2, 2e_uv_u, v_u^2+a_u^2)(xU+ x\bar{U})\\&\quad + (e_d^2, 2e_dv_d, v_d^2+a_d^2)(xD+ x\bar{D})], \\ \end{aligned}\\&\begin{aligned} (xF_3^{\gamma Z}, xF_3^Z)&\approx 2 [(e_ua_u, v_ua_u) (xU-x\bar{U})\\&\quad + (e_da_d, v_da_d) (xD-x\bar{D})], \end{aligned} \end{aligned} \end{aligned}$$
(4)

where \(e_u\) and \(e_d\) denote the electric charge of up- and down-type quarks, while \(v_{u,d}\) and \(a_{u,d}\) are the vector and axial-vector weak couplings of the up- and down-type quarks to the Z boson. The terms xU, xD, \(x\bar{U}\) and \(x\bar{D}\) denote the sums of parton distributions for up-type and down-type quarks and anti-quarks, respectively. Below the b-quark mass threshold, these sums are related to the quark distributions as follows

$$\begin{aligned} \begin{aligned}&xU = xu + xc, \quad x\bar{U}= x\bar{u}+ x\bar{c}, \\&xD = xd + xs, \quad x\bar{D}= x\bar{d}+ x\bar{s}, \end{aligned} \end{aligned}$$
(5)

where xs and xc are the strange- and charm-quark distributions. Assuming symmetry between the quarks and anti-quarks in the sea, the valence-quark distributions can be expressed as

$$\begin{aligned} xu_v = xU -x\bar{U}, \quad xd_v = xD -x\bar{D}. \end{aligned}$$
(6)

It follows from Eq. 1 that the structure function \(x\tilde{F_3}\) can be determined from the difference between the \(e^+p\) and \(e^-p\) reduced cross sections:

$$\begin{aligned} x\tilde{F_3} = \frac{Y_{+}}{2Y_{-}}( \sigma ^{-}_{r,\mathrm{NC}} - \sigma ^{+}_{r,\mathrm{NC}}). \end{aligned}$$
(7)

Equations 2, 4 and 6 demonstrate that in the QPM, \(x{\tilde{F_3}}\) is directly related to the valence-quark distributions. In the HERA kinematic range, its dominant contribution is from the photon–Z exchange interference and the simple relation

$$\begin{aligned} xF_3^{\gamma Z} \approx \frac{x}{3}(2u_{v} + d_{v}) \end{aligned}$$
(8)

emerges. The measurement of \(xF_3^{\gamma Z}\) therefore provides access to the lower-x behaviour of the valence-quark distribution, under the assumption that sea-quark and anti-quark distributions are the same.

The reduced cross sections for inclusive unpolarised CC \(e^{\pm } p\) scattering are defined as

$$\begin{aligned} \sigma _{r,\mathrm{CC}}^{\pm }= \frac{2 \pi x_\mathrm{Bj}}{G_F^2} \left[ \frac{M_W^2+Q^2}{M_W^2} \right] ^2 \frac{\textstyle \mathrm{d^2} \sigma ^{e^{\pm }p}_{\mathrm{CC}}}{\textstyle \mathrm{d}x_\mathrm{Bj}\mathrm{d} Q^2}. \end{aligned}$$
(9)

In HERAFitter, the values of \(G_F=1.16638\times 10^{-5} \) GeV\(^{-2}\) and \(M_W=80.385\) GeV [52] were used for the Fermi constant and W-boson mass. In analogy to Eq. 1, CC structure functions are defined such that

$$\begin{aligned} \sigma _{r,\mathrm{CC}}^{\pm }= \frac{Y_+}{2}W_2^\pm \mp \frac{Y_-}{2} xW_3^\pm - \frac{y^2}{2} W_\mathrm{L}^\pm . \end{aligned}$$
(10)

In the QPM, \(W_\mathrm{L}^\pm = 0\) and \(W_2^\pm \), \(xW_3^\pm \) represent sums and differences of quark and anti-quark distributions, depending on the charge of the lepton beam:

$$\begin{aligned} \begin{aligned}&W_2^{+} \approx x\bar{U}+xD,\quad xW_3^{+} \approx xD-x\bar{U},\\&W_2^{-} \approx xU+x\bar{D},\quad xW_3^{-} \approx xU-x\bar{D}. \end{aligned} \end{aligned}$$
(11)

From these equations, it follows that

$$\begin{aligned}&\sigma _{r,\mathrm{CC}}^{+}\approx (x\bar{U}+ (1-y)^2xD),\nonumber \\&\quad \sigma _{r,\mathrm{CC}}^{-}\approx (xU +(1-y)^2 x\bar{D}). \end{aligned}$$
(12)

The combination of NC and CC measurements makes it possible to determine both the combined sea-quark distributions, \(x\bar{U}\) and \(x\bar{D}\), and the valence-quark distributions, \(xu_v\) and \(xd_v\).

The relations within the QPM illustrate in a simple way which data contribute which information. However, the parton distributions are determined by a fit to the \(x_\mathrm{Bj}\) and \(Q^2\) dependence of the new combined data using the linear DGLAP equations [2933] at leading order (LO), next-to-leading order (NLO) and next-to-next-to-leading order (NNLO) in pQCD. These are convoluted with coefficient functions (matrix elements) at the appropriate order [54, 55]. Already at LO, the gluon PDF enters the equations giving rise to logarithmic scaling violations which make the parton distributions depend on the scale of the process. This factorisation scale, \(\mu _\mathrm{f}^2\), is taken as \(Q^2\) and the experimentally measured scaling violations determine the gluon distribution.Footnote 2

3 Measurements

3.1 Detectors

The H1 [5658] and ZEUS [5962] detectors were both multi-purpose detectors with an almost \(4\pi \) hermetic coverage.Footnote 3 They were built following similar physics considerations but the collaborations opted for different technical solutions resulting in slightly different capabilities [63]. The discussion here focuses on general ideas; details of the construction and performance are not discussed.

In both detectors, the calorimeters had an inner part to measure electromagnetic energy and identify electrons and an outer, less-segmented, part to measure hadronic energy and determine missing energy. Both main calorimeters were divided into barrel and forward sections. The H1 collaboration chose a liquid-argon calorimeter while the ZEUS collaboration opted for a uranium–scintillator device. These choices are somewhat complementary. The liquid-argon technology allowed a finer segmentation and thus the identification of electrons down to lower energies. The uranium–scintillator calorimeter was intrinsically “compensating” making jet studies easier. In the backward region, ZEUS also opted for a uranium–scintillator device. The H1 collaboration chose a lead–scintillating fibre or so-called “spaghetti” calorimeter. The backward region is particularly important to identify electrons in events with \(Q^2 < 100\) GeV\(^2\).

Both detectors were operated with a solenoidal magnetic field. The field strength was 1.16 and 1.43 T within the tracking volumes of the H1 and ZEUS detectors, respectively. The main tracking devices were in both cases cylindrical drift chambers. The H1 device consisted of two concentric drift chambers while ZEUS featured one large chamber. Both tracking systems were augmented with special devices in the forward and backward region. Over time, both collaborations upgraded their tracking systems by installing silicon microvertex detectors to enhance the capability to identify events with heavy-quark production. In the backward direction, the vertex detectors were also important to identify the electrons in low-\(Q^2\) events.

During the HERA I running period, special devices to measure very backward electrons were operated and events with very low \(Q^2\) were reconstructed. This became impossible after the luminosity upgrade for HERA II due to the placement of final-focus magnets further inside the detectors. This also required some significant changes in both main detectors. Detector elements had to be retracted, and as a result the acceptance for low-\(Q^2\) events in the main detectors was reduced.

Both experiments measured the luminosity using the Bethe–Heitler reaction \( ep \rightarrow e\gamma p\). In HERA I, H1 and ZEUS both had photon taggers positioned about 100 m down the electron beam line. For the higher luminosity of the HERA II period, both H1 [8, 64, 65] and ZEUS [6668] had to upgrade their luminosity detectors and analysis methods. The uncertainties on the integrated luminosities were typically about 2 %.

3.2 Reconstruction of kinematics

The usage of different reconstruction techniques, due to differences in the strengths of the detector components of the two experiments, contributes to the reduction of systematic uncertainties when combining data sets. The choice of the most appropriate kinematic reconstruction method for a given phase-space region and experiment is based on resolution, possible biases of the measurements and effects due to initial- or final-state radiation. The different methods are described in the following.

The deep inelastic ep scattering cross sections of the inclusive neutral and charged current reactions depend on the centre-of-mass energy, \(\sqrt{s}\), and on the two kinematic variables \(Q^2\) and \(x_\mathrm{Bj}\). The variable \(x_\mathrm{Bj}\) is related to y, \(Q^2\) and s through the relationship \(x_\mathrm{Bj}=Q^2/(sy)\). The HERA collider experiments were able to determine the NC event kinematics from the scattered electron, e, or from the hadronic final state, h, or from a combination of the two.

The “electron method” was applied to NC scattering events for which the quantities y and \(Q^2\) were calculated using only the variables measured for the scattered electron:

$$\begin{aligned} y_e = 1-\frac{\Sigma _e}{2 E_e}, \quad Q^2_e = \frac{P_{T,e}^2}{1 - y_e}, \quad x_e = \frac{Q^2_e}{s y_e}, \end{aligned}$$
(13)

where \(\Sigma _e = E'_e(1-\cos \theta _e)\), \(E'_e\) is the energy of the scattered electron, \(\theta _e\) is its angle with respect to the proton beam, and \(P_{T,e}\) is its transverse momentum.

The “hadron method” was applied to CC scattering events. The reconstruction of the hadronic final state h allowed the usage of similar relations [69]:

$$\begin{aligned} y_h = \frac{\Sigma _h}{2 E_e},\quad Q^2_h = \frac{P_{T,h}^2}{1 - y_h},\quad x_h = \frac{Q^2_h}{s y_h}, \end{aligned}$$
(14)

where \(\Sigma _h = (E-P_\mathrm{Z})_h=\sum _i{(E_i-p_{\mathrm{Z},i})}\) is the hadronic \(E-P_\mathrm{Z}\) variable with the sum extending over the energies, \(E_i\), and the longitudinal components of the momentum, \(p_{\mathrm{Z},i}\) of the reconstructed hadronic final-state particles, i. The quantity \(P_{T,h} = \left| \sum _i \varvec{p}_{T,i} \right| \) is the total transverse momentum of the hadronic final state with \(\varvec{p}_{T,i}\) being the transverse-momentum vector of the particle i. A hadronic scattering angle, \(\theta _h\), was defined as

$$\begin{aligned} \tan \frac{\theta _h}{2} = \frac{\Sigma _h}{P_{T,h}}. \end{aligned}$$
(15)

In the framework of the QPM, \(\theta _h\) corresponds to the direction of the struck quark.

In the “sigma method” [70], the total \(E-P_\mathrm{Z}\) variable,

$$\begin{aligned} E-P_\mathrm{Z} = E'_e (1-\cos {\theta _e}) + \sum _i (E_i - p_{\mathrm{Z},i}) = \Sigma _e + \Sigma _h , \end{aligned}$$
(16)

was introduced. For events without initial- or final-state radiation, the relation \(E-P_\mathrm{Z} = 2E_e\) holds. Thus, Eqs. 13 and 14 become

$$\begin{aligned} y_{\Sigma } = \frac{\Sigma _h}{E-P_\mathrm{Z}}, \quad Q^2_{\Sigma }=\frac{P^2_{T,e}}{1-y_{\Sigma }},\quad x_{\Sigma } = \frac{Q^2_{\Sigma }}{s y_{\Sigma }}. \end{aligned}$$
(17)

An extension of the sigma method [3, 4] introduced the variables

$$\begin{aligned}&y_{\Sigma '} = y_{\Sigma }, \quad Q^2_{\Sigma '}=Q^2_{\Sigma },\nonumber \\&\quad x_{\Sigma '} = \frac{Q^2_{\Sigma }}{2 E_p (E-P_\mathrm{Z}) y_{\Sigma }} = \frac{Q_{\Sigma }^2}{2 E_p \Sigma _h}. \end{aligned}$$
(18)

This method allowed radiation at the lepton vertex to be taken into account by replacing the electron beam energy in the calculation of \(x_{\Sigma '}\) in a way similar to its replacement in the calculation of \(y_{\Sigma }\).

In the hybrid “e-sigma method” [5, 12, 70], \(Q^2_e\) and \(x_\Sigma \) are used to reconstruct the event kinematics as

$$\begin{aligned} y_{e\Sigma } = \frac{Q^2_e}{s x_{\Sigma }} = \frac{2E_e}{E-P_\mathrm{Z}}\,y_{\Sigma },\quad Q^2_{e\Sigma } = Q^2_e,\quad x_{e\Sigma } = x_{\Sigma }. \end{aligned}$$
(19)

The “double-angle method” [71, 72] is used to reconstruct \(Q^2\) and \(x_\mathrm{Bj}\) from the electron and hadronic scattering angles as

$$\begin{aligned}&y_{DA} = \frac{\tan {(\theta _h/2)}}{\tan {(\theta _e/2)} + \tan {(\theta _h/2)}}, \nonumber \\&Q^2_{DA}= 4 E_e^{~2} \times \frac{\cot {(\theta _e/2)}}{\tan {(\theta _e/2)} + \tan {(\theta _h/2)}},\\&x_{DA} = \frac{Q^2_{DA}}{s y_{DA}}.\nonumber \end{aligned}$$
(20)

This method is largely insensitive to hadronisation effects. To first order, it is also independent of the detector energy scales. However, the hadronic angle is experimentally not as well determined as the electron angle due to particle loss in the beampipe.

In the “PT method” of reconstruction [73], the well-measured electron variables are used to obtain a good event-by-event estimate of the loss of hadronic energy by employing \(\delta _{PT}=P_{T,h}/P_{T,e}\). This improves both the resolution and uncertainties on the reconstructed y and \(Q^2\). The PT method uses all measured variables to optimise the resolution over the entire kinematic range measured. A variable \(\theta _{PT}\) is introduced as

$$\begin{aligned}&\tan {\frac{\theta _{PT}}{2}} = \frac{\Sigma _{PT}}{P_{T,e}},\mathrm{~~~where}\nonumber \\&\quad \Sigma _{PT} = 2E_e\frac{{C(\theta _h,P_{T,h},\delta _{PT})}\cdot \Sigma _h}{\Sigma _e+{C(\theta _h,P_{T,h},\delta _{PT})}\cdot \Sigma _h}. \end{aligned}$$
(21)

The variable \(\theta _{PT}\) is then substituted for \(\theta _h\) in the formulae for the double-angle method to determine \(x_\mathrm{Bj}\), y and \(Q^2\). The detector-specific function, C, is calculated using Monte Carlo simulations as \(\Sigma _{\mathrm{true},h}/\Sigma _{h}\), depending on \(\theta _h\), \(P_{T,h}\) and \(\delta _{PT}\).

3.3 Inclusive data samples

A summary of the 41 data sets used in the combination is presented in Table 1. From 1994 onwards, HERA was operated with an electron beam energy of \(E_e \simeq 27.5\) GeV. In the first years, until 1997, the proton beam energy, \(E_p\), was set to 820 GeV. In 1998, it was increased to 920 GeV. In 2007, it was lowered to 575 GeV and 460 GeV. The values for the centre-of-mass energies given in Table 1 are those for which the cross sections are quoted in the individual publications. The two collaborations did not always choose the same reference values for \(\sqrt{s}\) for the same \(E_p\). The methods of reconstruction used by H1 and ZEUS for the individual data sets are also given in the table. The integrated luminosities for a given period as provided by the collaborations can be different. One reason is the fact that H1 quotes luminosities for the data within the Z-vertex acceptance and ZEUS luminosities are given without any acceptance cut.

Table 1 The 41 data sets from H1 and ZEUS used for the combination. The marker [2] in the column “Data Set” indicates that the data are treated as two data sets in the analysis. The markers \(^{1.5p}\) and \(^{1.5}\) in the column “Data Set” indicate that the data were already used for HERAPDF1.5, see Appendix A. The p in \(^{1.5p}\) denotes that the cross-sections measurements were preliminary at that time. The markers \(^{*y.5}\) and \(^{*y}\) in the column “Data Set” are explained in Sect. 4.1. The marker \(^1\) for [8] indicates that published cross section were scaled by a factor of 1.018 [65]. Integrated luminosities are quoted as given by the collaborations. The equations used for the reconstruction of \(x_\mathrm{Bj}\) and \(Q^2\) are given in Sect. 3.2

The very low-\(Q^2\) region is covered by data from both experiments taken during the HERA I period. The lowest, \(Q^2 \ge 0.045\) GeV\(^2\), data come from measurements with the ZEUS detector using special tagging devices. They are named ZEUS BPT in Table 1. During the course of this analysis, it was discovered that in the HERA I analysis [2], values given for \(F_2\) were erroneously treated as reduced cross sections. This was corrected for the analysis presented in this paper. All other individual data sets from HERA I were used in the new combination exactly as in the previously published combination [2].

The \(Q^2\) range from 0.2 to 1.5 GeV\(^2\) was covered using special HERA I runs, in which the interaction vertex position was shifted forward, bringing backward scattered electrons with small scattering angles into the acceptance of the detectors [3, 13, 74]. The lowest-\(Q^2\) values for these shifted-vertex data were reached using events in which the electron energy was reduced by initial-state radiation [3].

The \(Q^2 \ge 1.5\) GeV\(^2\) range was covered by HERA I and HERA II data in various configurations. The high-statistics HERA II data sets increase the accuracy at high \(Q^2\), particularly for \(e^-p\) scattering, for which the integrated luminosity for HERA I was very limited.

The 2007 running periods with lowered proton energies [9, 10, 24] were included in the combination and provide data with reduced \(\sqrt{s}\) and \(Q^2\) up to 800 GeV\(^2\). These data were originally taken to measure \(F_\mathrm{L}\).

3.4 Data on charm, beauty and jet production

The QCD analyses presented in Sect. 6 also used selected results on heavy-quark and jet production.

The charm production cross sections were taken from a publication [46] in which data from nine data sets published by H1 and ZEUS, covering both the HERA I and II periods, were combined. The beauty production cross sections were taken from two publications, one from ZEUS [75] and one from H1 [76]. The heavy-quark events form small subsets of the inclusive data. Correlations between the charm and the inclusive data are small and were not taken into account.

The data on jet production cross sections were taken from selected publications: ZEUS inclusive-jet production data from HERA I [47], ZEUS dijet production data from HERA II [48], H1 inclusive-jet production data at low \(Q^2\) [49] and high \(Q^2\) from HERA I [50] and HERA II [51]. The HERA II H1 publication provides inclusive-jet, dijet and trijet cross sections normalised to the inclusive NC DIS cross sections in the respective \(Q^2\) range. This largely reduces the correlations with the H1 inclusive DIS reduced cross sections. The HERA I H1 high-\(Q^2\) jet data are similarly normalised. The other ZEUS and H1 jet data sets are small subsamples of the respective inclusive sample; correlations are small and are thus ignored.

For the heavy-quark and jet data sets used, the statistical, uncorrelated systematic and correlated systematic uncertainties were used as published.

4 Combination of the inclusive cross sections

In order to combine the published cross sections from the 41 data sets listed in Table 1, they were translated onto common grids and averaged.

4.1 Common \({\sqrt{s}}\) values, common \({(x_\mathrm{Bj},Q^2)}\) grids and translation of data

The data were taken with several \(E_p\) values and the double-differential cross sections were published by the two experiments for different reference \(\sqrt{s}\) and \((x_\mathrm{Bj},Q^2)\) grids. In order to average a set of data points, the points had to be translated to common \(\sqrt{s}_{\mathrm{com}}\) values and common \((x_\mathrm{Bj,grid},Q^2_\mathrm{grid})\) grids. The following choices were made.

Three common centre-of-mass values, \(\sqrt{s}_{\mathrm{com},i}\), were chosen to combine data onto two common grids:

$$\begin{aligned}&E_p=920 \,\mathrm{GeV} \rightarrow \sqrt{s}_{\mathrm{com},1}=318 \,\mathrm{GeV} \rightarrow \mathrm{grid}~1,\\&E_p=820 \,\mathrm{GeV} \rightarrow \sqrt{s}_{\mathrm{com},1}=318 \,\mathrm{GeV} \rightarrow \mathrm{grid}~1,\\&E_p=575 \,\mathrm{GeV} \rightarrow \sqrt{s}_{\mathrm{com},2}=251 \,\mathrm{GeV} \rightarrow \mathrm{grid}~2,\\&E_p=460 \,\mathrm{GeV} \rightarrow \sqrt{s}_{\mathrm{com},3}=225 \,\mathrm{GeV} \rightarrow \mathrm{grid}~2. \end{aligned}$$

Exceptions were made for data with \(E_p=820\) GeV and \(y \ge 0.35\). These cross sections were not translated to \(\sqrt{s}_{\mathrm{com},1}\), but were kept separately in grid 1 in order to retain their y dependence.

The two grids have a different structure in y such that the corrections due to translation were minimised. The grids are depicted in Fig. 1. For a given data point with \(\sqrt{s}_{\mathrm{com},1}\), the grid point was in general chosen to be closest in \(Q^2\) and then in \(x_\mathrm{Bj}\). However, for some data points, the grid point closest in y was chosen. This occurs for data sets marked with \(^{*y}\) or \(^{*y.5}\) in Table 1. The markers indicate that it happens for all y or \(y>0.5\), respectively. For a given data point at \(\sqrt{s}_{\mathrm{com},2}\) or \(\sqrt{s}_{\mathrm{com},3}\), the grid point closest in \(Q^2\) and then closest in y was always chosen.

Fig. 1
figure 1

The points of the two grids used for the combination. Grid 1 (open circles) was used for data with \(\sqrt{s}_{\mathrm{com},1}=318\) GeV. Grid 2 (dots) was used for data with \(\sqrt{s}_{\mathrm{com},2}=251\) GeV or \(\sqrt{s}_{\mathrm{com},3}=225\) GeV. The latter grid has a finer binning in \(x_\mathrm{Bj}\) in accordance with its special structure in y

In most of the phase space, separate measurements from the same data set were not translated to the same grid point. Only 9 out of 1307 grid points accumulated two and in one case three points from the same data set. Up to 10 data sets were available for a given process. The vast majority of grid points accumulated data from both H1 and ZEUS measurements; the typical case is six measurements from six different data sets. However, 22 % of all grid points have only one measurement, predominantly at low \(Q^2\). For \(Q^2\) above 3.5 GeV\(^2\), only 13 % of the grid points have only one measurement.

For the translation of the cross-section values, predictions for the ratios of the double-differential cross section at the \((x_\mathrm{Bj},Q^2)\) and \(\sqrt{s}\) where the measurements took place, and the \((x_\mathrm{Bj,grid},Q^2_\mathrm{grid})\) to which they were translated, were needed. These predictions, \(T_\mathrm{grid}\), were obtained from the data themselves by performing fits to the data using the HERAFitter [26, 27] tool. For \(Q^2 \ge 3\) GeV\(^2\), a next-to-leading-order QCD fit using the DGLAP formalism was performed.Footnote 4 In addition, a fit using the fractal modelFootnote 5 [3, 77] was performed for \(Q^2 \le 4.9\) GeV\(^2\). For \(Q^2 < 3\) GeV\(^2\), the fit to the fractal model was usedFootnote 6 to obtain factors \(T_\mathrm{grid,FM}\). For \(Q^2 > 4.9\) GeV\(^2\), the QCD fit was used to provide \(T_\mathrm{grid,QCD}\). For \(3 \le Q^2 \le 4.9\) GeV\(^2\), the factors were averaged as \(T_\mathrm{grid} = T_\mathrm{grid,FM} (1-(Q^2-3)/1.9) + T_\mathrm{grid,QCD} (Q^2-3)/1.9 \) where \(Q^2\) is in GeV\(^2\). The upper edge of the application of the fractal fit was varied between 3 GeV\(^2\) and 5 GeV\(^2\); the effect was negligible.

4.2 Averaging cross sections

The original double-differential cross-section measurements were published with their statistical and systematic uncertainties. The systematic uncertainties were classified as either point-to-point correlated or point-to-point uncorrelated. For each data set, all uncorrelated systematic uncertainties were added in quadrature before averaging. Correlated systematic uncertainties were kept separately. Some of the systematic uncertainties were originally reported as asymmetric. They were symmetrised by the collaborations before entering the averaging procedure.

The averaging of the data points was performed using the HERAverager [25] tool which is based on a \(\chi ^2\) minimisation method [3]. This method imposes that there is one and only one correct value for the cross section of each process at each point of the phase space. These values are estimated by optimising a vector, \(\varvec{m}\), which is the result of the averaging for the cross sections. The \(\chi ^2\) definition used takes into account the correlated and uncorrelated systematic uncertainties of the H1 and ZEUS cross-section measurements and allows for shifts of the data to accommodate the correlated uncertainties. For a single data set, ds, the \(\chi ^2\) is defined as

$$\begin{aligned}&\chi ^2_{\mathrm{exp},ds}\left( \varvec{m},\varvec{b}\right) \nonumber \\&\quad = \sum _{i}^{ds} + \sum _{j}^{b}\nonumber \\&\quad = \sum _i \frac{\left[ m^i - \sum _j \gamma ^{i,ds}_j m^i b_j - \mu ^{i,ds} \right] ^2}{ \textstyle \delta ^2_{i,ds,\mathrm{stat}}\,{\mu ^{i,ds}} \left( m^i - \sum _j \gamma ^{i,ds}_j m^i b_j\right) + (\delta _{i,ds,\mathrm{uncor}}\, m^i)^2}\nonumber \\&\qquad + \sum _j b^2_j, \end{aligned}$$
(22)

where \({\mu ^{i,ds}}\) is the measured value at the point i and \(\gamma ^{i,ds}_j \), \(\delta _{i,ds,\mathrm{stat}} \) and \(\delta _{i,ds,\mathrm{uncor}}\) are the relative correlated systematic, relative statistical and relative uncorrelated systematic uncertainties, respectively. For the reduced cross-section measurements, \({\mu ^{i,ds}} = \sigma _r^{i,ds}\), i runs over all points on the \((x_\mathrm{Bj,\mathrm{grid}},Q^2_\mathrm{grid})\) plane for which a measurement exists in ds. The components \(b_j\) of the vector \(\varvec{b}\) represent correlated shifts of the cross sections in units of sigma of the respective correlated systematic uncertainties; the summations over j extend over all correlated systematic uncertainties.

The leading systematic uncertainties on the cross-section measurements used for the combination arose from the uncertainties on the acceptance corrections and luminosity determinations. Thus, both the correlated and uncorrelated systematic uncertainties are multiplicative in nature, i.e. they increase proportionally to the central values. In Eq. 22, the multiplicative nature of these uncertainties is taken into account by multiplying the relative errors \(\gamma ^{i,ds}_j\) and \(\delta _{i,ds,\mathrm{uncor}}\) by the estimate \(m^i\). The denominator in the first right-hand-side term in Eq. 22 contains an estimate of the squared statistical uncertainty of the cross-section measurement, \(\delta _{i,ds,stat}^2 \mu ^{i,ds} (m^{i} - \sum _j \gamma ^{i,ds}_j m^i b_j)\), which is assumedFootnote 7 to scale with the expected number of events in bin i, as calculated from \(m^{i}\). Corrections due to the shifts to accommodate the correlated systematic uncertainties are introduced through the term \(\sum _{j} \gamma ^{i,ds}_j m^i b_j\).

For several data sets, a total \(\chi ^2\) function is defined as

$$\begin{aligned} \chi ^2_\mathrm{tot} = \sum _{ds} \sum _{i}^{ds} + \sum _{j}^{b} , \end{aligned}$$
(23)

with \(\sum _{i}^{ds}\) and \(\sum _{j}^{b}\) as introduced for a single measurement in Eq. 22. The total \(\chi ^2\) function in Eq. 23 can be approximated by

$$\begin{aligned} \chi ^2_\mathrm{tot}\approx & {} \chi ^2_\mathrm{min} \nonumber \\&+ \sum _{i=1,N_M}\frac{\left[ m^i - \sum _j \gamma ^{i}_j m^i b'_j - {\mu ^{i}} \right] ^2}{ \textstyle \delta ^2_{i,\mathrm{stat}}\, \mu ^{i}\left( m^i - \sum _j \gamma ^{i}_j m^i b'_j\right) + (\delta _{i,\mathrm{uncor}}\, m^i)^2}\nonumber \\&+ \sum _j (b'_j)^2, \end{aligned}$$
(24)

where \(\chi ^2_\mathrm{min}\) is the minimum of \(\chi ^2_\mathrm{tot}\), \(N_M\) is the number of combined measurements, \(\mu ^{i}\) is the average value at point i, and \(\gamma ^{i}_j \), \(\delta _{i,\mathrm{stat}} \) and \(\delta _{i,\mathrm{uncor}}\) are its relative correlated systematic, relative statistical and relative uncorrelated systematic uncertainties, respectively. To determine the average of the data as defined in Eq. 24, an iterative procedure is used. For the first iteration, for all terms in Eqs. 22 and 24 related to uncertainties or correlated shifts of the data, the expectation values \(m^i\) are replaced by \(\mu ^{i,ds}\) and the term \(\sum _j \gamma ^{i}_j m^i b'_j\) is set to zero for the calculation of the statistical uncertainty.Footnote 8 The average values \(\mu ^i\) and systematic shifts \(b_j\) are determined analytically from a system of linear equations \(\partial \chi ^2_\mathrm{tot} / \partial m^i = 0\) and \(\partial \chi ^2_\mathrm{tot} / \partial b_j = 0\). For the next iterations, the average values \(\mu ^i\) from the previous iteration are used.Footnote 9 The procedure converges after two iterations. The shifts \(b'_j\), also called nuisance parameters, are related to the original shifts \(b_j\) through an orthogonal transformation which is also used to determine \(\gamma ^{i}_j\) [2].

The ratio of \(\chi ^2_\mathrm{min}\) and the number of degrees of freedom, \(\chi ^2_\mathrm{min}/\mathrm{d.o.f.}\), is a measure of the consistency of the data sets. The number \(\mathrm{d.o.f.}\) is the difference between the total number of measurements and the number of averaged points \(N_M\).

Some systematic uncertainties \(\gamma _j^i\), which were treated as having point-to-point correlations, may be common for several data sets. A full table of the correlations of the systematic uncertainties across the data sets can be found elsewhere [79]. The systematic uncertainties were in general treated as independent between H1 and ZEUS. However, an overall normalisation uncertainty of \(0.5\,\%\), due to uncertainties on higher-order corrections to the Bethe–Heitler cross-section calculations, was assumed for all data sets which were normalised with data from the luminosity monitors.

All the NC and CC cross-section data from H1 and ZEUS are combined in one simultaneous minimisation. Therefore, the resulting shifts of the correlated systematic uncertainties propagate coherently to both NC and CC data. Even in cases where there are data only from a single data set, the procedure can still produce shifts with respect to the original measurement due to the correlation of systematic uncertainties.

4.3 Combination procedure

The combination procedure is iterative. Each iteration has two steps:

  1. 1.

    the data are translated to the common \(\sqrt{s}_\mathrm{com}\) values and \((x_{\mathrm{Bj,grid}},Q^2_\mathrm{grid})\) grids as described in Sect. 4.1;

  2. 2.

    the data are averaged as described in Sect. 4.2.

In the first iteration, the fits to provide the predictions needed for the translation were performed on the uncombined data. Starting with the second iteration, the fits were performed on combined data. The process was stopped after the third iteration. An investigation showed that further iterations did not induce significant changes in the resulting averaged cross sections.

4.4 Consistency of the data

The 2927 published cross sections were combined to become 1307 combined cross-section measurements. For the resulting 1620 degrees of freedom, a \(\chi ^2_\mathrm{min} = 1687\) was obtained. For data points k contributing to point i on the \((x_\mathrm{Bj, grid},Q^2_\mathrm{grid})\), pulls \(\mathrm{p}^{i,k}\) were defined as

$$\begin{aligned} \mathrm{p}^{i,k} = \frac{\mu ^{i,k} - \mu ^{i}\left( 1- \sum _j \gamma ^{i,k}_j b'_{j}\right) }{\sqrt{\Delta _{i,k}^2 - \Delta _{i}^2}}, \end{aligned}$$
(25)

where \(\Delta _{i,k}\) and \(\Delta _{i}\) are the statistical and uncorrelated systematic uncertainties added in quadrature for the point k and the average, respectively. The pull distribution for the different data sets is shown Fig. 2. The RMS values of these distributions are close to unity, indicating good consistency of all data.

Fig. 2
figure 2

Distributions of pulls \(\mathrm p\) for: a NC \(e^+p\) for \(Q^2 \le 3.5\) GeV\(^2\); b NC \(e^+p\) for \(3.5 < Q^2 \le 100\) GeV\(^2\); c NC \(e^+p\) for \(100 < Q^2 \le 50{,}000\) GeV\(^2\); d NC \(e^-p\) for \(60 \le Q^2 \le 50{,}000\) GeV\(^2\); e CC \(e^+p\) for \(300 \le Q^2 \le 30{,}000\) GeV\(^2\); and f CC \(e^-p\) for \(300 \le Q^2 \le 30{,}000\) GeV\(^2\). There are no entries outside the histogram ranges. The root mean square, RMS, of each distribution is given

4.5 Procedural uncertainties

Procedural uncertainties are introduced by the choices made for the combination. Three kinds of such uncertainties were considered.

4.5.1 Multiplicative versus additive treatment of systematic uncertainties

The \(\chi ^2\) definition from Eq. 22 treats all systematic uncertainties as multiplicative, i.e. their size is expected to be proportional to the “true” values \(\varvec{m}\). While this is a good assumption for normalisation uncertainties, this might not be the case for other uncertainties. Therefore an alternative combination was performed, in which only the normalisation uncertainties were taken as multiplicative, while all other uncertainties were treated as additive. The differences between this alternative combination and the nominal combination were defined as correlated procedural uncertainties \(\delta _\mathrm{rel}\). This is a conservative approach but still yields quite small uncertainties. The typical values of \(\delta _\mathrm{rel}\) for the \(\sqrt{s}_\mathrm{com,1}=318\) GeV (\(\sqrt{s}_\mathrm{com,2/3}\)) combination were below 0.5 % (1 %) for medium-\(Q^2\) data, increasing to a few percent for low- and high-\(Q^2\) data.

4.5.2 Correlations between systematic uncertainties on different data sets

Similar methods were often used to calibrate different data sets obtained by one or by both collaborations. In addition, the same Monte Carlo simulation packages were used to analyse different data sets. These similar approaches might have led to correlations between data sets from one or both collaborations. This was investigated in depth for the combination of HERA I data [2]. The important correlations for this period were found to be related to the background from photoproduction and the hadronic energy scales. The correlations for the HERA I period were taken into account as before [2].

The correlations between the experiments for the HERA II period were considered much less important, because both experiments developed different methods to address calibration and normalisation. In the case of H1, some potential correlations between the data from the HERA I and HERA II periods were identified. In the case of ZEUS, no such correlations were found; this is due to significant changes in the detector and in the data processing.

The differences between the nominal combination and the combinations, in which systematic sources for the photoproduction background and hadronic energy scale were taken as correlated across data sets, were defined as additional signed procedural uncertainties \(\delta _{\gamma p}\) and \(\delta _\mathrm{had}\). Typical values of \(\delta _{\gamma p}\) and \(\delta _\mathrm{had}\) are below \(1\,\%\) (0.5 %) for NC (CC) scattering. For the data at low \(Q^2\), they can reach a few percent.

4.5.3 Pull distribution of correlated systematic uncertainties

There are in total 162 sources of correlated systematic uncertainty including global normalisations characterising the separate data sets. In the procedure applied, all these sources were assumed to be fully point-to-point correlated. None of these sources was shifted by more than \(2.4\,\sigma \) from its nominal value in the combination procedure. The pull on any such source j is defined as \( \mathrm{p}_j = b'_{j}/(1 - \Delta ^2_{b'_{j}})^{1/2}, \) where \(\Delta _{b'_{j}}\) is the uncertainty on the source j after the averaging. The distribution of \(\mathrm{p}_{j}\) is shown in Fig. 3. Some large values for \(|\mathrm{p}_{j}|\) are observed. They are connected to small relative uncertainties, below 1 %, for which there is only a small reduction in the uncertainty. Such cases are, for example, expected if the point-to-point correlation within a data set is not 100 % as was assumed.

Fig. 3
figure 3

Distribution of pulls \(\mathrm{p}_{j}\) for the correlated systematic uncertainties including global normalisations. There are no entries outside the histogram range. The root mean square, RMS, of the distribution is given

The distribution of pulls shown in Fig. 3 is not Gaussian; it has a root-mean-square value of 1.34. Out of the 162 point-to-point correlated uncertainties, 40 were identified with \( \mathrm{p}_j > 1.3\). This might indicate that these uncertainties were either underestimated or do not fulfil the implicit assumptions of the linear procedure applied. Scaling these 40 uncertainties by a factor of two would reduce the root-mean-square value to 1.03 and the \(\chi ^2_\mathrm{min}\) of the combination would be reduced from 1687 to 1614 for the 1620 degrees of freedom.

Each of these 40 uncertainties could give rise to an individual procedural uncertainty if scaled. However, an extensive study revealed cross correlations between them. These cross correlations were used to form four groups related to

  1. 1.

    very low-\(Q^2\) data from HERA I (14 uncertainties);

  2. 2.

    low-\(Q^2\) data from HERA II with lowered proton beam energies (10 uncertainties);

  3. 3.

    medium- and high-\(Q^2\) data from HERA I and II (11 uncertainties);

  4. 4.

    normalisation issues from HERA I and II (5 uncertainties).

The normalisation related uncertainties were also found to be correlated to some of the uncertainties in the other groups but they were kept separate. Signed procedural uncertainties \(\delta _{(1,2,3,4)}\) were assigned to the four groups by scaling the uncertainties within each group by a factor of two and taking the difference between the result of this combination and that of the nominal combination as the uncertainty. Such cross correlations as observed here between different systematic uncertainties are not unexpected, even though different methods were used for different regions of phase space by two different experiments. Both experiments contribute about equally to the 40 sources discussed.

Since H1 and ZEUS used, as described for example in Sect. 3.2, different reconstruction methods, similar systematic sources influence the measured cross section differently as a function of \(x_\mathrm{Bj}\) and \(Q^2\). Therefore, requiring the cross sections to agree at all \(x_\mathrm{Bj}\) and \(Q^2\) constrains the systematics efficiently. In addition, for certain regions of the phase space, one of the two experiments has superior precision compared to the other. For these regions, the less precise measurement is fitted to the more precise measurement, with a simultaneous reduction of the correlated systematic uncertainty. This reduction propagates to the other points, including those which are based solely on the measurement from the less precise experiment. However, over most of the phase space, the precision of the H1 and ZEUS measurements is very similar and the systematic uncertainties are reduced uniformly.

Fig. 4
figure 4

The combined HERA data for the inclusive NC \(e^+p\) reduced cross sections as a function of \(Q^2\) for six selected values of \(x_\mathrm{Bj}\) compared to the individual H1 and ZEUS data. The individual measurements are displaced horizontally for better visibility. Error bars represent the total uncertainties. The two labelled entries at \(x_\mathrm{Bj}=0.008\) and 0.08 come from data which were taken at \(\sqrt{s}=300\) GeV and \(y<0.35\) and were translated to \(\sqrt{s}=318\) GeV, see Sect. 4.1

5 Combined inclusive \(\varvec{e^{\pm }p}\) cross sections

The combined reduced cross sections for NC and CC ep scattering together with their statistical, uncorrelated and total correlated systematic uncertainties, as well as procedural uncertainties as defined in Sect. 4, are given in Appendix C.Footnote 10 The new values supersede those published previously [2].

Fig. 5
figure 5

The combined HERA data for the inclusive NC \(e^+p\) reduced cross sections as a function of \(Q^2\) for six selected values of \(x_\mathrm{Bj}\) compared to the results from HERA I alone [2]. The two measurements are displaced horizontally for better visibility. Error bars represent the total uncertainties. The two labelled entries at \(x_\mathrm{Bj}=0.008\) and 0.08 come from data which were taken at \(\sqrt{s}=300\) GeV and \(y<0.35\) and were translated to \(\sqrt{s}=318\) GeV, see Sect. 4.1

Fig. 6
figure 6

The combined HERA data for the inclusive NC \(e^-p\) reduced cross sections as a function of \(Q^2\) for four selected values of \(x_\mathrm{Bj}\) compared to the individual H1 and ZEUS data. The individual measurements are displaced horizontally for better visibility. Error bars represent the total uncertainties

Fig. 7
figure 7

The combined HERA data for the inclusive NC \(e^-p\) reduced cross section as a function of \(Q^2\) for four selected values of \(x_\mathrm{Bj}\) compared to the results from HERA I alone [2]. The two measurements are displaced horizontally for better visibility. Error bars represent the total uncertainties

The total uncertainties are below 1.5 % over the \(Q^2\) range of \(3 \le Q^2 \le 500\) GeV\(^2\) and below 3 % up to \(Q^2 = 3000\) GeV\(^2\). Cross sections are provided for values of \(Q^2\) between \(Q^2=0.045\) GeV\(^2\) and \(Q^2=50{,}000\) GeV\(^2\) and values of \(x_\mathrm{Bj}\) between \(x_\mathrm{Bj}=6\times 10^{-7}\) and \(x_\mathrm{Bj}=0.65\). The events have a minimum invariant mass of the hadronic system, W, of 15 GeV.

In Fig. 4, the individual and the combined reduced cross sections for NC \(e^+p\) DIS scattering are shown as a function of \(Q^2\) for selected values of \(x_\mathrm{Bj}\). The improvement due to combination is clearly visible. In Fig. 5, a comparison between the new combination and the combination of HERA I data alone is shown. The improvement is especially significant at high \(Q^2\). The results for NC \(e^-p\) scattering are depicted in Figs. 6 and 7. As the integrated luminosity for \(e^-p\) scattering was very limited for the HERA I period, the improvements due to the new combination are even more substantial than for \(e^+p\) scattering.

The results of the combination of the data with lower proton beam energies are shown in Figs. 8 and 9 as a function of \(x_\mathrm{Bj}\) in selected bins of \(Q^2\). These data augment the data with standard proton energy to provide increased sensitivity to the gluon density in the proton.

The combined NC \(e^+p\) data for very low \(Q^2\) with proton beam energies of 920 and 820 GeV are shown in Figs. 10 and 11. These data were taken during the HERA I period, but due to the systematic shifts introduced by the combination with HERA II data, the numbers are not always the same as in the old HERA I combination.

The combined CC cross sections are shown in Figs. 12, 13, 14 and 15, together with the input data from H1 and ZEUS and the comparison to the HERA I combination results for \(e^+p\) and \(e^-p\) scattering. As for the NC data, the power of combination and the improved precision due to the high statistics data from HERA II are demonstrated.

Fig. 8
figure 8

The combined HERA data for the inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 251\) GeV as a function of \(x_\mathrm{Bj}\) for five selected values of \(Q^2\) compared to the individual H1 and ZEUS data. The individual measurements are displaced horizontally for better visibility. The ZEUS points at the same \(x_\mathrm{Bj}\) and \(Q^2\) values are from two different data sets. Error bars represent the total uncertainties

Fig. 9
figure 9

The combined HERA data for the inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 225\) GeV as a function of \(x_\mathrm{Bj}\) for five selected values of \(Q^2\) compared to the individual H1 and ZEUS data. The individual measurements are displaced horizontally for better visibility. The ZEUS points at the same \(x_\mathrm{Bj}\) and \(Q^2\) values are from two different data sets. Error bars represent the total uncertainties

The high-precision DIS cross sections provided here form a coherent set spanning six orders of magnitude, both in \(Q^2\) and \(x_\mathrm{Bj}\). They are a major legacy of HERA.

6 QCD analysis

In this section, the pQCD analysis of the combined data resulting in the PDF set HERAPDF2.0 and its released variants is presented. The framework established for HERAPDF1.0 [2] was followed in this analysis. A breakdown of pQCD is expected for \(Q^2\) approaching 1 GeV\(^2\). To safely remain in the kinematic region where pQCD is expected to be applicable, only cross sections for \(Q^2\) starting from \(Q^2_\mathrm{min} = 3.5\) GeV\(^2\) were used in the analysis. In this kinematic region, target-mass corrections are expected to be negligible. Since the centre-of-mass energy at the \(\gamma p\) vertex W is above 15 GeV for all the data, large-\(x_\mathrm{Bj}\) higher-twist corrections are also expected to be negligible. The \(Q^2\) range of the cross sections entering the fit is \(3.5 \le Q^2 \le 50{,}000\,\)GeV\(^2\). The corresponding \(x_\mathrm{Bj}\) range is \(0.651 \times 10^{-4} \le x_\mathrm{Bj} \le 0.65 \).

In addition to experimental uncertainties, model and parameterisation uncertainties were also considered. The latter were evaluated by variations of the values of various input settings at the starting scale and the form of the parameterisation.

6.1 Theoretical formalism and settings

Predictions from pQCD are fitted to data. These predictions were obtained by solving the DGLAP evolution equations [2933] at LO, NLO and NNLO in the \(\overline{\mathrm{MS}}\ \) scheme [80]. This was done using the programme QCDNUM [81] within the HERAFitter framework [26, 27] and an independent programme, which was already used to analyse the combined HERA I data [2]. The results obtained by the two programmes were in excellent agreement, well within fit uncertainties. The numbers on fit quality and resulting parameters given in this paper were obtained using HERAFitter.

The DGLAP equations yield the PDFs at all scales \(\mu _\mathrm{f}^2\) and x, if they are provided as functions of x at some starting scale, \(\mu ^2_\mathrm{f_{0}}\). In variable-flavour schemes, this scale has to be below the charm-quark mass parameter, \(M_c\), squared. It was chosen to be \(\mu ^2_\mathrm{f_{0}} = 1.9\,\)GeV\(^2\) as for HERAPDF1.0. The renormalisation and factorisation scales were chosen to be \(\mu ^2_\mathrm{r} = \mu ^2_\mathrm{f} = Q^2\). The predictions for the structure functions [1] which appear in the calculation of the cross sections, see Eq. 1, were obtained by convoluting the parton distribution functions with coefficient functions appropriate to the order of the calculation. The light-quark coefficient functions were calculated using QCDNUM. The heavy-quark coefficient functions were calculated in the general-mass variable-flavour-number scheme called RTOPT [8284] for the NC structure functions. For the CC structure functions, the zero-mass approximation was used, since all HERA CC data have \(Q^2 \gg M_b^2\), where \(M_b\) is the beauty-quark mass parameter in the calculation.

Fig. 10
figure 10

The combined HERA data for the inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV at very low \(Q^2\). Error bars represent the total uncertainties

Fig. 11
figure 11

The combined HERA data for the inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 300\) GeV at very low \(Q^2\). Error bars represent the total uncertainties

The value of \(M_c\) was chosen after performing \(\chi ^2\) scans of NLO and NNLO pQCD fits to the combined inclusive data from the analysis presented here and the HERA combined charm data [46]. The procedure is described in detail in the context of the combination of the reduced charm cross-section measurements [46]. All correlations of the inclusive and of the charm data were considered in the fits. Figure 16 shows the \(\Delta \chi ^2 = \chi ^2 -\chi ^2_\mathrm{min}\), where \(\chi ^2_\mathrm{min}\) is the minimum \( \chi ^2\) obtained, of these fits versus \(M_c\) at NLO and NNLO. As a result, the value of \(M_c\) was chosen as \(M_c=1.47\,\)GeV at NLO and \(M_c=1.43\,\)GeV at NNLO. The settings for LO were chosen as for NLO unless otherwise stated.

The value of the beauty-quark mass parameter \(M_b\) was chosen after performing \(\chi ^2\) scans of NLO and NNLO pQCD fits using the combined inclusive data and data on beauty production from ZEUS [75] and H1 [76]. The \(\chi ^2\) scans are shown in Fig. 17. The value of \(M_b\) was chosen to be \(M_b=4.5\) GeV at LO, NLO and NNLO. The value of the top-quark mass parameter was chosen to be 173 GeV [52] at all orders.

The value of the strong coupling constant was chosen to be \(\alpha _s(M_Z^2)= 0.118\) [52] at both NLO and NNLO and \(\alpha _s(M_Z^2)= 0.130\) [38] for the LO fit.

6.2 Parameterisation

In the approach of HERAPDF, the PDFs of the proton, xf, are generically parameterised at the starting scale \(\mu ^2_\mathrm{f_{0}}\) as

$$\begin{aligned} xf(x) = A x^{B} (1-x)^{C} (1 + D x + E x^2), \end{aligned}$$
(26)

where x is the fraction of the proton’s momentum taken by the struck parton in the infinite momentum frame. The PDFs parameterised are the gluon distribution, xg, the valence-quark distributions, \(xu_v\), \(xd_v\), and the u-type and d-type anti-quark distributions, \(x\bar{U}\), \(x\bar{D}\). The relations \(x\bar{U} = x\bar{u}\) and \(x\bar{D} = x\bar{d} +x\bar{s}\) are assumed at the starting scale \(\mu ^2_\mathrm{f_{0}}\).

The central parameterisation is

$$\begin{aligned}&xg(x) = A_g x^{B_g} (1-x)^{C_g} - A_g' x^{B_g'} (1-x)^{C_g'} , \end{aligned}$$
(27)
$$\begin{aligned}&xu_v(x) = A_{u_v} x^{B_{u_v}} (1-x)^{C_{u_v}}\left( 1+E_{u_v}x^2 \right) , \end{aligned}$$
(28)
$$\begin{aligned}&xd_v(x) = A_{d_v} x^{B_{d_v}} (1-x)^{C_{d_v}} , \end{aligned}$$
(29)
$$\begin{aligned}&x\bar{U}(x) = A_{\bar{U}} x^{B_{\bar{U}}} (1-x)^{C_{\bar{U}}}\left( 1+D_{\bar{U}}x\right) , \end{aligned}$$
(30)
$$\begin{aligned}&x\bar{D}(x) = A_{\bar{D}} x^{B_{\bar{D}}} (1-x)^{C_{\bar{D}}} . \end{aligned}$$
(31)

The gluon distribution, xg, is an exception from Eq. 26, for which an additional term of the form \(A_g'x^{B_g'}(1-x)^{C_g'}\)is subtracted.Footnote 11 This additional term was added to make the parameterisation more flexible at low x, such that it is not controlled by the single power \(B_g\) as x approaches zero [36]. This requires that the powers \(B_g\) and \(B_g'\) are different. Therefore a restriction was placed on \(B_g'\), such that \(B_g'\) values in the range \( 0.95 < B_g'/B_g < 1.05 \) were excluded for all PDFs released. The term \(A_g'x^{B_g'}(1-x)^{C_g'}\) was subtracted at NLO and NNLO, but not at LO, since such a term could lead to xg(x) becoming negative which is not physical at LO, because the LO gluon distribution function at low x is directly related to the observable longitudinal structure function \(\tilde{F_\mathrm{L}}\) [45].

Fig. 12
figure 12

The combined HERA data for the inclusive CC \(e^+p\) reduced cross sections as a function of \(x_\mathrm{Bj}\) for the ten different values of \(Q^2\) compared to the individual H1 and ZEUS data. The individual measurements are displaced horizontally for better visibility. Error bars represent the total uncertainties

Fig. 13
figure 13

The combined HERA data for the inclusive CC \(e^+p\) reduced cross sections as a function of \(x_\mathrm{Bj}\) for the ten different values of \(Q^2\) compared to the results from HERA I alone [2]. The individual measurements are displaced horizontally for better visibility. Error bars represent the total uncertainties

Fig. 14
figure 14

The combined HERA data for the inclusive CC \(e^-p\) reduced cross sections as a function of \(x_\mathrm{Bj}\) for the ten different values of \(Q^2\) compared to the individual H1 and ZEUS data. The individual measurements are displaced horizontally for better visibility. Error bars represent the total uncertainties

Fig. 15
figure 15

The combined HERA data for the inclusive CC \(e^-p\) reduced cross sections as a function of \(x_\mathrm{Bj}\) for the ten different values of \(Q^2\) compared to the results from HERA I alone [2]. The individual measurements are displaced horizontally for better visibility. Error bars represent the total uncertainties

Fig. 16
figure 16

The \(\Delta \chi ^2 = \chi ^2 - \chi ^2_\mathrm{min}\) versus the charm mass parameter \(M_c\) for NLO and NNLO fits based on the combined data on charm production in addition to the combined inclusive data

Fig. 17
figure 17

The \(\Delta \chi ^2= \chi ^2 - \chi ^2_\mathrm{min}\) versus the beauty mass parameter \(M_b\) for NLO and NNLO fits based on H1 and ZEUS data on beauty production in addition to the combined inclusive data

Fig. 18
figure 18

Comparison of the PDF uncertainties as determined by the Hessian and Monte Carlo (MC) methods at NNLO for the valence distributions \(xu_v\) and \(xd_v\), the gluon distribution xg and the sea distribution, \(xS=2x(\bar{U}+\bar{D})\), at the scale \(\mu _\mathrm{f}^{2} = 10\,\text {GeV}^{2}\)

The normalisation parameters, \(A_{u_v}, A_{d_v}, A_g\), are constrained by the quark-number sum rules and the momentum sum rule. The B parameters \(B_{\bar{U}}\) and \(B_{\bar{D}}\) were set as equal, \(B_{\bar{U}}=B_{\bar{D}}\), such that there is a single B parameter for the sea distributions. The strange-quark distribution is expressed as an x-independent fraction, \(f_s\), of the d-type sea, \(x\bar{s}= f_s x\bar{D}\) at \(\mu ^2_\mathrm{f_{0}}\). The value \(f_s=0.4\) was chosen as a compromise between the determination of a suppressed strange sea from neutrino-induced di-muon production [36, 85] and a recent determination of an unsuppressed strange sea, published by the ATLAS collaboration [86]. A further constraint was applied by setting \(A_{\bar{U}}=A_{\bar{D}} (1-f_s)\). This, together with the requirement \(B_{\bar{U}}=B_{\bar{D}}\), ensures that \(x\bar{u} \rightarrow x\bar{d}\) as \(x \rightarrow 0\).

The parameters appearing in Eqs. 2731 were selected by first fitting with all D and E parameters and \(A_g'\) set to zero. This left 10 free parameters. The other parameters were then included in the fit one at a time. The improvement of the \(\chi ^2\) of the fits was monitored and the procedure was ended when no further improvement in \(\chi ^2\) was observed. This led to a 15-parameter fit at NLO and a 14-parameter fit at NNLO. A common parameterisation with 14-parameters was chosen as “central”, both at NLO and at NNLO, such that any differences between these fits reflect only the change in order. The central fits satisfy the criterion that all the PDFs are positive in the measured region. The 15-parameter NLO fit was used as a parameterisation variation, see Sect. 6.5.

6.3 Definition of \(\varvec{\chi ^2}\)

The pQCD predictions were fit to the data using a \(\chi ^2\) method similar to that described in Sect. 4.2. The definition of \(\chi ^2\) is

$$\begin{aligned} \chi ^2_\mathrm{exp}\left( \varvec{m},\varvec{s}\right)= & {} \sum _i \frac{\left[ m^i - \sum _j \gamma ^i_j m^i s_j - {\mu ^i} \right] ^2}{ \textstyle \delta ^2_{i,\mathrm{stat}}\,{\mu ^i} m^i + \delta ^2_{i,\mathrm{uncor}}\, (m^i)^2}\nonumber \\&+ \sum _j s^2_j + \sum _i \ln \frac{ \delta ^2_{i,\mathrm{stat}} \mu ^i m^i + (\delta _{i,\mathrm{uncor}} m^i)^2}{ (\delta ^2_{i,\mathrm{stat}} +\delta ^2_{i,\mathrm{uncor}})(\mu ^i)^2} , \end{aligned}$$
(32)

where the notation is equivalent to that in Eq. 22; here \(\varvec{s}\) is used to denote systematic shifts. The additional logarithmic term in Eq. 32 compared to Eq. 22 was introduced to minimise biases [8].

Correlated systematic uncertainties were treated as for the combination of data, see Sect. 4.2. For the combined inclusive data, the correlated systematic uncertainties are smaller or comparable to the statistical and uncorrelated uncertainties. Nevertheless, the remaining correlations are significant and thus the 162 systematic uncertainties present for the H1 and ZEUS data sets plus the seven sources of procedural uncertainty which resulted from the combination procedure, see Sect. 4.5, were all individually treated as correlated uncertainties.

6.4 Experimental uncertainties

Experimental uncertainties were determined using the Hessian method with the criterion \(\Delta \chi ^2=1\). The use of a consistent input data set with common correlations justifies this approach.

Table 2 Input parameters for HERAPDF2.0 fits and the variations considered to evaluate model and parameterisation (\(\mu _{f_{0}}\)) uncertainties

A cross check was performed using the Monte Carlo method [87, 88]. It is based on analysing a large number of pseudo data sets called replicas. For this cross check, 1000 replicas were created by taking the combined data and fluctuating the values of the reduced cross sections randomly within their given statistical and systematic uncertainties taking into account correlations. All uncertainties were assumed to follow Gaussian distributions. The PDF central values and uncertainties were estimated using the mean and RMS values over the replicas.

The uncertainties obtained by the Monte Carlo method and the Hessian method were consistent within the kinematic reach of HERA. This is demonstrated in Fig. 18 where experimental uncertainties obtained for HERAPDF2.0 NNLO by the Hessian and Monte Carlo methods are compared for the valence, the gluon and the total sea-quark distributions. The RMS values taken as Monte Carlo uncertainties tend to be slightly larger than the standard deviations obtained in the Hessian approach.

6.5 Model and parameterisation uncertainties

For the NLO and NNLO PDFs, the uncertainties on HERAPDF2.0 due to the choice of model settings and the form of the parameterisation were evaluated by varying the assumptions. A summary of the variations on model parameters is given in Table 2. The variations of \(M_c\) and \(M_b\) were chosen in accordance with the \(\chi ^2\) scans related to the heavy-quark mass parameters as shown in Figs. 16 and 17. The data on heavy-quark production from HERA II led to a considerably reduced uncertainty on the heavy-quark mass parameters compared to the HERAPDF1.0 and HERAPDF1.5 analyses, see Appendix A.

The variation of \(f_s\) was chosen to span the ranges between a suppressed strange sea [36, 85] and an unsuppressed strange sea [86]. In addition to this, two more variations of the assumptions about the strange sea were made. Instead of assuming that the strange contribution is a fixed fraction of the d-type sea, an x-dependent shape, \(x\bar{s}= f_s' \, 0.5 \tanh (-20(x-0.07))\, x\bar{D}\), was used in which high-x strangeness is highly suppressed. This was suggested by measurements published by the HERMES collaboration [89, 90]. The normalisation of \(f_s'\) was also varied between \(f_s'=0.3\) and \(f_s'=0.5\).

In addition to these model variations, \(Q^2_\mathrm{min}\) was varied as for the HERAPDF1.0 and HERAPDF1.5 analyses, see Appendix A. The differences between the central fit and the fits corresponding to the variations of \(Q^2_\mathrm{min}\), \(f_s\), \(M_c\) and \(M_b\) are added in quadrature, separately for positive and negative deviations, and represent the model uncertainty of the HERAPDF2.0 sets.

Table 3 Input parameters for HERAPDF2.0FF fits. All other parameters were set as for the standard HERAPDF2.0 NLO fit
Table 4 The values of \(\chi ^2\) per degree of freedom for HERAPDF2.0 and its variants
Fig. 19
figure 19

The dependence of \(\chi ^2/\mathrm{d.o.f.}\) on \(Q^2_\mathrm{min}\) of the LO, NLO and NNLO fits to the HERA combined inclusive data. Also shown are values for an NLO fit to the combined HERA I data [2]. All fits were performed using the RTOPT heavy-flavour scheme

Fig. 20
figure 20

The dependence of \(\chi ^2/\mathrm{d.o.f.}\) on \(Q^2_\mathrm{min}\) for HERAPDF2.0 fits using a the RTOPT [84], FONNL-B [91], ACOT [110] and fixed-flavour (FF) schemes at NLO and b the RTOPT and FONNL-B/C [92] schemes at NLO and NNLO. The \(F_\mathrm{L}\) contributions are calculated using matrix elements of the order of \(\alpha _s\) indicated in the legend. The number of degrees of freedom drops from 1148 for \(Q^2_\mathrm{min}=2.7\) GeV\(^2\) to 1131 for the nominal \(Q^2_\mathrm{min}=3.5\) GeV\(^2\) and to 868 for \(Q^2_\mathrm{min}=25\) GeV\(^2\)

Fig. 21
figure 21

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\). The gluon and sea distributions are scaled down by a factor of 20. The experimental, model and parameterisation uncertainties are shown. The dotted lines represent HERAPDF2.0AG NLO with the alternative gluon parameterisation, see Sect. 6.8

Two kinds of parameterisation uncertainties were considered, the variation in \(\mu ^2_\mathrm{f_{0}}\) and the addition of parameters D and E, see Eq. 26. The variation in \(\mu ^2_\mathrm{f_{0}}\) mostly increased the PDF uncertainties of the sea and gluon at small x. The parameters D and E were added separately for each PDF. The only significant difference from the 14-parameter central fit came from the 15-parameter fit, for which \(D_{u_v}\) was non zero. This affected the shape of the U-type sea as well as the shape of \(u_v\). The final parameterisation uncertainty for a given quantity is taken as the largest of the uncertainties. This uncertainty is valid in the x-range covered by the QCD fits to HERA data.

6.6 Total uncertainties

The total PDF uncertainty is obtained by adding in quadrature the experimental, the model and the parameterisation uncertainties described in Sects. 6.4 and 6.5. Differences arising from using alternative values of \(\alpha _s(M_Z^2)\), alternative forms of parameterisations, different heavy-flavour schemes or a very different \(Q^2_\mathrm{min}\) are not included in these uncertainties. Such changes result in the different variants of the PDFs to be discussed in the subsequent sections.

6.7 Alternative values of \(\varvec{\alpha _s(M_Z^2)}\)

The HERAPDF2.0 NLO and NNLO standard fits were additionally made for a series of \(\alpha _s(M_Z^2)\) values from \(\alpha _s(M_Z^2)=0.110\) to \(\alpha _s(M_Z^2)=0.130\) in steps of 0.001. These variants are also released. They can be used to assess the uncertainty on any predicted cross section due to the choice of \(\alpha _s(M_Z^2)\) and for \(\alpha _s(M_Z^2)\) determinations using independent data.

Fig. 22
figure 22

The flavour breakdown of the sea distribution of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2}\) = 10 GeV\(^{2}\). Shown are the distributions \(x\bar{u}\), \(x\bar{d}\), \(x\bar{c}\) and \(x\bar{s}\) together with their experimental, model and parameterisation uncertainties. The fractional uncertainties are also shown

6.8 Alternative forms of parameterisation

An “alternative gluon parameterisation”, AG, was considered at NNLO and NLO. The value of \(A_g'\) in Eq. 27 was set to zero and a polynominal term for xg(x) as in Eq. 26 was substituted. This potentially resulted in a different 14-parameter fit. However, in practice a 13-parameter fit with a non-zero \(D_g\) was sufficient for the AG parameterisation, since there was no improvement in \(\chi ^2\) for a non-zero \(E_g\). Note that AG was the only parameterisation considered at LO.

The standard parameterisation fits the HERA data better; however, especially at NNLO, it produces a negative gluon distribution for very low x, i.e. \(x < 10^{-4}\). This is outside the kinematic region of the fit, but may cause problems if the PDFs are used at very low x within the conventional formalism. Therefore, a variant HERAPDF2.0AG using the alternative gluon parameterisation is provided for predictions of cross sections at very low x, such as very high-energy neutrino cross sections.

HERAPDF has a certain ansatz for the parameterisation of the PDFs, see Sect. 6.2. Different ways of using the polynomial form, such as parameterising xg, \(xu_v\), \(xd_v\), \(x\bar{d}+x\bar{u}\) and \(x\bar{d}-x\bar{u}\) or xg, xU, xD, \(x\bar{U}\) and \(x\bar{D}\) were investigated. The resulting PDFs agreed with the standard PDFs within uncertainties and no improvement of fit quality resulted. Therefore, these alternative parameterisations were not pursued further.

6.9 Alternative heavy-flavour schemes

The standard choice of heavy-flavour scheme for HERAPDF2.0 is the variable-flavour-number scheme RTOPT [84]. Investigations using other heavy-flavour schemes were also carried out.

Two other variable-flavour-number schemes, FONLL [91, 92] and ACOT [93], were considered, as implemented in HERAFitter at the time of the analysis. The FONLL scheme is implemented via an interface to the APFEL program [94] and was used at NLO and NNLO. The ACOT scheme is implemented using k-factors for the NLO corrections. The three heavy-flavour schemes differ in the order at which \(F_\mathrm{L}\) is evaluated. At NLO, the massless contribution to \(F_\mathrm{L}\) is evaluated to \(\mathcal{O} (\alpha _s^2)\) for RTOPT and to \(\mathcal{O}(\alpha _s)\) for FONNL-B and ACOT. At NNLO, the massless contribution to \(F_\mathrm{L}\) is evaluated to \(\mathcal{O} (\alpha _s^3)\) for RTOPT and to \(\mathcal{O}(\alpha _s^2)\) for FONNL-C. Fixed-flavour-number schemes were also investigated. In such schemes, the number of (massless) light flavours in the PDFs remains fixed across “flavour thresholds” and (massive) heavy flavours only occur in the matrix elements.

For some calculations, e.g. charm production at HERA, the availability of fixed-flavour variants of the PDFs is useful or even mandatory. Many PDF groups provide either fixed-flavour fits only, or variable-flavour fits only, with a fixed-flavour variant calculated from the variable-flavour parton distributions at the starting scale using theory. For HERAPDF2.0, fixed-flavour variants are provided which were actually fitted to the data.

Two schemes with three active flavours in the PDFs, FF3A and FF3B, were considered:

  • scheme FF3A:

    • Three-flavour running of \(\alpha _s\);

    • \(F_\mathrm{L}\) calculated to \(\mathcal{O}(\alpha _s^2)\);

    • pole masses for charm, \(m_c^\mathrm{pole}\), and beauty, \(m_b^\mathrm{pole}\);

  • scheme FF3B:

    • Variable-flavour running of \(\alpha _s\) [95]. This is sometimes called the “mixed scheme” [81];

    • massless (light flavour) part of the \(F_\mathrm{L}\) contribution calculated to \(\mathcal{O}(\alpha _s)\);

    • \(\overline{\mathrm{MS}}\) [80] running masses for charm, \(m_c(m_c)\), and beauty \(m_b(m_b)\).

The input parameters to the fits are given in Table 3.

The fits providing the variants HERAPDF2.0FF3A and HERAPDF2.0FF3B were obtained using the OPENQCDRAD [96] package as implemented in HERAFitter, partially interfaced to QCDNUM. This was proven to be consistent with the standalone version of OPENQCDRAD and, in the case of the A variant, with the FFNS definition used by the ABM [4042] fitting group. The HERAFitter implementation allows an external steering of the order of \(\alpha _s\) in \(F_\mathrm{L}\), as listed in Table 3.

6.10 Adding data on charm production to the HERAPDF2.0 fit

The data on charm production described in Sect. 3.4 were used to find the optimal value of \(M_c\) for the HERAPDF2.0 fits as described in Sect. 6.1.

The impact of adding charm data to inclusive data as input to NLO QCD fits has been extensively discussed in a previous publication [46]. This previous analysis was based on the HERA I combined inclusive data and combined charm data. It was established that the main impact of the charm data on the PDF fits is a reduction of the uncertainty on \(M_c\). It was also established that the optimal value of \(M_c\) can differ according to the particular general-mass variable-flavour-number scheme chosen for the fit. The fits for all schemes considered were of similar quality.

For the HERAPDF2.0 analysis, a total of 47 data points on charm production [46] with \(Q^2\) larger than \(Q^2_\mathrm{min} = 3.5\,\)GeV\(^2\) were added as input to the NLO fits. The 42 sources of correlated systematic uncertainty from the H1 and ZEUS data sets on charm production and two additional sources due to the combination procedure were taken into account. The correlations between the normalisation of the inclusive data and the normalisation of the charm data was not taken into account in the PDF fits but it was verified that this has a negligible effect.

Table 5 Central values of the HERAPDF2.0 parameters at NLO
Fig. 23
figure 23

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NNLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\). The gluon and sea distributions are scaled down by a factor 20. The experimental, model and parameterisation uncertainties are shown. The dotted lines represent HERAPDF2.0AG NNLO with the alternative gluon parameterisation, see Sect. 6.8

Table 6 Central values of the HERAPDF2.0 parameters at NNLO
Fig. 24
figure 24

The flavour breakdown of the sea distribution of HERAPDF2.0 NNLO at \(\mu _\mathrm{f}^{2}\) = 10 GeV\(^{2}\). Shown are the distributions \(x\bar{u}\), \(x\bar{d}\), \(x\bar{c}\) and \(x\bar{s}\) together with their experimental, model and parameterisation uncertainties. The fractional uncertainties are also shown

The inclusion of the charm data had little influence on the result of the fit. This was not unexpected, since the main effect of the charm data, i.e. to constrain \(M_c\), was already used for the fit to the inclusive data. The charm data were proven to be consistent with the inclusive data, but only a marginal reduction in the uncertainty on the low-x gluon PDF was obtained. The situation is similar at NNLO. Therefore no HERAPDF2.0 variants with only the addition of data on charm production are released.

6.11 Adding data on jet production to the HERAPDF2.0 fit

In pQCD fits to inclusive DIS data only, the gluon PDF is determined via the DGLAP equations using the observed scaling violations. This results in a strong correlation between the shape of the gluon distribution and the value of \(\alpha _s(M_Z^2)\). In most PDF fits, the value of \(\alpha _s(M_Z^2)\) is not fitted but taken from external information [52]. The uncertainty on the gluon distribution is reduced for fits with fixed \(\alpha _s(M_Z^2)\) compared to fits with free \(\alpha _s(M_Z^2)\). Data on jet production cross sections provide an independent measurement of the gluon distribution. They are sensitive to \(\alpha _s(M_Z^2)\) and already at LO to the gluon distribution at lower \(Q^2\) and to the valence-quark distribution at higher \(Q^2\). Therefore the inclusion of jet data not only reduces the uncertainty on the high-x gluon distribution in fits with fixed \(\alpha _s(M_Z^2)\) but also allows the accurate simultaneous determination of \(\alpha _s(M_Z^2)\) and the gluon distribution.

Fig. 25
figure 25

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of HERAPDF2.0 NNLO on logarithmic (top) and linear (bottom) scales. The bands represent the total uncertainties

Fig. 26
figure 26

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0AG LO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of HERAPDF2.0AG NLO. The bands represent the experimental uncertainties only

The jet data were included in the fits at NLO by calculating predictions for the jet cross sections with NLOjet++ [97, 98], which was interfaced to FastNLO [99101] in order to achieve the speed necessary for iterative PDF fits. The predictions were multiplied by corrections for hadronisation and \(Z^0\) exchange before they were used to fit the data [4751]. A running electro-magnetic \(\alpha \) as implemented in the 2012 version of the programme EPRC [102] was used for the treatment of jet cross sections when they were included in the PDF fits. The factorisation scale was chosen as \(\mu _\mathrm{f}^2 = Q^2\), while the renormalisation scale was linked to the transverse momenta, \(p_T\), of the jets by \(\mu _\mathrm{r}^2 = (Q^2 + p_{T}^2)/2\). Jet data could not be included at NNLO for the analysis presented here, because the matrix elements were not available at the time of writing.

The normalisations of the ZEUS jet data [47, 48] and the H1 low-\(Q^2\) jet data [49] are correlated with the inclusive samples but because of the combination procedure these correlations cannot be recovered. Thus they are treated conservatively as uncorrelated. However, cross checks performed by using the uncombined H1 and ZEUS inclusive data have shown that this does not have a significant impact on the result. In the case of the H1 high-\(Q^2\) jet data [50, 51], the correlations due to the uncertainty on the integrated luminosity are accounted for by the normalisation of the jet cross sections to the inclusive cross sections. The statistical correlations present between the jet data and the inclusive data were neglected, with no significant impact on the result.

Fig. 27
figure 27

The combined high-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0 NNLO. The bands represent the total uncertainties on the predictions

Fig. 28
figure 28

The combined high-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions of the HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

Fig. 29
figure 29

The combined high-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions of the HERAPDF2.0AG LO. The bands represent the experimental uncertainties on the predictions

Fits including these jet data and including the combined charm data were performed with \(\alpha _s(M_Z^2)=0.118\) fixed and with \(\alpha _s(M_Z^2)\) as a free parameter in the fit. The resulting HERAPDF variant with free \(\alpha _s(M_Z^2)\) is called HERAPDF2.0Jets. A full uncertainty analysis was performed for the HERAPDF2.0Jets variant, including model and parameterisation uncertainties and additional hadronisation uncertainties on the jet data as evaluated for the original publications [4751].

6.12 The \(\chi ^2\) values of the HERAPDF2.0 fits and alternative \(Q^2_\mathrm{min}\)

The \(\chi ^2/\mathrm{d.o.f.}\) of the fits for HERAPDF2.0 and its variants are listed in Table 4. These values are somewhat large, typically around 1.2. The dependence of \(\chi ^2\) on \(Q^2_\mathrm{min}\) was investigated in detail. Figure 19 shows the \(\chi ^2/\mathrm{d.o.f.}\) values for the LO, NLO and NNLO fits versus \(Q^2_\mathrm{min}\). The \(\chi ^2/\mathrm{d.o.f.}\) drop steadily until \(Q^2_\mathrm{min} \approx 10\,\)GeV\(^2\). Also shown are \(\chi ^2\) values obtained for an NLO fit to HERA I data only. These values are substantially closer to one, but they show the same trend as seen for HERAPDF2.0.

The \(\chi ^2/\mathrm{d.o.f.}\) values rise again for \(Q^2_\mathrm{min} > 15\,\)GeV\(^2\). If only data with \(Q^2\) between \(Q^2=15\,\)GeV\(^2\) and \(Q^2=150\,\)GeV\(^2\) were used, the \(\chi ^2/\mathrm{d.o.f.}\) became close to unity. The addition of either data with lower or higher \(Q^2\) increased the \(\chi ^2/\mathrm{d.o.f.}\). The lower- and middle-\(Q^2\) data clearly show tension. The higher-\(Q^2\) data generally cannot be fitted very well. It was not possible to attribute this to any particular region in \(x_\mathrm{Bj}\) or a particular NC or CC process. For the standard value \(Q^2_\mathrm{min} = 3.5\,\)GeV\(^2\), the data between \(Q^2=3.5\,\)GeV\(^2\) and \(Q^2=15\,\)GeV\(^2\) create about one third of the excess \(\chi ^2/\mathrm{d.o.f.}\) while two thirds originate from the data with \(Q^2>150\,\)GeV\(^2\).

Fig. 30
figure 30

The combined high-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections as partially shown already in Fig. 5 with overlaid predictions of HERAPDF2.0 NLO and NNLO. The two differently shaded bands represent the total uncertainties on the two predictions

Fig. 31
figure 31

The combined HERA inclusive NC \(e^-p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0 NNLO. The bands represent the total uncertainties on the predictions

Fig. 32
figure 32

The combined HERA inclusive NC \(e^-p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

Fig. 33
figure 33

The combined HERA inclusive NC \(e^-p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0AG LO. The bands represent the experimental uncertainties on the predictions

Fig. 34
figure 34

The combined low-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0 NNLO. The bands represent the total uncertainties on the predictions. Dotted lines indicate extrapolation into kinematic regions not included in the fit

Fig. 35
figure 35

The combined low-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} =318\) GeV with overlaid predictions from HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions. Dotted lines indicate extrapolation into kinematic regions not included in the fit

Fig. 36
figure 36

The combined low-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0AG LO. The bands represent the experimental uncertainties on the predictions. Dotted lines indicate extrapolation into kinematic regions not included in the fit

Fig. 37
figure 37

The combined HERA inclusive CC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0 NNLO. The bands represent the total uncertainties on the predictions

Fig. 38
figure 38

The combined HERA inclusive CC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

Fig. 39
figure 39

The combined HERA inclusive CC \(e^-p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0 NNLO. The bands represent the total uncertainties on the predictions

Fig. 40
figure 40

The combined HERA inclusive CC \(e^-p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions of the HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

Fig. 41
figure 41

The combined low-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 300\) GeV with overlaid predictions of HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions. Dotted lines indicate extrapolation into kinematic regions not included in the fit

Fig. 42
figure 42

The combined high-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 300\) GeV with overlaid predictions of HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

Fig. 43
figure 43

The combined low-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 251\) GeV with overlaid predictions from HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions. Dotted lines indicate extrapolation into kinematic regions not included in the fit

Fig. 44
figure 44

The combined high-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 251\) GeV with overlaid predictions from HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

Fig. 45
figure 45

The combined low-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 225\) GeV with overlaid predictions from HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions. Dotted lines indicate extrapolation into kinematic regions not included in the fit

Fig. 46
figure 46

The combined high-\(Q^2\) HERA inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 225\) GeV with overlaid predictions from HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

The influence of the choice of heavy-flavour scheme, and the order at which the massless contribution to \(F_\mathrm{L}\) is evaluated, on the \(\chi ^2/\mathrm{d.o.f.}\) behaviour was also investigated. Scans at NLO and NNLO of the \(\chi ^2/\mathrm{d.o.f.}\) versus \(Q^2_\mathrm{min}\) for fits done with the heavy-flavour schemes described in Sect. 6.9 are illustrated in Fig. 20. The decrease of the \(\chi ^2/\mathrm{d.o.f.}\) with increasing \(Q^2_\mathrm{min}\) is observed for every scheme. At NLO and low \(Q^2_\mathrm{min}\), all fits using schemes for which the \(F_\mathrm{L}\) contributions are calculated using matrix elements of the order of \(\alpha _s\) result in slightly lower \(\chi ^2/\mathrm{d.o.f.}\) than fits for schemes using matrix elements of the order of \(\alpha _s^2\). The increase of \(\chi ^2/\mathrm{d.o.f.}\) for lower \(Q^2_\mathrm{min}\) is also less pronounced for fits using the “\(\mathcal{O}(\alpha _s)\)-schemes”. However, at NNLO, the trend reverses and RTOPT, which uses matrix elements of order \(\alpha _s^3\) in the calculation of \(F_\mathrm{L}\), results in lower \(\chi ^2/\mathrm{d.o.f.}\) than the FONNL scheme, for which matrix elements of order \(\alpha _s^2\) are used. The \(\chi ^2/\mathrm{d.o.f.}\) values for fits with the RTOPT scheme are quite similar at NLO and NNLO.

The two fixed-flavour-number schemes considered, see Sect. 6.9, also differ in using light-flavour matrix elements of order \(\alpha _s\) (FF3B) and \(\alpha _s^2\) (FF3A). The FF3A fit variant results in \(\chi ^2/\mathrm{d.o.f.}\) values very similar to the values from the standard fit using RTOPT while the values for the FF3B variant closely follow the results for fits using the FONNL scheme. This suggests that the determining factor for the \(\chi ^2\) of the fits is the order of \(\alpha _s\) of the matrix elements used to calculate the massless \(F_\mathrm{L}\) contribution. Other differences between FF3A and FF3B as well as differences [103] between different variable-flavour-number schemes, and differences between fixed-flavour-number and variable-flavour-number schemes, seem to have less influence on \(\chi ^2\).

At HERA, the low-\(Q^2\) data are also dominantly at low \(x_\mathrm{Bj}\). Some of the poor \(\chi ^2\) values in this kinematic region could be due to low-\(x_\mathrm{Bj}\) physics not accounted for in the current framework [1, 104]. This could mean that the inclusion of low-\(x_\mathrm{Bj}\), low-\(Q^2\) data into the fits introduces bias. To study this, NLO and NNLO fits with \(Q^2_\mathrm{min} = 10\,\)GeV\(^2\) were also fully evaluated. This variant is called HERAPDF2.0HiQ2. As part of the evaluation, the settings were reexamined. No significant changes for the optimal parameterisation or for the optimal value of \(M_c\) or \(M_b\) were observed. Model and parameterisation variations were also performed in order to better assess possible bias. For the NLO fits, the \(\chi ^2/\mathrm{d.o.f.}\) of 1156 / 1002 for the \(Q^2_\mathrm{min} = 10\,\)GeV\(^2\) fit can be compared to the 1357 / 1131 for the \(Q^2_\mathrm{min} = 3.5\,\)GeV\(^2\) fit. This is a significant improvement, but still larger than observed for HERAPDF1.0. The values are similar at NNLO, see Table 4. In particular, the NNLO fit does not fit the lower-\(Q^2\) data better than the NLO fit, see Fig. 19, just as, at NLO, the higher-order evaluation of \(F_\mathrm{L}\) does not fit these data better, see Fig. 20.

Fits were also performed with the alternative gluon parameterisation and \(Q^2_\mathrm{min} = 10\,\)GeV\(^2\). The \(\chi ^2/\mathrm{d.o.f.}\) was always worse than for the standard parameterisation, see Table 4.

The \(\chi ^2/\mathrm{d.o.f.}\) values obtained for HERAPDF2.0Jets, both for fixed and for free \(\alpha _s(M_Z^2)\) are better than the value for the standard HERAPDF2.0 NLO fit, see Table 4. The partial \(\chi ^2\) for the jet data is 161 for 162 data points, while it is 41 for 47 data points for the charm data. The partial \(\chi ^2\) for the inclusive data remains practically the same as for HERAPDF2.0 NLO. This demonstrates the compatibility of the data on charm and jet production with the inclusive data.

7 HERAPDF2.0

The analysis described in Sect. 6 resulted in a set of PDFs called HERAPDF2.0. The HERAPDF2.0 analysis has the following notable features:

  • the data include four different processes, NC and CC for \(e^+p\) and \(e^-p\) scattering, such that there is sufficient information to extract the \(xd_v\), \(xu_v\), \(x\bar{U}\) and \(x\bar{D}\) PDFs, and the gluon PDF from the scaling violations;

  • the NC \(e^+p\) data include data at centre-of-mass energies sufficiently different to access different values of y at the same \(x_\mathrm{Bj}\) and \(Q^2\); this makes the data sensitive to \(F_\mathrm{L}\) and thus gives further information on the low-x gluon distribution;

  • it is based on a consistent data set with small correlated systematic uncertainties;

  • the experimental uncertainties are Hessian uncertainties;

  • the uncertainties introduced both by model assumptions and by assumptions about the form of the parameterisation are provided;

  • no heavy-target corrections were needed as all data are on ep scattering; the assumption of \(u_\mathrm{neutron}=d_\mathrm{proton}\) was not needed.

An overview about HERAPDF2.0 and its variants as released is given in Appendix B.

7.1 HERAPDF2.0 NLO, NNLO and 2.0AG

A summary of HERAPDF2.0 NLO is shown in Fig. 21 at the scale \(\mu ^2_\mathrm{f}=10\) GeV\(^2\). The experimental, model and parameterisation uncertainties, see Sects. 6.4 and 6.5, are shown separately. The model and parameterisation uncertainties are asymmetric. The uncertainties arising from the variation of \(\mu ^2_\mathrm{f_{0}}=1.9\) GeV\(^2\) and \(Q^2_\mathrm{min}=3.5\) GeV\(^2\) affect predominantly the low-x region of the sea and gluon distributions. The parameterisation uncertainty from adding the \(D_{u_{v}}\) parameter is important for the valence distributions for all x.

The gluon distribution of HERAPDF2.0 NLO does not become negative within the fitted kinematic region. The distributions of HERAPDF2.0AG with the alternative gluon parameterisation as described in Sect. 6.2 and discussed in Sect. 6.8 are shown superimposed on the standard PDFs.

The flavour breakdown of the sea into \(x\bar{u}\), \(x\bar{d}\), \(x\bar{c}\) and \(x\bar{s}\) for HERAPDF2.0 NLO at the scale \(\mu _\mathrm{f}^2=10\,\)GeV\(^2\) is shown in Fig. 22. The fractional uncertainties are also shown. The model uncertainties from the variation of \(f_s\) and \(M_c\) affect the \(x\bar{s}\) and \(x\bar{c}\) distributions. The \(x\bar{c}\) uncertainties also derive from the uncertainty on the gluon distribution, since charm is generated from \(g \rightarrow c \bar{c}\) splitting. The variation of \(M_c\) also affects the \(x\bar{u}\) distribution since the suppression (enhancement) of \(x\bar{c}\) results in an enhancement (suppression) of \(x\bar{u}\) in the u-type sea. Similarly the strangeness variations also affect \(x\bar{d}\), since the suppressed strangeness must be compensated by enhanced \(x\bar{d}\) in the d-type sea. However, since \(x\bar{d}\) is fixed to \(x\bar{u}\) at low x, this mostly affects the high-x uncertainty on \(x\bar{d}\). The central fit gives \(x\bar{d} - x\bar{u}\) negative at \(x \approx 0.1\). However, the uncertainty is very large because HERA data are not very sensitive to this difference. The uncertainty on \(x\bar{u}\) has a significant contribution from the parameterisation uncertainties. The values of the parameters of HERAPDF2.0 NLO are given in Table 5.

Fig. 47
figure 47

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2}=10\,\)GeV\(^{2}\) compared to those of HERAPDF1.0 on logarithmic (top) and linear (bottom) scales. The bands represent the total uncertainties

Fig. 48
figure 48

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of HERAPDF1.5 on logarithmic (top) and linear (bottom) scales. The bands represent the total uncertainties

Fig. 49
figure 49

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NNLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to the ones of HERAPDF1.5 on logarithmic (top) and linear (bottom) scales. The bands represent the total uncertainties

A summary of HERAPDF2.0 NNLO is shown in Fig. 23 at the scale \(\mu ^2_\mathrm{f}=10\) GeV\(^2\). At NNLO, the gluon distribution of HERAPDF2.0 ceases to rise at low x. Consequently, xg from HERAPDF2.0AG NNLO deviates significantly. As at NLO, the uncertainties arising from the variation of \(\mu ^2_\mathrm{f_{0}}\) and \(Q^2_\mathrm{min}\) affect predominantly the low-x region of the sea and gluon distributions. The parameterisation uncertainty from adding the \(D_{u_v}\) parameter is not important for the NNLO fit, since there was no significant improvement in \(\chi ^2\) from the addition of the 15th parameter. The parameters of the NNLO fit are listed in Table 6.

The flavour breakdown of the sea into \(x\bar{u}\), \(x\bar{d}\), \(x\bar{c}\) and \(x\bar{s}\) for HERAPDF2.0 NNLO is shown in Fig. 24. The uncertainties are also shown as fractional uncertainties. They are dominated by model uncertainties and derive from the same sources as already described at NLO. The parameterisation uncertainties are less important at NNLO than at NLO.

A comparison between HERAPDF2.0 NNLO and NLO is shown in Fig. 25 with total uncertainties, using both linear and logarithmic x scales. The main difference is the different shapes of the gluon distributions as expected from the differing evolution at NLO and NNLO.

At leading order, HERAPDF2.0 is only available as HERAPDF2.0AG LO with the alternative gluon parameterisation. It has thus to be compared to HERAPDF2.0AG NLO. HERAPDF2.0AG LO was determined with experimental uncertainties only, because its main purpose is to be used in LO Monte Carlo programmes. A comparison between the distributions of HERAPDF2.0AG LO and HERAPDF2.0AG NLO is shown in Fig. 26. The gluon distribution at LO rises much faster than at NLO, as expected from the different evolution. The \(xu_v\) distribution is softer at LO than at NLO.

7.1.1 Comparisons to inclusive HERA data

The data with the proton beam energy of 920 GeV (\(\sqrt{s}=318\,\)GeV) are the most precise data due to the large integrated luminosity, see Table 1. HERAPDF2.0 predictions are compared at NNLO, NLO and LO to these high-precision data.

The predictions of HERAPDF2.0 NNLO, NLO and AG LO are compared to the high-\(Q^2\) NC \(e^+p\) data in Figs. 2728 and 29. The data are well described by the predictions at all orders. Figure 30 shows the cross sections already shown in Fig. 5 together with the predictions of HERAPDF2.0 NNLO and NLO. The predictions at NNLO and NLO are very similar.

The predictions of HERAPDF2.0 NNLO, NLO and AG LO are compared to the NC \(e^-p\) data in Figs. 3132 and 33. The description of the \(e^-p\) data is as good as for the \(e^+p\) data.

Fig. 50
figure 50

The parton distribution functions \(xu_v\), \(xd_v\), xg and \(xS=2x(\bar{U}+\bar{D})\) of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of MMHT2014 [37], CT10 [39] and NNPDF3.0 [44]. The top panel shows the distribution with uncertainties only for HERAPDF2.0. The bottom panel shows the PDFs normalised to HERAPDF2.0 and with uncertainties for all PDFs

Fig. 51
figure 51

The parton distribution functions \(xu_v\), \(xd_v\), xg and \(xS=2x(\bar{U}+\bar{D})\) of HERAPDF2.0 NNLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of MMHT2014 [37], CT10 [105] and NNPDF3.0 [44]. The top panel shows the distribution with uncertainties only for HERAPDF2.0. The bottom panel shows the PDFs normalised to HERAPDF2.0 and with uncertainties for all PDFs

Fig. 52
figure 52

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) with \(Q^{2}_\mathrm{min} = 3.5\) GeV\(^{2}\) (top) and of HERAPDF2.0HiQ2 NLO with \(Q^2_\mathrm{min} = 10\) GeV\(^2\) (bottom). The gluon and sea distributions are scaled down by a factor of 20. The experimental, model and parameterisation uncertainties are shown. The dotted lines represent HERAPDF2.0AG NLO and HERAPDF2.0AG HiQ2 NLO

Fig. 53
figure 53

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NNLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) with \(Q^{2}_\mathrm{min} = 3.5\) GeV\(^{2}\) (top) and of HERAPDF2.0HiQ2 NLO with \(Q^2_\mathrm{min} = 10\) GeV\(^2\) (bottom). The gluon and sea distributions are scaled down by a factor 20. The experimental, model and parameterisation uncertainties are shown. The dotted lines represent HERAPDF2.0AG NLO and HERAPDF2.0AG HiQ2 NLO

Fig. 54
figure 54

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of HERAPDF2.0HiQ2 NLO on logarithmic (top) and linear (bottom) scales. The bands represent the total uncertainties

Fig. 55
figure 55

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2} = 10{,}000\,\)GeV\(^{2}\) compared to those of HERAPDF2.0HiQ2 NLO on logarithmic (top) and linear (bottom) scales. The bands represent the total uncertainties

Fig. 56
figure 56

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of HERAPDF2.0HiQ2 NLO on logarithmic (top) and linear (bottom) scales. The bands represent the total uncertainties

Fig. 57
figure 57

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0 NNLO at \(\mu _\mathrm{f}^{2} = 10{,}000\,\)GeV\(^{2}\) compared to those of HERAPDF2.0HiQ2 NNLO on logarithmic (top) and linear (bottom) scales. The bands represent the total uncertainties

For \(e^+p\) scattering, data at low \(Q^2\) are available. Figures 34, 35, and 36 show comparisons between the predictions of HERAPDF2.0 NNLO, NLO and AG LO and these low-\(Q^2\) data. The description of the data is generally good and for the predictions at NNLO and NLO, it remains so even for \(Q^2\) below the fitted kinematic region. However, at low \(x_\mathrm{Bj}\) and low \(Q^2\), the turnover in the cross sections related to \(F_\mathrm{L}\) is not well described, and HERAPDF2.0 NNLO does not describe these data better than HERAPDF2.0 NLO. The HERAPDF2.0AG LO predictions show a clear turnover, but the prediction is significantly too high at all \(x_\mathrm{Bj}\) for the lowest \(Q^2\).

The predictions of the NNLO and NLO fits are compared to the CC \(e^+p\) data with \(\sqrt{s} = 318\,\)GeV in Figs. 37 and 38 and to CC \(e^-p\) data in Figs. 39 and 40. The precise predictions describe the CC cross sections well. The CC data are in general less precise than the NC data.

The predictions of HERAPDF2.0 NLO compared to low-\(Q^2\) and high-\(Q^2\) NC \(e^+p\) data for \(\sqrt{s} = 300\,\)GeV are shown in Figs. 41 and 42. Equivalent comparisons for \(\sqrt{s} = 251\,\)GeV and \(\sqrt{s} = 225\,\)GeV are shown in Figs. 43 and 44, and Figs. 45 and 46, respectively. The data with reduced proton beam energy are also reasonably well described.

7.1.2 Comparisons to HERAPDF1.0 and 1.5

Comparisons of HERAPDF2.0 NLO to HERAPDF1.0 NLO and HERAPDF1.5 NLO are shown in Figs. 47 and 48, respectively. Whereas HERAPDF1.5 already had a somewhat smaller uncertainty than HERAPDF1.0, the use of all HERA II data for HERAPDF2.0 has led to a much larger reduction of the uncertainties on all PDFs. The shapes of the PDFs have also changed somewhat. The shape of the valence distributions have become a little harder. This was caused by the additional data with high \(x_\mathrm{Bj}\) which were not yet available for HERAPDF1.5. The HERAPDF2.0 high-x gluon distribution is also slightly harder than HERAPDF1.5 while the sea distribution of HERAPDF2.0 at high x is considerably softer.

Fig. 58
figure 58

The combined low-\(Q^2\) HERA data on inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0HiQ2 NNLO. The bands represent the total uncertainty on the predictions. Dotted lines indicate extrapolation into kinematic regions not included in the fit

Fig. 59
figure 59

The combined low-\(Q^2\) HERA data on inclusive NC \(e^+p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions from HERAPDF2.0HiQ2 NLO. The bands represent the total uncertainty on the predictions. Dotted lines indicate extrapolation into kinematic regions not included in the fit

A comparison between HERAPDF2.0 NNLO and HERAPDF1.5 NNLO is provided in Fig. 49. As in the case of the NLO PDFs, a reduction of the uncertainty at high x has been achieved by including further high-\(x_\mathrm{Bj}\) data. There is also a reduction of uncertainties at low x. This is mostly due to the better stability of the fit under the variation of \(Q^2_\mathrm{min}\), which is part of the model uncertainties. The shapes of the HERAPDF1.5 and HERAPDF2.0 at NNLO are rather similar, but the gluon distribution at high x has moved to the lower end of its previous uncertainty band.

7.1.3 Comparisons to other sets of PDFs

The PDFs of HERAPDF2.0 NLO and NNLO can be directly compared to the PDFs of MMHT 2014 [37], for which the same heavy-flavour scheme, i.e. RTOPT, was used. Comparisons are also made to the PDFs of CT10 [39, 105], for which a heavy-flavour-scheme based on ACOT was used, and NNPDF3.0 [44], for which the FONLL scheme was used. The results are shown in Figs. 50 and 51 for NLO and NNLO, respectively. For the PDFs themselves, the uncertainties are only shown for HERAPDF2.0. All uncertainties are shown when the ratios of the other PDFs with respect to HERAPDF2.0 are illustrated. Taking the full uncertainties into account, all PDFs are compatible. The largest relative discrepancy (\(\approx \)2.5\(\sigma \)) is found in the shape of the \(xu_v\) distribution at \(x\approx 0.4\) for both NLO and NNLO PDFs. In addition, at NLO, the gluon distribution of HERAPDF2.0 at high x is softer than that of the other PDFs, whereas at NNLO it is close to their \(68\,\%\) uncertainty bands.

7.2 HERAPDF2.0HiQ2

Figures 52 and 53 show summaries for HERAPDF2.0 NLO and NNLO as already shown in Figs. 21 and 23 together with the equivalent plots for HERAPDF2.0HiQ2. The only difference is that HERAPDF2.0 has \(Q^2_\mathrm{min} = 3.5\,\)GeV\(^2\) while HERAPDF2.0HiQ2 has \(Q^2_\mathrm{min} = 10\,\)GeV\(^2\). At NLO, the gluon distributions of HERAPDF2.0 and HERAPDF2.0HiQ2 are compatible within uncertainties. At NNLO, the two gluon distributions differ significantly. Using the higher \(Q^2_\mathrm{min}\) at NNLO causes the gluon distribution to turn over significantly at low x. The distributions of HERAPDF2.0AG are also shown in Figs. 52 and 53. They are not very different for the two \(Q^2_\mathrm{min}\) values. At NNLO, this causes the gluon distribution of HERAPDF2.0AG to be completely different than that of the standard parameterisation for \(x < 10^{-3}\).

7.2.1 Comparison of HERADPF2.0HiQ2 to HERAPDF2.0

A comparison of the NLO PDFs of HERAPDF2.0 to HERAPDF2.0HiQ2 at the scale \(\mu _\mathrm{f}^2=10\,\)GeV\(^2\) is shown in Fig. 54. The different shapes of the gluon distribution at low x are compatible within uncertainties. In Sect. 6.12, the question arose whether including data from the kinematic region of low \(x_\mathrm{Bj}\) and low \(Q^2\), i.e. below 10 GeV\(^2\), in the PDF fits would introduce a bias on predictions for high \(x_\mathrm{Bj}\) and high \(Q^2\). Figure 55 demonstrates that at the high scale of \(\mu _\mathrm{f}^2=10{,}000\,\)GeV\(^2\), the PDFs resulting from the two fits are very similar. This confirms that the value of \(Q^2_\mathrm{min} = 3.5\,\)GeV\(^2\) is a safe value for pQCD fits to HERA data and no bias is introduced for applications at higher scales like cross-section predictions for LHC.

A comparison of the NNLO PDFs of HERAPDF2.0 to those of HERAPDF2.0HiQ2 at the scale \(\mu _\mathrm{f}^2=10\,\)GeV\(^2\) is shown in Fig. 56. The differences in the gluon distributions are pronounced. The gluon distribution of HERAPDF2.0HiQ2 NNLO turns over for \(x < 10^{-3}\). The valence distributions at NNLO also differ between HERAPDF2.0HiQ2 and HERAPDF2.0, but they are compatible within uncertainties. At the high scale of \(\mu _\mathrm{f}^2=10{,}000\,\)GeV\(^2\), the PDFs resulting from the two fits are, as at NLO, very similar, see Fig. 57. This demonstrates that again no bias is introduced at higher scales when low-\(x_\mathrm{B_j}\) and low-\(Q^2\) data are included in the fit at NNLO .

7.2.2 Comparison of HERAPDF2.0HiQ2 to data

Figures 58 and 59 show the predictions of HERAPDF2.0HiQ2 NNLO and NLO compared to the data, which were already presented and compared to HERAPDF2.0 NNLO and NLO in Figs. 34 and 35. In the region \(3.5 \le Q^2 \le 10\) GeV\(^2\), the standard HERAPDF2.0 NLO fit compromises between describing the low-\(x_\mathrm{Bj}\) (high-y) turnover, for which it is too high, and the data at slightly higher \(x_\mathrm{Bj}\), for which it is too low. In the corresponding HERAPDF2.0HiQ2 fit, these data are not fitted. The resulting fit, when extrapolated to the excluded region, is systematically lower than the data at lower \(x_\mathrm{Bj}\) and lower \(Q^2\), but then is significantly above the data at very low \(x_\mathrm{Bj}\), where the contribution from \(F_\mathrm{L}\) becomes important. This implies that the pQCD fit evolves more strongly towards lower \(x_\mathrm{Bj}\) and \(Q^2\) than does the data. The situation is not improved at NNLO where the fit evolves even more strongly. This suggests that the conventional DGLAP resummation may not be fully adequate. This observation was also made during investigations of the HERA I data [104].

Fig. 60
figure 60

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0FF3A NLO and HERAPDF2.0FF3B NLO, at \(\mu _\mathrm{f}^{2}\) = 10 GeV\(^{2}\). The experimental, model and parameterisation uncertainties are shown

7.3 HERAPDF2.0FF

Summaries of HERAPDF2.0FF3A and HERAPDF2.0FF3B as introduced in Sect. 6.9 are shown in Fig. 60. The experimental, model and parameterisation uncertainties were evaluated as for the standard HERAPDF2.0 NLO, see Sects. 6.4 and 6.5, and are shown separately.

A comparison of the PDFs of HERAPDF2.0FF3A and HERAPDF2.0FF3B to the standard HERAPDF2.0 NLO using the RTOPT heavy-flavour scheme is shown in Fig. 61. This comparison is presented at the starting scale \(\mu _\mathrm{f_0}\), because a meaningful comparison can only be done at scales below the charm mass. There are differences in the valence and in the gluon distributions. The latter originate mainly from the different \(\mathcal{O}(\alpha _s)\) at which the massless contribution to \(F_\mathrm{L}\) is calculated and on the \(\alpha _s\) evolution scheme. A comparison of the predictions from HERAPDF2.0FF3B and HERAPDF2.0 NLO to selected data as already used for Fig. 30 is shown in Fig. 62. The predictions are very similar. However, at low \(x_\mathrm{Bj}\) and low \(Q^2\), the \(Q^2\) dependence predicted from HERAPDF2.0FF3B is a bit less steep than the prediction from HERAPDF2.0 NLO. The predictions of HERAPDF2.0FF3A are also very similar. The \(Q^2\) dependence predicted from HERAPDF2.0FF3A is however slightly steeper than the prediction from HERAPDF2.0 NLO at low \(x_\mathrm{Bj}\) and low \(Q^2\).

A comparison of the PDFs of HERAPDF2.0FF3A to the PDFs of ABM11 FF [42] is shown in Fig. 63. These two sets of PDFs can be directly compared as they use the same order for the description of \(F_\mathrm{L}\) and the same \(\alpha _s\) evolution. The largest difference is observed for the \(xd_v\) distribution. However, overall the two sets of PDFs are quite similar.

Fig. 61
figure 61

The parton distribution functions \(xu_v\), \(xd_v\), xg and \(xS=2x(\bar{U}+\bar{D})\) of HERAPDF2.0FF3A and FF3B at the starting scale \(\mu _\mathrm{f_{0}}^{2} = 1.9\,\)GeV\(^{2}\) compared to those of HERAPDF2.0 NLO. The top panel shows the distributions. The bottom panel shows the PDFs normalised to HERAPDF2.0 NLO. The uncertainties are given as differently hatched bands in both panels

A comparison of the PDFs of HERAPDF2.0FF3B to the PDFs of NNPDF3.0 FF(3N) [44] is shown in Fig. 64. These two sets of PDFs can be directly compared at the starting scale due to their equivalent treatment of the \(F_\mathrm{L}\) contribution and of the \(\alpha _s\) evolution.Footnote 12 The gluon distributions are quite similar. Some differences are observed in the \(xu_v\) and \(xd_v\) valence distributions.

7.4 HERAPDF2.0Jets

Data on jet production were included in the analysis as described in Sect. 6.11. This inclusion was first used to validate the choice of \(\alpha _s(M_Z^2)=0.118\) for HERAPDF by investigating the dependence of the \(\chi ^2\)s of the HERAPDF pQCD fits on \(\alpha _s(M_Z^2)\). Three \(\chi ^2\) scans vs. the value of \(\alpha _s(M_Z^2)\) were performed at NLO for three values of \(Q^2_\mathrm{min}\). The result is depicted in the top panel of Fig. 65. A distinct minimum at \(\alpha _s(M_Z^2)\approx 0.118\) is observed, which is basically independent of \(Q^2_\mathrm{min}\). This validates the choice of \(\alpha _s(M_Z^2)= 0.118\) for HERAPDF2.0 NLO. Scans at NLO and NNLO were also performed for fits to inclusive data only. The middle and bottom panels of Fig. 65 show that these scans yielded similar shallow \(\chi ^2\) dependences and the minima were strongly dependent on the \(Q^2_\mathrm{min}\). This demonstrates that the inclusive data alone cannot constrain \(\alpha _s(M_Z^2)\) reasonably.

7.4.1 PDFs and measurement of \({\alpha _s(M_Z^2)}\)

The PDFs resulting from a fit with free \(\alpha _s(M_Z^2)\), HERAPDF2.0Jets, and from a fit with fixed \(\alpha _s(M_Z^2)=0.118\) are shown in Fig. 66. A full uncertainty analysis was performed in both cases, including model and parameterisation uncertainties as well as additional hadronisation uncertainties on the jet data. The PDFs are very similar, because the HERAPDF2.0Jets fit with free \(\alpha _s(M_Z^2)\) yields a value which is very close to the value used for the fit with fixed \(\alpha _s(M_Z^2)\). The jet data determine the value of \(\alpha _s(M_Z^2)\) very well in the HERAPDF2.0Jets fit. Thus, the uncertainty on \(\alpha _s(M_Z^2)\) in this fit does not significantly increase the uncertainty on the gluon PDF with respect to the fit with \(\alpha _s(M_Z^2)\) fixed. The difference in the \(\alpha _s(M_Z^2)\) free fit is mostly due to extra uncertainty coming from the hadronisation corrections.

Fig. 62
figure 62

Selected combined HERA inclusive NC \(e^+p\) reduced cross sections compared to predictions of HERAPDF2.0 NLO and HERAPDF2.0FF3B. The two differently shaded bands represent the total uncertainties on the two predictions

The PDFs from the HERAPDF2.0Jets fit with \(\alpha _s(M_Z^2)=0.118\) fixed are also very similar to the standard PDFs from HERAPDF2.0 NLO. This is demonstrated in Fig. 67. This is again the result of the choice of \(\alpha _s(M_Z^2)=0.118\) for HERAPDF2.0 which is also the preferred value for HERAPDF2.0Jets. Consequently, there is only a small reduction of the uncertainty on the gluon distribution observed for HERAPDF2.0Jets.

The \(\chi ^2\) of the HERAPDF2.0Jets fit with free \(\alpha _s(M_Z^2)\) is the same as for the fit with fixed \(\alpha _s(M_Z^2)=0.118\), see Table 4. This is again due the fact that the value of \(\alpha _s(M_Z^2)\) obtained from the fit is very close to the value previously fixed. The strong coupling constant obtained is

$$\begin{aligned} \alpha _s(M_Z^2)= & {} 0.1183 \pm 0.0009\mathrm{(exp)}\\&\pm 0.0005\mathrm{(model/parameterisation)} \\&\pm 0.0012{ \mathrm (hadronisation)} ^{+0.0037}_{-0.0030}\mathrm{(scale)}. \end{aligned}$$

The uncertainty on \(\alpha _s(M_Z^2)\) due to scale uncertainties was evaluated by varying the renormalisation and factorisation scales by a factor of two, both separately and simultaneously, and taking the maximal positive and negative deviations. The uncertainties were assumed to be 50 % correlated and 50 % uncorrelated between bins and data sets. This resulted in an asymmetric uncertainty of \(+0.0037\) and \(-0.0030\). The result on \(\alpha _s(M_Z^2)\) is compatible with the world average [52] and it is competitive with other determinations at NLO.

7.4.2 Comparison of HERAPDF2.0Jets to data

The predictions of HERAPDF2.0Jets with free \(\alpha _s(M_Z^2)\) are shown together with the charm input data [46] in Fig. 68. The description of the data is excellent.

Comparisons of the predictions of HERAPDF2.0Jets to the data on jet production used as input are shown in Figs. 69, 70, 71, 72 and 73. All analyses were performed using the assumption of massless jets, i.e. the transverse energy, \(E_T\), and the transverse momentum of a jet, \(p_T\), are equivalent. For inclusive jet analyses, each jet is entered separately with its \(p_T\). For dijet and trijet analyses, the average of the transverse momenta is used as \(p_T\). These different definitions of \(p_T\) were also used to set the renormalisation scale to \(\mu _\mathrm{r}^2 = (Q^2 + p_{T}^2)/2\) for calculating predictions. The factorisation scale was chosen as \(\mu _\mathrm{f}^2 = Q^2\). Scale uncertainties were not considered for the comparisons to data.

Data from H1 on differential cross sections, \(\mathrm{d} \sigma / \mathrm{d} p_T\), at low \(Q^2\) [49] and high \(Q^2\) [50] are presented in Fig. 69 together with the predictions of HERAPDF2.0Jets. The high-\(Q^2\) data are normalised to the inclusive NC cross sections. Each event causes as many entries as there are jets. Data from ZEUS on differential cross-sections, \(\mathrm{d}\sigma /\mathrm{d}p_T\), at high \(Q^2\) for inclusive [47] and dijet [48] production are shown in Fig. 70 together with the predictions of HERAPDF2.0Jets. Finally, H1 inclusive-jet, dijet and trijet cross sections normalised to inclusive NC cross sections [51] are presented in Figs. 71, 72 and 73. The description of all the data on jet production by HERAPDF2.0Jets NLO is excellent.

Fig. 63
figure 63

The parton distribution functions \(xu_v\), \(xd_v\), xg and \(xS=2x(\bar{U}+\bar{D})\) of HERAPDF2.0FF3A at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of ABM11 FF [42]. The top panel shows the distributions. The bottom panel shows the PDFs normalised to HERAPDF2.0FF3A. The uncertainties are given as differently hatched bands in both panels

Fig. 64
figure 64

The parton distribution functions \(xu_v\), \(xd_v\), xg and \(xS=2x(\bar{U}+\bar{D})\) of HERAPDF2.0FF3B at the starting scale \(\mu _\mathrm{f_{0}}^{2} = 1.9\,\)GeV\(^{2}\) compared to those of NNPDF3.0FF (3N). The top panel shows the distributions. The bottom panel shows the PDFs normalised to HERAPDF2.0FF3B. The uncertainties are given as differently hatched bands in both panels

8 Electroweak effects and scaling violations

The precise data and the predictions from HERAPDF2.0 were used to examine both electroweak effects and scaling violations.

8.1 Electroweak unification

The combined reduced cross sections were integrated to obtain the differential cross sections \(\mathrm{d}\sigma /\mathrm{d}Q^2\). The integration over \(x_\mathrm{Bj}\) of the double-differential cross-sections \(\mathrm{d}^2\sigma /\mathrm{d}Q^2\mathrm{d}x_\mathrm{Bj}\) was performed in the region \(0 < y < 0.9\), using the shapes as predicted HERAPDF2.0 NLO. All correlated and uncorrelated uncertainties were taken into account. The cross-sections \(\mathrm{d}\sigma /\mathrm{d}Q^2\) are shown in Fig. 74 for NC and CC \(e^-p\) and \(e^+p\) scattering together with predictions from HERAPDF2.0 NLO. Whereas the NC cross sections are three orders of magnitude larger at low \(Q^2 \approx 100\) GeV\(^2\), where they are dominated by virtual photon exchange, the NC and CC cross sections become similar in magnitude at \(Q^2 \approx 10{,}000\,\)GeV\(^2\), i.e. at around the mass-scale squared of the electroweak bosons, demonstrating the success of electroweak unification in the Standard Model with impressive precision. The data also clearly demonstrate that the NC \(e^-p\) and NC \(e^+p\) cross sections are the same when photon exchange is dominant but they start to differ at \(Q^2 \approx 10{,}000\,\)GeV\(^2\) when \(\gamma \)Z interference becomes important.

Fig. 65
figure 65

\(\Delta \chi ^2 = \chi ^2 - \chi ^2_\mathrm{min}\) vs. \(\alpha _s(M_Z^2)\) for pQCD fits with different \(Q^2_\mathrm{min}\) using data on a inclusive, charm and jet production at NLO, b inclusive ep scattering only at NLO and c inclusive ep scattering only at NNLO

Fig. 66
figure 66

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0Jets NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) with fixed \(\alpha _s(M_Z^2)=0.118\) (top) and free \(\alpha _s(M_Z^2)\) (bottom). The experimental, model and parameterisation uncertainties are shown. The hadronisation uncertainty is also included, but it is only visible for the fit with free \(\alpha _s(M_Z^2)\)

Fig. 67
figure 67

The parton distribution functions \(xu_v\), \(xd_v\), \(xS=2x(\bar{U}+\bar{D})\) and xg of HERAPDF2.0Jets NLO at \(\mu _\mathrm{f}^{2} = 10\,\)GeV\(^{2}\) compared to those of HERAPDF2.0 NLO on logarithmic (top) and linear (bottom) scales. The fits were done with fixed \(\alpha _s(M_Z^2)=0.118\). The bands represent the total uncertainties

8.2 The structure function \(xF_3^{\gamma Z}\)

Figures 75 and 76 show the reduced cross sections for both \(e^+p\) and \(e^-p\) inclusive NC scattering and predictions from HERAPDF2.0 at NLO and NNLO as a function of \(Q^2\) for selected values of \(x_\mathrm{Bj}\). The differences in the cross sections at high \(Q^2\) are clearly visible and well described by HERAPDF2.0, both at NLO and at NNLO. The predictions at NNLO have slightly lower uncertainties than at NLO. As described in Sect. 2, the structure function \(xF_{3}^{\gamma Z}\) can be extracted by subtracting the NC \(e^+p\) from the NC \(e^-p\) cross sections. This directly probes the valence structure of the proton. Equations 2 and 7 were used to obtain \(xF_{3}^{\gamma Z}\) for \(Q^2 \ge 1000\) GeV\(^2\). The result is shown in Fig. 77 in bins of \(Q^2\) together with the predictions of HERAPDF2.0 NLO. The values are listed in Table 7. The subtraction yields precise results above \(Q^2\) of 3000 GeV\(^2\).

The valence-quark distributions and hence \(xF_{3}^{\gamma Z}\) depend only minimally on the scale, i.e. only small corrections are needed to translate all values of \(xF_{3}^{\gamma Z}\) to a common scale of \(1000\,\)GeV\(^{2}\). This was done using HERAPDF2.0 NLO. The translation factors were close to unity for most points. The largest factors of up to 1.6 were obtained for points at the highest \(Q^2\) and \(x_\mathrm{Bj}\) where \(xF_{3}^{\gamma Z}\) is very small.

The translated \(xF_{3}^{\gamma Z}\) values were averaged using the method described in Sect. 4. A full covariance matrix was built using the information on the individual sources of uncertainty. The averaging of the \(xF_{3}^{\gamma Z}\) values has a \(\chi ^2 /\mathrm{d.o.f.} = 58.8 / 57\) demonstrating the consistency of the data for different values of \(Q^2\). The result is presented in Fig. 78 together with the prediction of HERAPDF2.0 NLO. The values are listed in Table 8. The data are well described by the HERAPDF2.0 NLO prediction.

An integration of \(F_{3}^{\gamma Z}\) was performed using the averaged cross-section values. For each bin, the shape prediction of HERAPDF2.0 NLO was used. The correlated and uncorrelated uncertainties were taken into account. Two intervals, I1: \(0.016 < x_\mathrm{Bj} < 0.725\) and I2: \(0 < x_\mathrm{Bj} < 1\), were considered. An integration of the prediction of HERAPDF2.0 NLO was also performed. The integration was performed in bins equidistant in log\((x_\mathrm{Bj})\). The integral boundaries for I1 were derived from the maximum y and kinematic boundaries. The results are:

$$\begin{aligned}&\text {I1{:} HERAPDF2.0: } 1.165^{+0.042}_{-0.053}\nonumber \\&\quad \text {Data: } 1.314 \pm 0.057\mathrm (stat) \pm 0.057(\text {syst}) \end{aligned}$$
(33)
$$\begin{aligned}&\text {I2: HERAPDF2.0: } 1.588^{+0.078}_{-0.100}\nonumber \\&\quad \text {Data: } 1.790 \pm 0.078(\text {stat}) \pm 0.078(\text {syst}) \end{aligned}$$
(34)

The values from HERAPDF2.0 and data agree within uncertainties. For I2, they are also close to the QPM prediction of 5/3 from the integration of Eq. 8.

Fig. 68
figure 68

The HERA reduced cross sections for charm production with overlaid predictions of the HERAPDF2.0Jets NLO fit. The bands represent the total uncertainty on the predictions excluding scale uncertainties. Dotted lines indicate extrapolation into kinematic regions not included in the fit

Fig. 69
figure 69

a Differential jet cross sections, \(\mathrm{d}\sigma /\mathrm{d}p_T\), normalised to NC inclusive cross sections, in bins of \(Q^2\) between 150 and 15,000 GeV\(^2\) as measured by H1. b Differential jet cross sections, \(\mathrm{d}\sigma /\mathrm{d}p_T\), in bins of \(Q^2\) between 5 and 100 GeV\(^2\) as measured by H1. Also shown are predictions from HERAPDF2.0Jets. The bands represent the total uncertainties on the predictions excluding scale uncertainties

Fig. 70
figure 70

a Differential jet cross sections, \(\mathrm{d}\sigma /\mathrm{d}p_T\), in bins of \(Q^2\) between 125 and 20,000 GeV\(^2\) as measured by ZEUS. b Differential dijet cross sections, \(d\sigma /\mathrm{d} \langle p_T \rangle _2\), in bins of \(Q^2\) between 125 and 20,000 GeV\(^2\) as measured by ZEUS. The variable \(\langle p_T \rangle _2\) denotes the average \(p_T\) of the two jets. Also shown are predictions from HERAPDF2.0Jets. The bands represent the total uncertainty on the predictions excluding scale uncertainties

Fig. 71
figure 71

Differential jet cross sections, \(\mathrm{d}\sigma /\mathrm{d}p_T\), All cross sections are normalised to NC inclusive cross sections. Also shown are predictions from HERAPDF2.0Jets. The bands represent the total uncertainties on the predictions excluding scale uncertainties

Fig. 72
figure 72

Differential dijet cross sections, \(\mathrm{d}\sigma /\mathrm{d}\langle p_T \rangle _2\), in bins of \(Q^2\) between 150 and 15,000 GeV\(^2\) as measured by H1. The variable \(\langle p_T \rangle _2\) denotes the average \(p_T\) of the two jets. All cross sections are normalised to NC inclusive cross sections. Also shown are predictions from HERAPDF2.0Jets. The bands represent the total uncertainties on the predictions excluding scale uncertainties

Fig. 73
figure 73

Differential trijet cross sections, \(\mathrm{d}\sigma /\mathrm{d} \langle p_T \rangle _3\), in bins of \(Q^2\) between 150 and 15,000 GeV\(^2\) as measured by H1. The variable \(\langle p_T \rangle _3\) denotes the average \(p_T\) of the three jets. All cross sections are normalised to NC inclusive cross sections. Also shown are predictions from HERAPDF2.0Jets. The bands represent the total uncertainties on the predictions excluding scale uncertainties

8.3 Helicity effects in CC interactions

Figures 79 and 80 present the reduced cross sections for CC inclusive \(e^+p\) and \(e^-p\) scattering. The \(e^+p\) cross sections are affected strongly by the helicity factor \((1-y)^2\), see Eq. 12. Therefore, the contribution of the valence quarks is supressed at high y which translates to high \(Q^2\) for fixed \(x_\mathrm{Bj}\). The \(e^-p\) cross section is almost unaffected, because the helicity factor applies to the anti-quarks which as part of the sea are already supressed at high \(x_\mathrm{Bj}\).

8.4 Scaling violations

Scaling violations, i.e. the dependence of the structure functions on \(Q^2\) at fixed \(x_\mathrm{Bj}\), are a consequence of the strong interactions between the partons in the nucleon. The larger the kinematic range, the more clearly these violations are demonstrated. They have been used to extract the gluon content of the proton.

Figures 81 and 82 show the inclusive NC \(e^+p\) and \(e^-p\) HERA data together with fixed-target data [107, 108] and the predictions of HERAPDF2.0 NLO and NNLO, respectively. The data presented span more than four orders of magnitude, both in \(Q^2\) and \(x_\mathrm{Bj}\). The scaling violations are clearly visible and are well described by HERAPDF2.0, both at NLO and NNLO. The scaling violations were also already clearly visible in Fig. 30, in which a close-up for a particular kinematic range was presented.

The structure function \(\tilde{F_2}\), see Eq. 1, can be displayed as a function of \(x_\mathrm{Bj}\) at fixed \(Q^2\). For the part of the phase space where both \(x\tilde{F_3}\) and \(\tilde{F_\mathrm{L}}\) are small, the simple expression

$$\begin{aligned} \tilde{F}_{2} = \sigma _{r,{\mathrm {NC}}}^{\pm }\cdot \frac{\tilde{F}_{2}^{\mathrm {predicted}}}{\sigma _{r,{\mathrm {NC}}}^{\pm }{}^{\mathrm {predicted}}} = \sigma _{r,{\mathrm {NC}}}^{\pm }\cdot (1+C_F) \end{aligned}$$
(35)

can be used to extract the values of \(\tilde{F_2}\). Selected values and HERAPDF2.0 NLO predictions for \(\tilde{F_2}\), for which the correction \(|C_F| < 0.1\), are shown in Fig. 83.

The function \(\tilde{F_2}\) rises toward low \(x_\mathrm{Bj}\) at fixed \(Q^2\). The scaling violations manifest themselves by the rise becoming steeper as \(Q^2\) increases. In the conventional framework of pQCD, this implies an increasing gluon density. The predictions of HERAPDF2.0 NLO describe the data well.

9 Summary and conclusions

The H1 and ZEUS collaborations measured inclusive \(e^{\pm }p\) scattering cross sections at HERA from 1994 to 2007, collecting a total integrated luminosity of about 1 fb\(^{-1}\). The data were taken in two different beam configurations, called HERA I and HERA II, at four different centre-of-mass energies and with two different detectors changing and improving over time. All inclusive data were combined to create one consistent set of NC and CC cross-section measurements for unpolarised \(e^{\pm }p\) scattering, spanning six orders of magnitude in both negative four-momentum-transfer squared, \(Q^2\), and Bjorken x. The data from many measurements made independently by the two collaborations proved to be consistent with a \(\chi ^2\) per degree of freedom being 1.04 for the combination. Combined cross sections are provided for values of \(Q^2\) between \(Q^2=0.045\) GeV\(^2\) and \(Q^2=50{,}000\) GeV\(^2\) and values of \(x_\mathrm{Bj}\) between \(x_\mathrm{Bj}=6\times 10^{-7}\) and \(x_\mathrm{Bj}=0.65\). They are the most precise measurements ever published for ep scattering over such a large kinematic range and have been used to illustrate scaling violation. The precision of the data has also been exploited to illustrate electroweak unification and extract \(xF_3^{\gamma Z}\) above \(Q^2 = 1000\,\)GeV\(^2\).

The inclusive cross sections were used as input to a QCD analysis within the DGLAP formalism. In order to constrain the heavy-quark mass parameters, additional information from data on charm and beauty production at HERA was used. The resulting parton distribution functions are denoted HERAPDF2.0 and are available at LO, NLO and NNLO. They were calculated for a series of fixed values of \(\alpha _s(M_Z^2)\) around the central value of 0.118. HERAPDF2.0 has small experimental uncertainties due to the high precision and coherence of the input data. Parameterisation and model uncertainties have also been estimated. HERAPDF2.0 makes precise predictions which describe the input data well.

The heavy-flavour scheme used for HERAPDF2.0 is RTOPT, a variable-flavour number scheme. Two variants HERAPDF2.0 FF3A and FF3B, using fixed-flavour number schemes, are also available at NLO.

The perturbative QCD fits yielding HERAPDF2.0 are based on data with \(Q^2\) above 3.5 GeV\(^2\). Their \(\chi ^2/\mathrm{d.o.f.}\) values are around 1.2. An extensive investigation included fits with different \(Q^2_\mathrm{min}\), below which data were excluded. For \(Q^2_\mathrm{min} = 10\,\)GeV\(^2\), a full set of PDFs named HERAPDF2.0HiQ2 is also released. These fits have an improved \(\chi ^2/\mathrm{d.o.f.}\) of about 1.15. However, the resulting PDFs do not describe the data in the excluded low-\(Q^2\) region well. HERAPDF2.0 shows tensions between data and fit, independent of the heavy-flavour scheme used, at low \(Q^2\), i.e. below \(Q^2 = 15\,\)GeV\(^2\), and at high \(Q^2\), i.e. above \(Q^2 = 150\,\)GeV\(^2\). Comparisons between the behaviour of the fits with different \(Q^2_\mathrm{min}\) values indicate that the NLO theory evolves faster than the data towards lower \(Q^2\) and x. Fits at NNLO do not improve the agreement. HERAPDF2.0 NNLO and NLO have a similar fit quality.

A measurement of \(\alpha _s(M_Z^2)\) was made using a perturbative QCD fit for which the inclusive cross sections were augmented with selected jet- and charm-production cross sections as measured by both the H1 and ZEUS collaborations. The value obtained is \(\alpha _s(M_Z^2)=0.1183 \pm 0.0009 \mathrm{(exp)} \pm 0.0005\mathrm{(model/parameterisation)} \pm 0.0012\mathrm{(hadronisation)} ^{+0.0037}_{-0.0030}\mathrm{(scale)}\). This value is in excellent agreement with the value of the world average \(\alpha _s(M_Z^2)= 0.1185\) [109]. The set of PDFs obtained from the analysis with free \(\alpha _s(M_Z^2)\) is released as HERAPDF2.0Jets.

The precision data on inclusive ep scattering presented in this paper are one of the main legacies of HERA.

Fig. 74
figure 74

The combined HERA NC and CC \(e^-p\) and \(e^+p\) cross sections, \(\mathrm{d}\sigma /\mathrm{d}Q^2\), together with predictions from HERAPDF2.0 NLO. The bands represent the total uncertainty on the predictions

Fig. 75
figure 75

The combined HERA data for the inclusive NC \(e^+p\) and \(e^-p\) reduced cross sections as a function of \(Q^2\) for selected values of \(x_\mathrm{Bj}\) at \(\sqrt{s} = 318\) GeV with overlaid predictions of HERAPDF2.0 NLO. The bands represent the total uncertainties of the predictions

Fig. 76
figure 76

The combined HERA data for the inclusive NC \(e^+p\) and \(e^-p\) reduced cross sections as a function of \(Q^2\) for selected values of \(x_\mathrm{Bj}\) at \(\sqrt{s} = 318\) GeV with overlaid predictions of HERAPDF2.0 NNLO. The bands represent the total uncertainties of the predictions

Fig. 77
figure 77

The structure function \(xF_3^{\gamma Z}\) for ten values of \(Q^2\) together with predictions from HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

Table 7 Structure function \(xF_3^{\gamma Z}\) for different values of \(Q^2\) and \(x_\mathrm{Bj}\); \(\delta _\mathrm{stat}\), \(\delta _\mathrm{syst}\) and \(\delta _\mathrm{tot}\) represent the statistical, systematic and total uncertainties, respectively
Fig. 78
figure 78

The structure function \(xF_3^{\gamma Z}\) averaged over \(Q^2 \ge 1000\,\)GeV\(^2\) at the scale \(Q^2=1000\,\)GeV\(^{2}\) together with the prediction from HERAPDF2.0 NLO. The band represents the total uncertainty on the prediction

Table 8 Structure function \(xF_3^{\gamma Z}\) averaged over \(Q^2 \ge 1000\) GeV\(^2\) at the scale 1000 GeV\(^2\); \(\delta _\mathrm{stat}\), \(\delta _\mathrm{syst}\) and \(\delta _\mathrm{tot}\) represent the statistical, systematic and total uncertainties, respectively
Fig. 79
figure 79

The combined HERA data for inclusive CC \(e^+p\) and \(e^-p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions of HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions

Fig. 80
figure 80

The combined HERA data for inclusive CC \(e^+p\) and \(e^-p\) reduced cross sections at \(\sqrt{s} = 318\) GeV with overlaid predictions of HERAPDF2.0 NNLO. The bands represent the total uncertainty on the predictions

Fig. 81
figure 81

The combined HERA data for the inclusive NC \(e^+p\) and \(e^-p\) reduced cross sections together with fixed-target data [107, 108] and the predictions of HERAPDF2.0 NLO. The bands represent the total uncertainties on the predictions. Dashed lines indicate extrapolation into kinematic regions not included in the fit

Fig. 82
figure 82

The combined HERA data for the inclusive NC \(e^+p\) and \(e^-p\) reduced cross sections together with fixed-target data [107, 108] and the predictions of HERAPDF2.0 NNLO. The bands represent the total uncertainties on the predictions. Dashed lines indicate extrapolation into kinematic regions not included in the fit

Table 9 Settings for HERAPDF2.0 and HERAPDF1.5
Fig. 83
figure 83

The structure function \(\tilde{F_2}\) as extracted from the measured reduced cross sections for four values of \(Q^2\) together with the predictions of HERAPDF2.0 NLO. The bands represent the total uncertainty on the predictions