1 Introduction

The recent discovery of the Higgs boson [1, 2] and the extensive searches for signals of new physics in LHC proton–proton collisions require high-precision calculations to test the validity of the Standard Model (SM) and factorisation in Quantum Chromodynamics (QCD). Using collinear factorisation, inclusive cross sections in hadron collisions may be written as

$$\begin{aligned} \sigma (\alpha _{\mathrm {s}} (\mu _{\mathrm {R}} ^2),\mu _{\mathrm {R}} ^2,\mu _{\mathrm {F}} ^2)= & {} \sum \limits _{a,b}\, \int \limits _{0}^{1} \mathrm{d}x_1\ \mathrm{d}x_2 f_a(x_1,\mu _{\mathrm {F}} ^2) f_b(x_2,\mu _{\mathrm {F}} ^2) \nonumber \\&\times \,\hat{\sigma }^{ab}(x_1,x_2;\alpha _{\mathrm {s}} (\mu _{\mathrm {R}} ^2),\mu _{\mathrm {R}} ^2,\mu _{\mathrm {F}} ^2) \nonumber \\&+\,\mathcal{O}\left( \frac{\varLambda _\mathrm{QCD}^2}{Q^2}\right) \, \end{aligned}$$
(1)

where the cross section \(\sigma \) is expressed as a convolution of Parton Distribution Functions (PDFs) \(f_a\) and \(f_b\) with the parton cross section \(\hat{\sigma }^{ab}\), involving a momentum transfer q such that \(Q^2 = |q^2| \gg \varLambda _\mathrm{QCD}^2\), where \(\varLambda _\mathrm{QCD}\) is the QCD scale. At Leading Order (LO) in the perturbative expansion of the strong-coupling constant, the PDFs represent the probability of finding a specific parton a (b) in the first (second) hadron carrying a fraction \(x_1\) (\(x_2\)) of its momentum. The indices a and b in Eq. 1 indicate the various kinds of partons, i.e. gluons, quarks and antiquarks of different flavours that are considered as the constituents of the proton. The PDFs depend on the factorisation scale, \(\mu _{\mathrm {F}} \), while the parton cross sections depend on the strong-coupling constant, \(\alpha _{\mathrm {s}} \), and the factorisation and renormalisation scales, \(\mu _{\mathrm {F}} \) and \(\mu _{\mathrm {R}} \). The parton cross sections \(\hat{\sigma }^{ab}\) are calculable in perturbative QCD (pQCD) whereas PDFs are usually constrained by global fits to a variety of experimental data. The assumption that PDFs are universal, within a particular factorisation scheme [37], is crucial to this procedure. Recent review articles on PDFs can be found in Refs. [8, 9].

A precise determination of PDFs as a function of x requires large amounts of experimental data that cover a wide kinematic region and that are sensitive to different kinds of partons. Measurements of inclusive Neutral Current (NC) and Charge Current (CC) Deep Inelastic Scattering (DIS) at the lepton–proton (ep) collider HERA provide crucial information for determining the PDFs. The low-energy fixed-target data and different processes from proton–proton (pp) collisions at the LHC and proton–antiproton (\(p \bar{p}\)) collisions at the Tevatron provide complementary information to the HERA DIS measurements. The PDFs are determined from \(\chi ^2\) fits of the theoretical predictions to the data. The rapid flow of new data from the LHC experiments and the corresponding theoretical developments, which are providing predictions for more complex processes at increasingly higher orders, has motivated the development of a tool to combine them together in a fast, efficient, open-source framework.

This paper describes the open-source QCD fit framework HERAFitter  [10], which includes a set of tools to facilitate global QCD analyses of pp, \(p\bar{p}\) and ep scattering data. It has been developed for the determination of PDFs and the extraction of fundamental parameters of QCD such as the heavy quark masses and the strong-coupling constant. It also provides a common framework for the comparison of different theoretical approaches. Furthermore, it can be used to test the impact of new experimental data on the PDFs and on the SM parameters.

This paper is organised as follows: The general structure of HERAFitter is presented in Sect. 2. In Sect. 3 the various processes available in HERAFitter and the corresponding theoretical calculations, performed within the framework of collinear factorisation and the DGLAP [1115] formalism, are discussed. In Sect. 4 tools for fast calculations of the theoretical predictions are presented. In Sect. 5 the methodology to determine PDFs through fits based on various \(\chi ^2\) definitions is described. In particular, various treatments of correlated experimental uncertainties are presented. Alternative approaches to the DGLAP formalism are presented in Sect. 6. The organisation of the HERAFitter code is discussed in Sect. 7, specific applications of the package are presented in Sect. 8, which is followed by a summary in Sect. 9.

2 The HERAFitter structure

The diagram in Fig. 1 gives a schematic overview of the HERAFitter structure and functionality, which can be divided into four main blocks:

Fig. 1
figure 1

Schematic overview of the HERAFitter program

Data: Measurements from various processes are provided in the HERAFitter package including the information on their uncorrelated and correlated uncertainties. HERA inclusive scattering data are directly sensitive to quark PDFs and indirectly sensitive to the gluon PDF through scaling violations and the longitudinal structure function \(F_L\). These data are the basis of any proton PDF extraction and are used in all current PDF sets from MSTW [16], CT [17], NNPDF [18], ABM [19], JR [20] and HERAPDF [21] groups. Measurements of charm and beauty quark production at HERA are sensitive to heavy quark PDFs and jet measurements have direct sensitivity to the gluon PDF. However, the kinematic range of HERA data mostly covers low and medium ranges in x. Measurements from the fixed-target experiments, the Tevatron and the LHC provide additional constraints on the gluon and quark distributions at high-x, better understanding of heavy quark distributions and decomposition of the light-quark sea. For these purposes, measurements from fixed-target experiments, the Tevatron and the LHC are included.

The processes that are currently available within the HERAFitter framework are listed in Table 1.

Table 1 The list of experimental data and theory calculations implemented in the HERAFitter package. The references for the individual calculations and schemes are given in the text

Theory: The PDFs are parametrised at a starting scale, \(Q_0^2\), using a functional form and a set of free parameters \(\mathbf {p}\). These PDFs are evolved to the scale of the measurements \(Q^2\), \(Q^2>Q_0^2\). By default, the evolution uses the DGLAP formalism [1115] as implemented in QCDNUM  [22]. Alternatively, the CCFM evolution [2326] as implemented in uPDFevolv [27] can be chosen. The prediction of the cross section for a particular process is obtained, assuming factorisation, by the convolution of the evolved PDFs with the corresponding parton scattering cross section. Available theory calculations for each process are listed in Table 1. Predictions using dipole models [2830] can also be obtained.

QCD analysis: The PDFs are determined in a least squares fit: a \(\chi ^2\) function, which compares the input data and theory predictions, is minimised with the MINUIT [31] program. In HERAFitter various choices are available for the treatment of experimental uncertainties in the \(\chi ^2\) definition. Correlated experimental uncertainties can be accounted for using a nuisance parameter method or a covariance matrix method as described in Sect. 5.2. Different statistical assumptions for the distributions of the systematic uncertainties, e.g. Gaussian or LogNormal [32], can also be studied (see Sect. 5.3).

Results: The resulting PDFs are provided in a format ready to be used by the LHAPDF library [33, 34] or by TMDlib [35]. HERAFitter drawing tools can be used to display the PDFs with their uncertainties at a chosen scale. As an example, the first set of PDFs extracted using HERAFitter from HERA I data, HERAPDF1.0 [21], is shown in Fig. 2 (taken from Ref. [21]). Note that following the conventions, the PDFs are displayed as parton momentum distributions \(xf(x,\mu _F^2)\).

Fig. 2
figure 2

Distributions of valence (\(xu_v\), \(xd_v\)), sea (xS) and the gluon (xg) PDFs in HERAPDF1.0 [21]. The gluon and the sea distributions are scaled down by a factor of 20. The experimental, model and parametrisation uncertainties are shown as coloured bands

3 Theoretical formalism using DGLAP evolution

In this section the theoretical formalism based on DGLAP [1115] equations is described.

A direct consequence of factorisation (Eq. 1) is that the scale dependence or “evolution” of the PDFs can be predicted by the renormalisation group equations. By requiring physical observables to be independent of \(\mu _{\mathrm {F}} \), a representation of the parton evolution in terms of the DGLAP equations is obtained:

$$\begin{aligned} \frac{\mathrm{d}f_a(x,\mu _{\mathrm {F}} ^2)}{\mathrm{d} \log \mu _{\mathrm {F}} ^2} = \sum _{b=q,\bar{q},g}\int _{x}^{1}\frac{\mathrm{d} z}{z} P_{ab}\left( \frac{x}{z};\mu _{\mathrm {F}} ^2\right) f_b(z,\mu _{\mathrm {F}} ^2), \end{aligned}$$
(2)

where the functions \(P_{ab}\) are the evolution kernels or splitting functions, which represent the probability of finding parton a in parton b. They can be calculated as a perturbative expansion in \(\alpha _s\). Once PDFs are determined at the initial scale \(\mu _F^2 = Q_0^2\), their evolution to any other scale \(Q^2 > Q_0^2\) is entirely determined by the DGLAP equations. The PDFs are then used to calculate cross sections for various different processes. Alternative approaches to the DGLAP evolution equations, valid in different kinematic regimes, are also implemented in HERAFitter and will be discussed in Sect. 6.

3.1 Deep inelastic scattering and proton structure

The formalism that relates the DIS measurements to pQCD and the PDFs has been described in detail in many extensive reviews (see, e.g., Ref. [36]) and it is only briefly summarised here. DIS is the process where a lepton scatters off the partons in the proton by the virtual exchange of a neutral (\(\gamma /Z\)) or charged (\(W^{\pm }\)) vector boson and, as a result, a scattered lepton and a hadronic final state are produced. The common DIS kinematic variables are the scale of the process \(Q^2\), which is the absolute squared four-momentum of the exchanged boson, Bjorken x, which can be related in the parton model to the momentum fraction that is carried by the struck quark, and the inelasticity y. These are related by \(y=Q^2/sx\), where s is the squared centre-of-mass energy.

The NC cross section can be expressed in terms of generalised structure functions:

$$\begin{aligned}&\frac{\mathrm{d}^2\sigma _{NC}^{e^{\pm } p}}{\mathrm{d}x\mathrm{d}Q^2}=\frac{2\pi \alpha ^2 Y_{+}}{xQ^4} \sigma _{r,NC}^{e^{\pm } p},\end{aligned}$$
(3)
$$\begin{aligned}&\sigma _{r,NC}^{e^{\pm } p}= \tilde{F}_2^{\pm } \mp \frac{Y_{-}}{Y_{+}}x \tilde{F}_3^{\pm } - \frac{y^2}{Y_{+}} \tilde{F}_L^{\pm }, \end{aligned}$$
(4)

where \(Y_{\pm } = 1 \pm (1-y)^2\) and \(\alpha \) is the electromagnetic-coupling constant. The generalised structure functions \(\tilde{F}_{2,3}\) can be written as linear combinations of the proton structure functions \(F^{\gamma }_2, F^{\gamma Z}_{2,3}\) and \(F^Z_{2,3}\), which are associated with pure photon exchange terms, photon–Z interference terms and pure Z exchange terms, respectively. The structure function \(\tilde{F}_2\) is the dominant contribution to the cross section, \(x \tilde{F}_3\) becomes important at high \(Q^2\) and \(\tilde{F}_L\) is sizable only at high y. In the framework of pQCD, the structure functions are directly related to the PDFs: at LO \(F_2\) is the weighted momentum sum of quark and antiquark distributions, \(F_2~\approx ~x \sum e^2_q (q+ \overline{q})\) (where \(e_q\) is the quark electric charge), \(xF_3\) is related to their difference, \(xF_3~\approx ~x \sum 2e_q a_q (q- \overline{q})\) (\(a_q\) is the axial-vector quark coupling), and \(F_L\) vanishes. At higher orders, terms related to the gluon distribution appear, in particular \(F_L\) is strongly related to the low-x gluon.

The inclusive CC ep cross section, analogous to the NC ep case, can be expressed in terms of another set of structure functions, \(\tilde{W}\):

$$\begin{aligned}&\frac{\mathrm{d}^2\sigma _{CC}^{e^{\pm } p}}{\mathrm{d}x\mathrm{d}Q^2}=\frac{1\pm P}{2}\frac{G^2_F}{2\pi x} \frac{m^2_W}{m^2_W+Q^2} \sigma _{r,CC}^{e^{\pm } p}\end{aligned}$$
(5)
$$\begin{aligned}&\sigma _{r,CC}^{e^{\pm } p}= Y_{+} \tilde{W}_2^{\pm } \mp Y_{-}x \tilde{W}_3^{\pm } - y^2 \tilde{W}_L^{\pm }, \end{aligned}$$
(6)

where P represents the lepton beam polarisation. At LO in \(\alpha _s\), the CC \(e^+p\) and \(e^-p\) cross sections are sensitive to different combinations of the quark flavour densities:

$$\begin{aligned} \sigma _{r,CC}^{e^{+} p}\approx & {} x [\overline{u} + \overline{c}] + (1-y)^2 x [d+s], \end{aligned}$$
(7)
$$\begin{aligned} \sigma _{r,CC}^{e^{-} p}\approx & {} x[u+c] + (1-y)^2 x[\overline{d} + \overline{s}]. \end{aligned}$$
(8)

Beyond LO, the QCD predictions for the DIS structure functions are obtained by convoluting the PDFs with appropriate hard-process scattering matrix elements, which are referred to as coefficient functions.

The DIS measurements span a large range of \(Q^2\) from a few \(\,\text {GeV} ^2\) to about \(10^5\,\text {GeV} ^2\), crossing heavy quark mass thresholds, thus the treatment of heavy quark (charm and beauty) production and the chosen values of their masses become important. There are different schemes for the treatment of heavy quark production. Several variants of these schemes are implemented in HERAFitter and they are briefly discussed below.

Zero-mass-variable flavour number (ZM-VFN): In this scheme [37], the heavy quarks appear as partons in the proton at \(Q^2\) values above \({\sim } m_h^2\) (heavy quark mass) and they are then treated as massless in both the initial and the final states of the hard-scattering process. The lowest-order process is the scattering of the lepton off the heavy quark via electroweak boson exchange. This scheme is expected to be reliable only in the region where \(Q^2\gg m_h^2\), and it is inaccurate for lower \(Q^2\) values since it misses corrections of order \(m_h^2/Q^2\), while the other schemes mentioned below are accurate up to order \(\varLambda _\mathrm{QCD}^2/Q^2\) albeit with different perturbative orderings. In HERAFitter this scheme is available for the DIS structure function calculation via the interface to the QCDNUM [22] package, thus it benefits from the fast QCDNUM convolution engine.

Fixed flavour number (FFN): In this rigorous quantum field theory scheme [3840], only the gluon and the light quarks are considered as partons within the proton and massive quarks are produced perturbatively in the final state. The lowest-order process is the heavy quark-antiquark pair production via boson-gluon fusion. In HERA Fitter this scheme can be accessed via the QCDNUM implementation or through the interface to the open-source code OPENQCDRAD [41] as implemented by the ABM group. This scheme is reliable only for \(Q\sim m_h^2\), since it does not resum logarithms of the form \(\ln (Q^2/m_h^2)\) which become important for \(Q^2\gg m_h^2\). In QCDNUM, the calculation of the heavy quark contributions to DIS structure functions are available at Next-to-Leading Order (NLO) and only electromagnetic exchange contributions are taken into account. In the OPEN QCDRAD implementation the heavy quark contributions to CC structure functions are also available and, for the NC case, the QCD corrections to the coefficient functions in Next-to-Next-to Leading Order (NNLO) are provided in the best currently known approximation [42, 43]. The OPENQCDRAD implementation uses in addition the running heavy quark mass in the \(\overline{\text {MS}}\) scheme [44]. It is sometimes argued that this \(\overline{\text {MS}}\) scheme reduces the sensitivity of the DIS cross sections to higher-order corrections. It is also known to have smaller non-perturbative corrections than the pole mass scheme [45].

General-mass variable flavour number (GM-VFN): In this scheme (see [46] for a comprehensive review), heavy quark production is treated for \(Q^2 \sim m_h^2\) in the FFN scheme and for \(Q^2 \gg m_h^2\) in the massless scheme with a suitable interpolation in between. The details of this interpolation differ between implementations. The groups that use GM-VFN schemes in PDFs are MSTW, CT (CTEQ), NNPDF, and HERAPDF. HERAFitter implements different variants of the GM-VFN scheme.

  • GM-VFN Thorne–Roberts scheme: The Thorne–Roberts (TR) scheme [47] was designed to provide a smooth transition from the massive FFN scheme at low scales \(Q^2 \sim m_h^2\) to the massless ZM-VFNS scheme at high scales \(Q^2 \gg m_h^2\). Because the original version was technically difficult to implement beyond NLO, it was updated to the TR\(^\prime \) scheme [48]. There are two variants of the TR\(^\prime \) schemes: TR\(^\prime \) standard (as used in MSTW PDF sets [16, 48]) and TR\(^\prime \) optimal [49], with a smoother transition across the heavy quark threshold region. Both TR\(^\prime \) variants are accessible within the HERAFitter package at LO, NLO and NNLO. At NNLO, an approximation is needed for the massive \(\mathscr {O}(\alpha _s^3)\) NC coefficient functions relevant for \(Q^2\sim m_h^2\), as for the FFN scheme.

  • GM-VFN ACOT scheme: The Aivazis–Collins–Olness–Tung (ACOT) scheme belongs to the group of VFN factorisation schemes that use the renormalisation method of Collins–Wilczek–Zee (CWZ) [50]. This scheme unifies the low scale \(Q^2 \sim m_h^2\) and high scale \(Q^2 > m_h^2\) regions in a coherent framework across the full energy range. Within the ACOT package, the following variants of the ACOT \(\overline{\text {MS}}\) scheme are available at LO and NLO: ACOT-Full [51], S-ACOT-\(\chi \) [52, 53] and ACOT-ZM [51]. For the longitudinal structure function higher-order calculations are also available. A comparison of PDFs extracted from QCD fits to the HERA data with the TR\(^\prime \) and ACOT-Full schemes is illustrated in Fig. 3 (taken from [21]).

Fig. 3
figure 3

Distributions of valence (\(xu_v\), \(xd_v\)), sea (xS) and the gluon (xg) PDFs in HERAPDF1.0 [21] with their total uncertainties at the scale of \(Q^2 = 10\ \,\text {GeV} ^2\) obtained using the TR\(^\prime \) scheme and compared to the PDFs obtained with the ACOT-Full scheme using the k-factor technique (red). The gluon and the sea distributions are scaled down by a factor of 20

3.2 Electroweak corrections to DIS

Calculations of higher-order electroweak corrections to DIS at HERA are available in HERAFitter in the on-shell scheme. In this scheme, the masses of the gauge bosons \(m_W\) and \(m_Z\) are treated as basic parameters together with the top, Higgs and fermion masses. These electroweak corrections are based on the EPRC package [54]. The code calculates the running of the electromagnetic coupling \(\alpha \) using the most recent parametrisation of the hadronic contribution [55] as well as an older version from Burkhard [56].

3.3 Diffractive PDFs

About 10 % of deep inelastic interactions at HERA are diffractive, such that the interacting proton stays intact (\(ep\rightarrow eXp\)). The outgoing proton is separated from the rest of the final hadronic system, X, by a large rapidity gap. Such events are a subset of DIS where the hadronic state X comes from the interaction of the virtual photon with a colour-neutral cluster stripped off the proton [57]. The process can be described analogously to the inclusive DIS, by means of the diffractive parton distributions (DPDFs) [58]. The parametrization of the colour-neutral exchange in terms of factorisable ‘hard’ Pomeron and a secondary Reggeon [59], both having a hadron-like partonic structure, has proved remarkably successful in the description of most of the diffractive data. It has also provided a practical method to determine DPDFs from fits to the diffractive cross sections.

In addition to the usual DIS variables x, \(Q^2\), extra kinematic variables are needed to describe the diffractive process. These are the squared four-momentum transfer of the exchanged Pomeron or Reggeon, t, and the mass \(m_X\) of the diffractively produced final state. In practice, the variable \(m_X\) is often replaced by the dimensionless quantity \(\beta =\frac{Q^2}{m_X^2+Q^2-t}\). In models based on a factorisable Pomeron, \(\beta \) may be viewed at LO as the fraction of the Pomeron longitudinal momentum, \(x_{{IP}}\), which is carried by the struck parton, \(x=\beta x_{{IP}}\), where P denotes the momentum of the proton.

For the inclusive case, the diffractive cross section reads

$$\begin{aligned} \begin{array}{lcl} \frac{\mathrm{d}^4\sigma }{\mathrm{d}\beta \,\mathrm{d}Q^2\mathrm{d}x_{{IP}}\,\mathrm{d}t} = \frac{2\pi \alpha ^2}{\beta Q^4}\, \left( 1 + (1-y)^2 \right) \overline{\sigma }^{D(4)}(\beta ,Q^2,x_{{IP}},t) \end{array}\nonumber \\ \end{aligned}$$
(9)

with the “reduced cross section”:

$$\begin{aligned} \begin{array}{lcl} \overline{\sigma }^{D(4)} = F_2^{D(4)} - \frac{y^2}{1 + (1-y)^2}\, F_L^{D(4)}. \end{array} \end{aligned}$$
(10)

The diffractive structure functions can be expressed as convolutions of calculable coefficient functions with the diffractive quark and gluon distribution functions, which in general depend on \(x_{I\!P}\), \(Q^2\), \(\beta \) and t.

The DPDFs [60, 61] in HERAFitter are implemented as a sum of two factorised contributions:

$$\begin{aligned} \varPhi _{IP}(x_{IP},t)\, f^{{IP}}_{a}(\beta ,Q^2) + \varPhi _{I\!R}(x_{IP},t)\, f^{{I\!R}}_{a}(\beta ,Q^2), \end{aligned}$$
(11)

where \(\varPhi (x_{IP},t)\) are the Reggeon and Pomeron fluxes. The Reggeon PDFs, \(f^{{I\!R}}_{a}\) are fixed as those of the pion, while the Pomeron PDFs, \(f^{{IP}}_{a}\), can be obtained from a fit to the data.

3.4 Drell–Yan processes in pp or \(p\bar{p}\) collisions

The Drell–Yan (DY) process provides valuable information about PDFs. In pp and \(p\bar{p}\) scattering, the \(Z/\gamma ^*\) and W production probe bi-linear combinations of quarks. Complementary information on the different quark densities can be obtained from the \(W^{\pm }\) asymmetry (d, u and their ratio), the ratio of the W and Z cross sections (sensitive to the flavour composition of the quark sea, in particular to the s-quark distribution), and associated W and Z production with heavy quarks (sensitive to s, c- and b-quark densities). Measurements at large boson transverse momentum \(p_T\gtrsim m_{W,Z}\) are potentially sensitive to the gluon distribution [62].

At LO the DY NC cross section triple differential in invariant mass m, boson rapidity y and lepton scattering angle \(\cos \theta \) in the parton centre-of-mass frame can be written as [63, 64]:

$$\begin{aligned} \frac{\mathrm{d}^3\sigma }{\mathrm{d}m\mathrm{d}y \mathrm{d}\cos \theta }= & {} \frac{\pi \alpha ^2}{3ms}\sum \limits _{q}\hat{\sigma }^{q}(\cos \theta , m) \nonumber \\&\times \left[ f_q(x_1,m^2)f_{\bar{q}}(x_2,m^2) + (q\leftrightarrow \bar{q})\right] ,\nonumber \\ \end{aligned}$$
(12)

where s is the squared centre-of-mass beam energy, the parton momentum fractions are given by \(x_{1,2} = \frac{m}{\sqrt{s}}\exp (\pm y)\), \(f_q(x_1,m^2)\) are the PDFs at the scale of the invariant mass, and \(\hat{\sigma }^{q}\) is the parton–parton hard-scattering cross section.

The corresponding triple differential CC cross section has the form

$$\begin{aligned} \frac{\mathrm{d}^3\sigma }{\mathrm{d}m\mathrm{d}y\mathrm{d}\cos \theta }= & {} \frac{\pi \alpha ^2}{48s\sin ^4\theta _W} \frac{m^3(1-\cos \theta )^2}{(m^2-m_W^2)+\varGamma _W^2m_W^2} \nonumber \\&\times \sum _{q_1,q_2}V_{q_1q_2}^2f_{q_1}(x_1,m^2)f_{q_2}(x_2,m^2), \end{aligned}$$
(13)

where \(V_{q_1q_2}\) is the Cabibbo–Kobayashi–Maskawa (CKM) quark mixing matrix and \(m_W\) and \(\varGamma _W\) are the W boson mass and decay width, respectively.

The simple LO form of these expressions allows for the analytic calculations of integrated cross sections. In both NC and CC expressions the PDFs depend only on the boson rapidity y and invariant mass m, while the integral in \(\cos \theta \) can be evaluated analytically even for the case of realistic kinematic cuts.

Beyond LO, the calculations are often time-consuming and Monte Carlo generators are employed. Currently, the predictions for W and \(Z/\gamma ^*\) production are available up to NNLO and the predictions for W and Z production in association with heavy-flavour quarks are available to NLO.

There are several possibilities to obtain the theoretical predictions for DY production in HERAFitter. The NLO and NNLO calculations can be implemented using k-factor or fast grid techniques (see Sect. 4 for details), which are interfaced to programs such as MCFM [6567], available for NLO calculations, or FEWZ [68] and DYNNLO [69] for NLO and NNLO, with electroweak corrections estimated using MCSANC [70, 71].

3.5 Jet production in ep and pp or \(p \bar{p}\) collisions

The cross section for production of high \(p_T\) hadronic jets is sensitive to the high-x gluon PDF (see, e.g., Ref. [16]). Therefore this process can be used to improve the determination of the gluon PDF, which is particularly important for Higgs production and searches for new physics. Jet production cross sections are currently known only to NLO. Calculations for higher-order contributions to jet production in pp collisions are in progress [7274]. Within HERAFitter, the NLOJet++ program [75, 76] may be used for calculations of jet production. Similarly to the DY case, the calculation is very demanding in terms of computing power. Therefore fast grid techniques are used to facilitate the QCD analyses including jet cross section measurements in ep, pp and \(p\bar{p}\) collisions. For details see Sect. 4.

3.6 Top-quark production in pp or \(p \bar{p}\) collisions

At the LHC, top-quark pairs (\(t \bar{t}\)) are produced dominantly via gg fusion. Thus, LHC measurements of the \(t \bar{t}\) cross section provide additional constraints on the gluon distribution at medium to high values of x, on \(\alpha _{\mathrm {s}} \) and on the top-quark mass, \(m_t\) [77]. Precise predictions for the total inclusive \(t \bar{t}\) cross section are available up to NNLO [78] and they can be computed within HERAFitter via an interface to the program HATHOR [79].

Fixed-order QCD predictions for the differential \(t \bar{t}\) cross section at NLO can be obtained by using the program MCFM [67, 8083] interfaced to HERAFitter with fast grid techniques.

Single top quarks are produced by exchanging electroweak bosons and the measurement of their production cross section can be used, for example, to probe the ratio of the u and d distributions in the proton as well as the b-quark PDF. Predictions for single-top production are available at the NLO accuracy by using MCFM.

Approximate predictions up to NNLO in QCD for the differential \(t\bar{t}\) cross section in one-particle inclusive kinematics are available in HERAFitter through an interface to the program DiffTop [84, 85]. It uses methods of QCD threshold resummation beyond the leading logarithmic approximation. This allows the users to estimate the impact of the recent \(t\bar{t}\) differential cross section measurements on the uncertainty of the gluon density within a QCD PDF fit at NNLO. A fast evaluation of the DiffTop differential cross sections is possible via an interface to fast grid computations [86].

4 Computational techniques

Precise measurements require accurate theoretical predictions in order to maximise their impact in PDF fits. Perturbative calculations become more complex and time-consuming at higher orders due to the increasing number of relevant Feynman diagrams. The direct inclusion of computationally demanding higher-order calculations into iterative fits is thus not possible currently. However, a full repetition of the perturbative calculation for small changes in input parameters is not necessary at each step of the iteration. Two methods have been developed which take advantage of this to solve the problem: the k-factor technique and the fast grid technique. Both are available in HERAFitter.

4.1 k-factor technique

The k-factors are defined as the ratio of the prediction of a higher-order (slow) pQCD calculation to a lower-order (fast) calculation using the same PDF. Because the k-factors depend on the phase space probed by the measurement, they have to be stored including their dependence on the relevant kinematic variables. Before the start of a fitting procedure, a table of k-factors is computed once for a fixed PDF with the time-consuming higher-order code. In subsequent iteration steps the theory prediction is derived from the fast lower-order calculation by multiplying by the pre-tabulated k-factors.

This procedure, however, neglects the fact that the k-factors are PDF dependent, and as a consequence, they have to be re-evaluated for the newly determined PDF at the end of the fit for a consistency check. The fit must be repeated until input and output k-factors have converged. In summary, this technique avoids iteration of the higher-order calculation at each step, but still requires typically a few re-evaluations.

In HERAFitter, the k-factor technique can also be used for the fast computation of the time-consuming GM-VFN schemes for heavy quarks in DIS. “FAST” heavy-flavour schemes are implemented with k-factors defined as the ratio of calculations at the same perturbative order but for massive vs. massless quarks, e.g. NLO (massive)/NLO (massless). These k-factors are calculated only for the starting PDF and hence, the “FAST” heavy-flavour schemes should only be used for quick checks. Full heavy-flavour schemes should be used by default. However, for the ACOT scheme, due to exceptionally long computation times, the k-factors are used in the default setup of HERAFitter.

4.2 Fast grid techniques

Fast grid techniques exploit the fact that iterative PDF fitting procedures do not impose completely arbitrary changes to the types and shapes of the parameterised functions that represent each PDF. Instead, it can be assumed that a generic PDF can be approximated by a set of interpolating functions with a sufficient number of judiciously chosen support points. The accuracy of this approximation is checked and optimised such that the approximation bias is negligibly small compared to the experimental and theoretical accuracy. This method can be used to perform the time-consuming higher-order calculations (Eq. 1) only once for the set of interpolating functions. Further iterations of the calculation for a particular PDF set are fast, involving only sums over the set of interpolators multiplied by factors depending on the PDF. This approach can be used to calculate the cross sections of processes involving one or two hadrons in the initial state and to assess their renormalisation and factorisation scale variation.

This technique serves to facilitate the inclusion of time-consuming NLO jet cross section predictions into PDF fits and has been implemented in the two projects, fastNLO [87, 88] and APPLGRID [89, 90]. The packages differ in their interpolation and optimisation strategies, but both of them construct tables with grids for each bin of an observable in two steps: in the first step, the accessible phase space in the parton momentum fractions x and the renormalisation and factorisation scales \(\mu _{\mathrm {R}}\) and \(\mu _{\mathrm {F}}\) is explored in order to optimise the table size. In the second step the grid is filled for the requested observables. Higher-order cross sections can then be obtained very efficiently from the pre-produced grids while varying externally provided PDF sets, \(\mu _{\mathrm {R}}\) and \(\mu _{\mathrm {F}}\), or \(\alpha _s(\mu _R)\). This approach can in principle be extended to arbitrary processes. This requires an interface between the higher-order theory programs and the fast interpolation frameworks. For the HERAFitter implementations of the two packages, the evaluation of \(\alpha _s\) is done consistently with the PDF evolution code. A brief description of each package is given below:

  • The fastNLO project [88] has been interfaced to the NLOJet++ program [75] for the calculation of jet production in DIS [91] as well as 2- and 3-jet production in hadron–hadron collisions at NLO [76, 92]. Threshold corrections at 2-loop order, which approximate NNLO for the inclusive jet cross section for pp and \(p\bar{p}\), have also been included into the framework [93] following Ref. [94]. The latest version of the fastNLO convolution program [95] allows for the creation of tables in which renormalisation and factorisation scales can be varied as a function of two predefined observables, e.g. jet transverse momentum \(p_{\perp }\) and Q for DIS. Recently, the differential calculation of top-pair production in hadron collisions at approximate NNLO [84] has been interfaced to fastNLO [86]. The fastNLO code is available online [96]. Jet cross section grids computed for the kinematics of various experiments can be downloaded from this site. The fastNLO libraries and tables with theory predictions for comparison to particular cross section measurements are included in the HERAFitter package. The interface to the fastNLO tables from within HERAFitter was used in a recent CMS analysis, where the impact on extraction of the PDFs from the inclusive jet cross section is investigated [97].

  • In the APPLGRID package [90, 98], in addition to jet cross sections for \(pp(p\bar{p})\) and DIS processes, calculations of DY production and other processes are also implemented using an interface to the standard MCFM parton level generator [6567]. Variation of the renormalisation and factorisation scales is possible a posteriori, when calculating theory predictions with the APPLGRID tables, and independent variation of \(\alpha _S\) is also allowed. For predictions beyond NLO, the k-factors technique can also be applied within the APPLGRID framework. As an example, the HERAFitter interface to APPLGRID was used by the ATLAS [99] and CMS [100] Collaborations to extract the strange quark distribution of the proton. The ATLAS strange PDF extracted employing these techniques is displayed in Fig. 4 together with a comparison to the global PDF sets CT10 [17] and NNPDF2.1 [18] (taken from [99]).

Fig. 4
figure 4

The strange antiquark distribution versus x for the ATLAS epWZ free \(\bar{s}\) NNLO fit [99] (magenta band) compared to predictions from NNPDF2.1 (blue hatched) and CT10 (green hatched) at \(Q^2 = 1.9\ \,\text {GeV} ^2\). The ATLAS fit was performed using a k-factor approach for NNLO corrections

5 Fit methodology

When performing a QCD analysis to determine PDFs there are various assumptions and choices to be made concerning, for example, the functional form of the input parametrisation, the treatment of heavy quarks and their mass values, alternative theoretical calculations, alternative representations of the fit \(\chi ^2\) and for different ways of treating correlated systematic uncertainties. It is useful to discriminate or quantify the effect of a chosen ansatz within a common framework and HERAFitter is optimally designed for such tests. The methodology employed by HERAFitter relies on a flexible and modular framework that allows independent integration of state-of-the-art techniques, either related to the inclusion of a new theoretical calculation, or of new approaches to the treatment of the data and their uncertainties.

In this section we describe the available options for the fit methodology in HERAFitter. In addition, as an alternative approach to a complete QCD fit, the Bayesian reweighting method, which is also available in HERAFitter, is described.

5.1 Functional forms for PDF parametrisation

Careful consideration must be taken when assigning the PDF freedom via functional forms. The PDFs can be parametrised using several predefined functional forms and flavour decompositions, as described briefly below. The choice of functional form can lead to a different shape for the PDF distributions, and consequently the size of the PDF uncertainties can depend on the flexibility of the parametric choice.

Standard polynomials: The standard-polynomial form is the most commonly used. A polynomial functional form is used to parametrise the x-dependence of the PDFs, where the index j denotes each parametrised PDF flavour:

$$\begin{aligned} xf_j(x) = A_j x^{B_j} (1-x)^{C_j} P_j(x). \end{aligned}$$
(14)

The parametrised PDFs are the valence distributions \(xu_v\) and \(xd_v\), the gluon distribution xg, and the light sea quark distributions, \(x\bar{u}\), \(x\bar{d}\), \(x\bar{s}\), at the starting scale, which is chosen below the charm mass threshold. The form of polynomials \(P_j(x)\) can be varied. The form \((1 + \epsilon _j \sqrt{x} + D_j x + E_j x^2)\) is used for the HERAPDF [21] with additional constraints relating to the flavour decomposition of the light sea. This parametrisation is termed HERAPDF-style. The polynomial can also be parametrised in the CTEQ-style, where \(P_j(x)\) takes the form \(e^{a_3x} (1 + e^{a_4} x + e^{a_5} x^2)\) and, in contrast to the HERAPDF-style, this is positive by construction. QCD number and momentum sum rules are used to determine the normalisations A for the valence and gluon distributions, and the sum-rule integrals are solved analytically.

Bi-Log-normal distributions: This parametrisation is motivated by multi-particle statistics and has the following functional form:

$$\begin{aligned} xf_j(x)=a_j x^{p_j-b_j\log (x)}(1-x)^{q_j-d_j \log (1-x)}. \end{aligned}$$
(15)

This function can be regarded as a generalisation of the standard polynomial form described above, however, numerical integration of Eq. 15 is required in order to impose the QCD sum rules.

Chebyshev polynomials: A flexible parametrisation based on the Chebyshev polynomials can be employed for the gluon and sea distributions. Polynomials with argument \(\log (x)\) are considered for better modelling the low-x asymptotic behaviour of those PDFs. The polynomials are multiplied by a factor of \((1-x)\) to ensure that they vanish as \(x\rightarrow 1\). The resulting parametric form reads

$$\begin{aligned} x g(x)= & {} A_g \left( 1-x\right) \sum _{i=0}^{N_g-1} A_{g_i} T_i \left( -\frac{\textstyle 2\log x - \log x_\mathrm{min} }{\textstyle \log x_\mathrm{min} } \right) , \nonumber \\\end{aligned}$$
(16)
$$\begin{aligned} x S(x)= & {} \left( 1-x\right) \sum _{i=0}^{N_S-1} A_{S_i} T_i \left( -\frac{\textstyle 2\log x - \log x_\mathrm{min} }{\textstyle \log x_\mathrm{min} } \right) , \end{aligned}$$
(17)

where \(T_i\) are first-type Chebyshev polynomials of order i. The normalisation factor \(A_g\) is derived from the momentum sum rule analytically. Values of \(N_{g,S}\) to 15 are allowed; however, the fit quality is already similar to that of the standard-polynomial parametrisation from \(N_{g,S} \ge 5\) and has a similar number of free parameters [101].

External PDFs: HERAFitter also provides the possibility to access external PDF sets, which can be used to compute theoretical predictions for the cross sections for all the processes available in HERAFitter. This is possible via an interface to LHAPDF  [33, 34] providing access to the global PDF sets. HERAFitter also allows one to evolve PDFs from LHAPDF using QCDNUM. Figure 5 illustrates a comparison of various gluon PDFs accessed from LHAPDF as produced with the drawing tools available in HERAFitter.

Fig. 5
figure 5

The gluon PDF as extracted by various groups at the scale of \(Q^2=4 \ \,\text {GeV} ^2\), plotted using the drawing tools from HERAFitter

5.2 Representation of \(\chi ^2\)

The PDF parameters are determined in HERAFitter by minimisation of a \(\chi ^2\) function taking into account correlated and uncorrelated measurement uncertainties. There are various forms of \(\chi ^2\), e.g. using a covariance matrix or providing nuisance parameters to encode the dependence of each correlated systematic uncertainty for each measured data point. The options available in HERAFitter are the following:

  • Covariance matrix representation: For a data point \(\mu _i\) with a corresponding theory prediction \(m_i\), the \(\chi ^2\) function can be expressed in the following form:

    $$\begin{aligned} \chi ^2 (m)= & {} \sum _{i,k}(m_i-\mu _i)C^{-1}_{ik}(m_k-\mu _k), \end{aligned}$$
    (18)

    where the experimental uncertainties are given as a covariance matrix \(C_{ik}\) for measurements in bins i and k. The covariance matrix \(C_{ik}\) is given by a sum of statistical, uncorrelated and correlated systematic contributions:

    $$\begin{aligned} C_{ik}= & {} C^\mathrm{stat}_{ik}+C^\mathrm{uncor}_{ik}+C^\mathrm{sys}_{ik}. \end{aligned}$$
    (19)

    Using this representation one cannot distinguish the effect of each source of systematic uncertainty.

  • Nuisance parameter representation: In this case, \(\chi ^2\) is expressed as

    $$\begin{aligned} \chi ^2\left( \varvec{m},\varvec{b}\right)= & {} \sum _i \frac{\left[ {\mu _i} - m_i \left( 1 - \sum _j \gamma ^i_j b_j \right) \right] ^2}{ \textstyle \delta ^2_{i,\mathrm{unc}}m_i^2 + \delta ^2_{i,\mathrm{stat}}\, {\mu _i} m_i \left( 1 - \sum _j \gamma ^i_j b_j\right) } \nonumber \\&+\, \sum _j b^2_j, \end{aligned}$$
    (20)

    where \(\delta _{i,\mathrm stat}\) and \(\delta _{i,\mathrm unc}\) are relative statistical and uncorrelated systematic uncertainties of the measurement i. Further, \(\gamma ^i_j\) quantifies the sensitivity of the measurement to the correlated systematic source j. The function \(\chi ^2\) depends on the set of systematic nuisance parameters \(b_j\). This definition of the \(\chi ^2\) function assumes that systematic uncertainties are proportional to the central prediction values (multiplicative uncertainties, \(m_i(1-\sum _j\gamma _j^ib_j)\)), whereas the statistical uncertainties scale with the square root of the expected number of events. However, additive treatment of uncertainties is also possible in HERAFitter. During the \(\chi ^2\) minimisation, the nuisance parameters \(b_j\) and the PDFs are determined, such that the effect of different sources of systematic uncertainties can be distinguished.

  • Mixed form representation: In some cases, the statistical and systematic uncertainties of experimental data are provided in different forms. For example, the correlated experimental systematic uncertainties are available as nuisance parameters, but the bin-to-bin statistical correlations are given in the form of a covariance matrix. HERAFitter offers the possibility to include such mixed forms of information.

Any source of measured systematic uncertainty can be treated as additive or multiplicative, as described above. The statistical uncertainties can be included as additive or following the Poisson statistics. Minimisation with respect to nuisance parameters is performed analytically, however, for more detailed studies of correlations individual nuisance parameters can be included into the MINUIT minimisation.

5.3 Treatment of the experimental uncertainties

Three distinct methods for propagating experimental uncertainties to PDFs are implemented in HERAFitter and reviewed here: the Hessian, Offset and Monte Carlo method.

  • Hessian (Eigenvector) method: The PDF uncertainties reflecting the data experimental uncertainties are estimated by examining the shape of the \(\chi ^2\) function in the neighbourhood of the minimum [102]. Following the approach of Ref. [102], the Hessian matrix is defined by the second derivatives of \(\chi ^2\) on the fitted PDF parameters. The matrix is diagonalised and the Hessian eigenvectors are computed. Due to orthogonality these vectors correspond to independent sources of uncertainty in the obtained PDFs.

  • Offset method: The Offset method [103] uses the \(\chi ^2\) function for the central fit, but only uncorrelated uncertainties are taken into account. The goodness of the fit can no longer be judged from the \(\chi ^2\) since correlated uncertainties are ignored. The correlated uncertainties are propagated into the PDF uncertainties by performing variants of the fit with the experimental data varied by \(\pm 1 \sigma \) from the central value for each systematic source. The resulting deviations of the PDF parameters from the ones obtained in the central fit are statistically independent, and they can be combined in quadrature to derive a total PDF systematic uncertainty. The uncertainties estimated by the offset method are generally larger than those from the Hessian method.

  • Monte Carlo method: The Monte Carlo (MC) technique [104, 105] can also be used to determine PDF uncertainties. The uncertainties are estimated using pseudo-data replicas (typically \({>}100\)) randomly generated from the measurement central values and their systematic and statistical uncertainties taking into account all point-to-point correlations. The QCD fit is performed for each replica and the PDF central values and their experimental uncertainties are estimated from the distribution of the PDF parameters obtained in these fits, by taking the mean values and standard deviations over the replicas.

    The MC method has been checked against the standard error estimation of the PDF uncertainties obtained by the Hessian method. Good agreement was found between the methods provided that Gaussian distributions of statistical and systematic uncertainties are assumed in the MC approach [32]. A comparison is illustrated in Fig. 6. Similar findings were reported by the MSTW global analysis [106].

    Since the MC method requires large number of replicas, the eigenvector representation is a more convenient way to store the PDF uncertainties. It is possible to transform MC to eigenvector representation as shown by [107]. Tools to perform this transformation are provided with HERAFitter and were recently employed for the representation of correlated sets of PDFs at different perturbative orders [108].

Fig. 6
figure 6

Comparison between the standard error calculations as employed by the Hessian approach (black lines) and the MC approach (with more than 100 replicas) assuming Gaussian distribution for uncertainty distributions, shown here for each replica (green lines) together with the evaluated standard deviation (red lines) [32]. The black and red lines in the figure are superimposed because agreement of the methods is so good that it is hard to distinguish them

The nuisance parameter representation of \(\chi ^2\) in Eq. 20 is derived assuming symmetric experimental errors, however, the published systematic uncertainties are often asymmetric. HERAFitter provides the possibility to use asymmetric systematic uncertainties. The implementation relies on the assumption that asymmetric uncertainties can be described by a parabolic function. The nuisance parameter in Eq. 20 is modified as follows:

$$\begin{aligned} \gamma ^i_{j} \rightarrow \omega ^i_{j}b_j + \gamma ^i_{j}, \end{aligned}$$
(21)

where the coefficients \(\omega ^i_{j}\), \(\gamma ^i_{j}\) are defined from the maximum and minimum shifts of the cross sections due to a variation of the systematic uncertainty j, \(S_{ij}^{\pm }\),

$$\begin{aligned} \omega ^i_{j}=\frac{1}{2}\left( S_{ij}^{+}+S_{ij}^{-}\right) ,\quad \gamma ^i_{j}=\frac{1}{2}\left( S_{ij}^{+}-S_{ij}^{-}\right) . \end{aligned}$$
(22)

5.4 Treatment of the theoretical input

The results of a QCD fit depend not only on the input data but also on the input parameters used in the theoretical calculations. Nowadays, PDF groups address the impact of the choices of theoretical parameters by providing alternative PDFs with different choices of the mass of the charm quarks, \(m_c\), mass of the bottom quarks, \(m_b\), and the value of \(\alpha _{\mathrm {s}}(m_Z) \). Other important aspects are the choice of the functional form for the PDFs at the starting scale and the value of the starting scale itself. HERAFitter provides the possibility of different user choices of all this input.

5.5 Bayesian reweighting techniques

As an alternative to performing a full QCD fit, HERAFitter allows the user to assess the impact of including new data in an existing fit using the Bayesian Reweighting technique. The method provides a fast estimate of the impact of new data on PDFs. Bayesian Reweighting was first proposed for PDF sets delivered in the form of MC replicas by [104] and further developed by the NNPDF Collaboration [109, 110]. More recently, a method to perform Bayesian Reweighting studies starting from PDF fits for which uncertainties are provided in the eigenvector representation has also been developed [106]. The latter is based on generating replica sets by introducing Gaussian fluctuations on the central PDF set with a variance determined by the PDF uncertainty given by the eigenvectors. Both reweighting methods are implemented in HERAFitter. Note that the precise form of the weights used by both methods has recently been questioned [111, 112].

The Bayesian Reweighting technique relies on the fact that MC replicas of a PDF set give a representation of the probability distribution in the space of PDFs. In particular, the PDFs are represented as ensembles of \(N_\mathrm{rep}\) equiprobable (i.e. having weights equal to unity) replicas, \(\{f\}\). The central value for a given observable, \(\mathscr {O}(\{f\})\), is computed as the average of the predictions obtained from the ensemble as

$$\begin{aligned} \langle \mathscr {O}(\{f\})\rangle = \frac{1}{N_{\mathrm {rep}}} \sum _{k=1}^{N_{\mathrm {rep}}} \mathscr {O}(f^{k}), \end{aligned}$$
(23)

and the uncertainty as the standard deviation of the sample.

Upon inclusion of new data the prior probability distribution, given by the original PDF set, is modified according to the Bayes Theorem such that the weight of each replica, \(w_k\), is updated according to

$$\begin{aligned} w_k = \frac{(\chi ^2_k)^{\frac{1}{2} (N_{\mathrm {data}}-1) } e^{-\frac{1}{2}\chi ^2_k}}{ \frac{1}{N_{\mathrm {rep}}} \sum ^{N_{\mathrm {rep}}}_{k=1}(\chi ^2_k)^{\frac{1}{2}(N_{\mathrm {data}}-1)} e^{-\frac{1}{2}\chi ^2_k} }, \end{aligned}$$
(24)

where \(N_{\mathrm {data}}\) is the number of new data points, k denotes the specific replica for which the weight is calculated and \(\chi ^2_k\) is the \(\chi ^2\) of the new data obtained using the kth PDF replica. Given a PDF set and a corresponding set of weights, which describes the impact of the inclusion of new data, the prediction for a given observable after inclusion of the new data can be computed as the weighted average,

$$\begin{aligned} \langle \mathscr {O}(\{f\})\rangle = \frac{1}{N_{\mathrm {rep}}} \sum _{k=1}^{N_{\mathrm {rep}}} w_k \mathscr {O}(f^{k}). \end{aligned}$$
(25)

To simplify the use of a reweighted set, an unweighted set (i.e. a set of equiprobable replicas which incorporates the information contained in the weights) is generated according to the unweighting procedure described in [109]. The number of effective replicas of a reweighted set is measured by its Shannon Entropy [110],

$$\begin{aligned} N_{\mathrm {eff}}\equiv \exp \left\{ \frac{1}{N_{\mathrm {rep}}}\sum _{k=1}^{N_{\mathrm {rep}}}w_k\ln (N_{\mathrm {rep}}/w_k)\right\} , \end{aligned}$$
(26)

which corresponds to the size of a refitted equiprobable replica set containing the same amount of information. This number of effective replicas, \(N_{\mathrm {eff}}\), gives an indicative measure of the optimal size of an unweighted replica set produced with the reweighting/unweighting procedure. No extra information is gained by producing a final unweighted set that has a number of replicas (significantly) larger than \(N_{\mathrm {eff}}\). If \(N_{\mathrm {eff}}\) is much smaller than the original number of replicas the new data have great impact, however, it is unreliable to use the new reweighted set. In this case, instead, a full refit should be performed.

6 Alternatives to DGLAP formalism

QCD calculations based on the DGLAP [1115] evolution equations are very successful in describing all relevant hard-scattering data in the perturbative region \(Q^2 \gtrsim \) few \( \,\text {GeV} ^2\). At small-x (\(x <\) 0.01) and small-\(Q^2\) DGLAP dynamics may be modified by saturation and other (non-perturbative) higher-twist effects. Various approaches alternative to the DGLAP formalism can be used to analyse DIS data in HERAFitter. These include several dipole models and the use of transverse-momentum dependent, or unintegrated PDFs (uPDFs).

6.1 Dipole models

The dipole picture provides an alternative approach to proton–virtual photon scattering at low x which can be applied to both inclusive and diffractive processes. In this approach, the virtual photon fluctuates into a \(q\bar{q}\) (or \(q\bar{q} g\)) dipole which interacts with the proton [113, 114]. The dipoles can be considered as quasi-stable quantum mechanical states, which have very long life time \({\propto }1/m_p x\) and a size which is not changed by scattering with the proton. The dynamics of the interaction are embedded in a dipole scattering amplitude.

Several dipole models, which show different behaviours of the dipole–proton cross section, are implemented in HERAFitter: the Golec-Biernat–Wüsthoff (GBW) dipole saturation model [28], a modified GBW model which takes into account the effects of DGLAP evolution, termed the Bartels–Golec–Kowalski (BGK) dipole model [30] and the colour glass condensate approach to the high parton density regime, named the Iancu–Itakura–Munier (IIM) dipole model [29].

GBW model: In the GBW model the dipole–proton cross section \(\sigma _\mathrm{{dip}}\) is given by

$$\begin{aligned} \sigma _\mathrm{{dip}}(x,r^{2}) = \sigma _{0} \left( 1 - \exp \left[ -\frac{r^{2}}{4R_{0}^{2}(x)} \right] \right) , \end{aligned}$$
(27)

where r corresponds to the transverse separation between the quark and the antiquark, and \(R_{0}^{2}\) is an x-dependent scale parameter which represents the spacing of the gluons in the proton. \(R_{0}^{2}\) takes the form, \(R_0^2(x) = (x/x_0)^\lambda 1/ \mathrm{\,\text {GeV}}^{2}\), and is called the saturation radius. The cross-section normalisation \(\sigma _0\), \(x_0\), and \(\lambda \) are parameters of the model fitted to the DIS data. This model gives exact Bjorken scaling when the dipole size r is small.

BGK model: The BGK model is a modification of the GBW model assuming that the spacing \(R_0\) is inverse to the gluon distribution and taking into account the DGLAP evolution of the latter. The gluon distribution, parametrised at some starting scale by Eq. 14, is evolved to larger scales using DGLAP evolution.

BGK model with valence quarks: The dipole models are valid in the low-x region only, where the valence quark contribution to the total proton momentum is 5 to 15 % for x from 0.0001 to 0.01 [115]. The inclusive HERA measurements have a precision which is better than 2 %. Therefore, HERAFitter provides the option of taking into account the contribution of the valence quarks

IIM model: The IIM model assumes an expression for the dipole cross section which is based on the Balitsky–Kovchegov equation [116]. The explicit formula for \(\sigma _\mathrm{{dip}}\) can be found in [29]. The alternative scale parameter \(\tilde{R}\), \(x_{0}\) and \(\lambda \) are fitted parameters of the model.

6.2 Transverse momentum dependent PDFs

QCD calculations of multiple-scale processes and complex final-states can necessitate the use of transverse-momentum dependent (TMD) [7], or unintegrated parton distribution and parton decay functions [117125]. TMD factorisation has been proven recently [7] for inclusive DIS. TMD factorisation has also been proven in the high-energy (small-x) limit [126128] for particular hadron–hadron scattering processes, like heavy-flavour, vector boson and Higgs production.

In the framework of high-energy factorisation [126, 129, 130] the DIS cross section can be written as a convolution in both longitudinal and transverse momenta of the TMD parton distribution function \(\mathcal{A}\left( x,k_t,\mu _{F}^2\right) \) with the off-shell parton scattering matrix elements as follows:

$$\begin{aligned} \sigma _j (x, Q^2)= & {} \int _x^1 \mathrm{d}z \int \mathrm{d}^2k_t \hat{\sigma }_j(x,Q^2,{z},k_t) \mathcal{A}\left( {z},k_t, \mu _{F}^2 \right) ,\nonumber \\ \end{aligned}$$
(28)

where the DIS cross sections \(\sigma _j\)(\(j= 2 , L\)) are related to the structure functions \(F_2\) and \(F_L\) by \(\sigma _j = 4 \pi ^2 F_j / Q^2\), and the hard-scattering kernels \({\hat{\sigma }}_j\) of Eq. 28 are \(k_t\)-dependent.

The factorisation formula in Eq. 28 allows for resummation of logarithmically enhanced small- x contributions to all orders in perturbation theory, both in the hard-scattering coefficients and in the parton evolution, fully taking into account the dependence on the factorisation scale \(\mu _F\) and on the factorisation scheme [131, 132].

Phenomenological applications of this approach require matching of small- x contributions with finite-x contributions. To this end, the evolution of the transverse-momentum dependent gluon density \(\mathcal{A} \) is obtained by combining the resummation of small-x logarithmic corrections [133135] with medium-x and large-x contributions to parton splitting [11, 14, 15] according to the CCFM evolution equation  [2326]. Sea quark contributions [136] are not yet included at transverse-momentum dependent level.

The cross section \(\sigma _j\) (\(j= 2 , L\)) is calculated in a FFN scheme, using the boson-gluon fusion process (\(\gamma ^* g^* \rightarrow q \bar{q}\)). The masses of the quarks are explicitly included as parameters of the model. In addition to \(\gamma ^* g^* \rightarrow q\bar{q}\), the contribution from valence quarks is included via \(\gamma ^* q \rightarrow q\) by using a CCFM evolution of valence quarks [137139].

CCFM grid techniques: The CCFM evolution cannot be written easily in an analytic closed form. For this reason, a MC method is employed, which is, however, time-consuming and thus cannot be used directly in a fit program.

Following the convolution method introduced in [139, 140], the kernel \( \tilde{\mathcal{A}}\left( x'',k_t,p\right) \) is determined from the MC solution of the CCFM evolution equation, and then folded with a non-perturbative starting distribution \(\mathcal{A}_0 (x)\)

$$\begin{aligned}&x \mathcal{A}(x,k_t,p) \nonumber \\&\quad =x\int \mathrm{d}x' \int \mathrm{d}x'' \mathcal{A}_0 (x') \tilde{\mathcal{A}}\left( x'',k_t,p\right) \delta (x' x'' - x)\nonumber \\&\quad = \int \mathrm{d}x' {\mathcal{A}_0 (x') } \frac{x}{x'} \ { \tilde{\mathcal{A}}\left( \frac{x}{x'},k_t,p\right) }, \end{aligned}$$
(29)

where \(k_t\) denotes the transverse momentum of the propagator gluon and \(p\) is the evolution variable.

The kernel \(\tilde{\mathcal{A}}\) incorporates all of the dynamics of the evolution. It is defined on a grid of \(50\otimes 50\otimes 50\) bins in \( x, k_t, p\). The binning in the grid is logarithmic, except for the longitudinal variable x for which 40 bins in logarithmic spacing below 0.1, and 10 bins in linear spacing above 0.1 are used.

Calculation of the cross section according to Eq. 28 involves a time-consuming multidimensional MC integration, which suffers from numerical fluctuations. This cannot be employed directly in a fit procedure. Instead the following equation is applied:

$$\begin{aligned} \sigma (x,Q^2)= & {} \int _x^1 \mathrm{d}x_g \mathcal{A}(x_g,k_t,p) \hat{ \sigma }(x,x_g,Q^2)\nonumber \\= & {} \int _x^1 \mathrm{d}x' \mathcal{A}_0 (x') \tilde{ \sigma }(x/x',Q^2), \end{aligned}$$
(30)

where first \( \tilde{ \sigma }(x',Q^2)\) is calculated numerically with a MC integration on a grid in x for the values of \(Q^2\) used in the fit. Then the last step in Eq. 30 is performed with a fast numerical Gauss integration, which can be used directly in the fit.

Functional forms for TMD parametrisation: For the starting distribution \(\mathcal{A}_0\), at the starting scale \(Q_0^2\), the following form is used:

$$\begin{aligned} x\mathcal{A}_0(x,k_t)= & {} N x^{-B}(1 -x)^{C}\left( 1 -D x + E \sqrt{x} \right) \nonumber \\&\times \exp [ - k_t^2 / \sigma ^2 ], \end{aligned}$$
(31)

where \( \sigma ^2 = Q_0^2 / 2 \) and NBCDE are free parameters. Valence quarks are treated using the method of Ref. [137] as described in Ref. [139] with a starting distribution taken from any collinear PDF and imposition of the flavour sum rule at every scale p.

The TMD parton densities can be plotted either with HERAFitter tools or with TMDplotter [35].

7 HERAFitter code organisation

HERAFitter is an open-source code under the GNU general public licence. It can be downloaded from a dedicated webpage [10] together with its supporting documentation and fast grid theory files (described in Sect. 4) associated with data files. The source code contains all the relevant information to perform QCD fits with HERA DIS data as a default set.Footnote 1 The execution time depends on the fitting options and varies from 10 min (using “FAST” techniques as described in Sect. 4) to several hours when full uncertainties are estimated. The HERAFitter code is a combination of C++ and Fortran 77 libraries with minimal dependencies, i.e. for the default fitting options no external dependencies are required except the QCDNUM evolution program [22]. The ROOT libraries are only required for the drawing tools and when invoking APPLGRID. Drawing tools built into HERAFitter provide a qualitative and quantitative assessment of the results. Figure 7 shows an illustration of a comparison between the inclusive NC data from HERA I with the predictions based on HERAPDF1.0 PDFs. The consistency of the measurements and the theory can be expressed by pulls, defined as the difference between data and theory divided by the uncorrelated error of the data. In each kinematic bin of the measurement, pulls are provided in units of standard deviations. The pulls are also illustrated in Fig. 7.

Fig. 7
figure 7

An illustration of the consistency of HERA measurements [21] and the theory predictions, obtained in HERAFitter with the default drawing tool

In HERAFitter there are also available cache options for fast retrieval, fast evolution kernels, and the OpenMP (Open Multi-Processing) interface which allows parallel applications of the GM-VFNS theory predictions in DIS.

8 Applications of HERAFitter

The HERAFitter program has been used in a number of experimental and theoretical analyses. This list includes several LHC analyses of SM processes, namely inclusive Drell–Yan and Wand Z production [99, 100, 141143], inclusive jet production [97, 144], and inclusive photon production [145]. The results of QCD analyses using HERAFitter were also published by HERA experiments for inclusive [21, 146] and heavy-flavour production measurements [147, 148]. The following phenomenological studies have been performed with HERAFitter: a determination of the transverse-momentum dependent gluon distribution using precision HERA data [139], an analysis of HERA data within a dipole model [149], the study of the low-x uncertainties in PDFs determined from the HERA data using different parametrisations [101]. It is also planned to use HERAFitter for studying the impact of QED radiative corrections on PDFs [150]. A recent study based on a set of PDFs determined with HERAFitter and addressing the correlated uncertainties between different orders has been published in [108]. An application of the TMDs obtained with HERAFitter to W production at the LHC can be found in [151].

The HERAFitter framework has been used to produce PDF grids from QCD analyses performed at HERA [21, 152] and at the LHC [153], using measurements from ATLAS [99, 144]. These PDFs can be used to study predictions for SM or beyond SM processes. Furthermore, HERAFitter provides the possibility to perform various benchmarking exercises [154] and impact studies for possible future colliders as demonstrated by QCD studies at the LHeC [155].

9 Summary

HERAFitter is the first open-source code designed for studies of the structure of the proton. It provides a unique and flexible framework with a wide variety of QCD tools to facilitate analyses of the experimental data and theoretical calculations.

The HERAFitter code, in version 1.1.0, has sufficient options to reproduce the majority of the different theoretical choices made in MSTW, CTEQ and ABM fits. This will potentially make it a valuable tool for benchmarking and understanding differences between PDF fits. Such a study would, however, need to consider a range of further questions, such as the choices of data sets, treatments of uncertainties, input parameter values, \(\chi ^2\) definitions, nuclear corrections, etc.

The further progress of HERAFitter will be driven by the latest QCD advances in theoretical calculations and in the precision of experimental data.