HERAFitter, Open Source QCD Fit Project

HERAFitter is an open-source package that provides a framework for the determination of the parton distribution functions (PDFs) of the proton and for many different kinds of analyses in Quantum Chromodynamics (QCD). It encodes results from a wide range of experimental measurements in lepton-proton deep inelastic scattering and proton-proton (proton-antiproton) collisions at hadron colliders. These are complemented with a variety of theoretical options for calculating PDF-dependent cross section predictions corresponding to the measurements. The framework covers a large number of the existing methods and schemes used for PDF determination. The data and theoretical predictions are brought together through numerous methodological options for carrying out PDF fits and plotting tools to help visualise the results. While primarily based on the approach of collinear factorisation, HERAFitter also provides facilities for fits of dipole models and transverse-momentum dependent PDFs. The package can be used to study the impact of new precise measurements from hadron colliders. This paper describes the general structure of HERAFitter and its wide choice of options.


Introduction
The recent discovery of the Higgs boson [1,2] and the extensive searches for signals of new physics in LHC protonproton collisions demand high-precision calculations to test the validity of the Standard Model (SM) and factorisation in Quantum Chromodynamics (QCD). Using collinear factorisation, inclusive cross sections in hadron collisions may be written as where the cross section σ is expressed as a convolution of Parton Distribution Functions (PDFs) f a and f b with the parton cross sectionσ ab , involving a momentum transfer q such that Q 2 = |q 2 | Λ 2 QCD , where Λ QCD is the QCD scale. At Leading-Order (LO) in the perturbative expansion of the strong-coupling constant, the PDFs represent the probability of finding a specific parton a (b) in the first (second) hadron carrying a fraction x 1 (x 2 ) of its momentum. The indices a and b in Eq. 1 indicate the various kinds of partons, i.e. gluons, quarks and antiquarks of different flavours that are considered as the constituents of the proton. The PDFs depend on the factorisation scale, µ F , while the parton cross sections depend on the strong coupling constant, α s , and the factorisation and renormalisation scales, µ F and µ R . The parton cross sectionsσ ab are calculable in perturbative QCD (pQCD) whereas PDFs are usually constrained by global fits to a variety of experimental data. The assumption that PDFs are universal, within a particular factorisation scheme [3][4][5][6][7], is crucial to this procedure. Recent review articles on PDFs can be found in Refs. [8,9].
A precise determination of PDFs as a function of x requires large amounts of experimental data that cover a wide kinematic region and that are sensitive to different kinds of partons. Measurements of inclusive Neutral Current (NC) and Charge Current (CC) Deep Inelastic Scattering (DIS) at the lepton-proton (ep) collider HERA provide crucial information for determining the PDFs. Different processes in proton-proton (pp) and proton-antiproton (pp) collisions at the LHC and the Tevatron, respectively, provide complementary information to the DIS measurements. The PDFs are determined from χ 2 fits of the theoretical predictions to the data. The rapid flow of new data from the LHC experiments and the corresponding theoretical developments, which are providing predictions for more complex processes at increasingly higher orders, has motivated the development of a tool to combine them together in a fast, efficient, opensource framework.
This paper describes the open-source QCD fit framework HERAFitter [10], which includes a set of tools to facilitate global QCD analyses of pp, pp and ep scattering data. It has been developed for the determination of PDFs and the extraction of fundamental parameters of QCD such as the heavy quark masses and the strong coupling constant. It also provides a common framework for the comparison of different theoretical approaches. Furthermore, it can be used to test the impact of new experimental data on the PDFs and on the SM parameters.
This paper is organised as follows: The general structure of HERAFitter is presented in Sec. 2. In Sec. 3 the various processes available in HERAFitter and the corresponding theoretical calculations, performed within the framework of collinear factorisation and the DGLAP [11][12][13][14][15] formalism, are discussed. In Sec. 4 tools for fast calculations of the theoretical predictions are presented. In Sec. 5 the methodology to determine PDFs through fits based on various χ 2 definitions is described. In particular, different treatments of correlated experimental uncertainties are presented. Alternative approaches to the DGLAP formalism are presented in Sec. 6. The organisation of the HERAFitter code is discussed in Sec. 7, specific applications of the package are presented in Sec. 8, which is followed by a summary in Sec. 9.

The HERAFitter Structure
The diagram in Fig. 1 gives a schematic overview of the HERAFitter structure and functionality, which can be divided into four main blocks: Data: Measurements from various processes are provided in the HERAFitter package including the information on their uncorrelated and correlated uncertainties. HERA inclusive scattering data are directly sensitive to quark PDFs and indirectly sensitive to the gluon PDF through scaling violations and the longitudinal structure function F L . These data are the basis of any proton PDF extraction, and are used in all current PDF sets from MSTW [16], CT [17], NNPDF [18], ABM [19], JR [20] and HERAPDF [21] groups. Measurements of charm and beauty quark production at HERA are sensitive to heavy quark PDFs and jet measurements have direct sensitivity to the gluon PDF. However, the kinematic range of HERA data mostly covers low and medium ranges in x. Measurements from the fixed target experiments, the Tevatron and the LHC provide additional constraints on the gluon and quark distributions at high-x, better    Table 1 The list of experimental data and theory calculations implemented in the HERAFitter package. The references for the individual calculations and schemes are given in the text. understanding of heavy quark distributions and decomposition of the light-quark sea. For these purposes, measurements from fixed-target experiments, the Tevatron and the LHC are included.
The processes that are currently available within the HERAFitter framework are listed in Tab. 1.
Theory: The PDFs are parametrised at a starting scale, Q 2 0 , using a functional form and a set of free parameters p. These PDFs are evolved to the scale of the measurements Q 2 , Q 2 > Q 2 0 . By default, the evolution uses the DGLAP formalism [11][12][13][14][15] as implemented in QCDNUM [22]. Alter-natively, the CCFM evolution [23][24][25][26] as implemented in uPDFevolv [27] can be chosen. The prediction of the cross section for a particular process is obtained, assuming factorisation, by the convolution of the evolved PDFs with the corresponding parton scattering cross section. Available theory calculations for each process are listed in Tab. 1. Predictions using dipole models [28][29][30] can also be obtained.
QCD Analysis: The PDFs are determined in a least squares fit: a χ 2 function, which compares the input data and theory predictions, is minimised with the MINUIT [31] program. In HERAFitter various choices are available for the treatment of experimental uncertainties in the χ 2 definition. Correlated experimental uncertainties can be accounted for using a nuisance parameter method or a covariance matrix method as described in Sec. 5.2. Different statistical assumptions for the distributions of the systematic uncertainties, e.g. Gaussian or LogNormal [32], can also be studied (see Sec. 5.3).
Results: The resulting PDFs are provided in a format ready to be used by the LHAPDF library [33,34] or by TMDlib [35]. HERAFitter drawing tools can be used to display the PDFs with their uncertainties at a chosen scale. As an example, the first set of PDFs extracted using HERAFitter from HERA I data, HERAPDF1.0 [21], is shown in Fig. 2 (taken from Ref. [21]). Note that following conventions, the PDFs are displayed as parton momentum distributions x f (x, µ 2 F ).

Theoretical formalism using DGLAP evolution
In this section the theoretical formalism based on DGLAP [11][12][13][14][15] equations is described. A direct consequence of factorisation (Eq. 1) is that the scale dependence or "evolution" of the PDFs can be predicted by the renormalisation group equations. By requiring physical observables to be independent of µ F , a representation of the parton evolution in terms of the DGLAP equations is obtained: where the functions P ab are the evolution kernels or splitting functions, which represent the probability of finding parton a in parton b. They can be calculated as a perturbative expansion in α s . Once PDFs are determined at the initial scale µ 2 F = Q 2 0 , their evolution to any other scale Q 2 > Q 2 0 is entirely determined by the DGLAP equations. The PDFs are then used to calculate cross sections for various different processes. Alternative approaches to DGLAP evolution equations, valid in different kinematic regimes, are also implemented in HERAFitter and will be discussed in Sec. 6.

Deep Inelastic Scattering and Proton Structure
The formalism that relates the DIS measurements to pQCD and the PDFs has been described in detail in many extensive reviews (see e.g. Ref. [36]) and it is only briefly summarised here. DIS is the process where a lepton scatters off the partons in the proton by the virtual exchange of a neutral (γ/Z) or charged (W ± ) vector boson and, as a result, a scattered lepton and a hadronic final state are produced. The common DIS kinematic variables are the scale of the process Q 2 , which is the absolute squared four-momentum of the exchanged boson, Bjorken x, which can be related in the parton model to the momentum fraction that is carried by the struck quark, and the inelasticity y. These are related by y = Q 2 /sx, where s is the squared centre-of-mass energy. The NC cross section can be expressed in terms of generalised structure functions: where Y ± = 1 ± (1 − y) 2 and α is the electromagnetic coupling constant. The generalised structure functionsF 2,3 can be written as linear combinations of the proton structure functions F γ 2 , F γZ 2,3 and F Z 2,3 , which are associated with pure photon exchange terms, photon-Z interference terms and pure Z exchange terms, respectively. The structure functioñ F 2 is the dominant contribution to the cross section, xF 3 becomes important at high Q 2 andF L is sizable only at high y. In the framework of pQCD, the structure functions are directly related to the PDFs: at LO F 2 is the weighted momentum sum of quark and anti-quark distributions, xF 3 is related to their difference, and F L vanishes. At higher orders, terms related to the gluon distribution appear, in particular F L is strongly related to the low-x gluon. The inclusive CC ep cross section, analogous to the NC ep case, can be expressed in terms of another set of structure functions,W : where P represents the lepton beam polarisation. At LO in α s , the CC e + p and e − p cross sections are sensitive to different combinations of the quark flavour densities. Beyond LO, the QCD predictions for the DIS structure functions are obtained by convoluting the PDFs with appropriate hard-process scattering matrix elements, which are referred to as coefficient functions.
The DIS measurements span a large range of Q 2 from a few GeV 2 to about 10 5 GeV 2 , crossing heavy quark mass thresholds, thus the treatment of heavy quark (charm and beauty) production and the chosen values of their masses become important. There are different schemes for the treatment of heavy quark production. Several variants of these schemes are implemented in HERAFitter and they are briefly discussed below.

Zero-Mass Variable Flavour Number (ZM-VFN):
In this scheme [37], the heavy quarks appear as partons in the proton at Q 2 values above ∼ m 2 h (heavy quark mass) and they are then treated as massless in both the initial and final states of the hard scattering process. The lowest order process is the scattering of the lepton off the heavy quark via electroweak boson exchange. This scheme is expected to be reliable in the region where Q 2 m 2 h . In HERAFitter this scheme is available for the DIS structure function calculation via the interface to the QCDNUM [22] package, thus it benefits from the fast QCDNUM convolution engine.

Fixed Flavour Number (FFN):
In this rigorous quantum field theory scheme [38][39][40], only the gluon and the light quarks are considered as partons within the proton and massive quarks are produced perturbatively in the final state. The lowest order process is the heavy quark-antiquark pair production via boson-gluon fusion. In HERAFitter this scheme can be accessed via the QCDNUM implementation or through the interface to the opensource code OPENQCDRAD [41] as implemented by the ABM group. This scheme is reliable for Q 2 ∼ m 2 h . In QCDNUM, the calculation of the heavy quark contributions to DIS structure functions are available at Next-to-Leading Order (NLO) and only electromagnetic exchange contributions are taken into account. In the OPENQCDRAD implementation the heavy quark contributions to CC structure functions are also available and, for the NC case, the QCD corrections to the coefficient functions in Next-to-Next-to Leading Order (NNLO) are provided in the best currently known approximation [42,43]. The OPENQCDRAD implementation uses in addition the running heavy quark mass in the MS scheme [44].
It is sometimes argued that this scheme reduces the sensitivity of the DIS cross sections to higher order corrections [42,43]. It is also known to have smaller non-perturbative corrections than the pole mass scheme [45].

General-Mass Variable Flavour Number (GM-VFN):
In this scheme [46], heavy quark production is treated for Q 2 ∼ m 2 h in the FFN scheme and for Q 2 m 2 h in the massless scheme with a suitable interpolation in between. The details of this interpolation differ between implementations. The groups that use GM-VFN schemes in PDFs are MSTW, CT (CTEQ), NNPDF, and HERAPDF. HERAFitter implements different variants of the GM-VFN scheme.
-GM-VFN Thorne-Roberts scheme: The Thorne-Roberts (TR) scheme [47] was designed to provide a smooth transition from the massive FFN scheme at low scales Q 2 ∼ m 2 h to the massless ZM-VFNS scheme at high scales Q 2 m 2 h . Because the original version was technically difficult to implement beyond NLO, it was updated to the TR scheme [48]. There are two variants of the TR schemes: TR standard (as used in MSTW PDF sets [16,48]) and TR optimal [49], with a smoother transition across the heavy quark threshold region. Both TR variants are accessible within the HERAFitter package at LO, NLO and NNLO.
-GM-VFN ACOT scheme: The Aivazis-Collins-Olness-Tung (ACOT) scheme belongs to the group of VFN factorisation schemes that use the renormalisation method of Collins-Wilczek-Zee (CWZ) [50]. This scheme unifies the low scale Q 2 ∼ m 2 h and high scale Q 2 > m 2 h regions in a coherent framework across the full energy range. Within the ACOT package, the following variants of the ACOT MS scheme are available at LO and NLO: ACOT-Full [51], S-ACOT-χ [52,53] and ACOT-ZM [51]. For the longitudinal structure function higher order calculations are also available. A comparison of PDFs extracted from QCD fits to the HERA data with the TR and ACOT-Full schemes is illustrated in Fig. 3 (taken from [21]).

Electroweak Corrections to DIS
Calculations of higher-order electroweak corrections to DIS at HERA are available in HERAFitter in the on-shell scheme. In this scheme, the masses of the gauge bosons m W and m Z are treated as basic parameters together with the top, Higgs and fermion masses. These electroweak corrections are based on the EPRC package [54]. The code calculates the running of the electromagnetic coupling α using the most recent parametrisation of the hadronic contribution [55] as well as an older version from Burkhard [56].

Diffractive PDFs
About 10% of deep inelastic interactions at HERA are diffractive, such that the interacting proton stays intact (ep → eX p). The outgoing proton is separated from the rest of the final hadronic system, X, by a large rapidity gap. Such events are a subset of DIS where the hadronic state X comes from the interaction of the virtual photon with a colourneutral cluster stripped off the proton [57]. The process can be described analogously to the inclusive DIS, by means of the diffractive parton distributions (DPDFs) [58]. The parametrization of the colour-neutral exchange in terms of factorisable 'hard' Pomeron and a secondary Reggeon [59], both having a hadron-like partonic structure, has proved remarkably successful in the description of most of the diffractive data. It has also provided a practical method to determine DPDFs from fits to the diffractive cross sections.
In addition to the usual DIS variables x, Q 2 , extra kinematic variables are needed to describe the diffractive process. These are the squared four-momentum transfer of the exchanged Pomeron or Reggeon, t, and the mass m X of the diffractively produced final state. In practice, the variable m X is often replaced by the dimensionless quantity In models based on a factorisable Pomeron, β may be viewed at LO as the fraction of the Pomeron longitudinal momentum, x IP , which is carried by the struck parton, x = β x IP , where P denotes the momentum of the proton. For the inclusive case, the diffractive cross section reads as: with the "reduced cross section": The diffractive structure functions can be expressed as convolutions of calculable coefficient functions with the diffractive quark and gluon distribution functions, which in general depend on x IP , Q 2 , β and t.
The DPDFs [60,61] in HERAFitter are implemented as a sum of two factorised contributions: where Φ(x IP ,t) are the Reggeon and Pomeron fluxes. The Reggeon PDFs, f IR a are fixed as those of the pion, while the Pomeron PDFs, f IP a , can be obtained from a fit to the data.

Drell-Yan Processes in pp or pp Collisions
The Drell-Yan (DY) process provides valuable information about PDFs. In pp and pp scattering, the Z/γ * and W production probe bi-linear combinations of quarks. Complementary information on the different quark densities can be obtained from the W ± asymmetry (d, u and their ratio), the ratio of the W and Z cross sections (sensitive to the flavour composition of the quark sea, in particular to the s-quark distribution), and associated W and Z production with heavy quarks (sensitive to s, cand b-quark densities). Measurements at large boson transverse momentum p T m W,Z are potentially sensitive to the gluon distribution [62]. At LO the DY NC cross section triple differential in invariant mass m, boson rapidity y and lepton scattering angle cos θ in the parton centre-of-mass frame can be written as [63,64]: where s is the squared centre-of-mass beam energy, the parton momentum fractions are given by are the PDFs at the scale of the invariant mass, andσ q is the parton-parton hard scattering cross section.
The corresponding triple differential CC cross section has the form: where V q 1 q 2 is the Cabibbo-Kobayashi-Maskawa (CKM) quark mixing matrix and m W and Γ W are the W boson mass and decay width, respectively.
The simple LO form of these expressions allows for the analytic calculations of integrated cross sections. In both NC and CC expressions the PDFs depend only on the boson rapidity y and invariant mass m, while the integral in cos θ can be evaluated analytically even for the case of realistic kinematic cuts.
Beyond LO, the calculations are often time-consuming and Monte Carlo generators are employed. Currently, the predictions for W and Z/γ * production are available up to NNLO and the predictions for W and Z production in association with heavy flavour quarks are available to NLO.
There are several possibilities to obtain the theoretical predictions for DY production in HERAFitter. The NLO and NNLO calculations can be implemented using k-factor or fast grid techniques (see Sec. 4 for details), which are interfaced to programs such as MCFM [65][66][67], available for NLO calculations, or FEWZ [68] and DYNNLO [69] for NLO and NNLO, with electroweak corrections estimated using MCSANC [70,71].

Jet Production in ep and pp or pp Collisions
The cross section for production of high p T hadronic jets is sensitive to the high-x gluon PDF (see e.g. Ref. [16]). Therefore this process can be used to improve the determination of the gluon PDF, which is particularly important for Higgs production and searches for new physics. Jet production cross sections are currently known only to NLO. Calculations for higher-order contributions to jet production in pp collisions are in progress [72][73][74]. Within HERAFitter, the NLOJet++ program [75,76] may be used for calculations of jet production. Similarly to the DY case, the calculation is very demanding in terms of computing power. Therefore fast grid techniques are used to facilitate the QCD analyses including jet cross section measurements in ep, pp and pp collisions. For details see Sec. 4.

Top-quark Production in pp or pp Collisions
At the LHC, top-quark pairs (tt) are produced dominantly via gg fusion. Thus, LHC measurements of the tt cross section provide additional constraints on the gluon distribution at medium to high values of x, on α s and on the top-quark mass, m t [77]. Precise predictions for the total inclusive tt cross section are available up to NNLO [78,79]. Currently, they can be computed within HERAFitter via an interface to the program HATHOR [80].
Fixed-order QCD predictions for the differential tt cross section at NLO can be obtained by using the program MCFM [67,[81][82][83][84] interfaced to HERAFitter with fast grid techniques.
Single top quarks are produced by exchanging electroweak bosons and the measurement of their production cross section can be used, for example, to probe the ratio of the u and d distributions in the proton as well as the b-quark PDF. Predictions for single-top production are available at the NLO accuracy by using MCFM.
Approximate predictions up to NNLO in QCD for the differential tt cross section in one-particle inclusive kinematics are available in HERAFitter through an interface to the program DiffTop [85,86]. It uses methods of QCD threshold resummation beyond the leading logarithmic approximation. This allows the users to estimate the impact of the recent tt differential cross section measurements on the uncertainty of the gluon density within a QCD PDF fit at NNLO. A fast evaluation of the DiffTop differential cross sections is possible via an interface to fast grid computations [87].

Computational Techniques
Precise measurements require accurate theoretical predictions in order to maximise their impact in PDF fits. Perturbative calculations become more complex and timeconsuming at higher orders due to the increasing number of relevant Feynman diagrams. The direct inclusion of computationally demanding higher-order calculations into iterative fits is thus not possible currently. However, a full repetition of the perturbative calculation for small changes in input parameters is not necessary at each step of the iteration. Two methods have been developed which take advantage of this to solve the problem: the k-factor technique and the fast grid technique. Both are available in HERAFitter.

k-factor Technique
The k-factors are defined as the ratio of the prediction of a higher-order (slow) pQCD calculation to a lower-order (fast) calculation using the same PDF. Because the k-factors depend on the phase space probed by the measurement, they have to be stored including their dependence on the relevant kinematic variables. Before the start of a fitting procedure, a table of k-factors is computed once for a fixed PDF with the time consuming higher-order code. In subsequent iteration steps the theory prediction is derived from the fast lower-order calculation by multiplying by the pre-tabulated k-factors.
This procedure, however, neglects the fact that the kfactors are PDF dependent, and as a consequence, they have to be re-evaluated for the newly determined PDF at the end of the fit for a consistency check. The fit must be repeated until input and output k-factors have converged. In summary, this technique avoids iteration of the higher-order calculation at each step, but still requires typically a few reevaluations.
In HERAFitter, the k-factor technique can also be used for the fast computation of the time-consuming GM-VFN schemes for heavy quarks in DIS. "FAST" heavy-flavour schemes are implemented with k-factors defined as the ratio of calculations at the same perturbative order but for massive vs. massless quarks, e.g. NLO (massive)/NLO (massless). These k-factors are calculated only for the starting PDF and hence, the "FAST" heavy flavour schemes should only be used for quick checks. Full heavy flavour schemes should be used by default. However, for the ACOT scheme, due to exceptionally long computation times, the k-factors are used in the default setup of HERAFitter.

Fast Grid Techniques
Fast grid techniques exploit the fact that iterative PDF fitting procedures do not impose completely arbitrary changes to the types and shapes of the parameterised functions that represent each PDF. Instead, it can be assumed that a generic PDF can be approximated by a set of interpolating functions with a sufficient number of judiciously chosen support points. The accuracy of this approximation is checked and optimised such that the approximation bias is negligibly small compared to the experimental and theoretical accuracy. This method can be used to perform the time consuming higher-order calculations (Eq. 1) only once for the set of interpolating functions. Further iterations of the calculation for a particular PDF set are fast, involving only sums over the set of interpolators multiplied by factors depending on the PDF. This approach can be used to calculate the cross sections of processes involving one or two hadrons in the initial state and to assess their renormalisation and factorisation scale variation.
This technique serves to facilitate the inclusion of time consuming NLO jet cross section predictions into PDF fits and has been implemented in the two projects, fastNLO [88,89] and APPLGRID [90,91]. The packages differ in their interpolation and optimisation strategies, but both of them construct tables with grids for each bin of an observable in two steps: in the first step, the accessible phase space in the parton momentum fractions x and the renormalisation and factorisation scales µ R and µ F is explored in order to optimise the table size. In the second step the grid is filled for the requested observables. Higher-order cross sections can then be obtained very efficiently from the pre-produced grids while varying externally provided PDF sets, µ R and µ F , or α s (µ R ). This approach can in principle be extended to arbitrary processes. This requires an interface between the higher-order theory programs and the fast interpolation frameworks. For the HERAFitter implementations of the two packages, the evaluation of α s is done consistently with the PDF evolution code. A brief description of each package is given below: -The fastNLO project [89] has been interfaced to the NLOJet++ program [75] for the calculation of jet production in DIS [92] as well as 2-and 3-jet production in hadron-hadron collisions at NLO [76,93]. Threshold corrections at 2-loop order, which approximate NNLO for the inclusive jet cross section for pp and pp, have also been included into the framework [94] following Ref. [95]. The latest version of the fastNLO convolution program [96] allows for the creation of tables in which renormalisation and factorisation scales can be varied as a function of two pre-defined observables, e.g. jet transverse momentum p ⊥ and Q for DIS. Recently, the differential calculation of top-pair production in hadron collisions at approximate NNLO [85] has been interfaced to fastNLO [87]. The fastNLO code is available online [97]. Jet cross section grids computed for the kinematics of various experiments can be downloaded from this site.
The fastNLO libraries and tables with theory predictions for comparison to particular cross section measurements are included in the HERAFitter package. The interface to the fastNLO tables from within HERAFitter was used in a recent CMS analysis, where the impact on extraction of the PDFs from the inclusive jet cross section is investigated [98].
-In the APPLGRID package [91,99], in addition to jet cross sections for pp(pp) and DIS processes, calculations of DY production and other processes are also implemented using an interface to the standard MCFM parton level generator [65][66][67]. Variation of the renormalisation and factorisation scales is possible a posteriori, when calculating theory predictions with the APPLGRID tables, and independent variation of α S is also allowed.
For predictions beyond NLO, the k-factors technique can also be applied within the APPLGRID framework.
As an example, the HERAFitter interface to APPLGRID was used by the ATLAS [100] and CMS [101] collaborations to extract the strange quark distribution of the proton. The ATLAS strange PDF extracted employing these techniques is displayed in Fig. 4 together with a comparison to the global PDF sets CT10 [17] and NNPDF2.1 [18] (taken from [100]). Fig. 4 The strange antiquark distribution versus x for the ATLAS epWZ frees NNLO fit [100] (magenta band) compared to predictions from NNPDF2.1 (blue hatched) and CT10 (green hatched) at Q 2 = 1.9 GeV 2 . The ATLAS fit was performed using a k-factor approach for NNLO corrections.

Fit Methodology
When performing a QCD analysis to determine PDFs there are various assumptions and choices to be made concerning, for example, the functional form of the input parametrisation, the treatment of heavy quarks and their mass values, alternative theoretical calculations, alternative representations of the fit χ 2 and for different ways of treating correlated systematic uncertainties. It is useful to discriminate or quantify the effect of a chosen ansatz within a common framework and HERAFitter is optimally designed for such tests. The methodology employed by HERAFitter relies on a flexible and modular framework that allows independent integration of state-of-the-art techniques, either related to the inclusion of a new theoretical calculation, or of new approaches to treat data and their uncertainties. In this section we describe the available options for the fit methodology in HERAFitter. In addition, as an alterna-tive approach to a complete QCD fit, the Bayesian reweighting method, which is also available in HERAFitter, is described.

Functional Forms for PDF Parametrisation
The PDFs can be parametrised using several predefined functional forms and flavour decompositions: Standard Polynomials: The standard polynomial form is the most commonly used. A polynomial functional form is used to parametrise the x-dependence of the PDFs, where index j denotes each parametrised PDF flavour: The parametrised PDFs are the valence distributions xu v and xd v , the gluon distribution xg, and the u-type and d-type sea, xŪ, xD, where xŪ = xū, xD = xd + xs at the starting scale, which is chosen below the charm mass threshold. The form of polynomials P j (x) can be varied. The form (1 + ε j √ x + D j x + E j x 2 ) is used for the HERAPDF [21] with additional constraints relating to the flavour decomposition of the light sea. This parametrisation is termed HERAPDF-style. The polynomial can also be parametrised in the CTEQ-style, where P j (x) takes the form e a 3 x (1 + e a 4 x + e a 5 x 2 ) and, in contrast to the HERAPDF-style, this is positive by construction. QCD number and momentum sum rules are used to determine the normalisations A for the valence and gluon distributions, and the sum-rule integrals are solved analytically.
Bi-Log-Normal Distributions: This parametrisation is motivated by multi-particle statistics and has the following functional form: This function can be regarded as a generalisation of the standard polynomial form described above, however, numerical integration of Eq. 13 is required in order to impose the QCD sum rules.
Chebyshev Polynomials: A flexible parametrisation based on the Chebyshev polynomials can be employed for the gluon and sea distributions. Polynomials with argument log(x) are considered for better modelling the low-x asymptotic behaviour of those PDFs. The polynomials are multiplied by a factor of (1 − x) to ensure that they vanish as x → 1. The resulting parametric form reads xS(x) = (1 − x) where T i are first-type Chebyshev polynomials of order i. The normalisation factor A g is derived from the momentum sum rule analytically. Values of N g,S to 15 are allowed, however the fit quality is already similar to that of the standardpolynomial parametrisation from N g,S ≥ 5 and has a similar number of free parameters. Fig. 5 (taken from [102]) shows a comparison of the gluon distribution obtained with the parametrisation Eqs. 14, 15 to the standard-polynomial one, for N g,S = 9. The gluon density is shown at the starting scale Q 2 = 1.9 GeV 2 . The black lines correspond to the uncertainty band of the gluon distribution using a standard parametrisation and it is compared to the case of the Chebyshev parametrisation [102]. The uncertainty band for the latter case is estimated using the Monte Carlo technique (see Sec. 5.3) with the green lines denoting fits to data replica. Red lines indicate the standard deviation about the mean value of these replicas.
External PDFs: HERAFitter also provides the possibility to access external PDF sets, which can be used to compute theoretical predictions for the cross sections for all the processes available in HERAFitter. This is possible via an interface to LHAPDF [33,34] providing access to the global PDF sets. HERAFitter also allows one to evolve PDFs from LHAPDF using QCDNUM. Fig. 6 illustrates a comparison of various gluon PDFs accessed from LHAPDF as produced with the drawing tools available in HERAFitter.

Representation of χ 2
The PDF parameters are determined in HERAFitter by minimisation of a χ 2 function taking into account correlated and uncorrelated measurement uncertainties. There are various forms of χ 2 , e.g. using a covariance matrix or providing nuisance parameters to encode the dependence of each correlated systematic uncertainty for each measured data point. The options available in HERAFitter are the following: Covariance Matrix Representation: For a data point µ i with a corresponding theory prediction m i , the χ 2 function can be expressed in the following form: where the experimental uncertainties are given as a covariance matrix C ik for measurements in bins i and k. The covariance matrix C ik is given by a sum of statistical, uncorrelated and correlated systematic contributions: Using this representation one cannot distinguish the effect of each source of systematic uncertainty. Nuisance Parameter Representation: In this case, the χ 2 is expressed as where, δ i,stat and δ i,unc are relative statistical and uncorrelated systematic uncertainties of the measurement i. Further, γ i j quantifies the sensitivity of the measurement to the correlated systematic source j. The function χ 2 depends on the set of systematic nuisance parameters b j . This definition of the χ 2 function assumes that systematic uncertainties are proportional to the central prediction values (multiplicative uncertainties, m i (1 − ∑ j γ i j b j )), whereas the statistical uncertainties scale with the square root of the expected number of events. However, additive treatment of uncertainties is also possible in HERAFitter.
During the χ 2 minimisation, the nuisance parameters b j and the PDFs are determined, such that the effect of different sources of systematic uncertainties can be distinguished. Mixed Form Representation: In some cases, the statistical and systematic uncertainties of experimental data are provided in different forms. For example, the correlated experimental systematic uncertainties are available as nuisance parameters, but the bin-to-bin statistical correlations are given in the form of a covariance matrix.
HERAFitter offers the possibility to include such mixed forms of information.
Any source of measured systematic uncertainty can be treated as additive or multiplicative, as described above. The statistical uncertainties can be included as additive or following the Poisson statistics. Minimisation with respect to nuisance parameters is performed analytically, however, for more detailed studies of correlations individual nuisance parameters can be included into the MINUIT minimisation.

Treatment of the Experimental Uncertainties
Three distinct methods for propagating experimental uncertainties to PDFs are implemented in HERAFitter and reviewed here: the Hessian, Offset, and Monte Carlo method.
Hessian (Eigenvector) method: The PDF uncertainties reflecting the data experimental uncertainties are estimated by examining the shape of the χ 2 function in the neighbourhood of the minimum [103]. Following the approach of Ref. [103], the Hessian matrix is defined by the second derivatives of χ 2 on the fitted PDF parameters. The matrix is diagonalised and the Hessian eigenvectors are computed. Due to orthogonality these vectors correspond to independent sources of uncertainty in the obtained PDFs.
Offset method: The Offset method [104] uses the χ 2 function for the central fit, but only uncorrelated uncertainties are taken into account. The goodness of the fit can no longer be judged from the χ 2 since correlated uncertainties are ignored. The correlated uncertainties are propagated into the PDF uncertainties by performing variants of the fit with the experimental data varied by ±1σ from the central value for each systematic source. The resulting deviations of the PDF parameters from the ones obtained in the central fit are statistically independent, and they can be combined in quadrature to derive a total PDF systematic uncertainty.
The uncertainties estimated by the offset method are generally larger than those from the Hessian method.
Monte Carlo method: The Monte Carlo (MC) technique [105,106] can also be used to determine PDF uncertainties. The uncertainties are estimated using pseudodata replicas (typically > 100) randomly generated from the measurement central values and their systematic and statistical uncertainties taking into account all point-topoint correlations. The QCD fit is performed for each replica and the PDF central values and their experimental uncertainties are estimated from the distribution of the PDF parameters obtained in these fits, by taking the mean values and standard deviations over the replicas.
The MC method has been checked against the standard error estimation of the PDF uncertainties obtained by the Hessian method. A good agreement was found between the methods provided that Gaussian distributions of statistical and systematic uncertainties are assumed in the MC approach [32]. A comparison is illustrated in Fig. 7. Similar findings were reported by the MSTW global analysis [107].  Fig. 7 Comparison between the standard error calculations as employed by the Hessian approach (black lines) and the MC approach (with more than 100 replicas) assuming Gaussian distribution for uncertainty distributions, shown here for each replica (green lines) together with the evaluated standard deviation (red lines) [32]. The black and red lines in the figure are superimposed because agreement of the methods is so good that it is hard to distinguish them.
Since the MC method requires large number of replicas, the eigenvector representation is a more convenient way to store the PDF uncertainties. It is possible to transform MC to eigenvector representation as shown by [108]. Tools to perform this transformation are provided with HERAFitter and were recently employed for the repre-sentation of correlated sets of PDFs at different perturbative orders [109].
The nuisance parameter representation of χ 2 in Eq. 18 is derived assuming symmetric experimental errors, however, the published systematic uncertainties are often asymmetric. HERAFitter provides the possibility to use asymmetric systematic uncertainties. The implementation relies on the assumption that asymmetric uncertainties can be described by a parabolic function. The nuisance parameter in Eq. 18 is modified as follows where the coefficients ω i j , γ i j are defined from the maximum and minimum shifts of the cross sections due to a variation of the systematic uncertainty j, S ± i j ,

Treatment of the Theoretical Input Parameters
The results of a QCD fit depend not only on the input data but also on the input parameters used in the theoretical calculations. Nowadays, PDF groups address the impact of the

Bayesian Reweighting Techniques
As an alternative to performing a full QCD fit, HERAFitter allows the user to assess the impact of including new data in an existing fit using the Bayesian Reweighting technique. The method provides a fast estimate of the impact of new data on PDFs. Bayesian Reweighting was first proposed for PDF sets delivered in the form of MC replicas by [105] and further developed by the NNPDF Collaboration [110,111]. More recently, a method to perform Bayesian Reweighting studies starting from PDF fits for which uncertainties are provided in the eigenvector representation has been also developed [107]. The latter is based on generating replica sets by introducing Gaussian fluctuations on the central PDF set with a variance determined by the PDF uncertainty given by the eigenvectors. Both reweighting methods are implemented in HERAFitter. Note that the precise form of the weights used by both methods has recently been questioned [112,113].
The Bayesian Reweighting technique relies on the fact that MC replicas of a PDF set give a representation of the probability distribution in the space of PDFs. In particular, the PDFs are represented as ensembles of N rep equiprobable (i.e. having weights equal to unity) replicas, { f }. The central value for a given observable, O({ f }), is computed as the average of the predictions obtained from the ensemble as and the uncertainty as the standard deviation of the sample. Upon inclusion of new data the prior probability distribution, given by the original PDF set, is modified according to Bayes Theorem such that the weight of each replica, w k , is updated according to where N data is the number of new data points, k denotes the specific replica for which the weight is calculated and χ 2 k is the χ 2 of the new data obtained using the k-th PDF replica. Given a PDF set and a corresponding set of weights, which describes the impact of the inclusion of new data, the prediction for a given observable after inclusion of the new data can be computed as the weighted average, To simplify the use of a reweighted set, an unweighted set (i.e. a set of equiprobable replicas which incorporates the information contained in the weights) is generated according to the unweighting procedure described in [110]. The number of effective replicas of a reweighted set is measured by its Shannon Entropy [111] which corresponds to the size of a refitted equiprobable replica set containing the same amount of information. This number of effective replicas, N eff , gives an indicative measure of the optimal size of an unweighted replica set produced with the reweighting/unweighting procedure. No extra information is gained by producing a final unweighted set that has a number of replicas (significantly) larger than N eff . If N eff is much smaller than the original number of replicas the new data have great impact, however, it is unreliable to use the new reweighted set. In this case, instead, a full refit should be performed.

Alternatives to DGLAP Formalism
QCD calculations based on the DGLAP [11][12][13][14][15] evolution equations are very successful in describing all relevant hard scattering data in the perturbative region Q 2 few GeV 2 . At small-x (x < 0.01) and small-Q 2 DGLAP dynamics may be modified by saturation and other (non-perturbative) highertwist effects. Different approaches alternative to the DGLAP formalism can be used to analyse DIS data in HERAFitter. These include several dipole models and the use of transverse momentum dependent, or unintegrated PDFs (uPDFs).

Dipole Models
The dipole picture provides an alternative approach to proton-virtual photon scattering at low x which can be applied to both inclusive and diffractive processes. In this approach, the virtual photon fluctuates into a qq (or qqg) dipole which interacts with the proton [114,115]. The dipoles can be considered as quasi-stable quantum mechanical states, which have very long life time ∝ 1/m p x and a size which is not changed by scattering with the proton. The dynamics of the interaction are embedded in a dipole scattering amplitude. Several dipole models, which assume different behaviours of the dipole-proton cross section, are implemented in HERAFitter: the Golec-Biernat-Wüsthoff (GBW) dipole saturation model [28], a modified GBW model which takes into account the effects of DGLAP evolution, termed the Bartels-Golec-Kowalski (BGK) dipole model [30] and the colour glass condensate approach to the high parton density regime, named the Iancu-Itakura-Munier (IIM) dipole model [29].
GBW model: In the GBW model the dipole-proton cross section σ dip is given by where r corresponds to the transverse separation between the quark and the antiquark, and R 2 0 is an x-dependent scale parameter which represents the spacing of the gluons in the proton. R 2 0 takes the form, R 2 0 (x) = (x/x 0 ) λ 1/ GeV 2 , and is called the saturation radius. The cross-section normalisation σ 0 , x 0 , and λ are parameters of the model fitted to the DIS data. This model gives exact Bjorken scaling when the dipole size r is small.
BGK model: The BGK model is a modification of the GBW model assuming that the spacing R 0 is inverse to the gluon distribution and taking into account the DGLAP evolution of the latter. The gluon distribution, parametrised at some starting scale by Eq. 12, is evolved to larger scales using DGLAP evolution.
BGK model with valence quarks: The dipole models are valid in the low-x region only, where the valence quark contribution to the total proton momentum is 5% to 15% for x from 0.0001 to 0.01 [116]. The inclusive HERA measurements have a precision which is better than 2%. Therefore, HERAFitter provides the option of taking into account the contribution of the valence quarks IIM model: The IIM model assumes an expression for the dipole cross section which is based on the Balitsky-Kovchegov equation [117]. The explicit formula for σ dip can be found in [29]. The alternative scale parameterR, x 0 and λ are fitted parameters of the model.
In the framework of high-energy factorisation [127,130,131] the DIS cross section can be written as a convolution in both longitudinal and transverse momenta of the TMD parton distribution function A x, k t , µ 2 F with the off-shell parton scattering matrix elements as follows where the DIS cross sections σ j ( j = 2, L) are related to the structure functions F 2 and F L by σ j = 4π 2 F j /Q 2 , and the hard-scattering kernelsσ j of Eq. 26 are k t -dependent. The factorisation formula in Eq. 26 allows resummation of logarithmically enhanced small-x contributions to all orders in perturbation theory, both in the hard scattering coefficients and in the parton evolution, fully taking into account the dependence on the factorisation scale µ F and on the factorisation scheme [132,133].
Phenomenological applications of this approach require matching of small-x contributions with finite-x contributions. To this end, the evolution of the transverse momentum dependent gluon density A is obtained by combining the resummation of small-x logarithmic corrections [134][135][136] with medium-x and large-x contributions to parton splitting [11,14,15] according to the CCFM evolution equation [23][24][25][26].
The cross section σ j , ( j = 2, L) is calculated in a FFN scheme, using the boson-gluon fusion process (γ * g * → qq). The masses of the quarks are explicitly included as parameters of the model. In addition to γ * g * → qq, the contribution from valence quarks is included via γ * q → q by using a CCFM evolution of valence quarks [137][138][139].
CCFM Grid Techniques: The CCFM evolution cannot be written easily in an analytic closed form. For this reason, a MC method is employed, which is, however, timeconsuming, and thus cannot be used directly in a fit program.
Following the convolution method introduced in [139,140], the kernelÃ (x , k t , p) is determined from the MC solution of the CCFM evolution equation, and then folded with a non-perturbative starting distribution A 0 (x) where k t denotes the transverse momentum of the propagator gluon and p is the evolution variable.
The kernelÃ incorporates all of the dynamics of the evolution. It is defined on a grid of 50 ⊗ 50 ⊗ 50 bins in x, k t , p. The binning in the grid is logarithmic, except for the longitudinal variable x for which 40 bins in logarithmic spacing below 0.1, and 10 bins in linear spacing above 0.1 are used. Calculation of the cross section according to Eq. 26 involves a time-consuming multidimensional MC integration, which suffers from numerical fluctuations. This cannot be employed directly in a fit procedure. Instead the following equation is applied: where firstσ (x , Q 2 ) is calculated numerically with a MC integration on a grid in x for the values of Q 2 used in the fit. Then the last step in Eq. 28 is performed with a fast numerical Gauss integration, which can be used directly in the fit.
Functional Forms for TMD parametrisation: For the starting distribution A 0 , at the starting scale Q 2 0 , the following form is used: where σ 2 = Q 2 0 /2 and N, B,C, D, E are free parameters. Valence quarks are treated using the method of Ref. [137] as described in Ref. [139] with a starting distribution taken from any collinear PDF and imposition of the flavour sum rule at every scale p. The TMD parton densities can be plotted either with HERA-Fitter tools or with TMDplotter [35].

HERAFitter Code Organisation
HERAFitter is an open source code under the GNU general public licence. It can be downloaded from a dedicated webpage [10] together with its supporting documentation and fast grid theory files (described in Sec. 4) associated with data files. The source code contains all the relevant information to perform QCD fits with HERA DIS data as a default set. 1 The execution time depends on the fitting options and varies from 10 minutes (using "FAST" techniques as described in Sec. 4) to several hours when full uncertainties are estimated. The HERAFitter code is a combination of C++ and Fortran 77 libraries with minimal dependencies, i.e. for the default fitting options no external dependencies are required except the QCDNUM evolution program [22]. The ROOT libraries are only required for the drawing tools and when invoking APPLGRID. Drawing tools built into HERA-Fitter provide a qualitative and quantitative assessment of the results. Fig. 8 shows an illustration of a comparison between the inclusive NC data from HERA I with the predictions based on HERAPDF1.0 PDFs. The consistency of the measurements and the theory can be expressed by pulls, defined as the difference between data and theory divided by the uncorrelated error of the data. In each kinematic bin of the measurement, pulls are provided in units of standard deviations. The pulls are also illustrated in Fig. 8.  Fig. 8 An illustration of the consistency of HERA measurements [21] and the theory predictions, obtained in HERAFitter with the default drawing tool.
In HERAFitter there are also available cache options for fast retrieval, fast evolution kernels, and the OpenMP (Open Multi-Processing) interface which allows parallel applications of the GM-VFNS theory predictions in DIS. 1 Default settings in HERAFitter are tuned to reproduce the central HERAPDF1.0 set.

Applications of HERAFitter
The HERAFitter program has been used in a number of experimental and theoretical analyses. This list includes several LHC analyses of SM processes, namely inclusive Drell-Yan and W and Z production [100,101,[141][142][143], inclusive jet production [98,144], and inclusive photon production [145]. The results of QCD analyses using HERA-Fitter were also published by HERA experiments for inclusive [21,146] and heavy flavour production measurements [147,148]. The following phenomenological studies have been performed with HERAFitter: a determination of the transverse momentum dependent gluon distribution using precision HERA data [139], an analysis of HERA data within a dipole model [149], the study of the low-x uncertainties in PDFs determined from the HERA data using different parametrisations [102] and the impact of QED radiative corrections on PDFs [150]. A recent study based on a set of PDFs determined with HERAFitter and addressing the correlated uncertainties between different orders has been published in [109]. An application of the TMDs obtained with HERAFitter W production at the LHC can be found in [151].
The HERAFitter framework has been used to produce PDF grids from QCD analyses performed at HERA [21,152] and at the LHC [153], using measurements from ATLAS [100,144]. These PDFs can be used to study predictions for SM or beyond SM processes. Furthermore, HERA-Fitter provides the possibility to perform various benchmarking exercises [154] and impact studies for possible future colliders as demonstrated by QCD studies at the LHeC [155].

Summary
HERAFitter is the first open-source code designed for studies of the structure of the proton. It provides a unique and flexible framework with a wide variety of QCD tools to facilitate analyses of the experimental data and theoretical calculations.
The HERAFitter code, in version 1.1.0, has sufficient options to reproduce the majority of the different theoretical choices made in MSTW, CTEQ and ABM fits. This will potentially make it a valuable tool for benchmarking and understanding differences between PDF fits. Such a study would however need to consider a range of further questions, such as the choices of data sets, treatments of uncertainties, input parameter values, χ 2 definitions, nuclear corrections, etc. The further progress of HERAFitter will be driven by the latest QCD advances in theoretical calculations and in the precision of experimental data.