Charm Production and QCD Analysis at HERA and LHC

This review is devoted to the study of charm production in ep and pp collisions. The total set of measurements obtained by the two collaborations H1 and ZEUS from HERA and their combination is outlined, as well as complementary data obtained by the LHCb collaboration at the LHC. After fitting the parton distribution functions the charm production cross sections are predicted within perturbative QCD at next-to-leading order using the fixed-flavour-number scheme. Agreement with the data is found. The combined HERA charm data are sensitive to the $c$-quark mass and enabled its accurate determination. The predictions crucially depend upon the knowledge of the gluon distribution function. It is shown that the shape of the gluon distribution based on the HERA data is considerably improved by adding the measurements from LHCb and applicable down to values x of about 10^{-6}, where x is the proton momentum fraction carried by a parton.


Introduction
The HERA collider was the first and unique machine in which electrons and protons were collided. It emerged from a series of earlier lepton-nucleon accelerator studies as the highest energy electron-proton collider to investigate simultaneously neutral and charged current reactions and their electroweak unification. The pointlike electron serves as probe to study the internal structure of the proton governed by strong interactions, i.e. Quantum Chromodynamics (QCD ). The generic electron-proton scattering process occurs via the exchange of an electroweak boson. The uniqueness of HERA consists in the clean distinction between electroweak and strong processes. The precise knowledge of electroweak interactions makes HERA ideal for investigation of QCD.
Measurements of deep inelastic scattering (DIS) at HERA have been the central topic in the investigation of the proton structure for the two collider experiments, H1 and ZEUS [1,2]. Such measurements are the core data to determine the proton structure in terms of parton distribution functions (PDFs). The inclusive cross section at HERA contains contributions from all active quark and antiquark flavours. It is remarkable that a large contribution, up to one third, is coming from events with charm. This necessitates the understanding of heavy-flavour production for global QCD analyses of HERA data and is the main subject of the present review.
The tests of perturbative QCD depend on phenomenological input, in particular on the knowledge of the gluon distribution function. For this reason an additional piece has been included in the analysis coming from charm production in the LHCb experiment at the Large Hadron Collider (LHC ).
This review describes various aspects of heavy-flavour physics at HERA and LHC. It presents one new measurement of charm production at HERA, which is further combined with other precise H1 and ZEUS charm measurements in order to obtain the most precise charm dataset from HERA. These combined data are extensively used in a comparison of data and theory and in a QCD analysis to extract the c-quark mass. Another combination is performed at the more exclusive level of D * + visible cross sections. In contrast to the inclusive one, it does not include theory-related uncertainties. Furthermore, charm and beauty measurements from LHCb are considered and included in a QCD analysis. They provide sensitivity to the gluon distribution at low values of fractions of the proton momenta carried by a parton. This is a kinematic range that is currently not covered by other input data, and therefore improves the PDF fits.
The review is organised in the following way. Section 2 introduces the theoretical concepts, relevant for the subsequent contents. Section 3 gives a description of the HERA experimental set-up, while Section 4 describes tagging techniques used to measure charm production at HERA and presents existing measurements. Section 5 deals with the new physics results, the measurement of D + -meson production performed with the ZEUS detector at HERA. Section 6 describes a combination of charm measurements from H1 and ZEUS, performed at the two levels: for D * + visible cross sections and for inclusive reduced charm cross sections. Section 7 switches to the LHC: it introduces measurements of heavy-flavour productions at the LHCb experiment and discusses the impact of the phase-space cover-age which is comprementary to the one from HERA. Section 8 presents a QCD analysis including the LHCb heavy-flavour data. Finally, Section 9 summarises the results.
This review is based on the PhD thesis of the author [3]. The physics results presented in detail in Sections 5-8 were part of the work for the thesis, and later on most of them were published [4,5,6].

Theoretical overview of heavy-flavour production in QCD
Section 2.1 gives a short introduction to perturbative calculations in QCD. Section 2.2 discusses ways of treating of heavyquark production and focuses on the fixed-flavour-number scheme, the preferred scheme in this review. In Section 2.3 various defintions of the heavy-quark mass are given. Sections 2.4 and 2.5 are the central part of this theoretical overview: they provide information on the current status of the calculations for heavy-quark production in different schemes in ep and pp collisions, respectively. Section 2.6 reviews an important nonperturbative aspect of heavy-flavour production: the fragmentation process of partons into hadrons. Finally, Section 2.7 gives a summary.

Perturbative calculations
In the approach of perturbative QCD (pQCD), any physical quantity, Γ, is given as a power series in the strong coupling constant, α s = g 2 /4π, where g is the constant representing the coupling strength in the QCD Lagrangian: 1) where n is the order of the calculation and the coefficients c i are determined using the Feynman rules. Contributions to the perturbative expansion of scattering amplitudes beyond the leading order (LO) are usually formally divergent. In order to regularise these divergences, different renormalisation schemes exist. Moreover, in subtracting the divergences in any renormalisation scheme, an arbitrary mass scale is introduced, known as the renormalisation scale, µ r . Most commonly the modified minimal subtraction scheme, MS, is used [7]. The renormalised coupling, g r , turns out to be scale dependent; keeping only the one-loop order, the running coupling is given by 2) where the constant of integration Λ QCD is a dimensionful quantity, known as the QCD scale, β 0 = (33 − 2n f )/(48π 2 ) is the one-loop beta-function coefficient with n f being the number of massless quark flavours. The strong coupling can be determined through experimental observables, e.g. jet production cross sections, event shapes, τ decay width etc. The measurements of α s as a function of the energy scale are shown in Fig. 2.1 [8]. The running of α s agrees with the expectation from pQCD. The renormalised coupling decreases as the relevant momentum scale grows. This behaviour is known as asymptotic freedom; it enables perturbative calculations at large momentum scales (short distances ). On the other hand, the perturbative approach breaks down at Λ QCD (long distances) as the coupling gets too large. This phenomenon is known as confinement.
Quarks and gluons are not observed as free particles, because, with increasing distance between them, the production of a new quark-antiquark pair instead is energetically preferred. α s (Q) 1 10 100

Q [GeV]
Heavy Quarkonia (NLO) e + ejets & shapes (res. NNLO) DIS jets (NLO) Sept. 2013 Lattice QCD (NNLO) (N 3 LO) τ decays (N 3 LO) 1000 pp -> jets ( Because of confinement hadrons are considered to be made up of massless constituents, known as partons, held together by their mutual interactions. Application of perturbative calculations to any process involving hadrons requires factorisation of short-and long-distance effects [9]. To define the separation, an arbitrary mass scale appears, known as the factorisation scale, µ f . It is introduced in a way similar to the way the renormalisation scale µ r appears in renormalisation, although serves different purposes. In the factorisation approach hadrons are described by parton distribution functions (PDFs), which are not perturbatively calculable and must be extracted from data, however pQCD predicts their evolution with µ f (see Section 2. 4.2 for more details ).
Thus the region where pQCD calculations are reliable is given by µ r , µ f Λ QCD . One usually chooses the two scales to be of the order of the energy involved in the hard process; e.g. for the inclusive production of heavy quarks µ 2 r = µ 2 f = m 2 Q is a possible choice, where m Q denotes the heavy-quark mass (see e.g. [10] for an exhaustive discussion).

Treatment of heavy flavours in pQCD
The masses of the heavy quarks satisfy m Q Λ QCD (m c ≈ 1.5 GeV, m b ≈ 4.5 GeV, m t ≈ 170 GeV) and then provide a hard scale for pQCD calculations. At the same time they complicate calculations, since the new hard scale leads to the appearance of terms proportional to ln( , where p T is the transverse momentum of the produced heavy quark, known as the multiscale problem. One has a freedom to treat the heavy quarks either as massive or massless in perturbative calculations; both choices have their advantages and disadvantages at different phase-space regions, as discussed below. The PDF evolution and α s running depend on the number of quark flavours assumed to be massless and appearing in loops and legs. Several schemes exist for the treatment of heavy flavours in pQCD.
In the present review in most cases the fixed-flavour-number scheme (FFNS) is used in comparisons of theory to the data, since it provides most reliable predictions in the phase space of existing experimental data. In this scheme, heavy quarks are treated as massive at all energy scales, thus they do not enter the PDF evolution of massless quarks and gluons and α s running. 1 One has to specify which particular quark flavours are treated as massless, e.g. the number of flavours n f = 3 for massless u, d and s. The FFNS is expected to be most precise in the threshold region p 2 T ∼ m 2 Q , while at high p T terms proportional to ln( may spoil the convergence of the perturbative series. Other schemes are known as variants of the variable-flavournumber scheme (VFNS), in which heavy quarks are treated as massive or massless depending on the energy scale: in the zero-mass variable-flavour-number scheme (ZM-VFNS) [14], heavy flavours are treated as infinitely massive (and thus completely vanishing) below a certain threshold and as massless above it. This scheme is expected to be appropriate at high energy scales, since the PDF evolution of the heavy quarks and the renormalisation of collinear and infrared singularities provides a resummation of terms proportional to ln in the general-mass variable-flavour-number scheme (GM-VFNS), an interpolation is made between the FFNS and the ZM-VFNS, avoiding double counting of common terms in the PDF evolution and coefficient functions. This scheme is expected to combine the advantages of the FFNS and ZM-VFNS, although some level of arbitrariness is unavoidably introduced in the treatment of the interpolation. Therefore, different variants of the GM-VFNS are available [15,16,17,18,19,20,21,22]. Moreover, this arbitrariness prevents a clear interpretation of the heavy-quark masses in terms of a specific scheme; therefore the heavy-quark masses in GM-VFNS must be treated as effective mass parameters. 2 In the context of VFNS many non-perturbative models, particularly those based on the light-cone wave-function picture, expect an "intrinsic charm" component of the nucleon at an energy scale comparable to the c-quark mass. This intrinsic-charm component, if present at a low-energy scale, will participate fully in QCD dynamics and evolve along with the other partons as the energy scale increases; for more details see, e.g. [24] and references therein. Such models predict a sizeable intrinsic-charm contribution to heavy-flavour production, but in the phase-space regions which are difficult to be probed with currently available experimental data [25] (see Section 7.2.1). In the recent analysis [26] some evidence was found that the intrinsic charm PDF at large parton momentum and low-energy scale carries about 1% of the total momentum of the proton. Future LHC data are expected to further constrain a possible intrinsic-charm component of the proton. 1 Note that in some variants of the FFNS, heavy quarks contribute to the loops in the PDF evolution and α s running (see, e.g. [11,12]); sometimes these variants are called the mixed-flavour-number scheme [13]. 2 Although at a certain order of pQCD a VFNS can be converted to use other heavy-quark mass definition, for example see [23] for FONLL structure functions with MS running masses.

Quark masses
Since free quarks are unobservable, one can consider different definitions of the quark mass m Q . One of the most popular choices is the pole quark mass, m pole Q , defined as the mass at the position of the pole in the quark propagator in perturbation theory. This quantity is introduced in a gauge invariant way and is well defined in each finite order of perturbation theory. This convenient feature has made it very popular and widely used in perturbative calculations, although it has an important drawback: any definition of this quantity suffers from an intrinsic uncertainty of order Λ QCD m Q . The problem arises for the reason, that the pole mass is sensitive to large-distance dynamics (infrared contributions ). 3 Alternative mass definitions avoid this problem. The most prominent example is the MS mass, m Q (µ r ), which is to be evaluated at the renormalisation scale µ r , where µ r Λ QCD , and which is free of ambiguities of order Λ QCD . One benefit of theoretical predictions using the MS mass is improved stability of the perturbative series with respect to scale variations as compared to the result in the pole-mass scheme [28]. The scale dependence of the running mass at LO is given by Here m Q (m Q ) denotes the MS running mass evaluated at the scale µ r = m Q . The scale dependence of the charm and beauty running masses has been measured at LEP and HERA 4 [29,30,31] and is shown in Fig. 2. 2. It is found to be consistent with the QCD expectation.
The relation between the pole mass m pole Q and the MS running mass m Q (m Q ) is known to three loops [32,33,34,35]; at one-loop order it is given by (2.4)

Heavy-quark production in ep collisions
Heavy-quark production in deep inelastic ep scattering collisions serves as a test of pQCD (see Sections 5 and 6); moreover, it is directly sensitive to the gluon density of the proton and to the heavy-quark masses (see Section 6. 5. 3). The charm contribution to the inclusive cross section at HERA reaches 30% [36], thus necessitating its understanding for any global QCD analysis based on HERA data. The contents of this Section is partially based on [37], where more details can be found. 3 In other words, the pole mass is unobservable, because of confinement no free colored quarks exist. Perturbation theory itself produces clear evidence for this non-perturbative correction to m pole Q : the signal is the peculiar factorial growth of the high-order terms in the α s expansion corresponding to a renormalon; for more details see, e.g. [27] and references therein. 4 As of November 2016 only preliminary data from HERA on charmmass running is available, i.e. only the most important figures have been publicly released by the H1 and ZEUS Collaborations, while the complete publication of the results is expected soon. The generic electron-proton 5 scattering process, ep → l X, where l is the scattered lepton and X is the hadronic final state, is shown in Fig. 2. 3. It occurs via the exchange of an electroweak boson V * (the superscript * denotes an intermediate vector boson .) of two types: a neutral γ or Z 0 boson; these reactions are called neutral current (NC); a charged W ± boson; these reactions are called charged current (CC ). Denoting the incoming electron and proton four-momenta with k and p, respectively, and the scattered-lepton fourmomentum with k , the event kinematics can be described by the following Lorentz invariant variables: (2.5) 5 Both electrons and positrons are referred to as electrons, unless explicitly stated otherwise. Q 2 is the virtuality of the exchanged boson, W 2 is the bosonproton energy squared, x and y are Bjorken scaling variables. The y variable is also referred to as inelasticity. The variables x, y and Q 2 are related by 6) with s = 2k · p ≈ s tot approximately equals to the centre-of-mass energy s tot of the experiment. The virtuality Q 2 can be interpreted as the power with which the exchanged boson can resolve the proton structure. Depending on Q 2 , the ep scattering phase space is divided into two regions: deep inelastic scattering (DIS), if Q 2 1 GeV 2 ; photoproduction (PHP), if Q 2 ≈ 0 GeV 2 .
The inelasticity y defines the relative fraction of the electron energy transferred to the hadronic system in the proton rest frame, while the Bjorken variable x determines the relative fraction of the proton energy involved in the partonic subprocess. More details on ep scattering physics, including description of the quark-parton model (QPM), can be found for instance in [38].

Factorisation approach
The inclusive differential cross section of heavy-flavour production in DIS, d 2 σ QQ dxdQ 2 , where QQ stands for c or b quark-antiquark pairs (top production is not accessible at HERA), is expressed in terms of the dimensionless reduced cross sections: xQ 4 2πα 2 (1 + (1 − y) 2 ) , (2. 7) where α is the running electromagnetic coupling. The reduced cross sections can be expressed in terms of the heavy-flavour structure functions F QQ 2 , F QQ L : σ cc red = F cc 2 − y 2 1 + (1 − y) 2 F cc L .
(2. 8) Here the term proportional to the parity-violating structure function, xF 3 , is neglected since Q 2 M 2 Z , where M Z is the Z 0boson mass. Conventionally the structure functions F QQ 2 , F QQ L are defined at the Born level without QED and electroweak radiative corrections, except for the running electromagnetic coupling α = α(Q 2 ). The heavy-flavour structure functions are predicted in the FFNS using light-flavour PDFs as input and the factorisation approach, which gives the field-theory realisation of the parton model in the form of the theorem of the separation of the long-distance from the short-distance dependence for DIS [9]. This theorem states that the sum of all the diagrammatic contributions to the structure functions is a direct generalisation of the parton-model results, given by 9) where i denotes the sum over all partons (massless quarks, antiquarks and gluons), ξ is the momentum fraction of the parton i, which goes from x/x max to 1, x max = 1 1+4M 2 /Q 2 , M = m pole Q is the heavy-quark pole mass, C QQ i 2,L are the heavy-flavour coefficient functions (known also as the hard-scattering functions, or Wilson coefficients, or matrix elements) and f i are the massless PDFs. The factorisation scale µ f serves to define the separation of short-distance from long-distance effects: any propagator that is off-shell by µ 2 f or more will contribute to C QQ i 2,L , while below this scale it will be absorbed into f i . Note that the left-hand side of Eq. 2.9, which is an observable quantity, does not depend on arbitrary scales µ r and µ f by definition. This is a requirement of the factorisation theorem. However, if the right-hand side of Eq. 2.9 is expressed as a perturbative series truncated at a certain order, the calculated value for the observable turns out to be scale-dependent due to the neglected orders. For the actual calculation of F QQ 2,L (x, Q 2 ) one sets the two scales to some fixed values and varies them within a certain range to estimate the effect of the missing higher-order corrections. The C QQ i 2,L are calculated in perturbation theory (see Section 2. 4. 3) but the f i must be extracted by comparing Eq. 2.9 to some standard set of cross sections.
The factorisation prescription is not unique and allows different choices. A set of rules that defines these choices is called a factorisation scheme. Common factorisation schemes are MS [7] or DIS [39,40]. Within such a scheme the PDFs have no physical meaning, since they are dominated by infrared effects and thus by infrared parameters that cannot be measured, although they can be extracted from data by comparing the theoretical calculation 2.9 with measured cross sections. The factorisation theorem ensures that the hard-scattering functions determined in this calculation are insensitive to infrared scales and parameters, and are applicable to cross sections calculated with phenomenologically determined PDFs.
A remarkable consequence of factorisation is that measuring PDFs for one scale µ f 1 allows their prediction for any other scale µ f 2 , as long as both µ f 1 and µ f 2 are large enough which means both α s (µ f 1 ) and α s (µ f 2 ) are small. The evolution of PDFs in µ f is most often, and most conveniently, described in terms of the integro-differential Dokshitzer-Gribov-Lipatov-Altarelli-Parisi (DGLAP) equations [41,42,43,44,45,46,47]: (2. 10) The evolution kernels P i j (x), or the splitting functions, are given by perturbative expansions, beginning with O(α s ); they represent the probability of a parton i to emit a parton j carrying a fraction z = x ξ of the momentum of the parton i. Note that the integral on the right-hand side of Eq. 2.10 begins at x. Thus, it is only necessary to know f i (ξ, µ f 2 1 ) for ξ > x at some starting value of the scale µ f 1 , in order to derive f i (x, µ f 2 2 ) at a higher value µ f 2 > µ f 1 . This is a great simplification, since data at small x are hard to obtain at moderate energies.
At very low values of x, terms proportional to α s ln( 1 x ) may spoil the accuracy of the DGLAP approach; there other evolution schemes, e.g. BFKL [48,49,50] or CCFM [51,52,53,54], might be more appropriate to use. The difference between the schemes comes from the ordering of the emitted partons before entering the hard-scattering process.
Since the perturbative series is truncated at a certain order, the approximation is µ f dependent due to neglected orders. In practice the two scales are often set to be equal, although it is not a requirement. To estimate the perturbative uncertainties of the neglected higher orders, the µ r and µ f scales are varied around the central values, simultaneously or independently.

Calculations in FFNS
In the FFNS with n f = 3 there are no c and b quarks in the proton at any scale, therefore the leading-order (LO) process (O(α s )) for heavy-flavour production in DIS is the boson-gluon-fusion (BGF) process [55,56,57,58,59], gγ * → QQ, shown in Fig. 2. 4. The corresponding hard-scattering functions C gγ * →QQ (0) 2,L are given by: 11) where e c = +2/3 is the c-quark charge in units of the proton charge, = M 2 Q 2 , M = m pole Q is the heavy-quark pole mass, ν = 1 − 4 z 1−z and z ≡ x ξ running from x to 1 1+4 . Note that the LO hard-scattering functions in Eq. 2.11 do not depend on the factorisation and renormalisation scales (the dependence on the latter appears only via the strong coupling), therefore the structure functions in Eq. 2.9, calculated at LO, necessarily turn out to be dependent on these arbitrary scales. For higher-order calculations this dependence partially cancels in the convolution of the hard-scattering functions, PDFs and running strong coupling. The two scales are chosen to be of the order of Q 2 or M 2 ; a typical choice is µ 2 r = µ 2 f = Q 2 + 4M 2 [60]. Next-to-leading-order (NLO) corrections (O(α 2 s )) were calculated in [61,62]. They can be classified into three groups: 1. real corrections to the BGF process, i.e. all processes containing an extra gluon in the final state gγ * → QQg; 2. virtual corrections to the BGF process coming from the interference of O(α s ) and O(α 3 s ) terms; 3. a new process, when the virtual photon interacts with a light quark q in the proton: γ * q(q) → QQq(q ).
The NLO predictions [61,62] are available in the HVQDIS program [63], which calculates fully differential double-particle inclusive cross sections. The pole-mass definition is used in these calculations. The NLO corrections for charm production are important as they change both the shape and normalisation of the transverse momentum, pseudorapidity, and x distributions, while the Q 2 distribution only receives a shift in normalisation [63]. In the kinematic region of HERA, the scale dependence of the NLO calculations for charm production is moderate: it varies from 10% at high Q 2 to 30% at low Q 2 (see Sections 5.8, 6.4.1 and 6. 5. 3).
In a recent variant of the FFNS from the ABM group the running-mass definition in the MS scheme is used [28]. This scheme has the advantage of improving the convergence of the perturbative series (see Section 2. 3). These predictions are provided for inclusive quantities only, i.e. at the F QQ 2,L level. At next-to-next-to-leading order (NNLO) (O(α 3 s )) only approximate calculations are available. For F 2 , four out of the five massive Wilson coefficients are known at large scales Q 2 [64,65,66,67,68], and an estimate has been made for the remaining coefficient [69] based on the anticipated small-x behavior, a series of moments [70], and two-loop operator matrix elements [71,72]. E.g. in Ref. [69] combined approximate expressions for three kinematic limits are given: in the limit of high partonic centre-of-mass energy squared,ŝ m 2 Q , in the threshold region,ŝ 4m 2 Q , and in the high-scale region Q 2 m 2 Q .

Calculations in VFNS
In the VFNS, the LO process for heavy-flavour production in ep collisions is the QPM scattering. At NLO, fully differential calculations exist only in the ZM-VFNS [73,74,75].
The main difference between the FFNS and ZM-VFNS mechanisms can be attributed to the fact that for heavy-quark production in the FFNS, two heavy particles appear in the final state instead of one as in the case of the intrinsic heavy-quark approach 6 . This reveals itself in the p T -distribution where for the FFNS the quark and antiquark appear back to back in the Breit frame. The heavy-flavour data from HERA [76,77] clearly prefer the p T -spectrum predicted by the FFNS production mechanism (see Fig. 4.2 in Section 4. 1).
Calculations in the GM-VFNS for heavy-flavour production in DIS exist only at the inclusive F QQ 2 level. Some of the most popular GM-VFNS are the Thorne-Roberts (RT) [21,78,79], Aivazis-Collins-Olness-Tung (ACOT) [16] and FONLL [22] schemes. The calculations are available at NLO and (approximate) NNLO orders. Predictions from various variants of GM-VFNS were compared to the combined HERA charm data in [60]; they are generally found to describe the data well in the region Q 2 5 GeV 2 .

Heavy-quark production in pp collisions
Similar to the case of ep collisions, heavy-quark production in hadronic collisions is interesting either as a benchmark process for the study of pQCD or as a probe of the nucleon structure [80].
Most important examples of the latter are: inclusive heavy-flavour production at high energy mostly probes the gluon density of the proton, since the leading process is gg → QQ [81,6,82]. This covers a wide kinematic range, because a hard scale provided by the mass of heavy quarks allows applicability of pQCD even at low transverse momentum p T ∼ Λ QCD ; -W ± +c final states probe the strange-quark content of the proton, since the LO production mechanism is gs → W ± c [83].
The understanding of heavy-quark production is also important in searches for possible new physics, where QCD-initiated heavy-quark final states cause large backgrounds for such analyses. The contents of this Section is partially based on [80,84], where more details can be found. The cross sections for heavy-flavour production in pp collisions are calculated in pQCD using the factorisation approach, similar to Eq. 2.9: (2. 12) Here the sum in i, j goes over all relevant partons,σ i j→QQ is the perturbatively calculated partonic cross section, x 1 , x 2 are momentum fractions carried by the two incoming partons, and f i , f j are the PDFs, introduced in Section 2. 4.2, for the two incoming protons p 1 and p 2 . Note that similarly to Eq. 2.9 the left-hand side of Eq. 2.12 is an observable quantity and does not depend on scales µ r and µ f by definition.
In the FFNS at LO (O(α 2 s )) two processes are responsible for heavy-quark production: qq → QQ and gg → QQ.
(2. 13) The diagrams are shown in Fig. 2.5 and the corresponding differential partonic cross sections are [85,86,87,88,89,56]: dσ i j→QQ (ŝ, θ) dφ (2) = M(i j → QQ) 2 , 14) where N = 3 is the number of colors, V = N 2 − 1 = 8 is the dimension of the SU(3) gauge group, i.e. the number of gluons,ŝ = (x 1 p 1 + x 2 p 2 ) 2 is the squared partonic centre-ofmass energy, τ 1,2 = (1 ∓ βcosθ)/2, θ is the partonic scattering angle, ρ = 4M 2 /ŝ, β = 1 − ρ, M = m pole Q is the heavy-quark pole mass, and dφ (2) is the two-body phase-space element given by (2. 15) The total production cross section for heavy quarks is finite at LO, owing to the fact that M is the minimum virtuality exchanged in the t-channel, therefore no poles can develop in the intermediate propagators. This is not the case for light quarks: the total production cross section for u or d quarks is not calculable in pQCD [80]. Note that the LO partonic cross sections in Eq. 2.14, similarly to the LO hard-scattering functions in Eq. 2.11, do not depend on the factorisation and renormalisation scales, except for the α 2 s (µ 2 r ) dependence. The scales are chosen to be of the order of the energy involved in the hard process.
T is the average squared transverse momentum of the produced heavy quark and antiquark. Other possible choices are the off-shell of the internal lines in the diagrams in Fig. 2 The total partonic cross section can be obtained by integrating over the partonic scattering angle: 16)  where L(β) = 1 β log 1+β 1−β − 2. At largeŝ the qq rate drops more quickly than gg, as can be seen from Eq. 2.16 (this remains true also when NLO effects are considered). In addition, threshold effects for the qq channel vanish very quickly as soon asŝ > 4m 2 Q ; this is related to the spin 1/2 of quarks [80].
To calculate the differential hadronic cross section of Eq. 2.12, the partonic cross section in Eq. 2.14 needs to be convoluted with the PDFs in the hadrons. The kinematics of the final state can be parametrised in terms of the transverse momenta p T 1 , p T 2 and rapidities y 1 , y 2 of the produced quark and antiquark, which are related at LO to the parton momentum fractions x 1 , x 2 : where = √ s/M T , M T = M 2 + p 2 T , p T = p T 1 = p T 2 and s is the squared hadron centre-of-mass energy. The resulting phase-space element is 1 4π 2 dy 1 dy 2 dp 2 T , (2. 18) and the differential cross section at LO is: dσ pp→QQ dy 1 dy 2 dp 2 (2. 19) As can be seen from Eq. 2.19, for a fixed value of p T the rate is suppressed when |y 1 − y 2 | becomes large, therefore the quark and antiquark tend to be produced with the same rapidity. At the LHC the bulk of the contribution to heavy-flavour production directly probes the gluon content of the proton and serves to improve its knowledge over the HERA determination (see Section 8).

MNR calculations
NLO corrections come from two sources of O(α 3 s ) diagrams: real-and virtual-emission diagrams. In the first case, the corrections come from the square of the real-emission matrix elements; in the second case, from the interference of the virtual matrix elements (of O(α 4 s )) with the tree-level ones (of O(α 2 s )). Ultraviolet divergences in the virtual diagrams are removed by the renormalisation process. Infrared and collinear divergences, which appear both in the virtual diagrams and in the integration over the emitted parton in the real-emission processes, cancel each other or are absorbed in the PDFs. The complete calculations of NLO corrections to the production of heavy-quark pairs in hadro-and in photoproduction were done in [90,91] (total hadroproduction cross sections), [92,93] (one-particle inclusive distributions in hadroproduction), [94,95] (total and oneparticle inclusive distributions in ep photoproduction), [96] (two-particle inclusive distributions in hadroproduction) and [97] (two-particle inclusive distributions in ep photoproduction). They are known as Mangano-Nason-Ridolfi (MNR) calculations and are available in the MNR program [98], which calculates double-or single-particle inclusive or total cross sections. The pole-mass definition is used in the calculations.
There are a few important remarks concerning the NLO calculations: no collinear singularities appear when gluons are emitted from the final-state heavy quarks, since they are screened by the heavy-quark mass. Therefore, contrary to the case of a light parton, the p T distribution for a heavy quark is a well-defined quantity in NLO. For light partons, a collinear singularity would be encountered that requires the introduction of a fragmentation function, not calculable from first principles (see also Section 2.6); at large p T , nevertheless, large ln(p T /m Q ) factors appear, signalling the increased probability of collinear gluon emission. At large p T , the massive quark behaves similar to a massless particle. These logarithms can be resummed using the fragmentation-function formalism (see next Section 2.5.2); new processes appear at NLO which drastically change thê s dependence of the cross sections and/or the kinematic distributions; there is evidence however that NLO is not sufficient to get accurate estimates, since a large scale dependence is still present. This is demonstrated in Fig. 2.6, which shows as an example the scale dependence of the inclusive p T distribution of b quarks at the Tevatron. At the LHC, the estimated uncertainty is of the order of 50% for b-quark production, and it is even larger for c-quark production, owing to the smaller value of the c-quark mass. Large scale dependence is a symptom of large NNLO corrections. 7 For an extensive and a more quantitative analysis of the NLO corrections, as well as a general discussion of the corrections beyond the Born level, see Ref. [80].

FONLL calculations
The fixed-order plus next-to-leading-logarithms (FONLL) calculations [101] were developed for improving the large-p T differential cross section for heavy-quark production in hadronhadron collisions and then were extended to photoproduction in ep collisions [102]. This approach is a variant of GM-VFNS, based on the matching of NLO massive and massless calculations according to the prescription [84]: 20) Here FO denotes the massive NLO cross section, where a heavy quark enters only in the partonic scattering through the flavourcreation processes, but not in the PDFs, and its mass is kept as a non-vanishing parameter. This part, which is singular in the massless limit, and the finite parts related to its different definition in dimensional and mass regularisation are denoted FOM0 and therefore resummed to next-to-leading-logarithm order in the contribution denoted RS. The RS contribution is then added to the FO calculation, while the overlap FOM0 is subtracted to avoid double counting. This is controlled by the matching function G(m Q , p T ), which must tend to unity in the massless limit p T m Q , where FO approaches FOM0 and the mass logarithms must be resummed. In FONLL its functional form is 21) with an ad-hoc constant a = 5.
Comparison of the NLO and FONLL calculations for beauty production at the Tevatron is shown in Fig. 2.7, where uncertainty bands obtained from the scale variations are shown. The resummation procedure indicates the presence of a small enhancement in the intermediate-p T region, followed by a reduction of the cross section (and of the uncertainty band) at larger p T [101]. Both uncertainty bands fully overlap in a wide p T range. FONLL predictions for LHC data are given in [103]; they can be also obtained using the public web interface [104].

Other GM-VFNS calculations
Other GM-VFNS calculations [105] were originally performed in the massless limit, valid at high p T , and therefore include flavour-creation, gluon-splitting and flavour-excitation processes [84]. Subsequently the calculations were improved by identifying the previously omitted finite-mass terms through a comparison with the massive NLO calculation, where together with the mass logarithms, finite terms were also subtracted in such a way that in the limit m Q → 0 the correct massless MS result was recovered, since the PDFs and perturbative fragmentation functions that are convoluted with the partonic cross sections are defined in the ZM-VFNS.

Fragmentation of heavy quarks
The production of hadrons in QCD can only be described by taking into account a non-perturbative hadronisation phase, i.e. the processes which transform objects amenable to a perturbative description (quarks and gluons) into real particles (see Fig. 5.1 in Section 5). The contents of this Section is partially based on [106], where more details can be found.
In the case of light hadrons, the QCD factorisation theorem [107,108,9,109,110,111] allows for factorisation of these non-perturbative effects into universal (but factorisation-scheme dependent) fragmentation functions: (2. 22) In this equation, valid up to higher-twist corrections of order Λ QCD p T , the partonic cross sections dσ h dp T for production of the hadron h are calculated in pQCD, while the fragmentation functions, D i→h (z, µ), are usually extracted from fits to experimental data (not to be confused with the heavy-quark perturbative fragmentation functions, introduced for the GM-VFNS calculations, which initial values at the starting scale are calculable perturbatively [112]). The fragmentation function D i→h (z, µ) describes the probability that a parton i fragments into a hadron h carrying a fraction z of the momentum of the parton i. Due to their universality they can be used to make predictions for different processes. The factorisation scale µ is a reminder of the nonphysical character of both the partonic cross sections and the fragmentation functions: it is usually taken of the order of the hard scale of the process (p T ), and D i→h (z, µ) are evolved from a low scale up to µ by means of the DGLAP evolution equations. 8 This general picture becomes different for the production of heavy-flavoured hadrons. NLO QCD calculations describe the production of an on-shell heavy quark. Still, mimicking the factorisation theorem given above, the quark to hadron transition can be described by convoluting the heavy-quark production cross section with a suitable scale-independent non-perturbative fragmentation function, D np Q→H (z), describing the hadronisation of the heavy quark: 23) It is worth noting that at this stage this formula is not the result of a rigorous theorem, but is used on a purely phenomenological basis. Moreover, it will in general fail (or at least be subjected to large uncertainties) in the region where the mass m Q of the heavy quark is not much smaller than its transverse momentum p T , since the choice of the scaling variable, z, is no longer unique, and O(m Q /p T ) corrections cannot be neglected. This leads to a modelling uncertainty which is, however, small compared to the perturbative uncertainty at NLO. An important characteristic of the non-perturbative fragmentation function is that the average fraction of momentum lost by the heavy quark when hadronising into a heavy-flavoured hadron, z np , is given by [113,114] 24) Since (by definition) the mass of a heavy quark is much larger than the scale Λ QCD , this amounts to saying that the nonperturbative fragmentation function for a heavy quark from Eq. 2.23 is very hard, i.e. the quark loses very little momentum when hadronising. This can also be seen by noting that a fast massive quark will lose very little speed (and hence momentum) when picking up a light quark of mass Λ QCD from the vacuum to form a heavy meson. 9 This basic behaviour is to be found as a common feature in all the non-perturbative heavy-quark fragmentation functions, derived from various phenomenological models. Among the most commonly used are the Kartvelishvili-Likhoded-Petrov [118], Bowler [119], Peterson-Schlatter-Schmitt-Zerwas [120] and Collins-Spiller [121] functions.
These models all provide some functional form for the D np Q→H (z) function and one or more free parameters that control its hardness. Such parameters are usually not predicted by the models (or only very roughly), and must be fitted to experimental data.
There are two important aspects concerning the fragmentation of heavy quarks: 1. a non-perturbative fragmentation function is designed to describe the transition from the heavy quark to the hadron, involving many soft gluons with energies of the order of Λ QCD . However, if a heavy quark is produced in a highenergy event, it will initially be far off-shell: hard gluons will be emitted to bring it on-shell, reducing the heavyquark momentum and yielding in the process large collinear logarithms. The amount of gluon radiation is related to the distance between the heavy-quark mass scale and the hard scale of the interaction, and is therefore process dependent.
To account for this dependence, different free parameters of the non-perturbative fragmentation function are used at different centre-of-mass energies or transverse momenta (see, e.g. [122]); 2. since only the final heavy-flavoured hadron is observed, both the non-perturbative fragmentation function and the perturbative cross section for producing heavy quarks must be regarded as non-physical objects. The details of the fitted non-perturbative fragmentation function (e.g. the precise value(s) of its free parameter(s)) depend on those of the perturbative cross sections: different perturbative calculations (LO, NLO, FONLL etc.) and different perturbative parameters (heavy-quark masses, strong coupling etc.) lead to different non-perturbative fragmentation functions. These in turn will have to be used only with a perturbative description similar to the one within which they have been determined [123].

Concluding remarks
QCD provides robust predictions for heavy-flavour production, owing to the presence of the finite heavy-quark mass which provides a hard scale for perturbative calculations. However application of perturbative calculations to any process involving hadrons requires a priori knowledge of proton PDFs which are not calculable perturbatively, but must be extracted from data. In addition to describe the transistion of heavy quarks into colorless heavy-flavoured hadrons phenomenological fragmentation functions have to be used. Alternative treatments of the heavyquark mass effects in perturbative calculations lead to several schemes. In the phase space of currently available experimental data the most rigorous calculations are performed in the FFNS, when mass effects of heavy quarks are fully taken into account in all parts of calculations at the price that this potentially may spoil the convergence of the perturbative series at high energy scales.
The dominant heavy-flavour production process at HERA is the boson-gluon-fusion process and at the LHC it is the gluongluon fusion. Therefore at both colliders the gluon distribution is an essential ingredient to predict the production rate. In other words existing precise heavy-flavour data help to pin down the gluon distribution (see Section 8).
Currently exact pQCD calculations exist at NLO for heavyflavour production both in ep and pp collisions; but only approximate NNLO calculations are available. Since the calculations depend on non-perturbative input (PDFs and fragmentation), it is important to remember that a careful treatment of the latter is crucial for a meaningful comparison of data and theory. Uncertainties in the predictions come from missing higher orders (known as scale uncertainties), QCD parameters (the heavyquark masses and strong coupling constant), input PDFs and phenomenological fragmentation functions; in the bulk of the available phase space, even at NLO, they are dominated by scale uncertainties (especially in the case of pp collisions, where these uncertainty are of the order of factor 2 for charm production at the LHC). Therefore progress in theoretical calculations is crucial for performing strong tests of QCD. Nevertheless already with currently available calculations one can not only test QCD but also use experimental data for significant improvement in the precision of parameters of QCD (mainly the heavy-quark masses) and the gluon content of the proton, and leading to an improved predictive power of the Standard Model.

HERA collider
HERA played a prominent role in the exploration of the proton structure. It emerged from a series of electron-proton accelerator studies in the 70's as the highest energy ep collider possible, which made it possible to produce both NC and CC reactions simulataneously and study electroweak unification. The description below is partially based on [124].
HERA (German: Hadron-Elektron Ring Anlage), at DESY, Hamburg, was the first, and so far the only, accelerator complex in which electrons and protons were collided [125]. It was built in the 80's with the capability to scatter polarised electrons and positrons off protons, at an energy of the proton beam of initially 820 GeV until it was increased to 920 GeV, in 1998. Together with an electron energy of 27.5 GeV, this resulted in a centreof-mass energy, √ s, of about 320 GeV. The energy was high enough to probe the phase space in x down to 10 −6 and Q 2 up to 30000 GeV 2 . The protons were accelerated and stored in a ring of superconducting magnets. The electron ring was normal conducting. A schematic view of the HERA accelerator ring and preaccelerators is shown in Fig. 3. 1.  Two general-purpose detectors with nearly 4π acceptance were proposed in 1985, H1 [127] and ZEUS [128]. They were operated over the 16 years of HERA operation. Two further experiments at HERA were built and run in the fixed-target mode. The HERMES experiment [129] (1994-2007) used the polarised electron beam to study spin effects in lepton-nucleon interactions using a polarised nuclear target. The HERA-B experiment [130] (1998-2003) was designed to investigate B-meson physics and nuclear effects in the interactions of the proton-beam halo with a nuclear wire target.
The first HERA data were taken in summer 1992. HERA had its first phase of operation (referred to as HERA-I) from 1992 through 2000. In this period, the collider experiments H1 and ZEUS each recorded data corresponding to integrated luminosities of approximately 120 pb −1 of e + p and 15 pb −1 of e − p collisions. The HERA collider was then upgraded to increase the specific luminosity by a factor of about four, as well as to provide longitudinally polarised lepton beams to the collider experiments [131]. The second data-taking phase (referred to as HERA-II) began in 2003, after completion of the machine and detector upgrades, and ended in 2007. The H1 and ZEUS experiments each recorded approximately 200 pb −1 of e + p and 200 pb −1 of e − p data with electron (positron) energy of approximately 27.5 GeV and proton energy of 920 GeV. The lepton beams had an average polarisation of approximately ±30% with roughly equal samples of opposite polarities recorded. In the last three months of HERA operation, data with lowered protonbeam energies of 460 GeV (referred to as LER, Low Energy Run) and 575 GeV (referred to as MER, Middle Energy Run) were taken; each experiment recorded approximately 13 pb −1 and 7 pb −1 of the LER and MER data, respectively. The primary purpose of the LER and MER data was the measurement of the longitudinal proton structure function F L .
HERA ceased operations in June 2007 after a long, successful data-taking period of 16 years. A wealth of results have been published. The studies have considerably enlaged the knowledge on the proton structure and provided tests of the Standard Model. There are still ongoing analyses.

H1 and ZEUS experiments
The collider detectors H1 [127,132,133] and ZEUS [128] were designed primarily for deep inelastic scattering (DIS) at large virtuality Q 2 and large final-state energies. Thus, much attention was paid to the electromagnetic and hadron calorimeters. The H1 Collaboration chose liquid argon as active material for their main calorimeter to maximise long-term reliability. The ZEUS Collaboration chose scintillator active media and depleted uranium as the absorber material with the property of equal "eπ" response to electrons and hadrons. The calorimeters were complemented by large-area wire chamber systems to measure muon momentum and the tail of hadron-shower energy. Because the electron-and proton-beam energies were very different, the detectors were asymmetric, with extended coverage of the forward (proton-beam) direction. Drift chambers inside the calorimeters, both in H1 and in ZEUS, were segmented into a forward and a central part. Later, in H1 starting in 1996 and in ZEUS from 2003 onwards, silicon detectors near the beampipe were installed for precision vertexing and tracking. Both apparatus were complemented with detector systems positioned near the beam axis in the accelerator tunnel, to measure backward photons and electrons, mainly for the determination of the beam interaction luminosity, and to tag leading protons and neutrons in the forward direction. Both experiments took data over the entire time of HERA's operation with efficiency of 70-80%. The main components of the H1 and ZEUS detectors are briefly described below with the main emphasis on the components most relevant for studies of heavy-flavour production: the tracking systems for the precise reconstruction of tracks and vertices, which is crucial for the identification of heavy-flavoured hadrons, and the calorimeters, needed for the identification of the scattered electron and for the reconstruction of event kinematical variables.

H1 detector
A schematic view of the H1 detector [127,132,133] along the beampipe and the main detector components are shown in Fig. 3.2. The coordinate system is a right-handed Cartesian system, with the Z axis in the proton-beam direction, referred to as the "forward direction", and the X axis pointing towards the centre of HERA. The coordinate origin is at the nominal interaction point. The pseudorapidity is defined as η = − ln tan θ 2 , where the polar angle, θ, is measured with respect to the Z axis. The azimuthal angle, φ, is measured with respect to the X axis.  Charged particles were measured within the central tracking detector (CTD) in the pseudorapidity range −1.85 < η < 1. 85. The CTD consisted of two large cylindrical jet chambers (CJCs), surrounding a system of three silicon detectors consisting of the Central Silicon Tracker (CST) [134], and the Forward and Backward Silicon Trackers [135]. The CJCs were separated by a drift chamber which improves the z-coordinate reconstruction. A multiwire proportional chamber [136], which was mainly used in the trigger, is situated inside the inner CJC. These detectors are arranged concentrically around the interaction region in a magnetic field of 1.16 T. The trajectories of charged particles were measured with a transverse momentum resolution of σ(p T )/p T = 0.005 · p T /GeV ⊕ 0.015 [137]. 10 The interaction vertex was reconstructed from CTD tracks. The CTD also provided triggering information based on track segments measured in the CJCs [138,139,140] and a measurement of the specific ionisation energy loss, dE/dx, of charged particles. The Forward Silicon Tracker measured tracks of charged particles at smaller polar angles (1.5 < η < 2.8) than the central tracker.
Charged and neutral particles were measured in the liquid argon (LAr) calorimeter, which surrounded the tracking chambers and covers the range −1.5 < η < 3.4 with full azimuthal acceptance [141]. Electromagnetic shower energies were measured with a precision of σ(E)/E = 12%/ √ E/GeV ⊕1% and hadronic energies with σ(E)/E = 50% √ E/GeV ⊕2%, as determined in test beam measurements [142,143]. A lead-scintillating fibre calorimeter, also referred to as the Spaghetti Calorimeter (SpaCal), [133] covered the backward region −4.0 < η < − 1.4 (the region of high Q 2 ) completed the measurement of charged and neutral particles. For electrons the SpaCal had a relative energy resolution of σ(E)/E = 7%/ √ E/GeV ⊕1%, as determined in test beam measurements [144]. The SpaCal provided energy and time-of-flight information used for triggering purposes. Because the LAr calorimeter was non-compensating and had on average a larger response to electromagnetic compared to hadron energy depositions, a software weighting method had to be applied for the energy reconstruction. The hadronic final state was reconstructed using an energy flow algorithm which combines charged particles measured in the CTD and the forward tracking detector with information from the SpaCal and LAr calorimeters.
The luminosity determination was based on the measurement of the Bethe-Heitler process [145] where the photon was detected in a calorimeter located at Z = −103 m downstream of the interaction region in the electron beam direction. Additionally, the overall integrated luminosity normalisation was determined using a precision measurement of the QED Compton process [146] with the best achieved relative uncertainty on the measured luminosity was 2.3%.
To reduce the event rate to technickally acceptable ≈ 10 Hz a sophisticated multilevel trigger system was used at H1 [147,148]. The first trigger level was supposed to stop the pipeline. The decision was based on special trigger signals from various detector components. The second trigger level started the readout and used neural networks and topological triggers. The third trigger level was placed into operation in 2005 and was mainly used for heavy-quark decays identification. It used timeoptimised routines for the reconstruction of decay resonances and event properties, therefore event building was started on this level. On the fourth trigger level an on-line event reconstruction was performed.

ZEUS detector
A schematic view of the ZEUS detector [128] along the beampipe and the main detector components are shown in Fig. 3. 3. The ZEUS coordinate system is the same as for the H1 detector. The main detector components are briefly described below. The momenta of charged particles were measured by the Central Tracking Detector (CTD) [149,150,151] in the 1.43 T magnetic field of the solenoid [152]. The CTD was a cylindrical drift chamber measuring the direction, momentum and energy loss (dE/dx). It was filled with a gas mixture of argon, carbon dioxide and ethane. The CTD was made of 72 layers of wires, which were grouped in 9 superlayers. The angular coverage of the CTD was 15°< θ < 164°and the momentum resolution for the full-length tracks in the HERA-I period was determined to be σ(p T )/p T = 0.0058 · p T /GeV ⊕ 0.0065 ⊕ 0.0014 GeV/p T .
At the time of the HERA luminosity upgrade during the shutdown period 2000-2001, the tracking system of the ZEUS detector was upgraded with the Microvertex Detector (MVD) [153] (Fig. 3.4). The MVD was a silicon-strip vertex detector, mainly supposed to allow reconstruction of secondary vertices and track impact parameters from heavy-quark decays. The MVD consisted of two sections: barrel (BMVD) with an angular coverage 30°< θ < 150°and forward (FMVD), which extended the coverage to 7°. The momentum resolution of the combined tracking system MVD+CTD for fulllength tracks in the HERA-II period was determined to be σ(p T )/p T = 0.0029· p T /GeV⊕0. 0081⊕0.0012 GeV/p T , indicating an improved transverse momentum resolution, although the MVD material between the interaction point and the CTD increases the probability for multiple scattering.
The forward region of the ZEUS detector required enhanced tracking and particle identification capabilities due to the asymmetric beam energies. It consisted of the Forward Tracking Detector (FTD) and the Transition Radiation Detector (TRD). The purpose of the FTD was to reconstruct low-angle tracks of ionising particles whereas the TRD separated electrons from hadrons. During the HERA luminosity upgrade programme the TRD was replaced by the Straw Tube Tracker (STT) [154], which improved the tracking efficiency in events with high multiplicities. In the rear direction the Rear Tracking Device (RTD) was located. To determine the position of the scattered electron near the beampipe, the small-angle rear tracking detector (SRTD) [155] was used.
The most important sub-detector that measured energies was the calorimeter (CAL) [156,157,158,159]. The CAL was mechanically subdivided in three parts: the Barrel Calorimeter (BCAL) covering polar angles from 36 The CAL was a sampling calorimeter consisting of plates of depleted uranium interleaved with plastic scintillator as active material. The ratio of absorber and scintillator thickness had been chosen to achieve equal signals from hadrons and electromagnetic showers, thereby producing the best possible resolution for hadrons. The CAL provided precise energy measurements for hadrons and jets, an angular resolution for jets better than 10 mrad, the ability to discriminate between hadrons and electrons using their different energy depositions, and a time resolution of 1 ns. The energy resolution for electrons and hadrons as determined under test-beam conditions was 18%/ √ E/GeV and 35%/ √ E/GeV, respectively. The Backing Calorimeter (BAC) was built to fulfill two tasks: to achieve a hermetic hadron jet-energy measurement and to aid the tracking of muons passing through the iron yoke of the detector. To measure the energy of hadron-shower leakages out of the CAL and to correct jet-energy measurements, the BAC was equipped with an analog readout, giving precise information on the deposited energy but only approximate information on the deposit position. To enable muon tracking in the iron yoke, a complementary digital readout was designed, giving basically no information about the deposited energy, but exact position in two dimensions. This information was used for better positioning of shower leakages and for discrimination between leaking hadron cascades and muons. To identify muons, the forward muon detector (FMUON) was located in front of the magnet yoke and the barrel and rear muon detectors (BMUON, RMUON) [160] inside and outside the iron yoke. Note that one of techniques used at HERA to measure the production of charmed and beauty hadrons is to identify their decays into muons (see Section 4).
The luminosity was measured at ZEUS using the bremsstrahlung reaction ep → e γp by a lead-scintillator calorimeter (PCAL) [161], located at Z = −107 m, and (after the HERA upgrade) an independent magnetic spectrometer (SPEC) [162], located at Z = −104 m. The best achieved relative uncertainty on the measured luminosity was 1.8%.
To reduce the event rate from the highest collision rate ≈ 10 MHz to technically acceptable ≈ 10 Hz, a three-level trigger system was used at ZEUS. The First Level Trigger (FLT) [163,164] consisted of hardware trigger systems in individual sub-detectors, which sent the information to the Global First Level Trigger (GFLT) to perform the decision. Events that passed the GFLT were processed further by the Second Level Trigger (SLT), based on software triggers, which used information on charged-particle tracks, the interaction vertex, calorimeter timing and global energy sums [165]. Events that passed the SLT were processed by the Third Level Trigger (TLT) [166], which took the decision based on the global information from an event. Finally, events that passed the TLT were written to tape to be fully reconstructed offline.

Overview of existing measurements of charm production at HERA
This Section describes tagging techniques used to measure open 11 charm production at HERA and gives an overview of the measurements in deep inelastic scattering (DIS) done by the H1 and ZEUS Collaborations. This overview is restricted to those measurements which are later (see Section 6) used for their combination; the measuremerents are listed in Table 4. 1. A detailed description of event reconstruction and inclusive DIS selection can be found in next Section 5, where the ZEUS measurement of D + production [4] is outlined, whereas the present Section merely discusses techniques used to identify charm production.

Reconstruction of D * + mesons in the "golden"
decay channel D * + mesons are reconstructed in the "golden" decay channel D * + → D 0 π + s and subsequently D 0 → K − π + . The π + s denotes a "slow" pion with a low momentum in the D * + centre-of-mass frame, since the mass of D * + is only slightly above the sum of the masses of D 0 and π + . This results in a narrow peak for the mass difference ∆M = M(K − π + π + s ) − M(K − π + ) near the threshold, accompanied with a not too large combinatorial background and hence the best signal-to-background ratio. The smallness of the mass difference in a charmed hadron decay with emission of a low momentum pion was first observed in a bubble chamber event at BNL [173]. It was proposed for charmed mesons in [174] and widely used in various experiments (e.g. [175,176,36,168,177]). The main shortcoming is that in practice D * + mesons can be measured in the limited kinematic space p T (D * + ) 1.25 GeV only, otherwise the transverse momentum of the slow pion is too small and its track cannot be reconstructed. Another limitation comes from the fact that all decay products have to be reconstructed in the tracking system, thus the production of D * + mesons can be measured in the central region only, typically |η(D * + )| 1. 8. Also, the product of the branching ratios for the decay channels D * + → D 0 π + s and D 0 → K − π + is about 3% only [8]. However still the most precise measurements of open charm production at HERA were obtained using this technique.
Distributions of the reconstructed mass difference ∆M for the most precise H1 and ZEUS HERA-II measurements [172,77] are shown in Fig. 4. 1. Note that these measurements are performed in slightly different ranges of p T (D * + ) and η(D * + ), therefore the ZEUS measurement has a better signal-to-background ratio and narrower peak at the cost of two times smaller statistics. Both experiments performed a subtraction of the background using the wrong-sign combinations, obtained by forming "D 0 candidates" by combining two tracks with the same sign. 11 When the measured final state contains a charmed hadron.
The measured cross sections of D * + production as a function of Q 2 , y, x, p T (D * + ), η(D * + ) and z(D * + ) = (E(D * + ) − p Z (D * + ))/(2E e y), with E e being the incoming electron energy, E(D * + ) and p Z (D * + ) the energy and longitudinal momentum of D * + , respectively, are shown in Fig. 4.2 and compared to the NLO predictions, obtained in the ZM-VFNS and FFNS (see Sections 2.2 and 2.4). The dominant experimental uncertainty is the systematic uncertainty on the tracking efficiency (≈ 4%); in most of the bins the statistical uncertainty is smaller than the total systematical one. The FFNS predictions describe the data reasonably well within uncertainties, with a possible exception for the shape of the z(D * + ) distribution. The ZM-VFNS predictions describe the data significantly less well; in particular, they fail to describe the shape of p T (D * + ), y and x distributions.

Reconstruction of weakly decaying D mesons
The exploitation of the long lifetime of weakly decaying charmed hadrons allows their identification. All final decay products must be charged particles reconstructed in the tracking system. Examples of such decay channels are D + → K − π + π + and D 0 → K − π + . Large combinatorial background can be significantly suppressed by applying a cut on lifetime information (e.g. track impact parameters or decay-length significance), although since the background rises steeply towards lower values of p T (D), a lower cut on p T (D) has to be applied; a cut on transverse momentum also improves the efficiency of the lifetime information. It should be noted that there are limitations of this technique, which are similar to those of the previous one: a measurement can be performed only in a fiducial transverse-momentum and pseudorapidity phase-space region and the branching ratios are small.
ZEUS measured the production of D 0 [170] and D + [4] mesons using the weak decays D 0 → K − π + and D + → K − π + π + , respectively. The measurement of D 0 production was based on the 134 pb −1 of data from 2005 only, while the measurement of D + production used the full HERA-II data of 354 pb −1 . 12 Both measurements were performed in the phase-space region p T (D + , D 0 ) > 1.5 GeV, |η(D + , D 0 )| < 1.6, 5 < Q 2 < 1000 GeV 2 , 0.02 < y < 0. 7. Lifetime information was used to reduce combinatorial background substantially, applying a cut on the decay-length significance of the secondary vertex. This technique benefits from the MVD tracking and vertexing, which is not feasible using the HERA-I data. The measurement of D + production is one of the important results further described in detail in Section 5, where also an example of an event with a selected D + candidate can be seen in Fig. 5.4.
Distributions of the discriminating variables are shown in Fig. 4. 3. Contributions from charm and beauty production are separated from light flavours and from each other by using a global template fit to the Monte Carlo (MC) expectation. The measured muon differential cross sections as a function of p T (µ), η(µ), Q 2 and x are shown in Fig. 4.4 and compared to the NLO predictions, obtained in the FFNS, and RAPGAP MC, normalised according to the result of the global fit. The NLO FFNS predictions describe the data well. The RAPGAP MC gives a good description of the shape of all the differential cross sections. Since MC was normalised according to the result of the fit, this can be considered as a verification of the validity of the fit results.

Fully inclusive analyses based on lifetime information
Events with charmed particles are identified by reconstruction of displaced secondary vertices based on the lifetime information. The measurement results benefit from the larger phase-space coverage and largest statistics, since they are not limited by any particular branching ratio, although the signal-to-background ratio is usually worst.
H1 measured inclusive charm and beauty cross sections using variables reconstructed by the vertex detector, including the impact parameter of tracks to the primary vertex and the position of the secondary vertex [167]. The measurement was based on the 189 pb    [77]. The data are compared to NLO predictions obtained in the ZM-VFNS and FFNS (HVQDIS). In the lower part of the figures the normalised ratio, R norm , of theory to data is shown, defined in Eq. 3 of [77], which has reduced normalisation uncertainties. measurement was 5 < Q 2 < 2000 GeV 2 and 0.0002 < x < 0.05. Similar to the technique used for measurements with semileptonic decays, described in Section 4.3, this measurement was based on the discrimination of charm and beauty contributions, performed with a neural network, using long-lifetime discriminating variables. Fig. 4.5 shows the distributions of the discriminating variables, used as input for the neural network. The measured quantities were the charm and beauty reduced cross sections as a function of Q 2 and x in the full p T and η range. The measurement [167] was then combined with previous H1 measurements [178,179] based on HERA-I data.
ZEUS measured the production of charm and beauty with at least one jet using the invariant mass of the charged tracks associated with secondary vertices and the decay-length significance of these vertices [30]. The measurement was based on the full HERA-II data of 354 pb −1 . The kinematic phase space of the charm measurement was E jet T > 4.2 GeV, −1.6 < η jet < 2.2, 5 < Q 2 < 1000 GeV 2 and 0.02 < y < 0.7, where E jet T is the transverse energy of the jet. Contributions from charm and beauty production were separated from light flavours and from each other by using a global template fit to the MC expectation.  samples were normalised according to the scaling factors obtained from the fit. A good agreement between data and MC is observed. The first two mass bins corresponding to the region 1 < m vtx < 2 GeV are dominated by charm events. In the third mass bin, 2 < m vtx < 6 GeV, beauty events are dominant at high values of the decay-length significance. The measured differential cross sections for inclusive jet production in charm events as a function of E jet T , η jet , Q 2 and x are shown in Fig. 4.7 and compared to the NLO predictions obtained in the FFNS with different proton PDFs and to the predictions from the RAPGAP MC, scaled to the ratio of the measured visible cross section to the RAPGAP prediction. All measured cross sections are better described by the NLO FFNS, while RAPGAP provides a worse description of the shape of the charm cross sections than the NLO FFNS calculations. The data are typically 20-30% above the NLO predictions, but in agreement within uncertainties. The differences between the NLO predictions using different proton PDFs are mostly very small.

Concluding remarks
Different tagging techniques have been used to measure open charm production in DIS at HERA. The most precise results were obtained in measurements of D * + production using the "golden" decay channel. In all cases (except for the H1 vertex measurement [167]) the measured quantities were visible cross sections in a limited p T (E jet T ) and η phase-space region. The largest phase-space coverage was obtained in fully inclusive analyses based on lifetime information. Techniques based on the usage of semi-leptonic decays and fully inclusive analyses are often used for a simultaneous measurement of charm and beauty production, while in measurements using the full reconstruction of D mesons usually the sum of heavy-flavoured hadron production of charm and beauty processes are measured (dominated by charm). Techniques that rely on the usage of lifetime information require precise tracking and vertexing, thus can be fully exploited only with the data taken with the silicon detectors near the beampipe.
Results of precise charm measurements, which provide a double-differential cross section, are used to extract the inclusive cross section, i.e. the charm structure function F cc 2 or the reduced cross section σ cc red . The extraction is based on the extrapolation procedure, which used the shape of theoretical predictions, thus measurements with larger transverse-momentum and pseudorapidity phase-space coverage are preferable (more details on the extrapolation procedure are provided in Section 6.2.3 in the context of charm data combination).    All measured cross sections were compared to the theoretical predictions obtained in different schemes. The NLO FFNS predictions provide a good description of the data within uncertainties in all cases, while the NLO ZM-VFNS predictions do not describe the shape of some kinematic variables well. A direct comparison of measured visible cross sections to the GM-VFNS predictions is not possible, since the GM-VFNS calculations were done only for inclusive cross sections. Comparisons to MC predictions did not aim to check theory, since in these cases MC simulations were LO and parton showers, re-normalised to the data (more details on the technique of MC simulations are provided in Section 5.3). These comparisons mainly aimed at justifying the validity of the template fit procedure or the acceptance corrections, which exploited MC.
In general, measurements performed using different methods are complementary to each other and thus can be combined to achieve the best precision. This is presented in Section 6. 5. Such combination requires an extrapolation of the visible cross sections to the full phase space using the shape of some theor-etical calculations. Since the FFNS predictions are consistent with the data in all kinematic regions (including high Q 2 ), the NLO FFNS is considered to be the best and presently the only practically available theoretical calculation for this extrapolation.

Measurement of D + production
Among the charm-tagging techniques, described in Section 4, measurements of D + -meson production are based on the full reconstruction of final-state charmed hadrons and crucially depend on the precise tracking and vertexing near the beampipe. This Section describes the measurement of D + -meson production with the ZEUS detector, using the full HERA-II data set. Combinatorial background can be significantly suppressed by applying a cut on lifetime information. The earlier ZEUS measurement of D + production [170] was performed using 134 pb −1 of data from 2005. It has demonstrated the high potential of this tagging method, although was not really competitive in precision to other ZEUS charm measurements, e.g. [169]. The present measurement is based on about 2.5 times larger data sample and improved tracking alignment. The results were published by the ZEUS Collaboration [4]. The H1 measurement of D + -meson production [180] is based on the HERA-I data.
Section 5.1 explains general aspects of the event reconstruction in ZEUS, relevant for the present analysis; it is partially based on [181]. Section 5.2 describes the selection of DIS events. Section 5.3 introduces the technique of MC simulations and provides information on MC samples, used in the analysis. Section 5.4 describes the reconstruction and selection of D + candidates, while Section 5.5 explains the procedure of attributing these candidates to D + and background. Section 5.6 describes the cross-section determination procedure and corrections applied. Details of the theoretical calculations are given in Section 5. 7. Finally, results are reported and summarised in Section 5.8.

Event reconstruction
Each single ep collision is referred to as an event. Events selected by the Third Level Trigger (TLT) were written to tape as raw data in the form of signals from all sub-detectors (see Section 3.2.2 for the description of the ZEUS detector). These data were used offline to reconstruct general characteristics of events which correspond to signatures of physical objects (particles, jets etc. ). Subsequently, the reconstruction of tracks (Section 5.1.1), vertices (Section 5. 1.2), hadronic final-state system (Section 5. 1. 3) and DIS kinematic variables (Section 5. 1.4) is briefly described.

Tracking
A charged particle is identified through the trajectory it leaves in the detector. This trajectory is referred to as a track. It depends not only on the inhomogeneous magnetic field, but also on energy loss and multiple scattering in the material; thus the reconstruction of tracks is a complicated task. The approach adopted in ZEUS made use of the Kalman filter [182] and ensured a rigorous treatment of all factors which affect particle trajectories. For the most precise reconstruction of tracks the information from the CTD and MVD was used.
The Kalman filter algorithm [182] is an iterative procedure for the reconstruction of tracks from the measured hits. It reconstructs tracks from the outermost point of the tracking system to the origin. Unlike other global methods which fit all the measurements to a single set of track parameters, the Kalman filter causes the track to "follow the measurements" through the detector [183]. A detailed description of the procedure can be found in [182] and an extended review of its properties and advantages can be found in [183].
Tracks were reconstructed in two stages: -pattern recognition. The first stage was performed in multiple steps by the VCTRACK package. It started from the outermost tracking detector layer, which was the 9th CTD superlayer for the central region, where the track density was lower than close to the interaction point. Combinations of three CTD hits from axial CTD superlayers formed the tracking seeds. A track seed was extrapolated inward, gathering additional hits with increasing precision as the trajectory parameters were updated. A very broad "fictitious" hit was added at the beam line to guide the trajectory. After a "road" of hits from the CTD through the MVD to the interaction point has been created, a least-squares fit of the track was performed using the selected hits on the road in order to determine the helix parameters at the beginning of the helix. In general the tracking reconstruction was not restricted to tracks with hits in all tracking devices; the socalled CTD-only and MVD-only tracks have hits in only one sub-detector; -trajectory refinement. A track fit was performed with the Kalman filter to improve the precision of the helix parameters in the vicinity of the interaction point. As input it took the fit output from the pattern recognition stage. The track fit was applied recursively in three steps: prediction, filtering and smoothing. At the prediction step, the present state i hits (i.e. hits that have already been used for the trajectory estimation) was used to predict the position of the next (i + 1) th hit on the next detector sensor (which could be a CTD wire or an MVD sensor). At the following filtering step the predicted and the measured values for the (i + 1) th hit positions were combined. At the last step a smoothing of the whole trajectory was performed and the covariance matrix was updated.

Vertexing
A vertex is the position where an interaction or decay happened. The evaluation of vertices serves two purposes [184]. The first is to evaluate the position of the primary ep interaction point and to calculate the appropriate track momenta at that point with improved precision due to the vertex constraint. The second purpose of using vertices is to estimate the probability that the tracks originate from a certain vertex. This probability might be estimated from the vertex fit quality (e.g. the χ 2 of the vertex fit) and used for the event selection. The essential information that is used in the fit consists of track parameters and their covariance matrices.
Proper identification of both the primary point of interaction and the D + decay vertex in an event was of particular importance for this analysis. Their position was reconstructed first with the VCTRACK package and further refinement was applied later [181].
The vertex pattern recognition started with a loose constraint that the primary vertex must be found along the proton-beam line. Track pairs, that were compatible with this soft constraint as well as with a common vertex, were combined with other track pairs. The final choice of the primary-vertex position after the pattern recognition stage was the vertex with the best overall χ 2 . To improve the precision of the vertex-position measurement, the Deterministic Annealing Filter (DAF) [185] was used. The main feature of the DAF algorithm is that tracks with the best quality get the largest weight in the fit, while tracks that are far from the vertex get the smallest weight in the fit [184]. In the chosen approach the vertex position was measured iteratively by calculating a weighted sum of the χ 2 contributions from individual tracks to the vertex [181].
For the primary-vertex fit, an important further improvement in precision was possible by the introduction of a constraint on the vertex position to be close to the averaged interaction point, the beamspot. The beamspot was defined as the overlap region of the colliding beams. It had a width of roughly 80 × 20 µm in the XY plane, but it was too large in the Z direction to use this information as a constraint [181].
In the case of secondary vertices, e.g. the D + -decay vertex, the fit was made with the same algorithm skipping the step of the pattern recognition, since the combination of tracks was chosen based on its compatibility with the D + mass. For each secondary vertex, the corresponding reduced primary vertex was recalculated removing the secondary-vertex tracks and repeating the standard primary-vertex fit [181].

Hadronic final states
To get the most precise hadron energy measurement, information from the calorimeter and the tracking detectors was combined into the so-called ZEUS Unidentified Flying Objects (ZUFOs) [186] 13 Ideally each ZUFO was supposed to represent one final-state particle. The energy resolution of the CAL developed for higher particle energies as σ(E)/E ∼ 1/E, while the tracking momentum resolution, parametrised by σ(p T )/p T = ap T ⊕ b ⊕ c/p T , gave a better energy estimate for lower particle momenta (see Section 3.2.2). For neutral particles, only CAL information could be used, whereas for charged particles the tracking information was mainly used below 10 GeV while calorimeter energy was used for higher energies.
ZUFOs were constructed in the following steps: -CAL cells were clustered into two-dimensional cell islands; the cell islands from the previous stage were used as input to clustering in (θ, φ) space to form three-dimensional energy clusters called cone islands; tracks, that have been fitted to a vertex and passed certain requirements, were extrapolated to the surface of the CAL taking into account the magnetic field; as a result of this procedure, groups of cone islands and tracks -ZUFOswere formed; the combination of the information from the CAL and the tracking system was carried out in the following way: • if one track has been matched to one cone island, the ZUFO energy was taken either from the CAL cluster or from the matched track momentum, depending on which measurement had better resolution; • for tracks that have not been associated to islands, the energy was derived from the momentum measurement with the assumption that the particle was a charged pion; • cone islands that have not been matched to any track were treated as neutral particles and the CAL energy was used; • cone islands with more than three associated tracks were treated as jets and the energy was taken from the CAL; • if a track has been matched to multiple islands or two tracks have been matched to one or two islands, the algorithm was similar to the one-to-one matching, but using the sum of energies or momenta instead.
Additional corrections were applied to account for the material of the detector, the inefficiency in the regions of cracks between the CAL sections, the presence of muons 14 and the imbalance in the compensation effect for low momentum (∼ 1 GeV) hadrons. In this analysis the reconstructed ZUFOs have been used to determine the kinematics of the hadronic system as well as DIS kinematic variables (see Section 5.1.4).

Scattered-electron identification and reconstruction of kinematic variables
The identification of the scattered electron is essential for the NC DIS event selection. The scattered electron leaves a signature in the detector which differentiates the NC DIS events from the CC DIS, where the neutrino escapes undetected, and photoproduction (PHP), where the scattered electron escapes through the beam hole. There have been two main electron finders developed in ZEUS: the neural-network-based SINISTRA95 (also referred to just as SINISTRA) [187] and the probabilistic EM [188]. The former was tuned for the kinematic region of the D + measurement, whereas the latter was better for the high-Q 2 region, where the electron was reconstructed in the BCAL.
A scattered electron passing through the CAL created an electromagnetic shower, therefore most of its energy was measured in the EMC with a small leakage in the HAC. SINISTRA started from the search of the cells with maximum energy deposits to form candidate clusters. These clusters were formed using the next-to-nearest neighbour algorithm on CAL towers to produce islands and then merging the islands from different CAL sections. This information was passed to the neural network, which had been trained using MC simulated hadronic and electromagnetic clusters in the RCAL. As an output SIN-ISTRA returned a number between 0 and 1, which represents the probability of the cluster to be the scattered electron. In the following only the candidate with the highest probability was considered. The identified electron was assigned the energy of the reconstructed CAL cluster.
After the reconstruction of the scattered electron and the hadronic system in an event, the kinematic variables Q 2 , x and y, introduced in Section 2. 4.1, can be calculated. There were several methods: -the electron method used only the electron energy and scattering angle. The kinematic variables were calculated as follows: x el = Q 2 el sy el , where E e is the incoming electron energy (which is known a priori), E e and θ e are the scattered-electron energy and angle, respectively, and s is the centre-of-mass energy squared. This method relies strongly on the measurement of the electron energy and position. Because of the characteristics of the ZEUS detector it is more precise in the rear region, therefore it is optimal at low Q 2 . In addition this method is strongly affected by initial-and final-state photon radiation, which spoils the measurement of the lepton energy and leads to deterioration of the results; -the Jacquet-Blondel method (JB) relied exclusively on the reconstruction of the hadronic final state [189]. The kinematic variables were calculated as follows: where P T had and δ had are given by where (E i had , P i x had , P i y had , P i z had ) is the four-momentum of each hadron final state and the sum goes over all hadronic energy, excluding the scattered electron, if any. The advantage of this method is that it does not require the scattered electron to be detected and thus can be used in PHP or CC events, although it has poor Q 2 resolution in DIS events; -The double-angle method (DA) combined information from the scattered electron and the hadronic system [190,191]. The kinematic variables were calculated as follows: cot(θ e /2) tan(θ e /2) + tan(θ had /2) , y DA = tan(θ had /2) tan(θ e /2) + tan(θ had /2) , 4) where θ had is the hadronic angle, defined as This method exploits the fact that the angular resolution for the hadronic system is usually better than the angular resolution for the scattered electron, while for energy it is vice versa. Thus the DA method leads to a more precise measurement of the kinematic variables in a large part of the phase space and was chosen as the main one for the present analysis [181].

DIS event selection
The analysis used the full HERA-II data with an integrated luminosity 354 pb −1 . Both electron-proton and positron-proton events were used, because the charm NC DIS cross sections at not too high Q 2 are invariant with respect to the lepton charge.
The DIS kinematic region of the measurement was restricted to 5 < Q 2 < 1000 GeV 2 and 0.02 < y < 0.7, where reliable reconstruction of the scattered electron was possible with the ZEUS detector after the HERA-II high-luminosity upgrade [131].
The selected events had to be triggered online by one of the inclusive DIS Third Level Trigger slots (see [181] for the description of these slots): -SPP02 for the 2004-2005 data period, or -SPP09, or HFL17, or HPP31 for the 2006-2007 data period.
Furthermore to ensure selection of good DIS events, the following cuts were applied offline: DA < 1000 GeV 2 , 0.02 < y DA < 0. 7. These criteria selected the considered DIS phase-space region; -E e > 10 GeV. The requirement ensured high efficiency of SINISTRA and rejected possible background PHP events with "fake" scattered electrons; -E cone non e < 5 GeV, where E cone non e is the energy deposit in the CAL in the cone centered around the scattered electron with a radius of 0.8 in the (η, φ) plane, not originating from it. This cut is known as the electron isolation and was supposed to improve further the quality of the scattered-electron reconstruction; prob SINISTRA > 0.9, where prob SINISTRA is the output of the SINISTRA neural network. 15 This selection further ensured high efficiency in SINISTRA; -y JB > 0.02. This requirement rejected events with the poorly reconstructed hadronic system, for which the DA method was not precise; -40 < δ had < 65 GeV. The lower cut reduced the PHP contamination (when the scattered electron was not detected) and the upper cut rejected events initiated by cosmic-ray particles; 16 -−30 < Z vtx < 30 cm, where Z vtx is the Z coordinate of the primary vertex. This requirement rejected events initiated by beam-gas and satellite-bunch interactions; a set of cuts on the geometric position of the scattered electron in the CAL (x e , y e , z e ), to remove events, in which the scattered electron passed through the regions of the CAL poorly simulated in Monte Carlo; note that these cuts are quoted as exclusion cuts, i.e. the events were removed if they satisfied any of the criteria: • |x e | < 13 cm and |y e | < 13 cm. This requirement is known as the box cut and removed the edges of the CAL; • x 2 e + y 2 e > 175 cm. This cut rejected the region between the RCAL and BCAL; • −104 < z e < −98.5 cm or 164 < z e < 174 cm. This requirement is known as the super-crack cut and removed the regions of cracks between the RCAL, BCAL and FCAL; • 6.5 < x e < 12 cm and y e > 0, or −14 < x e < −8.5 cm and y e < 0. This requirement is known as the modulegap cut and removed the region of gaps between halves of the RCAL; • |x e | < 12 cm and y e > 80 cm. This requirement is known as the chimney cut and removed the region of the RCAL where cooling tubes and supply cables for the solenoid were mounted; • in addition, for a subset of the data with the run ranges 59600-60780, 61350-61580, 61800-63000 the region 11 < x e < 27 cm and 10.5 < y e < 27 cm was removed, which was not described by the Monte Carlo simulations.

Monte Carlo simulations
The complexity of the HERA experiments makes it necessary to apply Monte Carlo (MC) methodes for their evaluation. The two tasks are: the descrption of all relevant physics processes with their complete final state using available MC generators, and the simulation of the detector response, i.e. the account of the effects as the final state particles pass through the various detector components.

Simulation of physics processes
In the generation of MC events the QCD factorisation theorem [107,108,9,109,110,111] is exploited to separate shortand long-distance effects. Fig. 5.1 illustrates the case for different phases in the boson-gluon fusion process (see also Fig. 2.4 for the corresponding diagram): -the hard scattering process is usually calculated at LO; -radiation corrections (referred to also as parton showers) are modelled using some phenomenological models. The difference between the fixed-order NLO calculation and LO accompanied by parton showers is that the latter is better reproducing the whole final state (the event shape), which is important for the correct simulation of the detector response, while the former gives a better description of inclusive quantities; 17 -hadronisation is the non-perturbative QCD process of the formation of colourless hadrons from coloured partons. It is performed by using some phenomenological models; -decays of unstable particles are simulated accordingly to available decay tables. 18 Examples of event generators commonly used in ZEUS are PYTHIA [192], ARIADNE [193], RAPGAP [194] etc.

Simulation of detector response
After the simulation of underlying physics processes, final-state particles are passed through a simulated detector. Simulation of the ZEUS detector was performed in the MOZART program, which is based on GEANT 3.21 [196]. Furthermore, generated events were passed through the simulated ZEUS trigger system and the reconstruction program ZEPHYR. More details on the ZEUS MC production system can be found in [181]. Finally, MC events were written to tape as regular data and processed by the same reconstruction and selection algorithms, although they contain additional information on generated particles, referred to as generated, or true information. However, the procedure of matching between generated particles and reconstructed ones has some complications (see Section 5. 6.3).

MC samples
In the present analysis the following MC samples have been used: the RAPGAP charm DIS MC sample, ep → e ccX, was the main sample used to determine acceptance corrections. MC events were simulated with the RAPGAP 3.00 [194] program, interfaced with HERACLES 4. 6.1 [197] to incorporate first-order electroweak corrections. The CTEQ5L [198] PDFs were used for the proton; the RAPGAP beauty DIS MC sample, ep → e bbX, similar to the previous one, was used to estimate the contribution to D + production from decays of beauty hadrons; the RAPGAP charm DIS MC sample without QED radiation, ep → e ccX, was used to correct the measured cross sections to the QED Born level; the ARIADNE inclusive MC sample, ep → e X, was used for simulation of combinatorial background and optimisation of selection cuts; the PYTHIA PHP charm MC sample, ep → e ccX, was used to estimate the contribution from PHP events.

Reconstruction and selection of D + candidates
The D + mesons were reconstructed in the decay channel D + → K − π + π + . The full final-state particle reconstruction consists in making combinations of all tracks with proper charges, if possible followed by the reconstruction of the secondary vertex (the place where the decay happened) and re-fitting of the considered tracks to this vertex, thus improving their reconstruction. The invariant mass, M(Kππ), is calculated using the energy and momentum conservation rules, and the masses of the daughter particles. If the reconstructed invariant mass is found to be close to the mass of the hadron under consideration, the combination is considered as a candidate. The tracks from the selected combinations are referred to as daughter tracks.
Inherently such a method leads to a large combinatorial background, which is due to combinations of tracks not originating from the analysed hadron decay channel or from wrongly combined daughter tracks. In order to suppress this background, additional cuts on the parameters of the daughter tracks and the quality of the secondary-vertex reconstruction have been applied.
The measurement was performed in the D + phase-space region 1.5 < p T (D + ) < 15 GeV, |η(D + )| < 1. 6. At lower values of p T (D + ) the combinatorial background increases drastically, making the measurement impossible, while at higher values of p T (D + ) the production cross section becomes too small to be measured with the available integrated luminosity. The η(D + ) range is determined by the coverage of the tracking system, since all daughter tracks have to be detected and well reconstructed.

Selection of secondary vertices
The large lifetime of D + mesons, cτ(D + ) = 311.8±2.1 µm [199], makes it possible to reconstruct their secondary vertices with the Microvertex Detector (MVD). Important characteristics of the reconstructed secondary vertices (Fig. 5

.2) include:
χ 2 of the secondary-vertex fit, χ 2 sec vtx ; the decay length, defined as the distance between the primary and secondary vertices; the uncertainty on the decay length; the collinearity of the directions from the primary to the secondary vertex and the D + momentum. The most efficient way of using the lifetime information is to combine the last three quantities into the projected decaylength significance (for simplicity referred to as just the decaylength significance), S l , defined as the ratio of the decay length, projected on the XY plane and on the D + momentum, to the uncertainty on this quantity: where l XY is the projected decay length, defined as and σ l XY is the uncertainty on l XY . Here P XY and S XY are the vectors pointing to the primary and secondary vertices, respectively, and the · sign denotes a scalar product. The projection on the XY plane was used because the resolution of the vertex position was most precise in the transverse plane. The optimal cuts on S l and χ 2 sec vtx were determined by maximising the statistical significance of the mass peak, S P , defined as the ratio of the signal 19 to its statistical uncertainty, assuming a Poisson distribution: 8) where S is the number of candidates in the signal peak and Bg is the number of candidates in the background, where the region of the signal peak is defined within three standard deviations. The study was performed on the inclusive ARIADNE MC sample.
The dependence of S P on a lower cut on S l and an upper cut on χ 2 sec vtx is shown in Fig. 5. 3. The optimal cuts are: -S l > 4, -χ 2 sec vtx < 10.

Selection of D + candidates
To ensure the selection of well reconstructed D + candidates and to improve the signal-to-background ratio, the following cuts were applied: sec vtx < 10, to reduce the combinatorial background, as explained in Section 5.4.1; l XY < 1.5 cm, to ensure that selected secondary vertices were inside the beampipe, thus did not originate from interactions with the beampipe or detector material; p T (K) > 0.5 GeV, p T (π) > 0. 35 GeV, to further reduce combinatorial background while still keeping the detector acceptance at a reasonable level at low p T (D + ); -|η(K, π)| < 1.75, to ensure the selection of well reconstructed daughter tracks; each track should have at least two MVD hits in both the Z and φ directions and pass through at least three CTD superlayers, to improve further the quality of the daughter tracks; the mass difference ∆M = M(Kππ) − M(Kπ) should not be within 0.143 < ∆M < 0.148 GeV, which is the difference between the D * + and D 0 masses, to reduce background from D * + mesons decaying in the "golden" channel D * + → D 0 π + s , D 0 → K − π + (see Section 4. 1), which result in identical final states; the invariant mass of a combination of the kaon and any of two pion daughter tracks assuming that they are kaons, M(KK), should not be within 1.0115 < M(KK) < 1.0275 GeV. This cut reduced background from D + s mesons decaying in the channel D + S → φπ + with subsequent φ → K − K + , which result in similar final states with an asymmetric mass peak (a so-called reflection).
An example of an event with a selected D + candidate, displayed in the ZEUS Event Display program, is shown in Fig. 5.4.

Extraction of D + signal
selected without the cuts on the decay-length significance and χ 2 of the secondary vertex is also shown. The cuts applied on S l and χ 2 sec vtx improved the statistical significance by a factor of 3 (a similar conclusion can also be drawn from Fig. 5. 3).
To extract the number of reconstructed D + mesons, the mass distribution was fitted to a function where the signal component, F signal (M), is given by a modified Gaussian function: 10) and the background component, F background (M), is given by a second-order polynomial. The signal position, M 0 , the peak width, σ M , as well as the signal normalisation parameter, C, and parameters of the background component were free parameters in the fit. The parameter β of the modified Gaussian function controls the deviation of its tails from the normal distribution (β = 0); the central value β = 0.5 was chosen to get the best description of the peak, while it has been varied in order to estimate the systematic uncertainty (see Section 5.6.6). The fit was performed using the least-squares method as implemented in the MINUIT package [200]. As the expectation values in the χ 2 -function, the integrals of the fit function within each bin of M were used. To account for possible non-linearities, the fit uncertainty was calculated as the average of the positive and negative fit uncertainty, obtained with the MINOS algorithm [201].
The number of D + mesons yielded by the fit is N(D + ) = 8356 ± 198. The fitted position of the peak is M 0 = 1868.97 ± 0.26 MeV, where only the statistical uncertainty is quoted, and is consistent with the PDG value of 1869.62 ± 0.15 MeV [199]. The peak width is σ = 12.2±0. 3 MeV, driven by the momentum resolution of the detector.  5. Mass distribution of the reconstructed D + candidates after final selection (left), and without the cuts on the decay-length significance and χ 2 of the secondary vertex (right). The solid curve represents a fit to the sum of a modified Gaussian for the signal and a second-order polynomial for the background.

Cross-section determination
The differential cross section as a function of a given observable Y in the i th bin was determined as where ∆Y i is the width of the i th bin, N DATA is the number of the reconstructed D + mesons in the data and N reco MC b is the number of D + mesons from beauty-hadron decays, as predicted by RAPGAP. The latter was additionally scaled by 1.6 according to previous ZEUS measurements [202,203,204] of beauty production in DIS. The determination of the cross section accounts for the branching ratio, B(D + → K − π + π + ) = 9.13 ± 0.19% [199], the acceptance correction, A, the radiative correction, C rad , and the contribution from beauty-hadron decays. The radiative corrections were applied to correct the measured cross sections corresponding to the QED Born level; more details on their calculations can be found in [181]. The determination of the acceptance corrections is described in Section 5.6.1.

Acceptance correction
MC simulations were used to determine efficiency, E, purity, P, and acceptance A. For the i th bin these quantities are defined as 12) where N gen i and N rec i are the numbers of the signal events, generated and reconstructed in the i th bin, respectively. The notation N gen i N rec i in the numerators means that events must be generated and reconstructed in the same bin. Therefore, the efficiency is the portion of events generated in a given bin, that were also reconstructed in the same bin; it determines the dependence of the measurement on the MC simulations. The purity is the portion of events reconstructed in a given bin, that were also generated in the same bin; it determines the level of migrations of events to different bins. The purity plots for p T (D + ), η(D + ), Q 2 and y are provided in Appendix A; the purity values are typically above 80%. Finally, the acceptance determines the correction from detector to generator level required to calculate the cross section.

Comparison of data and MC
To get the correct acceptance, the MC simulations must describe the shapes of all kinematic variables in the data. The acceptance determined from the MC and integrated over some variable, x, is given by: 13) where the integration is performed over the full range of the variable x, A(x) is the acceptance at a fixed value of x and dσ dx is the differential cross section as a function of x, used in the MC simulations. The correct detector simulation guarantees the correct value of A(x), although dσ dx is the generator-level cross section, thus even for correct A(x) at all x, incorrect dσ dx will lead to an incorrect total acceptance A.
Therefore the differential distributions of kinematic variables from the MC simulations and from the data were compared to each other; these comparison plots are referred to as control plots. Since the MC simulations usually describe the shapes of kinematic distributions, but not the normalisation, and moreover acceptance does not depend on the MC normalisation, the MC distributions are re-normalised to the data. To estimate the goodness of the description, for each control plot the χ 2 /n dof were calculated as follows: 14) where the sum goes over all bins, N DATA i and N MC i are the number of signal events in the i th bin in the data and MC, respectively, σ DATA i and σ MC i are the corresponding statistical uncertainties on N DATA i and N MC i , respectively, and n dof is the number of bins minus one 20 . Fig. 5.6 shows the control plots for p T (D + ), η(D + ), Q 2 and y. The data are compared to the sum of charm and beauty MC; the beauty contribution was scaled by 1.6 [202, 203, 204], while the charm contribution was re-normalised to the difference between the data and re-scaled MC beauty. 21 The beauty contribution is shown separately; typically it is below 5%. More control plots can be found in Appendix A (Fig. A.3). The MC does not describe well the shapes of p T (D + ), η(D + ) and Q 2 , thus the generator-level MC cross sections had to be reweighted. 22 The reweighting procedure is described later in this Section.  ). The data are shown as points, with bars representing the statistical uncertainty. The sum of charm and beauty MC is shown as the light shaded area; the beauty contribution is shown separately as the dark shaded area.
Fig. 5.7 shows control plots for S l and χ 2 sec vtx , obtained before applying the cuts on these quantities. MC simulations describe these distributions well. This fact is of crucial importance, because the detector acceptance strongly depends on the cuts applied on S l and χ 2 sec vtx , so that an incorrect simulation of their shape would lead to large systematic uncertainties; this was the dominant systematic uncertainty in the previous analysis [170], performed with an inferior tracking alignment and calibration. 20 Because of re-normalisation of MC to the data. 21 This procedure corresponds to the measurement of charm production, when the beauty contribution is assumed to be known a priori. 22 For a single correction one should say rather "weighting", but the term "reweighting" is much more convenient and will be used in this work. Moreover, this is not a single correction applied to MC in the analysis.  ). The data are shown as points, with bars representing the statistical uncertainty. The sum of charm and beauty MC is shown as the light shaded area; the beauty contribution is shown separately as the dark shaded area.

Data to MC matching
A general rule of the MC reweighting approach is that the kinematic weights must be applied at the generator level only. This is straightforward for reweighting in inclusive event quantities, e.g. Q 2 , although it becomes complicated if the shapes of D + kinematic variables should be corrected (namely p T (D + ) and η(D + )), because: in each MC event there may be more than one generated D + mesons; the efficiency of the D + reconstruction in the present analysis is not very high (E = 1.5-15% depending on p T (D + ); see Fig. A.2 in Appendix A), thus for a large fraction of events, the reweighting of all generated D + mesons will result in a reweighting of the combinatorial background, which does not make sense and potentially may introduce an additional systematic uncertainty. These complications arise from the fact that according to the general rule weights must be applied for events, while the control plots allow their determination for candidates for D + only. If applied for events, weights are unique for both generator and reconstructed level, while if applied for candidates, the uniqueness is lost.
Therefore in the present analysis a procedure of matching between true and reconstructed D + candidates was developed. It contained two steps: 1. for each daughter track the corresponding generator-level particle was matched, if the following criteria (motivated by the resolution of the tracking system) were fulfilled: where p gen T and p rec T are the transverse momenta of the generated and reconstructed particles, respectively, and -∆R = (φ gen − φ rec ) 2 + (η gen − η rec ) 2 < 0.035, where φ gen and φ rec are the azimuthal angles of the generated and reconstructed particles, respectively, and η gen and η rec are the pseudorapidities of the generated and reconstructed particles, respectively; 2. if all daughter tracks were successfully matched to generatorlevel particles and if the generator-level particles originated from a D + meson in the considered decay channel 23 , the re-constructed D + candidate was considered to be successfully matched to the generator-level one.
The efficiency of this matching procedure was found to be very close to 100% [205]. 24 Note that the matching procedure is needed also to determine purity and efficiency (the numerators in 5. 12), although it is not needed for the accpetance determination.

MC reweighting
Since transverse momentum p T (D + ) and virtuality Q 2 are significantly correlated, reweighting in these two variables was performed simultaneously, while the cross section in pseudorapidity η(D + ) was reweighted independently. In both cases step functions determined from the control plots as the ratios of the number of signal events in the data to the number of signal events in the charm MC were used as reweighting functions.
The beauty MC contribution was subtracted from the data and reweighting was applied only to the charm MC. The reweighting functions are shown in Fig. 5. 8. For the reweighting in p T (D + ) and η(D + ) only the matched D + candidates, as explained in Section 5. 6.3, were reweighted at the reconstruction level, because the reweighting of non-reconstructed D + effectively would result in a meaningless reweighting of combinatorial background, thus producing additional statistical fluctuations. For the acceptance calculation according to Eq. 5.12, all D + were reweighted at the generator level (for the N gen i calculation). 25 The control plots for p T (D + ), η(D + ) and Q 2 after reweighting are shown in Fig. 5. 9. The reweighted MC simulations describe the data well and were used to determine acceptance corrections. The acceptance as a function of p T (D + ), η(D + ), Q 2 and y is shown in Fig. 5. 10. It is not high, mainly because of the strong cut applied on the decay-length significance in order to reduce combinatorial background, varying from 1.5% at low p T (D + ) to 15% at high p T (D + ). The same plots for purity and efficiency are provided in Appendix A (

Additional corrections
Additional corrections were applied in the MC simulations: -trigger-inefficiency correction.

Most of the First Level
Trigger bits (see Section 3.2.2) used in this analysis had some requirements on the track multiplicity in the events. The efficiency of these criteria was measured [181] using a trigger without track requirements and the detector simulation was tuned to match the data. The trigger-inefficiency 24 The efficiency of the matching procedure is defined as the ratio of the number of matched particles to the number of candidates in the fitted signal. This quantity is not to be confused with the efficiency defined in Eq. 5. 12. 25 Note that this procedure does not guarantee that the same weights have been applied at both levels (generator and reconstruction), and therefore cannot a priori guarantee consistency for the determined acceptance. In order to check it, the p T (D + )-Q 2 reweighting was performed by applying the same weight, derived from the "best" D + (with highest p T (D + )), on both levels. The difference between the two procedures was found to be less than 0.5%.
corrections for the MC simulations were between 1-10% for different tracking requirements. The corrections changed the overall efficiency of the triggers used in the analysis by a negligible amount for medium-Q 2 values and up to ∼ 2% for the low-and high-Q 2 regions. More details on this study can be found in [181]; -tracking-inefficiency correction. A special study [206] was performed to assess the tracking inefficiency for charged pions due to hadronic interactions in the detector material and how well the MC simulations reproduce these interactions. The MC simulations were found to underestimate the interaction rate by about 40% for p T < 1.5 GeV and to agree with the data for p T > 1.5 GeV; more details can be found in [206] and references therein. A corresponding correction was applied to the MC simulations. The effect of the correction on the D + -production cross section was found to be about 3%. The effect of the correction on the D + differential cross sections is provided in Appendix A ( Fig. A.4); -decay-length smearing. The S l distribution was found to be asymmetric [181] with respect to zero, with charmed mesons dominating in the positive tail. Detector resolution effects cause the negative tail, which is dominated by lightflavour events. A smearing was applied to the decay length of a small fraction of the MC events in order to reproduce the negative decay-length data. The parameters of the smearing had to be tuned to describe the data. The effect of the smearing is typically below 3%. More details on this correction can be found in [181].

Systematic uncertainties
The systematic uncertainties were determined by changing the analysis procedure or varying parameter values within their estimated uncertainties and repeating the extraction of the signals and the cross-section calculations. The following sources of systematic uncertainties were considered with the impact on the cross sections given in parentheses: the cut on the positions |x e | and |y e | of the scattered electron in the RCAL was varied by ±1 cm in both the data and the MC simulations, to account for potential imperfections of the detector simulation near the inner edge of the CAL (±1%); the reconstructed electron energy was varied by ±2% in the MC only, to account for the uncertainty in the electromagnetic energy scale (< 1%); the energy of the hadronic system was varied by ±3% in the MC only, to account for the uncertainty in the hadronic energy scale (< 1%); the FLT tracking-efficiency corrections for the MC were varied within their estimated uncertainties (< 1%) [181]; uncertainties due to the signal-extraction procedure were estimated by repeating the fit in both the data and the MC using: an exponential function for the background parametrisation (< 1%); a signal parametrisation changed by simultaneously varying the β parameter of the modified Gaussian function(see Eq. 5. 10) in the data and MC by +0 nominal value 0. 5. The range was chosen to cover the values which give the best description of the mass peaks in the data and MC simulations in bins of the differential cross sections ( +0.7% −1.5% ); uncertainties due to the decay-length smearing procedure were estimated by varying its parameters by ±50% (±1%) [181]. As a further cross check, the cut on the decaylength significance was varied between 3 and 5. The resulting variations of the cross sections were compatible with the variation of the decay-length smearing and were therefore omitted to avoid double counting; the scaling factor for the MC beauty-production cross sections was varied by ±0.6 from the nominal value 1. 6. This was done to account for the range of the RAPGAP beautyprediction normalisation factors extracted in various analyses [202,203,204] (±2%); the uncertainties due to the model dependence of the acceptance corrections were estimated by varying the shapes of the kinematic distributions in the charm MC sample in a range of good description of the data: the shape of the η(D + ) reweighting function (±2%); the shape of the p T (D + )-Q 2 reweighting function (±4%); the uncertainty of the pion track inefficiency due to nuclear interactions was evaluated by varying the correction applied to the MC by its estimated uncertainty of ±50% of its nominal size (±1.5%); the contribution from the PHP processes was estimated using the PYTHIA MC sample and found to be < 0.5%, therefore it was neglected; overall normalisation uncertainties: the simulation of the MVD hit efficiency (±0.9%) [181]; the effect of the description of χ 2 sec vtx < 10 was checked by multiplying χ 2 sec vtx for D + candidates in the MC simulations by a factor 1.1 to match the distribution in the data (+2%) [181]; the branching-ratio uncertainty (±2.1%); the measurement of the luminosity (±1.9%). The size of each systematic effect was estimated bin-bybin except for the overall normalisation uncertainties. The overall systematic uncertainty was determined by adding the above uncertainties in quadrature. The normalisation uncertainties due to the luminosity measurement and that of the branching ratio were not included in the systematic uncertainties on the differential cross sections.

Theoretical calculations
NLO QCD predictions were obtained in the FFNS with the HVQDIS program [63] (see Section 2. 4. 3).The renormalisation and factorisation scales were set to µ r = µ f = Q 2 + 4m 2 c and the c-quark pole mass to m c = 1.5 GeV. The FFNS variant of the ZEUS-S NLO QCD PDF fit [207] to inclusive DIS data was used as the parametrisation of the proton PDFs. The same charm mass and choice of scales were used in the fit as in the HVQDIS calculation. The strong coupling constant was set to α 116. To calculate D + observables, events at the parton level were interfaced with a fragmentation model based on the Kartvelishvili function [118]. The fragmentation was performed in the γ * p centre-of-mass frame. The Kartvelishvili parameter, α K , was parametrised [181] as a smooth function of the invariant mass of the cc system, M cc , to fit the measurements of the D * + fragmentation function by ZEUS [208] and H1 [122]: , with m c and M cc in GeV. In addition, the mean value of the fragmentation function was scaled down to 0.95 since kinematic considerations [209] and direct measurements [210] show that, on average, the momentum of D + mesons is 5% lower than that of D * + mesons; this is due to some of the D + mesons originating from D * + decays. For the fragmentation fraction, f (c → D + ), the value 0.2297 ± 0.0078 was used [211].
The uncertainties on the theoretical predictions were estimated as follows: the renormalisation and factorisation scales were independently varied up and down by a factor of 2; the c-quark mass was consistently changed in the PDF fits and in the HVQDIS calculations by ±0.15 GeV; the proton PDFs were varied within the total uncertainties of the ZEUS-S PDF fit; the fragmentation function was varied by changing the functional dependence of the parametrisation function α(M cc ) within uncertainties [181]; the fragmentation fraction was varied within its uncertainties.
The total theoretical uncertainty was obtained by summing in quadrature the effects of the individual variations. The dominant contributions originate from the variations of the c-quark mass and the scales. In previous studies [60] the uncertainty due to the variation of α n f =3 s (M Z ) was found to be insignificant and neglected here.

Results
The production of D + mesons in the process ep → e ccX → e D + X (i.e. not including D + mesons from beauty decays) was measured in the kinematic range: (5. 15) The differential cross sections are defined according to Eq. 5.11. The measured cross sections in bins of p T (D + ), η(D + ), Q 2 and y are listed in Table 5.1 and shown in Fig. 5. 11. The cross section falls by about three orders of magnitude over the measured Q 2 range and one order of magnitude in y; it also falls with the transverse momentum p T (D + ), but is only mildly dependent on the pseudorapidity η(D + ). The measured cross sections are compared to the results of the previous ZEUS D + measurement [170] 26 , based on a subset of the HERA-II data. The present measurement has significantly smaller uncertainties and supersedes the previous results. The NLO QCD predictions, calculated in the FFNS, provide a good description of the data. The experimental uncertainties are smaller than the theoretical uncertainties, apart from the high-Q 2 region, where statistics is limited.
The measured cross sections as a function of y in five Q 2 bins are listed in Table 5.2 and shown in Fig. 5. 12. The data are well reproduced by the HVQDIS calculation. The effects of individual sources of systematic uncertainties (described in Section 5. 6.6) on the cross sections in bins of Q 2 and y can be found in [4]. The measured double-differential cross section as a function of Q 2 and y has been used to extract the charm contribution F cc 2 to the proton structure function; the results can be found in [4].
In summary, the present results supersede the previous ZEUS D + measurement, based on a subset of the data, and exhibit significantly better precision. The improvement in precision comes from the larger data sample, used in the present analysis and from a better control of experimental systematic uncertainties, owing to improved tracking alignment and calibration. Predictions from NLO QCD in the FFNS describe the measured cross sections well. The results presented here are of similar or higher precision than measurements of charm production, previously published by ZEUS 27 . The new precise data provide an improved check of pQCD and provide further constrains on the PDFs in the proton. Section 6.5 uses these data for the HERA charm combination. 26 The contribution of D + mesons from beauty decays was subtracted using the scaled RAPGAP MC predictions [202,203,204]. 27 At the moment when the results were being published (February 2013); later on the most precise ZEUS charm measurement became the measurement of D * + production using the full HERA-II data set [172].  Table 5.1: Differential cross sections for D + production in bins of p T (D + ), η(D + ), Q 2 and y. The cross sections are given in the kinematic region 5. 15. The statistical and systematic uncertainties, ∆ stat and ∆ syst , are presented separately. Normalisation uncertainties of 1.9% and 2.1% due to the luminosity and the branching-ratio measurements, respectively, were not included in ∆ syst . The correction factors to the QED Born level, C rad , are also listed. For reference, the beauty cross sections predicted by RAPGAP and scaled as described in the text, σ b , are also shown.
[nb] −  Table 5.2: Differential cross sections for D + production as a function of y in five regions of Q 2 . The cross sections are given in the kinematic region 5. 15. The statistical and systematic uncertainties, ∆ stat and ∆ syst , are presented separately. Normalisation uncertainties of 1.9% and 2.1% due to the luminosity and the branching-ratio measurements, respectively, were not included in ∆ syst . The correction factors to the QED Born level, C rad , are also listed. For reference, the beauty cross sections predicted by RAPGAP and scaled as described in the text, σ b , are also shown.

Combination of the HERA charm measurements
This Section is devoted to the combination of the open charm measurements at HERA in DIS. Measurements of open charm production at HERA provide an important input for tests of QCD. As outlined in Section 2.4, c quarks in ep collisions are predominantly produced by the boson-gluon-fusion process, γg → cc, thus charm production is sensitive to the gluon distribution in the proton, and charm measurements are a valuable input for studies of the proton structure and for the extraction of the c-quark mass. Section 6.1 explains the motivation and gives an overview of general aspects of the procedure. Section 6.2 describes the procedure of cross-section averaging, the extrapolation to a common phase-space region and the treatment of experimental uncertainties. Details of the theoretical calculations in the FFNS, which were used in the combination procedure for phase-space corrections and for the comparison with the combined data, are given in Section 6. 3. In Sections 6.4 and 6.5 the main results are presented: a combination of visible D * + cross sections and reduced charm cross sections, respectively. Finally, Section 6.6 gives a summary of the results.

Introduction
The main goal of a data combination is to obtain a single consistent dataset for a given physical process. State-of-the-art QCD analysis procedures use data from a number of individual experiments. The data points are correlated through common systematic uncertainties, within and also across the publications. The procedure can be significantly simplified by averaging the input data in a model-independent way before performing a QCD analysis of that data [212]: e.g. combined into a single dataset DIS charm cross-section measurements are much easier to handle compared to a scattered set of individual experimental measurements, overviewed in Section 4, while retaining the full correlations between data points. Also, a combination serves as a consistency cross check of the input data: a study of the global χ 2 /n dof of the average and the distribution of the pulls allows a model-independent consistency examination between the measurements. Although a combination is not supposed to provide new information, it is possible that a combination will give an extra reduction of correlated uncertainties due to usage of information from the phase-space corners which normally would not be used in analyses or theory fits.
A combination requires input data in the same bins covering the same phase-space region. Considering the existing charm measurements at HERA, there are two strategies for the combination: to combine a limited number of measurements that closely fulfill the above requirement; to combine all relevant measurements extrapolated to a common phase-space region and common bins.
The former provides a model-independent combination (or with minimised model dependency) which retains most of original information; this strategy is followed in the combination of the visible D * + cross sections (Section 6.4). The latter gains from a big number of input measurements, thus it has ultimate accuracy at the cost of some model dependence and a sizeable theoretical uncertainty from the extrapolation procedure; this strategy is followed in the combination of the reduced charm cross sections (Section 6.5).

Combination procedure
In this Section common aspects of the combination procedure are described: the combination method, needed to average quantities given in a common phase-space region (Section 6.2.1), the treatment of uncertainties of input quantities in the combination method (Section 6.2. 2), and phase-space corrections, needed to translate the input quantities to a common phase-space region (Section 6.2.3).

Combination method
The HERAverager package [213] 28 is used for the combination of the charm data. It is an averaging tool developed for the H1 and ZEUS data combination. The combination method is based on the minimisation of the χ 2 -function which includes correlated systematic uncertainties using the nuisance-parameter technique, also known as the Hessian method [214].
Consider N e sets of measurements of N m quantities µ i (e.g. from different experiments or from one experiment, but obtained in different analises), µ e i . Each measurement has one uncorrelated uncertainty, σ e i , and N s correlated, Γ e, j i : All correlated uncertainties are assumed to be Gaussian. Thus each measurement can be written as where a e i and b e, j are independent variables distributed according to the unit Gaussian distribution around zero. Note that b e, j are independent of i; that is, the uncertainties Γ e, j i are 100% correlated for all data points denoted with the same e. The b e, j are called nuisance parameters of correlated uncertainties. Then the generalised χ 2 can be written as where the vectors m and b denote the true parameters m i and nuisance parameters b e, j , respectively. Here the first term takes into account the effects of the shifts of the correlated uncertainties, and the second term is a penalty for the correlated uncertainty shifts from their nominal (zero) values. The uncorrelated uncertainties σ e i are the total uncorrelated uncertainties which may consist of several independent components (e.g. a statistical uncertainty and several different systematic ones, assumed to be uncorrelated between the data points) added in quadrature, according to the law of combination of errors [215,216]. Note, that some of Γ e, j i may be equal to 0 if the measurement µ e i is insensitive to the systematic source j. A formal derivation of the χ 2 expression 6.2 from the assumption 6.1 can be found in [217]. The averaging problem is solved by minimising χ 2 (m, b) w.r.t m and b, providing the average values m and the fitted nuisance parameters b; a variation of χ 2 (m, b) by 1 provides the uncertainties on these values. The formulas for these quantities are provided in Appendix B. 1.
The Hessian method usually leads to a reduction of correlated uncertainties in the combination procedure. This is a considerable advantage compared to the more conservative offset method [218], when error propagation is based on shifting the data by the systematic errors and adding the deviations in quadrature, therefore the size of the correlated uncertainties remains unchanged.
So far the form of the correlated uncertainties Γ e, j i was not specified. It is useful to define the relative correlated systematic uncertainties by the ratio Usually the relative, not absolute, systematic uncertainties are provided by the measurements. Several types of uncertainty treatment can be considered: the multiplicative treatment, when the systematic uncertainties are proportional to the true values: the additive treatment, when the systematic uncertainties are independent of the true value; then they are considered to be proportional to the measured values, i.e. by the definition 6.3, or independent of either: (in other words they are constant and not changed in the combination procedure); a mixed case is the signal-dominated statistical uncertainties, which obey the Poisson statistics; their values are scaled with the square root of m i : The same options exist for the treatment of the uncorrelated uncertainties in the denominator of 6.2. The additive treatment is appropriate for background-dominated uncertainties, which do not depend on the true value m, while the multiplicative treatment is appropriate for all others.
For the charm measurements at HERA the statistical uncertainties are mainly dominated by background, so in the combination they were treated additively. The systematic uncertainties are predominantly proportional to the central values and thus treated multiplicatively. So the χ 2 -function, used in the present combination, is given by 6) where in the denominator the statistical and uncorrelated systematic uncertainties are added in quadrature; δ e stat,i and δ e uncor,i are the relative statistical and uncorrelated systematic uncertainties, respectively, defined similar to 6.3: where σ e stat,i and σ e uncor,i are the absolute statistical and uncorrelated systematic uncertainties, respectively. In the previous combination of the reduced charm cross sections [60] the sensitivity of the result to the treatment of the uncertainties was studied and procedural uncertainties were assigned; however they turned out to be much smaller than the other ones (on average below 0-10% of the total uncertainty, reaching up to 40% only at few combined points [60]) and are neglected in the present combination.

Treatment of systematic uncertainties
As explained in Section 6.2.1, in the combination procedure uncertainties are treated either as fully uncorrelated or fully correlated between the data points of certain measurements. Neither of these is conservative in general. Experimental uncertainties of the input measurements consist of statistical and systematic uncertainties. The statistical component of uncertainties was always treated as uncorrelated between all data points. 29 The systematic component of uncertainties, in general, may have mixed nature: it may be partially correlated between the data points; moreover, the level of correlation may differ in different phase-space regions. In the current combination procedure the following "common sense" strategy was applied: normalisation uncertainties (reported as a single number) were treated as correlated (e.g. luminosity and branching ratios); they are marked as 'N'; those uncertainties that have smooth behaviour in the phase space of the measurement were also treated as correlated (typically these are different kinds of corrections, reweightings, inefficiencies etc., evaluated using studies based on MC); they are marked as 'S'; theory-related uncertainties that arose from the phase-space corrections (see Sections 6.2. 3) were treated as correlated; they are marked as 'T'; all other uncertainties were treated as uncorrelated (typically these are uncertainties estimated using cut variations in data, which are subject to statistical fluctuations).
Explicit information on the sources that were treated as correlated is given in Sections 6.4.1, 6.4.2 and 6.5. 1.
Many of the experimental systematic uncertainties are quoted as asymmetric and have been symmetrised before performing a combination. For the measurements [4,172,30], which have not been included in the previous charm combination [60], symmetrisation consisted in taking the largest deviation; no corrections to the central values were applied. For those measurements that have been included in the previous charm combination [60], the symmetrisation remains the same as in [60]. 30

Phase-space correction
Whenever the quantities to be averaged are measured in different phase-space regions, they have to be corrected before performing a combination. Assume that there is a measured quantity (e.g. a cross section) in the phase-space region 1, σ meas 1 , which needs to be shifted into the phase-space region 2. The correction procedure is called extrapolation and relies on usage of theoretical calculations: (6. 8) Here σ th 1 and σ th 2 are the predicted quantities in the phase-space regions 1 and 2, respectively. The closer the phase-space regions 1 and 2 (in particular, the more they overlap), the less model dependency the extrapolated quantity, σ extr 2 , has. In order to estimate the remaining model dependence, the parameters of the theoretical calculations are varied; the resulting uncertainty is called the extrapolation uncertainty. In addition to the extrapolation uncertainty the extrapolated quantity σ extr 2 has an original uncertainty of σ meas 1 (which, for instance, may consist of statistical and systematic uncertainties of the experimental measurement ).
It is important to distinguish between "small" extrapolations to another region of a measured phase-space region (these can be thought of rather as interpolations), referred to in the future as swimming, e.g. when a quantity is translated into a different binning scheme, and actual extrapolation to an unmeasured phase-space region, referred to in the future just as extrapolation.
In the first case it is important that the original measurement in general covers all the phase-space region where the swimming is performed, so the predictions can be compared to the measurements in order to check the adequacy of the swimming. In contrast, in the second case the results of the extrapolation depend on theoretical predictions in unmeasured phase-space corners; thus in general the adequacy of the results cannot be verified unless there are other measurements in the uncovered 30 It was found in [60] that the results are insensitive to the details of the symmetrisation procedure. regions. Note that in both cases the corrections do not depend on common normalisation factors. For the combinations presented in this review, the combination of D * + cross sections requires only swimming, while the combination of reduced charm cross sections requires genuine extrapolation.
For the charm combination presented here, phase-space corrections were always done using the theory outlined in Section 2.4.3: the NLO QCD calculations (O(α 2 s )) in the 3-flavour FFNS. Details of the theoretical calculations (including the variations which are used to estimate the extrapolation uncertainties) are given in Section 6.3.

Theoretical calculations in FFNS
The FFNS theoretical calculations were used for two purposes: NLO QCD predictions in the FFNS were obtained with the HVQDIS program [63]. The parameters used in the calculations, together with the corresponding variations which were used to estimate the uncertainties, are described below. 31 In the combination procedure each extrapolation uncertainty was treated as correlated between all points and all measurements. Most of the extrapolation uncertainties were originally asymmetric and have been symmetrised before performing the combination. Symmetrisation was performed by taking the largest deviation; no corrections to the central values were applied. For the data to theory comparison, to obtain total theoretical uncertainties, all the variations were added in quadrature and the summation was performed separately for positive and negative variations.

Parton-level cross sections
The parton-level cross sections were calculated using the following settings: -the renormalisation and factorisation scales were set to µ r = µ f = Q 2 + 4m 2 c and varied up and down by a factor of two. The variations were performed independently if the theoretical predictions were used for comparison with data, or simultaneously if they were used for extrapolation or swimming corrections (which are sensitive only to the shape of the predictions); -the pole mass of the c quark m c = 1.50 ± 0.15 GeV [60]; since the renormalisation and factorisation scale definitions include the c-quark mass, varying this also slightly affected the two scales; -the strong coupling constant α Charm measurements were not included in the determination of these PDF sets. For each of the parameter variations (the scales, mass and α s ), a different respective PDF set was used. By default, the scales for the charm contribution to the inclusive data in the PDF determination were chosen to be consistent with the factorisation scale used in HVQDIS, while the renormalisation scale in HVQDIS was decoupled from the PDF scales, except in the cases where the factorisation and renormalisation scales were varied simultaneously. As a cross check, instead of fitting the PDFs from inclusive data, 3-flavour NLO variants of the ABM [219] and MSTW [220] PDFs were also used to evaluate the cross sections. For MSTW, the variant with m c = 1.5 GeV was chosen. The differences were found to be much smaller compared to those from other parameter variations, therefore the PDF uncertainties are neglected; the plots are provided in Appendix C (Fig. C.6).

Fragmentation
The fragmentation model described in the previous publication [60] was used to provide hadron-level cross sections, if needed. It is based on the measurements by H1 [122] and ZEUS [208] using the production of D * + mesons, with and without associated jets, in DIS and PHP. This model uses the fragmentation function of Kartvelishvili et al. [118], controlled by the parameter α K , to describe the longitudinal fraction of the c-quark momentum transferred to the D * + mesons. The fragmentation was performed in the photon-proton centre-of-mass frame by rescaling the quark three-momentum, then the energy of the produced hadron was calculated and the hadron was boosted to the lab frame. The calculation of the hadron energy and the Lorentz boost were done by using the hadron mass. 32 Different values of α K [60] were used for different bins in the photonparton centre-of-mass-frame squared energy,ŝ, and for different hadrons. Since ground-state D mesons partly originate from decays of D * + and other excited mesons, the corresponding cquark fragmentation function is softer than that measured using D * + mesons. From kinematic considerations [209], supported by experimental measurements [210], the expectation value for the fragmentation function of c quarks into D 0, not D * + 33 , D + and in the mix of charmed hadrons decaying into muons, has to be reduced by ≈ 5% with respect to that for D * + mesons. The values of α K for the fragmentation into ground-state hadrons, used for the D 0, not D * + , D + and µ measurements, have been re-evaluated accordingly [60] and are reported in Table 6.1. The model also implements a transverse fragmentation component by assigning to the charmed hadron a transverse momentum, k T , with respect to the c-quark direction, with k T = 0.35 ± 0.15 GeV. If needed (for the phase-space corrections for the ZEUS muon measurement [171]), the charm-hadron cross sections were accompanied 32 As explained in Section 2.6, a phenomenological fragmentation model must be applied exactly in the same way as it was measured. Here the fragmentation model follows the original H1 and ZEUS measurements [122,208]. 33 D 0, not D * + refers to D 0 that do not originate from decays of D * + .
by the semi-leptonic decays from [221]. Fragmentation fractions were taken from [199,211] and are listed in Table 6.2.
In total, the following uncertainties were assigned to the fragmentation: the variation of α K (the upward and downward variations were performed simultaneously for allŝ bins and for all hadrons 34 ); the variation ofŝ 1 35 ; the variation of k T 36 ; the uncertainties on the fragmentation fractions.

Beauty contribution
In most of the analyses the cross sections of charmed hadrons either produced directly or in decays of beauty hadrons were measured. For the combination of reduced charm cross sections the beauty contribution needed to be subtracted, while for the data to theory comparison for D * + -production cross sections it must be added to the charm theoretical predictions. In previous H1 and ZEUS charm analyses the beauty contribution had been obtained using the RAPGAP MC [194], with the normalisation rescaled to dedicated beauty measurements. Typical normalisation factors vary from 1.0 to 2.0 [222, 202,203, 204, 30], thus an uncertainty ∼ 50% has to be assigned to the beauty contribution. Propagated to the uncertainty on charm and beauty production, this results in an uncertainty up to ∼ 5% and thus becomes a dominant uncertainty at high Q 2 , where the perturbative calculations are quite accurate. Moreover, this approach provides predictions at LO accompanied by parton showers, re-normalised to measured data.
In the present study the beauty contribution was obtained at NLO: from the NLO QCD predictions for beauty hadrons with subsequent decays into charmed hadrons. A non-trivial ingredient of these calculations is the decay kinematics of beauty to charmed hadrons, which, since many individual decay channels are involved, has to be obtained from some MC generator. In Fig. 6.1 the distributions of D-meson momenta in the Bhadron rest frame as obtained from the PYTHIA [192] and EvtGen [223] MC generators are compared with the data from CLEO [176] and ARGUS [224]. The shape from EvtGen describes the data reasonably well, therefore it was used for the predictions.
The parameters for the beauty contribution calculations and uncertainties were: Peterson et al. [120] parametrisation with b = 0.0035 ± 0.0020 [225]; -the fraction of beauty hadrons decaying into charmed hadrons was taken from [199] and listed in Table 6.2; -the PDFs, described by the same set (the 3-flavour FFNS) as the one used for the corresponding charm prediction.
The dominant uncertainty comes from the variation of the fraction of beauty hadrons decaying into D * + mesons; although it reaches only ≈ 2% in the highest Q 2 bins. Since the beauty contribution itself is small (varies from 1% at low Q 2 to 7% at high Q 2 ), all other uncertainties are negligible.

Combination of visible D * + cross sections
Among all techniques used at HERA to measure open charm production (see Section 4), measurements of D * + production have the best signal-to-background ratio and are the most precise. ZEUS and H1 have recently published single-and doubledifferential D * + cross sections for inclusive D * + -meson production in DIS from their respective final HERA-II datasets [76,77,172]. The measurements have been performed in similar  [192] and EvtGen [223] MC generators compared with the data from CLEO [176] (top left, bottom left and bottom right) and ARGUS [224] (top right). The distributions from the event generators are normalised to the data.
phase-space regions and used similar binning schemes 37 thus fulfilling the requirement for a combination with minimised theory dependence.
The phase-space region of HERA-II measurements in DIS is restricted compared to that of HERA-I measurements. Due to beam-line modifications related to HERA-II high-luminosity running [131] the visible phase space of these D * + cross sections at HERA-II is restricted to virtualities Q 2 > 5 GeV 2 . This fact prevents straightforward combination with HERA-I measurements for most of the single-differential D * + cross sections, although in the case of the single-or double-differential D * + cross sections as a function of Q 2 , the above restriction does not apply and the kinematic range can be extended to lower Q 2 using earlier HERA-I measurements. In fact only the doubledifferential D * + cross sections as a function of Q 2 and y can be combined with HERA-I measurements without applying extensive swimming corrections. 38 For this reason the treatment of the visible D * + cross section combination consists of two parts: a combination of single-differential D * + cross sections, described in Section 6.4.1, and a combination of the double-differential cross section, described in Section 6.4.2. While the common 37 An agreement on the phase-space region and binning schemes was achieved between the H1 and ZEUS Collaborations before performing the measurements. 38 Although the single-differential D * + cross sections as a function of Q 2 , in principle, also can be combined with HERA-I measurements without applying swimming corrections, this combination is not provided, because information on the single-differential D * + cross sections as a function of Q 2 can be obtained from the double-differential D * + cross sections as a function of Q 2 and y, provided in Section 6.4.2; to keep consistency between all the combined single-differential D * + cross sections, which requires the same input data. method and strategy for both parts remain the same, the input measurements and phase-space regions differ. All measurements to be combined for the single-and double-differential D * + cross sections are already corrected to the QED Born level with a running fine-structure constant and include both the charm and beauty contributions to D * + production. The results reported in this Section have been published by the H1 and ZEUS Collaborations [5].

Combination of single-differential cross sections
First the input measurements, combination phase-space region and details of the combination procedure are given, which includes all necessary corrections needed to transform the input data to the common phase-space region. Then the results of the combination are presented and discussed. Afterwards the combined data are compared to NLO QCD predictions. As a result of a detailed comparison of data and theory a 'customised' theoretical calculation is introduced. Table 6.3 presents the datasets used for the combination together with their visible phase-space regions and integrated luminosities. Note that the H1 Collaboration has published D * + cross-section measurements separately for 5 < Q 2 < 100 GeV 2 (dataset I) [77] 39 and for 100 < Q 2 < 1000 GeV 2 (dataset II) [76] because different sub-detectors had to be employed for the detection and measurement of the scattered electron in these two regions. Thus the overall phase-space region for the combined D * + cross sections is given by 5 < Q 2 < 1000 GeV 2 , 0.02 < y < 0.7, p T (D * + ) > 1.5 GeV, |η(D * + )| < 1.5.

Input measurements, phase-space region and combination details
(6.9) The combination was done for single-differential D * + cross sections as a function of the D * + transverse momentum, p T (D * + ), pseudorapidity, η(D * + ), and inelasticity, z(D * + ) = (E(D * + ) − p Z (D * + ))/(2E e y), with E e being the incoming electron energy, E(D * + ) and p Z (D * + ) the energy and longitudinal momentum of D * + , respectively, as well as of the DIS kinematic variables Q 2 and y. 40 Since the H1 datasets I and II are complementary to each other and give the phase-space region of the combination 6.9, their differential D * + cross sections are summed up on a bin-bybin basis and enter the combination as a single dataset. However, due to the limited statistics at high Q 2 a coarser binning scheme 39 From the two sets of measurements in [77], the one compatible with the cuts on p T (D * + ) and η(D * + ) quoted in Table 6.3, which are compatible with the phase-space region of the ZEUS measurement [172], was chosen and referred to as dataset I. 40 Although all input measurements from Table 6. 3 give also the single-differential cross section as a function of the Bjorken variable x, the binning differs significantly, preventing a combination without large swimming corrections.
in p T (D * + ), η(D * + ), z(D * + ) and y had to be used in dataset II compared to dataset I. This made a straightforward summation of the differential D * + cross sections from the two measurements complicated. Therefore the cross section in a bin i of a given observable integrated in the range 5 < Q 2 < 1000 GeV 2 was calculated according to .
Here σ int denotes the integrated visible cross section and NLO stands for the NLO predictions obtained from HVQDIS. 41 In this calculation both the experimental uncertainties of the visible cross section at high Q 2 and the theoretical uncertainties (described in Section 6.2.3) were included. The contribution from the region 100 < Q 2 < 1000 GeV 2 to the full Q 2 range amounts to 4% on average and reaches up to 50% at the highest p T (D * + ); the extrapolation uncertainty is negligible in most of the bins compared to the corresponding experimental uncertainty; only at the two highest p T (D * + ) bins it approaches 35% of the experimental uncertainty. Thus in the combination procedure the extrapolation uncertainties from all theoretical parameter variations were added in quadrature and treated as an uncorrelated uncertainty. The sensitivity of the shape to the beauty contribution was found to be negligible and therefore was ignored.
For the single-differential D * + cross sections as a function of Q 2 , the procedure described above was not needed. However the binning schemes used for these D * + cross sections differ between datasets I-II and dataset III. At low Q 2 this was solved by combining the cross-section measurements of the first two bins of dataset I into a single bin. For Q 2 > 100 GeV 2 no consistent binning scheme could be defined from the singledifferential cross-section measurements dσ/dQ 2 itself. However, the measurements of the double-differential D * + cross section d 2 σ/dydQ 2 have been performed in a common binning scheme. By integrating these D * + cross sections in y, single-differential D * + cross sections in Q 2 were obtained at Q 2 > 100 GeV 2 from datasets II, III which were used directly in the combination. The contribution to dataset III from the range p T (D * + ) > 20 GeV was found to be negligible ( 1%).
Applying the procedure described above provided exactly two input measurements for each combined bin: one from H1 (datasets I-II) and one from ZEUS (dataset III). Thus n dof is equal to the number of combined bins. Since the data are statistically correlated between the different distributions, each distribution was combined separately.
The branching ratios for datasets I, II were updated to the PDG value [199]. A full list of considered correlated sources is provided in Appendix C (Table C.1). All systematic uncertainties were treated as uncorrelated between the H1 and ZEUS measurements, except for the branching-ratio uncertainty; since the latter is fully correlated between all datasets, it is not

Combined D * + cross sections
The results of combining the HERA-II measurements [77,76,172] as a function of p T (D * + ), η(D * + ), z(D * + ), Q 2 and y are given in Table 6.4, together with their uncorrelated and correlated uncertainties. The total uncertainties were obtained by adding the uncorrelated and correlated uncertainties in quadrature. A detailed breakdown of the correlated uncertainties is provided in Appendix C (Table C. 6.2 indicates that the H1 data points lie on average below the ZEUS points, the pulls in Fig. 6.3 show an overall symmetric spread of the H1 and ZEUS input data around the combined results; this is explained by taking into account shifts of the correlated systematic uncertainties. The shifts and reductions of the correlated sources are consistent for the combinations of the D * + cross sections in different variables; they are provided in Appendix C (Table C.1).
The combined D * + cross sections exhibit significantly reduced uncertainties. While the effective doubling of the statistics of the combined result reduces the uncorrelated uncertainties (inner error bars in Fig. 6.2), the correlated uncertainties (quadratic difference of the outer and inner error bars) of the combined D * + cross sections are significantly reduced through cross-calibration effects between the two experiments. Typically, both effects contribute about equally to the reduction of the total uncertainty.

Comparison with theoretical predictions
The combined D * + cross sections as a function of p T (D * + ), η(D * + ), z(D * + ), Q 2 and y are compared to the NLO QCD predictions in the FFNS (described in Section 6. 3) in Fig. 6.4; there is also a dotted line referred to as 'customised' NLO QCD predictions shown there, which will be discussed below. In general the predictions describe the data well. The uncertainties of the data are as small as 5% over a large fraction of the measured phase-space region, while the typical theory uncertainty ranges from 30% at low Q 2 to 10% at high Q 2 . The data points between the different distributions are statistically and systematically correlated, so they can be quantitatively compared to theory only on a one-by-one basis.
The theoretical predictions describe the combined data well within the corresponding uncertainty band, however the central theoretical curves underestimate the data normalisation. The central theoretical prediction shows a somewhat softer y distribution than the data. The central prediction for z(D * + ) is slightly wider than the measured distribution.

'Customised' theoretical predictions
As stated above, in overall the theoretical uncertainties are larger than the experimental uncertainties of the combined data.
Since the theoretical uncertainties depend on several correlated sources, it is rather difficult to make a strong statement about agreement between the theory and the data from Fig. 6.4 itself.
In order to study the impact of the current theory uncertainties in more detail, the effect of each theoretical uncertainty on the predictions was studied. The most conclusive variations on the predictions are shown separately in Fig. 6.5, compared to the same data as in Fig. 6. 4. Plots with all the variations are available in Appendix C (Figs. C.1 to C. 5).
The NLO prediction as a function of p T (D * + ) ( Fig. 6.5a) describes the data better if either the c-quark pole mass is reduced to 1.35 GeV; or the renormalisation scale is reduced by a factor 2; or the factorisation scale is increased by a factor 2.
Simultaneous variation of both scales will largely compensate and will therefore result in a much smaller effect.
The prediction for the z(D * + ) distribution ( Fig. 6.5d) describes the shape of the data noticeably better if the fragmentation parameters are adjusted such that the bin boundarŷ s 1 between the two lowest fragmentation bins [60] is set to 30 GeV 2 (see Table 6.1). This also slightly improves the shape of the y distribution ( Fig. 6.5b).
The preference for a reduced renormalisation scale already observed for p T (D * + ) is confirmed by the z(D * + ) distribution ( Fig. 6.5c). However, the shape of the z(D * + ) distribution rather favours variations of the charm mass and the factorisation scale in the opposite direction to those found for the p T (D * + ) dis- Table 6.4: The combined single-differential D * + cross sections as a function of p T (D * + ), η(D * + ), z(D * + ), Q 2 and y, with their uncorrelated (δ unc ), correlated (δ cor ) and total (δ tot ) uncertainties. The cross sections are given in the kinematic region 6.9.  Table 6.5: The values of χ 2 , n dof and the corresponding χ 2probabilities for the combinations of the single-differential D * + cross sections as a function of different variables.
Cross section n dof χ 2 p(χ 2 ,n dof ) dσ/dp T (D * + ) 11 6.9 81% dσ/dη(D * + ) 12 7.8 80% dσ/dz(D * + ) 7 10.9 15% dσ/dQ 2 11 6.1 87% dσ/dy 8 5.8 67% tribution. The other kinematic variables do not contribute any additional information to these findings. As stated before, within the large uncertainties indicated by the theory bands in Fig. 6.4, all distributions are reasonably well described. However, the above study shows that the different contributions to these uncertainties do not only affect the norm-alisation but also change the shape of different distributions in various ways. It is therefore not a priori expected that a variant of the prediction which gives a good description in one variable will also give a good description in another.
Based on the above study, a 'customised' calculation was hence performed with the goal to demonstrate that it is possible to obtain an acceptable description of the data in all variables at the same time, for both shape and normalisation, within the theoretical uncertainties quoted in Section 6. 3. 42 For this calculation the renormalisation scale was reduced by a factor 2, with the factorisation scale unchanged; 42 Since several of the theory parameters (e.g. the renormalisation and factorisation scales) are not physical parameters, and hence their "uncertainties" have no physical relevance, a demonstration rather than a detailed fit will suffice to clarify this point. Another reason not to perform a detailed fit is that the data are statistically correlated between the different distributions, therefore all the distributions must not be fitted simultaneously. the change of the fragmentation parameterŝ 1 = 30 GeV 2 was applied; at this stage, the resulting distributions were still found to underestimate the data normalisation. As the renormalisation and factorisation scales are recommended to differ by at most a factor of two [106], the only significant remaining handle is the c-quark pole mass. This mass was set to 1.4 GeV, a value which was also found to be compatible with the partially overlapping data used for a previous dedicated study [60] of the c-quark mass; all other parameters, which were found to result in a much smaller effect than those treated above, were left at their central settings as described in Section 6. 3.
The result of this customised calculation is indicated as a dotted line in Fig. 6. 4. Indeed a reasonable agreement with the data is achieved in all variables at the same time. This a posteriori adjustment of theory parameters may be taken as a hint in which direction theoretical and phenomenological developments might need to move: the strong improvement of the description of the data relative to the central prediction through the customisation of the renormalisation scale indicates that NNLO calculations, which might reduce the scale-related uncertainties to a level which matches the data precision, are needed to obtain a more stringent statement concerning the agreement of the pQCD predictions with the data; the improvement from the customisation of one of the fragmentation parameters and the still not fully satisfactory description of the z(D * + ) distribution indicate that further dedicated experimental and theoretical studies of the fragmentation treatment will be helpful.
In general, the precise single-differential distributions resulting from the combination, in particular those as a function of p T (D * + ), η(D * + ) and z(D * + ), are sensitive to theoretical and phenomenological parameters in a way which complements the sensitivity of more inclusive variables like Q 2 and y.

Combination of double-differential cross section
This Section continues the D * + cross-section combination and presents a combination of double-differential cross section as a function of Q 2 and y.

Input measurements, phase-space region and combination details
Since for the combination of the double-differential cross section as a function of Q 2 and y the restriction to the same phase-space region in Q 2 does not apply, HERA-I D * + measurements can be included in the combination.  Table 6.6 presents the datasets considered for the combination of the visible D * + double-differential cross section. 43 Compared to Table 6.3, Table 6.6 contains also three most precise HERA-I measurements; it also has an additional column which reports the centre-of-mass energy, since the latter differs for one of the HERA-I measurements.
Inclusion of HERA-I measurements in the combination allows an extension of the kinematic range down to lower Q 2 . Although all three HERA-I measurements have different lower Q 2 boundaries, a reasonable compromise between them was to choose the lower Q 2 equal to Q 2 = 1.5 GeV 2 . Thus the overall phase-space region for the combined D * + cross sections is given 43 Same as in Table 6.3, from the two sets of measurements in [77], the one compatible with the quoted cuts on p T (D * + ) and η(D * + ), which are compatible with the phase space of the ZEUS measurement [172], was chosen and referred to as dataset I. by 1.5 < Q 2 < 1000 GeV 2 , 0.02 < y < 0.7, p T (D * + ) > 1. 5 GeV, |η(D * + )| < 1.5, √ s = 318 GeV. (6.11) Some of the HERA-I measurements from Table 6.6 have a slightly different phase-space region and are performed at a different centre-of-mass energy. Moreover, their binning schemes for the double-differential cross section significantly differ from that which has been used for datasets I-III; dataset VI reports the double-differential cross section not as a function of Q 2 and y but as a function of Q 2 and x. Thus inclusion of HERA-I measurements in the combination necessarily required applying swimming corrections.
A dedicated study was performed to select those HERA-I measurements which are reasonably compatible with the HERA-II ones. At first a common binning scheme had to be chosen. Since the HERA-II measurements still remain the most precise in the combination, the double-differential cross section as a function of Q 2 and y was selected with the binning scheme which is based on datasets I-III (although slightly revised to improve consistency with the HERA-I measurements ). It was extended at low Q 2 with the binning scheme based on the most precise HERA-I dataset IV. The new binning will be given together with the combined D * + cross sections in Table 6. 8. D * + cross sections in the new bins (also referred to as destination, or output, bins) were obtained from the original bins (referred to also as input bins) using the swimming procedure described in Section 6.2. 3. For each swum bin the following quantities were calculated: the fraction of the cross section of the original bin contained in the new one, efficiency, E; the fraction of the cross section of the new bin contained in the original one, purity, P; the ratio of the swimming uncertainty to the experimental uncorrelated uncertainty in the corresponding bin, R.
Definition of purity, efficiency, and swimming factor, F sw , which were used to translate the differential cross section from the original bin to the destination one, is illustrated in Fig. 6. 6. Note that in some cases for a given original bin there can be several candidates for destination bins; in this case a destination bin with maximum P, E, and minimum R was chosen. Sometimes it was advantageous to combine two input bins before swimming. The overlap of the binning schemes for the doubledifferential cross section from all input measurements and the new binning scheme is shown in Fig. 6.7. Fig. 6.8 shows P vs. E, and R vs. its denominator, the experimental uncorrelated uncertainty, for all considered datasets. Since the binning scheme was chosen to be based on the HERA-II measurements, for all bins from datasets I-III purity, efficiency and the ratio satisfy P, E > 80% and R < 10% (note that in most bins P = E = 100% and R = 0%, since the original and destination bins exactly coincide ).
Since the purpose of the combination is to provide the visible D * + cross sections, P and E should not be too low and R not too large. Thus it is natural to introduce cuts on these quantities. Several possible values for the cuts on P and E are presented in Table 6.7 together with the numbers of input bins which survive the cuts. A reasonable cut was chosen to be P, E > 50%. Then most of the input bins from dataset IV (29 of 31) survive this selection, although it eliminates most of the bins from datasets V and VI. 44 Therefore a decision was taken to include in the combination from the HERA-I measurements only dataset IV. 44 For dataset V the input bins are too large, while for dataset VI the main difficulty is the original differential cross section as a function of Q 2 and x.
In addition the cut R < 30% was introduced. This eliminated 3 other input bins from dataset IV, so finally 26 of 31 original bins were kept. The data points removed from dataset IV mainly correspond to the low-y region where larger bins were used for the HERA-I data; additionally they suffer more from the swimming uncertainties, since the NLO QCD predictions at low y have a large mass dependence. All input bins from datasets I-III survived the above cuts on P, E and R and were kept. The swimming procedure includes the contribution to dataset IV from the range p T (D * + ) > 15 GeV. Similar to the case of the single-differential cross-section combination (Section 6. 4. 1), the   sensitivity of the shape to the beauty contribution was found to be negligible and thus was ignored.
The branching ratios for datasets I, II and IV were updated to the PDG value [199]. A full list of considered correlated sources is provided in Appendix C (Table C. 1). Similar to the case of the single-differential D * + cross sections, all systematic uncertainties were treated as uncorrelated between H1 and ZEUS measurements, except for the branching-ratio uncertainty; although since the latter is fully correlated between all datasets, it is not changed in the combination and was not included in the combination but applied as an external uncertainty on the results.

Combined D * + cross sections
The combined double-differential cross section with the uncorrelated, correlated and total uncertainties as a function of Q 2 and y is given in Table 6. 8. The total uncertainties were obtained by adding the uncorrelated and correlated uncertainties in quadrature. A detailed breakdown of the correlated uncertainties are provided in Appendix C (Table C. 3).
The individual datasets as well as the results of the combination are shown in Fig. 6. 9. The combined D * + cross sections Table 6.8: The combined double-differential D * + cross section as a function of Q 2 and y, with its uncorrelated (δ unc ), correlated (δ cor ) and total (δ tot ) uncertainties. The cross sections are given in the kinematic region 6  exhibit significantly reduced uncertainties. The input HERA-II H1 and ZEUS datasets are similar in precision. The precision of the ZEUS HERA-I data is smaller; however this sample also provides valuable input in some bins. In the first two Q 2 bins, the combination is based on the HERA-I data only; note that the uncertainty on the combined data in these bins is a bit reduced comparing to the original one because of reduction of the correlated systematic uncertainties.
The combination has χ 2 /n dof = 38/48; the corresponding probability is 85%, indicating consistency of the input measurements and, possibly, some overestimation of the experimental systematic uncertainties. The pull distribution is shown in Fig. 6. 10. It is close to a unit Gaussian distribution. As was seen in the results of the single-differential cross section combination (Section 6.4.1), although Fig. 6.9 indicates that the H1 data points lie on average below the ZEUS points, the pull distribution in Fig. 6.10 shows an overall symmetric spread of all input data around the combined results. The shifts and reductions of the correlated sources are provided in Appendix C (Table C.

1).
The combined cross section is compared to the NLO QCD predictions in the FFNS (described in Section 6.3) in Fig. 6. 11. The customised calculation (see Section 6.4.1) is also shown. In general the predictions describe the data well. As seen before from the single-differential y cross section, the central theory prediction shows a somewhat softer y distribution than the data, in particular at low Q 2 . The data reach a precision of about 5-10% over a large fraction of the measured phase space, while the typical theory uncertainty ranges from 30% at low Q 2 to 10% at high Q 2 , so higher-order calculations would be very helpful to match the data precision. As well as the single-differential distributions, the double-differential distribution gives additional input to test further theory improvements.

Combination of reduced charm cross sections
This Section describes the combination of HERA charmproduction measurements in DIS at the level of the reduced cross sections (see Eq. 2.7). The information on charm production in DIS can be collected in a single dataset in the full phase space, integrated over p T and η of the c quark, because most theoretical predictions exist only for this inclusive quantity. Since all methods used to measure charm production at HERA, introduced in Section 4, have limited phase-space coverage and thus must be corrected to the full phase space using theory, there are no reasons to restrict this combination to a specific phase-space region and binning scheme, and all datasets from H1 [168,167,76,77] and ZEUS [36,169,170,171,4,172,30] were included for which the necessary information on systematic uncertainties needed for the combination is available and which have not been superseded by later measurements up to November 2016. The results reported in this Section extend the previous combination of H1 and ZEUS charm measurements [60] by including three recent ZEUS datasets [4,172,30] which appeared after the combination [60] (referred to as 'HERA 2012') has been performed. The combination procedure was kept as close as possible to [60] to allow for a consistent comparison with the published results. Section 6.5.1 introduces the combination details: the input datasets and treatment of their experimental uncertainties, definition of the reduced cross sections and the combination Q 2 -x grid and extraction of the reduced cross sections from the visible ones, needed to put the input measurements into the common grid. In Section 6.5.2 the results of the combination are presented and compared to the results from [60]. In Section 6.5.3 the combined data are compared to the theoretical predictions in the FFNS and the running and pole charm masses are extracted from the data.

Input data samples
The datasets included in the combination are listed in Table 6.9 and correspond to 209 individual cross-section measurements. 45 45 From the two sets of measurements in [77], the one in the wider p T (D * + ) and η(D * + ) range was chosen and referred to as dataset I; this is another dataset from the one that was used in the combination of the D * + cross sections, described in Section 6. 4.  The combination includes measurements of charm production performed using different tagging techniques: the reconstruction of particular decays of D-mesons (datasets 2-7, 9, 10), the inclusive analysis of tracks exploiting lifetime information (datasets 1, 11) and the reconstruction of muons from charm semi-leptonic decays (dataset 8).
Datasets 1-8 have been used in the previous 'HERA 2012' combination, while datasets 9-11 were newly included. Note that dataset 9 replaced one of the datasets from 'HERA 2012', which is its subset.
Correlations between systematic uncertainties of different measurements were accounted for as explained in Section 6.2.2. All experimental systematic uncertainties were treated as independent between H1 and ZEUS. A full list of correlated sources is provided in Appendix D (Table D. 1). The total uncorrelated systematic uncertainties were obtained by adding individual ones in quadrature. 46

Reduced cross sections and common Q 2 -x grid
The quantities to be combined are the reduced charm cross sections, defined as follows: 46 For dataset 11 an additional uncorrelated systematic uncertainty was considered: an uncertainty of 100% on ∆ had = C had − 1 ( The cross section d 2 σ cc /dxdQ 2 is given at the Born level without QED and electroweak radiative corrections, except for the running electromagnetic coupling α. The reduced cross sections (and not the structure functions F cc 2 ) were chosen for the combination because they are proportional to the directly measured double-differential cross sections. F cc 2 and F cc L depend only on Q 2 and x. The presence of y in definition 6.12 leads to a dependence of σ cc red on the centre-of-mass energy, √ s. The combined reduced cross sections are provided at the centre-of-mass energy √ s = 318 GeV. The values of σ cc red for individual measurements were determined at the 52 (Q 2 , x) points of a common grid, chosen such that they are close to the centre-of-gravity in Q 2 and x of the corresponding bins, taking advantage of the fact that the binning schemes used by the H1 and ZEUS experiments are similar (the grid points were kept the same as in the 'HERA 2012' combination ). For all but three grid points, at least 2 measurements entered into the combination; for points in the medium Q 2 bins, the number of input measurements is as much as 7. The phasespace region of the combined cross sections is determined as (6.13)

Extrapolation and corrections
The results of the inclusive lifetime analysis (dataset 1) were directly taken from the original measurement in the form of σ cc red . For all other measurements the inputs to the combination were visible cross sections, σ vis,bin , defined as the D-, µor jetproduction cross sections in a particular p T and η range, in bins of Q 2 and y or x.
The reduced cross sections σ cc red were obtained from the visible cross sections σ vis,bin measured in a limited phase-space region using theoretical predictions according to the procedure described in Section 6.2.3: the reduced charm cross section at a reference (x, Q 2 ) point is given by σ cc red (x, Q 2 ) = σ vis,bin σ cc,th red (x, Q 2 ) σ th vis,bin . (6.14) To calculate σ cc,th red (x, Q 2 ) and the visible cross sections σ th vis,bin , the NLO QCD FFNS theory set-up was used, consistent with the previous 'HERA 2012' combination. 47 This set-up is almost identical to the one described in Section 6.3, except for the following minor changes: 1. in the fragmentation process, calculation of the hadron energy and Lorentz boost were performed by using the mass of the c quark rather than the charmed hadron 48 ; 47 The fully consistent theory set-up allowed for using existing input tables for σ cc red for datasets 1-8, available from [60], and straightforward comparison of new results of the combination with the 'HERA 2012' results; also note that two of three newly included ZEUS measurements (datasets 10 and 11) already published σ cc red extracted from the visible cross sections using exactly this theoretical set-up. 48 Except for dataset 8.  Table 6. 9: The most precise H1 and ZEUS measurements of charm production performed with various techniques. For each dataset the charm tagging method, the Q 2 , p T (E T ) and η range, the number of cross-section measurements, N, the integrated luminosity, L, and the centre-of-mass energy, √ s, are given. The dataset with the D 0, no D * + tagging method is based on an analysis of D 0 mesons not originating from detectable D * + decays. Dataset Tagging method Q 2 range D 0,noD * + 5 < Q 2 < 1000 1.5 < p T (D 0 ) < 15 |η(D * + )| < 1. 6 2. if it needed to be subtracted, the beauty contribution was evaluated using the estimates of the corresponding publications (based on MC re-normalised to data).
The extrapolation factors, R = σ th bin /σ th vis,bin , where σ th bin is the cross section in the full (p T , η) phase-space region, vary in a wide range: from R 1 at high Q 2 to R ∼ 5 at low Q 2 . For dataset 5 the extrapolation procedure includes also the centreof-mass energy correction.
The extrapolation uncertainties were estimated from the variations described in Section 6.3, and were treated as correlated between datasets 2-11 49 . For dataset 1 the extrapolation uncertainties (except for the longitudinal fragmentation) do not appear explicitly and were covered by the experimental systematic uncertainties. The dominant contributions arise from the variation of the renormalisation and factorisation scales (average 5-6%, reaching 15% at lowest Q 2 ) and from the variation of the fragmentation function (average 3-5%).
Prior to the combination, datasets 1 and 11 were corrected, when needed, from the grids used in the original publications to the common grid using the NLO FFNS calculation. The corrections were always smaller than 25% and the associated uncertainties, obtained by varying the charm mass, the scales and the PDFs, were negligible. All D-meson cross sections were updated using the most recent branching ratios [199].

Combined charm cross sections
In total, 209 measurements were combined to 52 reduced crosssection measurements. The combination has χ 2 /n dof = 117/157; the corresponding probability is 99.3%, indicating conservative estimation of the experimental systematic uncertainties of the input measurements. The pull distribution is shown in Fig. 6. 12.
The values of the combined cross section σ cc red together with uncorrelated, correlated and total uncertainties are given in Table 6. 10. A detailed breakdown of the correlated uncertainties is provided in Appendix D (Table D.2).
The individual datasets as well as the results of the combination are shown in Fig. 6. 13. 50 The combined cross sections exhibit significantly reduced uncertainties. The input H1 and ZEUS data in total are similar in precision and contribute roughly equally to the averaged results. The combined data are significantly more precise than any of the individual input datasets. The uncertainty of the combined results is about 8% on average and reaches 4% in the region of small x and medium Q 2 . This is an improvement of about a factor of 2.5 with respect to each of the most precise datasets in the combination.
There are in total 78 sources of correlated systematic uncertainty, including global normalisations, characterising the separate datasets. The shifts and the reduction of the correlated uncertainties are provided in Appendix D (Table D.1). None of these shifts exceeds 1.3 standard deviation. The influence of several correlated systematic uncertainties was reduced by more than a factor of two, while on average the reduction factors 49 The PDF uncertainties were neglected for the newly included datasets 9-11, since for the other ones they were found to be negligibly small (1% on average) [60]. 50 The same plots, but separately for each Q 2 bin, are available in Appendix D (Figs. D.1to D. 12).     are about 20% of the nominal standard deviation. The reductions are due to the different charm-tagging methods, and to the requirement that different measurements probe the same cross section at each (x, Q 2 ) point. The reduction of systematic uncertainties propagated to the other average points, including those which are based solely on the less precise measurements. Due to this propagation the uncertainty on the combined data in the points, to which only one input measurement contributes, was also reduced compared to the original one.
The combined data are compared to the 'HERA 2012' results in Fig. 6.14; for a more detailed comparison Fig. 6.15 shows the same results normalised to the 'HERA 2012' and Fig. 6.16 shows the comparison of the relative uncertainties. The new results are consistent with the previously published ones, although on average they lie slightly above. This is explained by taking into account the changes in the shifts of correlated systematic uncertainties, which affect all points simultaneously (mainly the theory-related sources and luminosity uncertainties, see Table D    uncertainties contribute about equally to the total improvement. At medium Q 2 , where new datasets 9, 10 and 11 contribute directly, the improvement is on average of the order of 20% of the 'HERA 2012' uncertainties, reaching 35% in several points, and in the low and very high Q 2 bins the improvement is 5-15% owing to the reduction of the correlated uncertainties only.

Comparison to theoretical predictions and QCD analysis
Fig. 6.17 presents a comparison of the NLO QCD predictions in the FFNS, calculated as described in Section 6.3, to the combined data. This is more clearly seen in the ratio to the theoretical predictions, shown in Fig. 6. 18. The predictions describe the data well within the uncertainties in the whole kinematic range of the combination, although the central theoretical curve underestimates the data normalisation, as observed also in the combination of the D * + cross sections (Section 6.4). The 'customised' NLO calculation (Section 6.4.1), while it was determined mainly from the exclusive D * + quantities in the restricted phase-space region, provides an improved description of the reduced cross-section normalisation, although it does not improve the description of the x shape.
In Fig. 6.19 the data are compared to the predictions by the ABM group in the FFNS at NLO and NNLO, based on the running-mass scheme [28,226]. The uncertainties on the predictions include the uncertainties on the charm mass, which dominate at small Q 2 . The predictions at NLO and NNLO are very similar and describe the data well in the whole kinematic range of the measurement.  The sensitivity of the theoretical predictions to the charm mass allows for the determination of its best value from the data in a QCD fit. The analysis was performed with the HERAFitter program [227,228] , which is based on the NLO DGLAP evolution scheme [41,42,43,44,45,46,47] as implemented in QCDNUM [13]. The strategy of the HERAPDF1.0 fit [2,60] was followed. The combined H1 and ZEUS inclusive ep NC and CC DIS cross sections [2] were used to constrain the PDFs. The analysis was restricted to the inclusive data with Q 2 > Q 2 min = 3.5 GeV 2 to ensure the applicability of pQCD calculations; for the charm data this cut was not applied 51 . Theoretical predictions were obtained at NLO using the 'FF ABM' and 'FF ABM RUNM' scheme for the heavy-quark poleand running-mass treatment, respectively, as implemented in OPENQCDRAD [229]. The factorisation and renormalisation scales were set to µ f = µ r = Q for the light quarks and to µ f = µ r = Q 2 + 4m 2 Q for the heavy quarks. The number of active flavours in PDFs and α s evolution was set to n f = 3. The strong coupling constant was set to α  [199] for the pole-and running-mass treatments, respectively. 52 The following combinations of PDFs were chosen in the fit procedure at the initial scale of the QCD evolution Q 2 0 = 1.4 GeV 2 : the valence-quark distributions xu v (x), xd v (x), the gluon distribution xg(x) and the u-type and d-type anti-quark distributions (note that they are identical to the sea-quark distributions), xU(x), xD(x), where xU(x) = xu(x) and xD(x) = xd(x) + xs(x). At the scale Q 0 , the PDFs are parametrised by (6. 15) The normalisation parameters A u v , A d v , A g were determined by the QCD sum rules, the B parameters determine the PDFs at small x, and the C parameters describe the shape of the distributions as x → 1. A flexible form for the gluon distribution was adopted with the choice of C g = 25 motivated by the approach of the MSTW group [21,230]. The s-quark distribution is expressed as x-independent strangeness fraction, f s , of the d-type sea, xs = f s xD at Q 2 0 , where f s = 0.31 as in the analysis of [230]. Additional constraints B U = B D and A U = A D (1 − f s ) were imposed, with xū → xd as x → 0. The parameters D u v , DŪ and DD were set to 0 for the nominal variant of the fit. In a 51 For the charm data the applicability of pQCD calculations is ensured by the presence of a massive c quark-antiquark pair in the final state; see also the scale choices. 52 For the calculation of the beauty contribution to the inclusive cross sections. compact way, these constraints can be summarised as 16) The analysis was performed by fitting the remaining 13 free parameters in 6.15 53 . The charm mass was left free in the fit.
The free parameters are determined in HERAFitter by minimisation of a χ 2 -function as implemented in the MINUIT package [200]. The χ 2 function is similar to that described in Section 6.2.1: where the notation is equivalent to that in Eq. 6. 6. The parameters m i are theoretical predictions which depend on the fitted parameters. Systematic uncertainties are assumed to be proportional to the central prediction values, whereas statistical uncertainties scale with the square root of the predictions. Correlated uncertainties are treated using nuisance-parameter representation [228]. The χ 2 -function includes an additional logarithmic term which is relevant when the estimated statistical and uncorrelated systematic uncertainties of the data are rescaled during the fit [231]. The uncertainties were evaluated following the strategy of [2,60]. These include: the fit uncertainty was evaluated [214,228,232,233]  • µ f and µ r for heavy-flavour production were varied simultaneously by a factor of two (the framework allows only their simultaneous variation); the largest differences in this range were taken; the parametrisation uncertainties: • Q 2 0 was varied in the range 1.0 < Q 2 0 < 1.9 GeV 2 ; 53 Note that a negative gluon distribution was allowed at the parametrisation scale. 54 For the c-quark mass, the fit uncertainty was determined with the MINOS algorithm [201].
• the parameter D u v was released; 55 • the parameter D U was released; • the parameter D D was released.
The These values indicate a good consistency of the fit, although they are slightly larger than obtained in the previous analysis [60] with the 'HERA 2012' combined data (total χ 2 /n dof = 628/626 and charm χ 2 /n dof = 44/47 for the running-mass treatment ). This might indicate that the more precise charm data require a somewhat more flexible PDF parametrisation.
The determined m c (m c ) value is consistent with the 'HERA 2012' result m c (m c ) = 1.26 ±0.05(fit) ±0.03(mod) ±0.02(par) ± 0.02(α s ) GeV and has a better accuracy owing to the more precise combined charm data used in the fit and to the usage of all Q 2 bins 56 . The improvement of the model, α s and scale uncertainties is attributed partially to the usage of all Q 2 bins and partially to better constraints on the gluon distribution coming from more accurate charm data, which stabilise the fit results against variations of external parameters. This value is in agreement with the other analyses of the 'HERA 2012' charm data performed at NLO and partial NNLO [234,235] 56 Note, that for the previous 'HERA 2012' result the lowest Q 2 bin of the charm data has not been included in the fit; repeating the fit with the 'HERA 2012' data with the lowest Q 2 bin gives the closer value m c (m c ) = 1.228 +0.048 −0.038 (fit) +0.024 −0.000 (mod) +0.022 −0.006 (α s ) +0.025 −0.010 (scale) +0.015 −0.000 (par) GeV. 57 One of the four variants of the fitted m c (m c ) from [234] is quoted. 1.417 GeV, calculated using the appropriate one-loop relation 2.4, which is consisted with the fitted value for m pole c = 1.334 GeV, but the latter differs significantly from the worldaverage pole mass m pole c = 1.67 ± 0.07 GeV [199], calculated from the world-average running mass using the two-loop relation. This illustrates one of the possible caveats in determination and usage of the pole mass in applications of pQCD, mentioned in Section 2. 3. Since no attempt has been made to estimate the non-perturbative theoretical uncertainty on m pole c , the presented result should not be considered as a measurement, but rather as extraction of the value, which is optimal for these particular data. In order to show alternative ways to calculate predictions for charm production, Appendix D.2 presents a comparison with the theoretical predictions in different VFNS and a determination of optimal c-quark mass parameters for these schemes.

Summary
Measurements of charm production by the H1 and ZEUS experiments were combined. The combination was done separately for the single-and double-differential visible D * + cross sections, and for all available measurements of open charm production extrapolated to the full phase space. The combination was performed in the kinematic region 1.5 < Q 2 < 1000 GeV 2 (5 < Q 2 < 1000 GeV 2 for the single-differential cross sections), 0.02 < y < 0.7, p T (D * + ) > 1.5 GeV and |η(D * + )| < 1.5 for the visible D * + cross sections, and in the region 2.5 ≤ Q 2 ≤ 2000 GeV 2 and 3 × 10 −5 ≤ x ≤ 5 × 10 −2 for the reduced charm cross sections. The procedure takes into account detailed information on correlations of the systematic uncertainties. For both combinations, the data were found to be consistent, and the combined sets exhibit significantly reduced uncertainties. The combination of visible D * + cross sections does not induce significant theory-related uncertainties, while the combination of the reduced charm cross sections presents the most precise charm dataset from HERA, however is affected by the theory-related uncertainties in the extrapolation procedure.
For the visible D * + cross sections, the combination was performed separately for the single-differential cross sections using the HERA-II data only, and for the double-differential cross section using the HERA-I and HERA-II data. Inclusion of the HERA-I data allowed an extension of the kinematic region in Q 2 . NLO QCD predictions in the FFNS were compared to the combined D * + data. The predictions describe the data well within their uncertainties. Because the uncertainties of the combined data are smaller than the theoretical uncertainties, higher-order calculations and an improved treatment of the fragmentation process is needed to reduce the theory uncertainty to a level comparable with the data precision. The D * + combined data can be used further as the most precise purely experimental charm measurement from HERA for tests of pQCD and phenomenological approaches, e.g. of the fragmentation process.
The combined reduced charm cross sections are consistent with the previous H1 and ZEUS charm combination and have an improved precision owing to the inclusion of new ZEUS measurements. The combined data were compared to NLO QCD predictions in the FFNS and various VFNS. Most of the predictions describe the data well within their uncertainties. Similar to the D * + combination, the uncertainties of the combined data are smaller than the theoretical uncertainties, thus further improvement in the theoretical calculations is required to match the data precision. The best description of the data in the whole kinematic range is provided by the approximate NNLO FFNS predictions of the ABM group. The combined reduced charm cross sections were also used as input for the QCD analysis to determine the optimal values of the MS running charm mass and c-quark mass parameters in different VFNS. The extracted value of the MS running charm mass is consistent with the world-average value and has competitive precision to other individual determinations in pQCD. These data can be used further as the most precise inclusive charm measurement from HERA for tests of pQCD and in QCD analyses to constrain the gluon distribution and to determine the c-quark mass.

Heavy-flavour production at LHCb
This Section provides an overview of measurements of heavyflavour production at the LHCb experiment. This overview is restricted to the selected measurements that are used for the QCD analysis presented in Section 8, and describes their comparison with theoretical predictions, including a discussion of the theoretical uncertainties.

Introduction
As outlined in Section 2.4.2, PDFs are a necessary ingredient for QCD predictions in any process with incoming hadrons. Since they are not currently calculable from first principles, they must be extracted from data. At the present time several groups determine PDFs (for latest results see, e.g. [236,237,238,239,1,240,241], and for a recent review see [242,243]).
In Fig. 7.1 the gluon distributions from several PDF sets [219,244,220,245] are compared at the scale Q 2 = 10 GeV 2 . 58 The FFNS variants of the fits with the number of active flavours n f = 3 were chosen. While being consistent with each other and well constrained in the region of medium x, the distributions have a significant spread between the central values and also large uncertainties in the low-x region 59 , since presently no data constraining gluons in this region are included in the PDF fits. Note also that within the uncertainty bands, some of the sets predict a negative gluon distribution in the region x 5 × 10 −5 . This comparison illustrates that, in spite of using similar input data, depending on the assumptions made for a PDF extraction and the methods used to estimate the uncertainties a variety of predictions in the unmeasured region exists, and demonstrates the need of further experimental input.
Such data exist from the LHCb Collaboration, which has measured charm and beauty production in the forward rapidity region 2.0 < y < 4.5 at the centre-of-mass energy √ s = 7 TeV [177,247] 60 . The measured quantities are one-particle inclusive charm-and beauty-hadron production cross sections in the p T ranges 0 < p T < 8 GeV and 0 < p T < 40 GeV for the charm and beauty measurements, respectively. Since the dominant process for heavy-flavour production in pp collisions at these energies is gluon-gluon fusion (see Section 2. 5), these data are sensitive to gluons at low x. Indeed, the x-range can be estimated (neglecting quark-to-hadron fragmentation effects) using the LO formulae of Eq. 2. 17. For given values of p T and y, e.g. y 1 in Eq. 2.17, of one of the produced heavy quarks, the probed ranges of the two proton momentum fractions x 1 and x 2 58 Note that in this and the next Sections Q 2 denotes the PDF factorisation scale and not the virtuality of ep scattering, x denotes the longitudinal fraction of the proton momentum and not the Bjorken variable, y denotes rapidity and not inelasticity, unless otherwise stated explicitly. 59 The region x 10 −4 will be referred to as low x. 60   are: For the other momentum fraction x 1 the range is e y 1 −e −y 1 ≤ x 1 ≤ 1. Thus the lowest x values probed by the LHCb charm data are x ≈ e −4.5 /(7000/1.4 − e 4.5 ) ≈ 2 × 10 −6 . Equation 2.19 shows that the cross section is suppressed when |y 1 − y 2 | becomes large, implying that the quark and antiquark tend to be produced with the same rapidity. Assuming y 1 = y 2 = y (p z = 0 of the produced heavy quark in the parton rest frame) in Eq. 2.17, the probed ranges of x 1 and x 2 become This estimation gives the lowest x values probed by the LHCb charm data x ≈ 2e −4.5 × 1.4/7000 ≈ 4 × 10 −6 . The low-x data in the LHCb experiment provide new information to pin down the gluons in the region unconstrained by HERA. The corresponding LO theoretical predictions can be obtained by using Eq. 2.19 (Section 2.5) and integrating over the rapidity of the unmeasured produced heavy quark over its kinematically allowed range −ln( − e −y 1 ) ≤ y 2 ≤ ln( − e y 1 ). The actually effective (x 1 , x 2 ) region probed by these data is presented in Section 7.3.2 using NLO calculations. In Fig. 7.2 the kinematic regions which are covered by different HERA and LHCb data are plotted. The precise HERA DIS data [2] are only indirectly sensitive to gluons, so they constrain the gluon distribution well only in the region 10 −3 x 10 −1 . The HERA heavy-flavour data [60,30] cover the region 10 −4 x 10 −2 , while the LHCb data extend the coverage up to x 5 × 10 −6 (low-p T forward charm) and up to x 1 (high-p T forward beauty ). 61 Note that the LHCb data are sensitive to the product of gluon densities in two non-overlapping regions: forward and medium; since the latter is already well constrained by other data, the LHCb data will have an impact mainly on the low-x region. It is worth noting that using PDFs with a strongly negative values of the gluon distribution at low x results in negative and thus unphysical predicted cross sections for the forward region of the LHCb charm data. This emphasises the inclusion of the LHCb data in a PDF fit to constrain gluons at low x.

Measurements of charm and beauty production at LHCb
The Large Hadron Collider (LHC) is the world's largest and most powerful particle collider; its description can be found elsewhere [255]. The LHCb detector at the LHC provides unique information to the forward-rapidity region with a detector that is tailored for flavour physics, therefore LHCb heavy-flavour data provide a unique access to the gluon distribution in the proton at very low values of the partonic momentum fraction x. In this review LHCb data published up to August 2014 are included.
The LHCb detector [256] (Fig. 7. 3) is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The right-handed coordinate system adopted has the Z axis along the beam. The detector has a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift-tubes placed downstream. The combined tracking system has a momentum resolution (δp/p) that varies from 0.4% at 5 GeV to 0.6% at 100 GeV and an impact-parameter resolution of 20 µm for tracks with high transverse momentum. Charged hadrons are identified using two ring-imaging Cherenkov detectors (RICH). The RICH system [257] of the LHCb experiment provides charged-particle identification over a wide momentum range, from 2 to 100 GeV 62 . It consists of two RICH detectors that cover between them the angular acceptance of the experiment, 15-300 mrad with respect to the beam axis. Photon, electron, and hadron candidates are identified by a calorimeter 61 The quoted regions are qualitatively determined in the following way: for the HERA DIS data, the x range is indicated, where the gluon HERAPDF1.0 [2] uncertainty at Q 2 = 10 GeV 2 is less than 10%. For the HERA charm and beauty data, the LO formula x = x bj (1 + 4m 2 Q /Q 2 ) is used, where x bj is the Bjorken variable, Q 2 is boson virtuality and m Q is the heavy-quark pole mass. For the LHCb charm and beauty data, the LO estimation is used as described above. 62 The typical momentum (in the laboratory frame) of the decay products in two-body B decays is about 50 GeV. The requirement of maintaining a high efficiency for the reconstruction of these decays leads to the need for particle identification up to at least 100 GeV. The lower momentum limit of about 2 GeV follows from the need to identify decay products from high-multiplicity B decays and also from the fact that particles below this momentum will not pass through the dipole magnetic field (4 Tm) of the LHCb spectrometer [257]. system consisting of scintillating-pad and pre-shower detectors, an electromagnetic calorimeter, and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multi-wire proportional chambers.

Measurement of prompt charm production
LHCb measured D 0 , D + , D + s , D * + and Λ + c production using data corresponding to an integrated luminosity of 15 nb −1 in the region of rapidity 2.0 < y < 4.5 and transverse momentum 0 < p T < 8 GeV in pp collisions at a centre-of-mass energy of 7 TeV [177]. The analysis was based on fully reconstructed decays of charmed hadrons in the following decay modes (Fig. 7.4): D 0 → K − π + , D + → K − π + π + , D * + → D 0 (K − π + )π + , D + s → φ(K − K + )π + and Λ + c → pK − π + . Charmed hadrons may be produced at the pp collision point either directly or as feed-down from the instantaneous decays of excited resonances. They may also arise in decays of beauty hadrons. The first two sources (direct production and feed-down) are referred to as prompt. Charmed particles from beauty-hadron decays are called secondary charmed hadrons. The result of the measurement is reported as the production cross sections of prompt charmed hadrons; secondary charmed hadrons were treated as background. The measurement was performed in twodimensional bins of p T and y. For the Λ + c measurement, only single-differential cross sections as a function of p T and y were measured. The prompt signal yields were selected using multidimensional extended maximum likelihood fits to the mass and log 10 (IP χ 2 ), where IP χ 2 is defined as the difference between the χ 2 of the primary vertex, reconstructed with and without the considered particle [177] (Fig. 7.4). The dominant systematic uncertainty is the uncertainty on the tracking efficiency, which is 3-4% per one final-state track, thus resulting in 6-10% for the measured cross sections.
The measured double-differential cross sections of D 0 , D + , D + s and D * + production are shown in Fig. 7.5 and compared to the theoretical predictions as provided by external groups at NLO in the FONLL [101,258,209,103] and other GM-VFNS approach [259,260,261,262,263,264] (see Sections 2.5.2 and 2. 5. 3). The GM-VFNS predictions are shown for p T > 3 GeV. Predictions for D 0 mesons are also compared with the GM-VFNS calculations using PDFs with intrinsic charm [24]. As shown in Fig. 7.5, in the phase space of the present measurement the effect of intrinsic charm is predicted to be small. All theoretical calculations describe the data well, although their uncertainties of the order of a factor 2 significantly exceed the experimental uncertainties of the data.

Measurement of beauty production
LHCb measured B + , B 0 and B 0 s production using data corresponding to an integrated luminosity of 0.36 fb −1 in the region of rapidity 2.0 < y < 4.5 and transverse momentum 0 < p T < 40 GeV in pp collisions at a centre-of-mass energy of 7 TeV [247]. The analysis was based on fully reconstructed decays of beauty hadrons in the following decay modes: B + → J/ψK + , B 0 → J/ψK * 0 and B 0 s → J/ψφ, with J/ψ → µ + µ − , K * 0 → K + π − and φ → K + K − . Similar to the case gluon momentum fraction x  of the charm measurement [177], also the beauty measurement was performed in two-dimensional bins of p T and y.
The mass distributions of the selected candidates for one of the p T and y bins are shown in Fig. 7. 6. The dominant systematic uncertainty comes from the tracking (2-9%) and trigger (2-8%) efficiencies and finite size of the bins (0-19%); for B 0 and B 0 s the branching-ratio uncertainties are also sizeable (≈ 10%).
The measured cross sections, integrated over p T and y, are compared to the FONLL theoretical predictions in Figs. 7.7 and 7.8, respectively. Similar to the results of the charm measurement [177], the FONLL calculations describe the data well, albeit within large uncertainties.

Comparison with theoretical predictions
Theoretical predictions for the charm and beauty data were obtained using the massive NLO O(α 3 s ) calculations in the FFNS [90,92,96] (see Section 2.5.1) using the MNR code [98], implemented in the HERAFitter package [227,228] for this purpose. Technical details of the implementation are described in Appendix E. 1. The parameters used in the calculations and the corresponding variations used to estimate the uncertainties are described below in Section 7.3. 1.    Figure 7.4: Mass and log 10 (IPχ 2 ) distributions for selected D 0 → K − π + and D + → K − π + π + candidates from the LHCb measurement of prompt charm production [177] showing the masses of the D 0 candidates (top left), log 10 (IP χ 2 ) distribution of D 0 candidates (top right), masses of the D + candidates (bottom left) and log 10 (IP χ 2 ) distribution of D + candidates (bottom right). Projections of likelihood fits to the full data samples are shown with components as indicated in the legends.

Parton-level cross sections
The parton-level cross sections were calculated using the oneparticle inclusive option of the MNR calculations [92] with the following settings:   [177] compared to the theoretical predictions as provided by external groups at NLO in the FONLL [101,258,209,103] and other GM-VFNS approach [259,260,261,262,263,264]. The cross sections for different y regions are shown as functions of p T . The y ranges are shown as separate curves and associated sets of points scaled by factors 10 −m , where the exponent m is shown on the plot with the y range.  the choice of the coefficients A c,b f,r is crucial for a reliable data description and has to be carefully studied; the explicit details are given in Section 8.1;

Fragmentation
Non-perturbative fragmentation functions for charm and beauty were extracted from e + e − and ep data (see, e.g. [265,225,209,122,208]). So far no fragmentation measurements were done in pp collisions. Universality of the fragmentation is often assumed; however it holds only if the perturbative part of the calculations is the same (see Section 2.6). Moreover, e.g. in [122] the fragmentation-function parameters were shown to be different for two different kinematic regions.
Since the kinematic region of the LHCb charm measurement is close to the HERA region where measurements were done by H1 [122] and ZEUS [208], the Kartvelishvili function [118] with α k = 4.4 ± 1.7 was used for the charm fragmentation, which covers the spread of the measurements [122,208]. The fragmentation was performed in the laboratory frame by rescaling the quark three-momentum, then the energy of the produced hadron was calculated using the hadron mass. The prescription for the fragmentation described above was used for D * + , D + s mesons and Λ + c hadrons, while for D 0 and D + mesons the contribution from D * + and D * 0 mesons was treated as described in [258]. For beauty no fragmentation measurements at HERA exist; therefore the value α k = 11 ± 4 was used, extracted from measurements at LEP [225]. All beauty hadrons were treated equally. The fragmentation fractions for charmed hadrons were taken from [211]. Their values are consistent with the recent determination from ee, ep and pp data [266]. The fragmentation fractions for beauty hadrons are taken from [247].

Kinematics of low-p T region
The dominant channel for heavy-flavour production at LHC is gg → QQ (see Fig. 7.10, left). Fig. 7.9 shows the twodimensional x distribution of the two incoming gluons, as predicted for the LHCb data by the calculations described above. This more complete evaluation confirms qualitatively the rough LO estimation (see Fig. 7.2): the main contribution comes from the two separated regions x 1 ≈ 10 −4.5 , x 2 ≈ 10 −1.5 for charm and x 1 ≈ 10 −4.0 , x 2 ≈ 10 −1.2 for beauty; although for charm an additional concentrated region is observed at x 1 ≈ x 2 ≈ 10 −2.0 -10 −1. 5 . The enhancement at medium x comes from a class of O(α 3 s ) corrections, given by the flavour-excitation diagrams (an example is given in Fig. 7.10, right), which can be thought of as initial-state gluon-splitting processes [80]. The relevant region of the phase space in this case is the one with the heavy-quark propagator close to the mass shell (the low-p T region). In GM-VFNS approaches these effects are reabsorbed in the evolution of the PDFs by defining a heavy-quark density inside a proton. The inclusion of the gQ → gQ process allows accounting for the higher-order effects in the evolution equations, while the inclusion of the NLO flavour-excitation diagrams reproduces instead more faithfully the exact kinematics and correlations of the flavour-creation process in the region close to the threshold [80]. The corrections from the flavour-excitation diagrams should in no way be thought of as a problem of the FFNS calculations, but rather as an important correction to the LO kinematics of the process. In order to demonstrate this, Fig. 7.11 shows the median of the centre-of-mass energy in the parton-parton rest frame,

√ŝ
, vs. transverse momentum and rapidity of the heavy quark for the charm and beauty LHCb data. While in the case of beauty, the √ŝ median smoothly increases with increasing p T almost independently of y, for charm in the threshold region 0 < p T 2 GeV a strong increase of the √ŝ median with increasing y is observed, This increase is correlated with the concentrated region at medium x in Fig. 7. 9. The results shown in these two Figs. 7.9, 7.11 indicate that the statement about the sensitivity of charm production at low p T and forward y to the low-x gluon region should be taken with some caution: in fact, about 50% of contribution in the corner region p T 2 GeV, y 3.5 does not come from low-x gluons; instead it comes from the medium-x region, and this particular contribution is described in the O(α 3 s ) calculations at LO.

Comparison to FONLL calculations
As mentioned in Section 2.5, one of the state of the art calculations for heavy-flavour production in hadron collisions is the FONLL approach [101]. Briefly reminding, the FONLL approach merges the massive NLO calculations (MNR) with massless ones using a phenomenologically chosen matching function. Owing to resummation of the NLL part the FONLL calculations are expected to have improved convergence of the perturbative expansion at high p T . In Fig. 7.12 the NLO predictions obtained with MNR as described in Section 7.3.1 are compared to the FONLL ones obtained using the public web interface [104] 63 for parton-level charm and beauty cross sections at LHCb. For the relevant regions of transverse momentum in the charm and beauty data, 0 < p T (c) 8 GeV, 0 < p T (b) 40 GeV 64 the maximum deviations of the order of 20% in the region p T ≈ 3m Q are observed. Note that these changes are well within the uncertainties from the scale variations.
It is also instructive to look at Fig. 2.7 (Section 2. 5. 2), taken from the original FONLL paper [101], which shows the bands obtained from the scale variations for the NLO and FONLL calculations for beauty production at the Tevatron. The behaviour of the bands in Fig. 2.7 is very similar to the change of the central values shown in Fig. 7. 12. Note that in Fig. 2.7 a significant reduction of the uncertainty band starts only at p T 40 GeV. Since the considered p T ranges of the LHCb data are not the high-p T region where the effects of the FONLL calculations become relevant, the usage of the NLO FFNS calculations as one of the currently best available theories for the LHCb kinematic region in the present study is justified. 65

Predictions based on PDFs from HERA
In Fig. 7.13 (left) the measured D 0 cross sections [177] are compared to the NLO predictions. 66 For these calculations the FFNS 63 The 'FONLL' option of the FONLL program was used; the settings consistently used in MNR and FONLL calculations were: PDFs set is MSTW2008nlo68cl [220], µ f = µ r = m 2 Q + p 2 T , m c = 1.5 GeV, m b = 4.75 GeV. 64 The fragmentation effects do not change significantly p T regions, since the heavy-flavour fragmentation functions are peaked near the scaling variable z = 1 and the cross sections are steeply falling with increasing p T . 65 Theorists are continuously making progress, and the approximate NNLO O(α 4 s ) predictions in the gg and qq channels for differential cross sections for heavy-flavour production at hadron colliders became available recently [267]. 66 The restriction to the D 0 final state for this comparison is motivated by the best experimental precision of the LHCb data [177] owing to variant of the HERAPDF1.0 set [2] was used, when charm is treated as massive and does not present in the proton at all energy scales. The factorisation and renormalisation scale were set to µ f = µ r = m 2 Q + p 2 T and varied independently up and down by a factor of 2, and the heavy-quarks masses chosen to be m c = (1.40 ± 0. 15) GeV, m b = (4.50 ± 0.25) GeV. The fragmentation functions were chosen and their uncertainties evaluated as described in Section 7.3.1. The theoretical uncertainties denoted as 'MNR' in Fig. 7.13 include uncertainties from scale, heavy-quark mass and fragmentation function variations, while the total theoretical uncertainties include also those arising from the PDFs. The latter are dominant in the lowest p T bin 0 < p T < 1 GeV, as well as in the next one 1 < p T < 2 GeV for the highest y values 3.5 < y < 4. 5. Although the 'MNR' uncertainties are of the order of factor 2 and exceed experimental data uncertainties in all bins. In overall, the predictions based on PDFs from HERA describe the LHCb data well within very large theoretical uncertainties. The right-hand side of Fig. 7.13 shows the same measurement, but normalised in y: each cross section is divided by the corresponding value in the central y bin 3.0 < y < 3. 5. Such normalisation leads to a great reduction of the 'MNR' uncertainties of the theoretical predictions (discussed in detail later in Section 8. 1.2), making them of the same order or below the experimental uncertainties, and enables the PDF uncertainties being visible in all bins. Note that the PDF uncertainties for the cross sections in the lowest p T bins of Fig. 7.13 extend to unphysical values below zero (although not appearing on the left-hand side of Fig. 7.13 because of the logarithmic scale) due to the usage of PDFs with a negative gluon distribution at low x.
This illustrates the power of the LHCb charm data to eventially constrain existing PDFs. The quantitative analysis follows in Section 8.
the presence of two charged daughter tracks only and therefore the smallest systematic uncertainties on the tracking efficiency.

QCD analysis of HERA and LHCb heavy-flavour data
This Section describes a QCD analysis, extending the HERA fit by including the LHCb data presented in Section 7. The general strategy of the study was to perform first a PDF fit with the HERA-only data, which is close to HERAPDF1.0 [2], and then repeat the fit with the LHCb data included. In this way the net effect of the additional information is becoming obvious. Results presented in this Section have been published by the PROSA Collaboration [6].

PDF fitting framework
The datasets used in the PDF fit are listed in Table 8.1: the combined HERA-I inclusive ep NC and CC DIS cross sections [2] (datasets 1-4) serve to constrain the core of the PDFs; the analysis is restricted to the inclusive data with virtuality Q 2 > Q 2 min = 3.5 GeV 2 to ensure the applicability of pQCD; the combined HERA charm data [60] and ZEUS beauty [30] data (datasets 5-6) were used to constrain the gluon PDF and the charm and beauty masses; the LHCb charm [177] and beauty [247] data (datasets 7-14) -the new ingredients of the study -serve to constrain the gluon PDF at low x.
The HERA data (datasets 1-6) were treated in the same way as in the original papers [2,60,30]. 67 For the LHCb data (datasets 7- 14), the double-differential cross sections as a function of p T and y, d 2 σ(pp→H c X) d p T dy , were used as published in [177,247] 68 . The correlations between the systematic uncertainties were taken into account as described in Section 3.3 of [177] and Section 4 of [247], treating as correlated those which are reported as single values for all (p T , y) 67 Except that for dataset 5 in contrast to [60] all Q 2 bins were used including the lowest one Q 2 = 2.5 GeV 2 ; the applicability of pQCD for the charm data is ensured by the presence of a massive c quarkantiquark pair in the final state. 68 For the Λ + c measurement from [177], where no double-differential distribution is available, the single-differential cross sections dσ(pp→Λ + c X) d p T and dσ(pp→Λ + c X) dy were used for the 'LHCb Abs' and 'LHCb Norm' approaches (see Section 8. 1.2), respectively.  69 The 3.5% luminosity uncertainty was treated as correlated between the charm and beauty measurements. In addition to the experimental uncertainties, the correlated fragmentationfraction uncertainties were assigned to the data.

Details of PDF fit
The PDF fitting framework is the same which has been used for the FFNS fit with the HERA charm combined data, described in Section 6.5. 3. The charm and beauty masses were left free in the fit and determined in HERAFitter by minimisation of a χ 2 -function. In variants of the fit with the LHCb heavy-flavour data additional uncertainties were evaluated, which are related to the uncertainties of the respective theoretical calculations; they are referred to as the 'MNR uncertainties': variations of the fragmentation and renormalisation scales, as described later in Section 8. 1.2, and variations of the fragmentation parameters α k = 4.4 ± 1. 7 for charm and α k = 11 ± 4 for beauty.
The MNR uncertainties were obtained by adding these variations in quadrature.

Strategy of QCD analysis
The strategy of the QCD analysis was to perform several PDF fits, with and without the LHCb data, and then compare the results. Two approaches of fitting the LHCb data were studied: fitting the absolute cross section or the cross section normalised as described below.

Fitting absolute LHCb cross sections
The absolute double-differential cross sections d 2 σ dp T dy measured by LHCb were fitted. These quantities contain the maximum information; they are therefore sensitive to all physical and nonphysical parameters of the theoretical calculations: the PDFs, heavy-quark mass, fragmentation function and especially factorisation and renormalisation scales. The scale dependence of the predictions is of the order of a factor of 2, thus much exceeding the experimental data uncertainties, although the PDF uncertainties from the available PDF sets at very low x and low Q 2 are even larger (see Fig. 7. 13).This fact makes the study rather complicated: in order to account for the uncertainties of the perturbative predictions, the scales should be varied, but these external variations change the description of the data and the fit results drastically, and thus are hard to control.
For this variant of the fit, the scale parameters A c f , A c r , A b f and A b r were therefore included in the fit, while for the estimation of the scale uncertainties the following procedure was used: the factorisation and renormalisation scales were varied independently one at a time in the ranges [0.50;2.00] and [0.25;1.00] 70 , respectively, while the other scale was being refitted. To be specific, the scale uncertainties include the following four variations: 50, the cut p T > 2 GeV was applied for the charm LHCb data to ensure that the factorisation scale is above 1 GeV 2 , since this is technically required in the framework. This sophisticated procedure for the treatment of the scale variations was necessary to give a reasonable description of the data for all variations of the fit.

Fitting normalised LHCb cross sections
The y shape of the cross-section ratio dσ dy / dσ dy 0 in each p T bin was fitted, where dσ dy 0 is the cross section in the central rapidity bin 3.0 < y 0 < 3. 5. The virtue is that the observable defined in this way has a much reduced scale dependence, while it still remains sensitive to the PDFs, namely to their x shape. This can be understood easily: the change in the production rate in neighboring bins of y is driven mainly by the change in the input PDFs, while the hard-scattering process remains essentially the same. Hence the µ r dependence is reduced to ∼ 1% and the µ f to ∼ 5-10% (the renormalisation scale affects the matrix elements only, while the factorisation scale affects both the matrix elements and the PDFs). Reduction of the scale dependence is illustrated in Figs. 8.1, 8.2. In addition, the dependence on the heavy-quark mass and the fragmentation function is also significantly reduced. For the mass, it is reduced almost to zero, while the fragmentation effects are still sizeable at low transverse momentum, since the fragmentation is performed by rescaling the quark three-momentum and thus it changes the rapidity of a massive particle. 70 The 'common' range for the variations [0.50;2.00] for µ r is ignored, since the χ 2 /n dof for A c r = A b r = 2.00 variation was found to be unacceptably large (χ 2 /n dof = 2497/1089).
Owing to the greatly reduced dependence on the scales, in this variant of the fit the 'common' scale choice and variations were used: the scale parameters were fixed to A c f = A b f = A c r = A b r = 1 and for the estimation of the scale uncertainties they were varied independently one at a time in the range [0.5;2.0]. Explicitly, the scale uncertainties include the following four variations: Similar to the previous approach, for the variation A c f = A b f = 0.50, the cut p T > 2 GeV was applied for the charm LHCb data to ensure that the factorisation scale is above 1 GeV 2 .
All correlated experimental systematic uncertainties and the fragmentation fractions cancel for the normalised cross sections, while for a given p T bin the uncorrelated uncertainty of the central dσ dy 0 bin was treated as correlated between the remaining y bins (because the cross sections in the remaining y bins were divided by the same dσ dy 0 ).

Fit results
The results of three fits are presented and discussed: the fit with the HERA-only data, referred to as 'HERA only' (Section 8.2.1); the fit with the HERA and LHCb data using the absolute LHCb cross sections, referred to as 'LHCb Abs' (Section 8.2.2); the fit with the HERA and LHCb data using the normalised LHCb cross sections, referred to as 'LHCb Norm' (Section 8.2.3).
The direct comparison of all fitted PDFs follow in Section 8.3.

'HERA only'
Here the results of the fit with the HERA-only data (datasets 1-6) are presented. The total χ 2 per degree of freedom for the fit is χ 2 /n dof = 647/646 yields a good consistency of the data. All individual χ 2 contributions (see Eq. 6.17 in Section 6. 5. 3) are given in Table 8.2. For reference, Table 8.3 lists the fitted parameters (see Eq. 6. 15) with their fit uncertainties only, while the total PDF uncertainties are shown below in Figs. 8.3 to 8. 10. The fit, model and parametrisation uncertainties of the gluon, sea 71 and valence-quark distributions at the scale Q 2 = 10 GeV 2 are shown in Fig. 8. 3. Note the gluon uncertainties in the low-x region: since this region is not covered directly by the HERA data, the dominant uncertainties are the parametrisation ones, namely those which arise from releasing the DŪ parameter. This is illustrated in Fig. 8.4, where all parametrisation variations are shown separately. At low values x 10 −4 the gluon distribution is not directly constrained by the HERA data and should be 71 The sea-quark distribution is defined as Σ = u + d + s.   considered as an extrapolation which relies on PDF parametrization assumptions. Qualitatively it can be understood in the following way: since gluons in this region are not constrained by the data, they are indirectly constrained via the sum rules for certain distributions of all other partons. When the parametrisation for other partons (in particular, for sea quarks) is changed, new parameter values via the sum rules necessarily result in a different distribution for gluons in the low-x region. However, the parametrisation variations do not result in a significant difference in the x region constrained by the HERA data. This also explains the large spread of the results obtained by the different PDF groups, which was observed in Fig. 7.1: since the different groups use different parametrisations and none of them uses data which constrain gluons at low x, they all obtain different gluon distributions in this region.

'LHCb Abs'
Here the results of the fit with the HERA and LHCb data (datasets 1-14) using the 'LHCb Abs' approach are presented. The total χ 2 per degree of freedom is χ 2 /n dof = 1073/1087. The partial χ 2 /n dof for all datasets are given in Table 8. 2. For the LHCb charm and beauty datasets they vary from 0.9 to 1.8 and from 0.7 to 1.0, respectively, indicating an overall reasonable description of the charm and beauty data. The inclusion of the LHCb data do not change significantly the quality of the description of the HERA datasets compared to the 'HERA only'  1.0 fit, assuring consistency between HERA and LHCb data. 72 As an example of the data description in the fit, in Fig. 8.5 the cross sections for D 0 and B + mesons for one of the y bins are shown.
Adjusting the scales in the fit results in the following fitted values for the scale parameters (see Table 8. 3): Note that all values are within the range [0.25;1.00]; also note the significant difference between the fitted scales for charm and beauty. Additionally, as expected, a positive correlation was found between A c f , A c r and A b f , A b r , respectively. The fitted pole masses of the c and b quarks are consistent with the ones obtained in the 'HERA only' fit within the intrinsic theoretical systematic uncertainty of the pole mass definition (see Section 2.3).
The parametrisation variations are shown in Fig. 8.6 (top left). As expected, in contrast to the results obtained with the HERA-only data, gluons are now strongly constrained in the low-x region. The fit uncertainties for the parameters of the 72 With the only exception of the worsening of the HERA charm data description due to the value of m c which is in a tension with that preferred by the HERA charm data (see Tab. 8. 3), if only the criterion of a χ 2 variation of 1 is considered. 'HERA' stands for the nominal fit, 'Q 2 0 = 1.9 GeV 2 ' for the parametrisation parametrisation corresponding to the change of the PDF evolution starting scale to 1.9 GeV 2 , and 'D u v free', 'DŪ free', 'DD free' for the parametrisation variations corresponding to released parameters D u v , DŪ, DD, respectively, one at a time (see Section 6.5.3 and Eq. 6.15).
gluon distribution are significantly reduced, as can be seen from Table 8. 3. Note that Table 8.3 list only the fit uncertainties of the PDF parameters, while the other components of the PDF uncertainties are shown in Fig. 8. 7.
The effect of scale variations on the predictions in the fit is shown in Fig. 8.5 and their effect on the fitted PDFs is shown in Fig. 8.6 (top right). The scale uncertainties are much larger than the parametrisation uncertainties, however compared to the fit with the HERA-only data they are a factor of 3 smaller than the total uncertainties. Another interesting observation from Figs. 8.5 and 8.6 (top right) is that the changes for the predictions from the scale variations are predominantly changes in their normalisation (see also Figs. 8.1,8.2); the fit handles them by adjusting the other scale and making large shifts for the correlated uncertainties of the data. It shows up in a formally bad χ 2 value, however the PDFs are less affected and remain reasonable, compared to the huge total 'HERA only' uncertainty at low x.
In addition, variations of the fragmentation-function parameters were performed, as described in Section 7.3. 1. The effect on the fitted PDFs is shown in Fig. 8.6 (bottom). All fragmentation variations yield in a good description of the data, since relatively small changes of the p T shape are easily compensated by adjusting the scales. The resulting uncertainties are much smaller compared to the scale variations. However, for charm a strong tendency was observed: the LHCb charm data favour a harder fragmentation function; this is discussed in more detail in Appendix E. 2. Finally, all individual contributions to the uncertainties are shown in Fig. 8. 7. The dominant uncertainties on the gluon distribution in the low-x region become the MNR ones, in particular coming from the scale variations.

'LHCb Norm'
Here the results of the fit with the HERA and LHCb data (datasets 1-14) using the 'LHCb Norm' approach are presented. The total χ 2 per degree of freedom is χ 2 /n dof = 958/994. The partial χ 2 /n dof for all datasets are given in Table 8. 2. For the LHCb charm and beauty datasets they vary from 0.4 to 0.9 73 , indicating description of the data and possible overestimation of the uncorrelated experimental uncertainties for the y shape (this can be thought of rather as an underestimation of correlations of the systematics). Similar to the 'LHCb abs' fit, the inclusion of the LHCb data do not change significantly the quality of the description of the HERA datasets compared to the 'HERA only' fit, assuring consistency between HERA and LHCb data.As an example of the data description in the fit, in Fig. 8.8 the cross sections for D 0 and B + mesons for one of the y bins are shown.
The parametrisation variations are shown in Fig. 8.9 (top left). The gluon distribution in the low-x region is constrained considerably compared to the 'HERA only' results, although somewhat weaker than in the 'LHCb Abs' approach.
The effect of the scale variations on the predictions in the fit is shown in Fig. 8.8 and their effect on the fitted PDFs is shown in Fig. 8.9 (top right). The effect of the fragmentation variations on the PDFs is shown in Fig. 8.9 (bottom). Note that fragmentation uncertainties obtained in this approach are larger than in the 'LHCb Abs' one, since the fragmentation effects are not reabsorbed in the refitted scales. All these variations result in a reasonable data description.
Finally, all individual contributions to the uncertainties are shown in Fig. 8. 10. The dominant uncertainties in the low-x and low-Q 2 region still remain the MNR ones, although they are comparable in size with the uncertainties from other sources. 73 Except for the low-statistics Λ + c dataset, where χ 2 /n dof = 4.9/3. variations for the gluon distribution at Q 2 = 10 GeV 2 in the fit with the HERA and LHCb data using the 'LHCb Abs' approach.

Impact of LHCb heavy-flavour data on PDFs
The main results of this study can be seen in Figs. 8.11 and 8.12: the PDFs obtained in the 'HERA only', 'LHCb Abs' and 'LHCb Norm' fits are compared at the scale Q 2 = 10 GeV 2 in Fig. 8.11, while their relative uncertainties are compared in Fig. 8. 12. 74 The two approaches of fitting the LHCb data result in strong and consistent constraints on the gluon distribution at low x. Improvement is also observed for the sea density, while the valence-quark distributions, which are decoupled in the evol- 74 The corresponding plots for Q 2 = 100 GeV 2 are available in Appendix E.3 (Figs. E.4, E. 5). the gluon distributions provided by several modern PDF sets, shown in Fig. 7.1, the shape of the gluon distribution favoured by the LHCb data is close to that in the ABM PDFs. The distributions at medium x mainly remain unchanged, although in the 'LHCb Abs' approach some enlargement of the uncertainty is observed, explained by the inclusion of the scale uncertainties for the LHCb data. No such effect is observed in the 'LHCb Norm' approach, where all variations describe the data well.
In addition, a noticeable improvement is observed for all partons in the region of high x also. This is presumably due to the constraints of the beauty LHCb data (see Fig. 7. 2) which cover this region, as well as a side effect of the improvement at low x, imposed via the momentum sum rule.
Obviously, in the fit with the absolute cross sections, more information is available to constrain the gluons: the absolute cross sections constrain the normalisation of the product of the gluon PDFs at low and medium x, leading to the correlation between the low-x region and the medium one. However In the fit with the normalised cross sections only the information of the y distribution of the cross sections was used, which is sensitive to the slope of the gluon distribution. In this approach therefore, the final impact of the LHCb data crucially depends on the presence of any x region where the absolute values of the gluon distribution must be constrained by other data (preferably at x ∼ 10 −4 -10 −3 ). Despite the reduced sensitivity, significantly smaller theoretical uncertainties are resulting owing to the reduced scale dependence of the normalised cross sections, so the final results are more precise than in the 'LHCb Abs' approach. As opposed to the absolute case, the theoretical calculations are data independent in this case. In Fig. 8.13 the measured D 0 cross sections [177] are compared to the NLO predictions obtained using the fitted 'LHCb Norm' PDFs. The theoretical uncertainties denoted as 'MNR' in Fig. 7.13 include uncertainties from scale, heavy-quark mass and fragmentation function variations, while the total theoretical uncertainties include also those arising from the PDFs. Compared to the predicitons obtained using the PDFs from HERA only (see Fig. 7.13), the PDF uncertainties are barely visible in this comparison, as expected, since PDFs are constrained by the data. It is important to note that the PDFs determined using the y shape of the LHCb data provide a good description of the absolute cross sections, within large theoretical 'MNR' uncertainties for the latter.

Concluding remarks
The observed strong impact of the charm and beauty LHCb measurements demonstrates that these data are a powerful addition to the existing global PDF fits. Quantitatively the reduction of the gluon and sea quark distribution uncertainties in the x range x 10 −4 , not probed by HERA, is of the order of a factor of 1.5-4. The inclusion of the LHCb data results in a positive definite gluon distribution in the full phase space covered by the data.
The study indicates that the provided constraints are subject to sizeable theoretical uncertainties. Currently none of the PDF fitting groups estimate perturbative uncertainties of theoretical predictions (e.g. uncertainties from scale variations), therefore the 'LHCb Norm' approach should look more attractive since the uncertainties from the scale variations are not crucial, while in the 'LHCb Abs' approach, results without the scale uncertainties will not be trustworthy.
Furthermore the inclusion of perturbative theoretical uncertainties in the PDF fits is becoming a pressing issue. Once an improved strategy is developed, both considered approaches of fitting the LHCb heavy-flavour data can be used in the future global fits. Having the scale uncertainties under control will provide a possibility to exploit the LHCb heavy-flavour data also for a precise measurement of the charm and beauty masses, using the absolute cross sections.
Improved precision of the gluon distribution at low x has implications in the physics of atmospheric showers, being important for calculations of prompt lepton fluxes in the atmosphere, see e.g. for recent studies [268,269,270,271]. At some point astrophysical measurements of high-energy neutrinos may provide an upper limit on charm hadroproduction in the atmosphere and complement collider-based measurements.
The final remark concerns the 'LHCb Norm' approach. As previously mentioned, the constraints at low x obtained in this approach crucially depend on the presence of any medium-x region already constrained by other data. In this context, the inclusion in the PDF fit of further datasets sensitive to gluons, e.g. jet measurements, should be very interesting. Another possible improvement may come from precise heavy-flavour measurements at the LHC near the kinematic threshold in the complementary rapidity region, which will extend the fitted y shape to the central region 0 < y < 2 and therefore to the medium range of gluon x.

Conclusions and outlook
In this review the total set of charm production measurements in deep inelastic scattering at HERA, both inclusive and exclusive, has been presented and analysed. The data were obtained by the two Collaborations H1 and ZEUS with their multipurpose detectors during the two operation phases of HERA and constitute an essential achievement of the HERA program. The measurements rely on several techniques and exploit specially developed detector components. All individual data sets were shown to be consistent with each other and could therefore be combined into a precise common set. The experimental uncertainty of the combined results reaches 4% in the region of medium Q 2 , and is about 8% on average. Also theoretical uncertainties entered, when extrapolations in unmeasured regions of phase space had to be performed.
Together with the HERA ep data also the LHCb pp data were included, since they provide complementary information about charm production. The constraining power of the precise double-differential LHCb data may be further improved by including in the QCD analysis also measurements of charm and beauty production at the LHC in the central rapidity phase-space region near the kinematic threshold, which can be obtained with the ATLAS and CMS detectors.
The theoretical framework for predicting the charm cross sections has been outlined. An essential feature is the heavy mass of the c-quark and thus the presence of a hard scale such that the application of perturbative calculations is justified in the whole kinematic range. In the phase space of currently available experimental data the most rigorous theoretical calculations are performed in the fixed-flavor-number scheme (FFNS). In this scheme the u, d, s quarks are massless, while the mass effects of heavy quarks (c, b) are fully accounted for. Heavy-flavour production at HERA and LHC are dominated by processes involving gluons in the initial state. Therefore the investigation of the gluon distribution was a central issue in this review and showed the importance of the LHCb data.
Quantitative predictions of charm productions were possible once the parton distribution functions had been extracted from experimental data. The NLO QCD predictions were shown to agree in the whole phase space with the data on charm production. The sensitivity of the predictions to the mass of the c-quark made it possible to determine from the HERA data the optimal value of the c-quark MS mass with a precision of about 3%. The extracted mass is consistent with the world-average value and has competitive precision to other individual determinations in perturbative QCD. The inclusion of the LHCb data to the HERA data improved considerably the accuracy of the gluon distribution at low x. The usage of the fixed-flavor-number scheme is of crucial importance in this analysis because it fully accounts for mass effects near the kinematic threshold without additional non-physical tunable parameters.
It turned out that the experimental precision of the data exceeds the precision of the theoretical calculations. Uncertainties in the perturbative QCD predictions arise from missing higher orders, QCD parameters, and non-perturbative parton distribution functions and fragmentation functions which must be extracted from data. Dominant theoretical uncertainties of the NLO calculations for charm production in the bulk of the available phase space come from missing higher orders. Higher-order calculations are needed in order to reduce the theoretical uncertainty to the level of the current data precision and thus enabling stronger tests of QCD. New measurements, e.g. those from the LHC or future collider experiments, with better precision or being performed in new regions of phase space, are expected to further improve the understanding of QCD and development of high-energy physics.

Acknowledgements
I would like to thank Dieter Haidt for the initiative to elaborate on my PhD thesis and his invaluable help in the preparation of the review. I thank my PhD supervisor Achim Geiser for several years of exciting collaboration, achieving the results described in this document. It is my pleasure to thank all members of ZEUS, H1 and PROSA Collaborations with whom I worked during these years.

A Measurement of D + production: additional information
In this Appendix additional information on the measurement of D + production (see Section 5) is provided. Fig. A.1 shows purity as a function of p T (D + ), η(D + ), Q 2 and y.

B Combination procedure: additional details
In this Appendix additional information on the combination procedure (Section 6.2) is provided.

B.1 Minimisation method
The minimisation method described below [272,273] is applicable if the uncertainties of the measurements do not depend on the central values (the additive treatment); in the case of the multiplicative treatment this method is extended with an iteration procedure described in the next Section B. 2. In this case the χ 2 -function 6.2 can be considered. Since χ 2 in 6.2 is a quadratic form of m and b, it may be rearranged such that it takes a simpler form. To show this explicitly, χ 2 can be written as a Taylor series up to its second derivatives near the minimum, (m 0 , b 0 ): Notation | 0 indicates that the expression is evaluated at m = m 0 , b = b 0 . Note, that this is an exact expression, because χ 2 is a quadratic form; moreover the second derivatives are constant, i.e. ∂ 2 χ 2 It is useful to give explicit expressions for the second derivatives and to introduce the following matrix notations: where δ i j is the Kronecker delta. The minimum χ 2 min = χ 2 0 is found by solving a system of linear equations: Although solving of this system can be performed directly by inversion of the whole matrix, it is more convenient to take an advantage of the diagonal structure of the block A M and solve the system using the method of the Schur complement: This method benefits from the fact, that the only non-trivial inversion to be performed is the inversion of the block A S . The size of this block is N s × N s and usually much smaller than the total size of the system B.5, (N m + N s ) × (N m + N s ), therefore this method is preferable for computation. Obtained solution (m 0 ,b 0 ) solves the minimisation problem for the central values. To find the uncertainties on (m 0 ,b 0 ), the χ 2 expansion in Eq. B.1 can be written, taking into account that (m 0 ,b 0 ) is its minimum: Denoting m − m 0 =m, b − b 0 =b: To separate contributions fromm andb in the term 2 m|A SM |b , introduce a variable substitution |m = |m − X |b : 10) thus choosing X = −A M −1 A SM : Eq. B.11 allows interpretation of matrices A M and A S in terms of uncertainties onm andb. Since variation of χ 2 -function of 1 corresponds to one standard deviation, diagonal elements of matrix A M gives the uncertainty onm and therefore the uncorrelated uncertainty on m 0 : 12) and diagonal elements of matrix A S gives the uncertainties on the fitted values of the nuisance parameter b 0 : 13) which are also referred to as the reduction factors of correlated uncertainties. Propagating them to |m = |m + X |b gives the correlated uncertainties on m 0 : (B.14) In Eq. B.11 variablesb are still mixed because of the nondiagonal structure of matrix A S , so the correlated uncertainties δ cor j m 0i are not independent. It is possible to decompose them, diagonilising this matrix 15) and introducing new independent (diagonalised) correlated error sources,b :b = DUb. (B. 16) Here U is an orthogonal matrix, composed of the eigenvectors of A S , and D is a diagonal matrix, composed of the corresponding square roots of eigenvalues. Using B.15 and B.16, χ 2 -function from Eq. B.11 can be written as: 17) where I is the unit matrix. Thus diagonalised correlated uncertainty sources are independent variables distributed according to the unit Gaussian distribution around zero. Propagating them to m = m + Xb gives the correlated uncertainties on m 0 : (B. 18) Summarising results of B.12 and B.18, averaged quantities can be written as: 19) with a i and b j being independently distributed according to the unit Gaussian distribution around zero.

B.2 Iterative procedure
If some of the uncertainties are treated multiplicatively, the extremum conditions B.3 do not produce a system of linear equations, since Γ e, j i are functions of unknown m 0 . In this case the averaging technique described in Section B.1 still can be used, but the average has to be found in an iterative procedure [272,273]: first equation 6 Then the determination of m 0i is repeated. Typically convergence is observed after two iterations and the iteration procedure is terminated.
Note that this iterative procedure does not give the exact minimum of the χ 2 -function 6. 6. Although there are arguments [231,227] that the exact minimum of 6.6 is biased, while the iterative procedure described above gives an unbiased result.

C Combination of visible D * + cross sections: additional information
In this Appendix additional information on the combination of the visible D * + cross sections (see Section 6.4) is provided.
Table C.1 provides information on the fitted nuisance parameters.
The combined data with all correlations are provided in Tables C.2 to C. 3.  Table C.1: Sources of bin-to-bin correlated uncertainties considered in the combination of the visible D * + cross sections. For each source the affected datasets, name, type (see Section 6.2.2) and reference to the place, where information can be found, are given, together with the shift (sh) and reduction factor (red) in the combination obtained after the first iteration. For sources which do not affect the combination of a given differential cross section, no shifts and reductions are quoted.  The combined single-differential D * + cross sections as a function of p T (D * + ), η(D * + ), z(D * + ), Q 2 and y, with their uncorrelated (δ unc ), correlated (δ 1 to δ 22 ) and external branching-ratio (δ br ) uncertainties. The cross sections are given in the kinematic region 6.9. p T (D * + ) dσ dp T (D * + ) δ unc δ 1 δ 2 δ 3 δ 4 δ 5 δ 6 δ 7 δ 8 δ 9 δ 10 δ 11 δ 12 δ 13 δ 14 δ 15 δ 16 Table C. 3: The combined double-differential D * + cross section as a function of Q 2 and y, with its uncorrelated (δ unc ), correlated (δ 1 to δ 28 ) and external branching-ratio (δ br ) uncertainties. The cross sections are given in the kinematic region 6.11.

D Combination of reduced charm cross sections
In this Appendix additional information on the combination of the reduced charm cross sections (see Section 6.5) is provided. Section D.2 presents a comparison with the theoretical predictions in different VFNS and a determination of optimal c-quark mass parameters for these schemes.

D.2 Comparison to theoretical predictions and QCD analysis in VFNS
In Figs D. 13 and D.14 the combined cross sections are compared with predictions of the MSTW group in the GM-VFNS at NLO and NNLO, respectively, using the RT-standard [21,230] and the RT-optimised [79] interpolation procedure of the cross section at the charm-production threshold. At NLO, the optimised prediction tends to describe the data better than the standard one at lower Q 2 . The description of the data is improved at NNLO compared to NLO. In Fig. D.15 the data are compared to the NLO predictions based on HERAPDF1.5 [275] extracted in the RT standard scheme using as inputs the published HERA-I [2] and the preliminary HERA-II combined inclusive DIS data. For the central PDF set a c-quark mass parameter M c = 1.4 GeV is used. The uncertainty bands of the predictions reflect the full uncertainties on the HERAPDF1.5 set. They are dominated by the uncertainty on M c which is varied between 1.35 GeV and 1.65 GeV [2]. Within these uncertainties the HERAPDF1.5 predictions describe the data well. The central predictions are very similar to those of the MSTW group for the same scheme.
In Fig. D. 16 the data are compared to the predictions in the GM-VFNS by the NNPDF Collaboration. Both the NNPDF FONLL-A [22] and FONLL-B [245,276] predictions describe the data fairly well at higher Q 2 , while they fail to describe the data at lower Q 2 . The description of the data at lower Q 2 is improved in the FONLL-C [245,276] scheme.
In Fig. D.17 the data are compared to the predictions in the GM-VFNS by the CTEQ Collaboration. The CT predictions [277,278] are based on the S-ACOT-χ heavy-quark scheme. The NLO prediction, which is very similar to the FONLL-A scheme, describes the data well for Q 2 > 5 GeV 2 but fails to describe the data at lower Q 2 . Similar to the FONNL-C case, the description of the data improves significantly at NNLO. In summary, conclusions similar to [60] can be drawn. The best description of the data is achieved by the predictions including partial O(α 3 s ) corrections (MSTW NNLO), however they do not fully describe the Q 2 slope of the data at low Q 2 Table D.1: Sources of bin-to-bin correlated uncertainties considered in the combination of the reduced charm cross sections. For each source the affected datasets, name, type (see Section 6.2.2) and reference to the place, where information can be found, are given, together with the shift (sh) and reduction (red) factor in the combination obtained after the first iteration. Descr  The combined reduced charm cross section, with its statistical (δ stat ),uncorrelated (δ unc ) and correlated (δ 1 to δ 78 ) uncertainties.
The correlated (δ 1 to δ 78 ) uncertainties for the combined reduced charm cross section. For the cross section values and their uncorrelated uncertainties see Table 6. 10.
Q 2 x δ1 δ2 δ3 δ4 δ5 δ6 δ7 δ8 δ9 δ10 δ11 δ12 δ13 δ14 δ15 δ16 δ17 δ18 δ19 δ20 δ21 δ22 δ23 δ24 δ25 δ26 δ27 δ28 δ29 δ30 δ31 δ32 δ33 δ34 δ35 δ36 δ37 δ38             x Figure D. 13: Combined measurements of σ cc red (closed circles) shown as a function of x for particular Q 2 , compared to the prediction by MSTW at NLO. The predictions obtained using the standard (optimised) parametrisation are represented by the shaded bands (solid lines). The uncertainties for the optimised parametrisation are not evaluated by the authors of the predictions but are expected to be of same size as those for the standard parametrisation.
(2.5 < Q 2 < 5.0 GeV 2 ). The predictions including O(α 2 s ) terms in all parts of the calculation (NNPDF FONLL-C, CT NNLO) as well as the MSTW NLO optimal scheme also agree with the data reasonably well. The largest deviations are observed for predictions based on O(α s ) terms only (NNPDF FONLL-A and CT NLO). As investigated below, further differences can be partially explained by the different choices for the value of the respective c-quark mass parameter M c .
Similar to the extraction of the c-quark mass in the FFNS, described in Section 6.5.3, the combined charm data were used to determine the effective parameters of the individual VFNS.
The following implementations of the VFNS were considered: ACOT full [15,16] as used for the CTEQHQ releases of PDFs; S-ACOT-χ [18,19,20] as used for the latest CTEQ releases of PDFs, and for the FONLL-A scheme [22] used by NNPDF; the RT standard scheme [21,230] as used for the MRST and MSTW releases of PDFs, as well as the RT optimised scheme providing a smoother behaviour across thresholds [79]. The ZM-VFNS as implemented by the CTEQ group [15,16] was also used for comparison. In all schemes, the onset of the heavy-quark PDFs is controlled by the parameter M c , in addition to the kinematic constraints.
The fitting procedure was the same as in the FFNS fit, described in Section 6. 5

.3, except:
since most of the considered VFNS at O(α 2 s ) fail to describe the Q 2 slope of the data in the range of 2.5 < Q 2 < 5.0 GeV 2 , the first Q 2 bin was excluded from the fit; 14: Combined measurements of σ cc red (closed circles) shown as a function of x for particular Q 2 , compared to the prediction by MSTW at NNLO. The predictions obtained using the standard (optimised) parametrisation are represented by the shaded bands (solid lines). The uncertainties for the optimised parametrisation are not evaluated by the authors of the predictions but are expected to be of same size as those for the standard parametrisation.
the strong coupling constant was chosen α n f =5 s (M Z ) = 0.1176 ± 0.0020; the renormalisation and factorisation scales for the heavy quarks were set to µ f = µ r = Q and not varied, since it is not technically possible in the framework; the preferred mass parameters were obtained from the scan, since the implementation of the calculations does not allow for their changes in the PDF fitting procedure. The step size 0.01 GeV was used.
In In the cases of the ACOT full and S-ACOT-χ schemes the dependence of χ 2 on M c has small discontinuities since these schemes are implemented using K-factors. 75 A smooth curve can be obtained by fitting the points with a parabolic function, although this will not significantly change the preferred M opt c values. In Table D. 3 the resulting values of M opt c are given together with the uncertainties and the χ 2 /n dof values; for comparison the 75 From a calculation point of view, the theoretical model consists of the numerical integration of an integro-differential equation and multiple convolution integrals that are evaluated mostly by adaptive algorithms (K-factors). Change of a parameter (M c in this case) results in the appearance of an uncontrolled numerical noise. Some details can be found also in [232].   'HERA 2012' results are also given. For ACOT full and S-ACOTχ schemes the uncertainties are not evaluated, since due to the present discontinuities in the χ 2 curves they can be misleading. The RT optimised scheme yields the best global χ 2 . The fit in the S-ACOT-χ scheme results in a very low value of M opt c as compared to the other schemes. In general the predictions of the different schemes become very similar for Q 2 ≥ 5 GeV 2 and describe the data well, once the charm-mass parameters are set to the preferred values. Note, that even the ZM-VFNS, which includes mass effects only indirectly [15,16], yields a reasonably good description of the combined charm data for Q 2 ≥ 5 GeV 2 (although it predicts a zero cross section in the lowest Q 2 bin), however ∼ 20 units of χ 2 worse than the other schemes.
Similar to the fit in the FFNS, all fitted M opt c values are consistent with those which have been determined in the previous analysis [60] with the 'HERA 2012' combined data. Those variants, for which the uncertainties are determined, exhibit improved precision.
Using different charm-mass parameters adjusted to the HERA data allows for a reduction of the theoretical uncertainty due to the choice of the heavy-flavour scheme for W ± and Z production at the LHC, as was demonstrated in [60].

E PDF fit with LHCb heavy-flavour data: additional information
In this Appendix additional information on the PDF fit with the LHCb heavy-flavour data (Section 8) is provided.

E.1 MNR calculations in HERAFitter: details of implementation
A PDF fit in the framework described in Section 8.1.1 typically requires several thousands of iterations to converge. In each iteration the theoretical predictions for each dataset must be recomputed. Since computation of the NLO predictions is usually very time consuming, this requires a "smart" implementation of the calculations with separating the bottleneck parts from the iterative procedure. Another popular solution is to use "fast" techniques, such as K-factors or precomputed perturbative grids (see, e.g. [279,280,281,282,283]). Although "fast" techniques are widely used by modern PDF groups, they usually have shortcomings, since they do not allow changing parameters of the calculations, like the factorisation and renormalisation scales or heavy-quark masses.
The MNR calculations (one-particle inclusive variant) as implemented originally in the FORTRAN code [98] require about several hours to calculate one set of the predictions for one of the considered LHCb datasets. 76 Numerical multi-dimensional integration over the phase space is done with the MC method using the VEGAS algorithm [284]. The main advantage of the MC integration is that it can be suitably performed for any configuration of the phase space; the only number to be adjusted to reach the desired accuracy is the total number of iterations. The disadvantage is that all parts of the calculations have to be repeated in each iteration.
Therefore in HERAFitter numerical multi-dimensional integration for the MNR calculations was implemented as nested loops using the trapezoidal rule for each one-dimensional integration. This allows for separation of the most time consuming parts in the top loop(s). The one-particle inclusive variant of the calculations was used. All flexibility of the original MNR code was retained: the factorisation and renormalisation scales, heavyquark mass, strong coupling constant, fragmentation function and PDFs may be changed in each iteration (in other words, may be treated as fit parameters). The typical timing to calculate one set of the predictions for the considered LHCb datasets is ∼ 1 s and the inaccuracy of the predictions is less than 1% comparing to the results obtained with the original MNR code. This allows a PDF fit with these data to converge typically within a few hours. Additionally the results were cross checked with the NLO predictions as calculated by the (semi)independent FONLL program, using the public web interface [104] 77 , and differences were found to be within 1-3%.
However note that the integration loops were adjusted for this particular configuration; another phase space and/or binning will need their readjustment. 76 The timing depends on the number of bins, desired accuracy of the predictions and CPU; the quoted one is for 40 bins from [177], 1% inaccuracy and Intel Core i7-3520M. 77 The 'NLO' option of the FONLL program was used.

E.2 Study of charm fragmentation function
In the 'LHCb Abs' fit the following tendency was observed: the LHCb charm data prefer a harder fragmentation function than was measured at HERA, since the variation of α k to upper values results in better χ 2 . This can be seen even from the nominal fit: predictions for the bins 1 < p T < 3 GeV are on average above the data, while the bins with higher p T are below; this non-perfect description of the p T shape actually explains the somewhat large χ 2 values for the LHCb charm datasets in Table 8. 2.
In order to investigate this further the fragmentationfunction parameter for charm was released in the fit. The fit converged to a very large α k value which corresponds to an almost z 1 parton to hadron transition. Another check was performed by using the BCFY fragmentation function [285] with r = 0.1 extracted from e + e − colliders within the FONLL approach [209], which corresponds approximately to α k = 12. A much better description of the charm data was found than with the fragmentation function derived from the HERA data. This study qualitatively confirms the recipe for heavy-flavour fragmentation provided in [106]: since FONLL resummations of NLL provide evolution of the perturbative part of the fragmentation function to the scale ∼ m Q , with NLO QCD predictions for hadro-and electroproduction of heavy flavours for the p T region close to the threshold it would be more appropriate to use a fragmentation function extracted at FONLL (e.g. those from e + e − at the Z 0 resonance), while for the high-p T region it is more appropriate to use a fragmentation function extracted at the NLO approach. However such a study was beyond the scope, so the QCD analysis was limited to the usage of the fragmentation function measured at HERA.
For beauty no tendencies were observed: the LHCb data clearly prefer the value α k ≈ 11 similar to that extracted from the LEP data.