A posteriori inclusion of parton density functions in NLO QCD final-state calculations at hadron colliders: The APPLGRID Project

A method to facilitate the consistent inclusion of cross-section measurements based on complex final-states from HERA, TEVATRON and the LHC in proton parton density function (PDF) fits has been developed. This can be used to increase the sensitivity of LHC data to deviations from Standard Model predictions. The method stores perturbative coefficients of NLO QCD calculations of final-state observables measured in hadron colliders in look-up tables. This allows the posteriori inclusion of parton density functions (PDFs), and of the strong coupling, as well as the a posteriori variation of the renormalisation and factorisation scales in cross-section calculations. The main novelties in comparison to original work on the subject are the use of higher-order interpolation, which substantially improves the trade-off between accuracy and memory use, and a CPU and computer memory optimised way to construct and store the look-up table using modern software tools. It is demonstrated that a sufficient accuracy on the cross-section calculation can be achieved with reasonably small look-up table size by using the examples of jet production and electro-weak boson (Z, W) production in proton-proton collisions at a center-of-mass energy of 14 TeV at the LHC. The use of this technique in PDF fitting is demonstrated in a PDF-fit to HERA data and simulated LHC jet cross-sections as well as in a study of the jet cross-section uncertainties at various centre-of-mass energies.


Introduction
The Large Hadron Collider (LHC) at CERN will collide protons at a centre-of-mass energy of up to 14000 GeV. The combination of its high collision rate and centre-of-mass energy will make it possible to probe new interactions at very short distances. Such interactions might be revealed in the production of cross-sections of particles at very high transverse momentum (p T ) as a deviation from the Standard Model theory.
The sensitivity to new physics depends on experimental uncertainties in the measurements and on theoretical uncertainties in the Standard Model predictions. It is therefore important to work out a strategy to minimise both the experimental and theoretical uncertainties from LHC data. Residual renormalisation and factorisation scale uncertainties in next-to-leading order (NLO) QCD calculations for single inclusive jet cross-sections are typically about 5 − 10% and should hopefully be reduced as NNLO calculations become available. However, in some kinematic regimes, PDF uncertainties can be substantially larger than the uncertainties from higher-order corrections, for example at large p T . One strategy to reduce such uncertainty is to use single inclusive jet or Drell-Yan cross-sections at lower p T to constrain the proton parton density function (PDF) uncertainties at high p T .
In order to further constrain PDF uncertainties, it would be useful to be able to include final state data such as p T and rapidity distributions for W/Z-boson and jet production in global NLO QCD PDF fits, without recourse to inexact methods like the use of simple factor correcting of LO cross-sections (k−factors). We propose here a method for a consistent inclusion of final-state observables in global QCD analyses.
For inclusive data, like the proton structure function F 2 in deep-inelastic scattering (DIS) the perturbative coefficients are known analytically. During the fit the cross-section can therefore be quickly calculated from the strong coupling (α s ) and the PDFs and then be compared to the measurements. However, final state observables, where detector acceptances or jet algorithms are involved in the definition of the perturbative coefficients (called "weights" in the following), have to be calculated using NLO QCD Monte Carlo programs. Typically such programs need about one day of CPU time to accurately calculate the cross-section. It is therefore necessary to find less time consuming methods.
Any NLO QCD calculation of a final-state observable involves Monte Carlo integration over a large number of events. For deep-inelastic scattering and at hadron colliders this must usually be repeated for each new PDF set, making it impractical to consider many 'error' PDF sets, or carry out PDF fits. Here, the "a posteriori" inclusion of PDFs is discussed, whereby the Monte Carlo run calculates a look-up table (in momentum fraction, x, and momentum transfer, Q) of cross-section weights that can subsequently be combined with an arbitrary PDF. The procedure is numerically equivalent to using an interpolated form of the PDF.
Many methods have been proposed to solve this problem in the past [1][2][3][4][5]. In principle the highest efficiencies can be obtained by taking moments with respect to Bjorken-x [1,2], because this converts convolutions into multiplications. This can have notable advantages with respect to memory consumption, especially in cases with two incoming hadrons. On the other hand, there are complications such as the need for PDFs in moment space and the associated inverse Mellin transforms.
Methods in x-space have traditionally been somewhat less efficient, both in terms of speed and in terms of memory consumption. They are, however, somewhat more transparent since they provide direct information on the x values of relevance. Furthermore they can be used with any PDF. The use of x-space methods can be further improved by using methods developed originally for PDF evolution [6][7][8].
Our method [9] bears a number of similarities to that of the fastNLO project [10] and the two approaches were to some extent developed in parallel. Relative to fastNLO, we take better advantage of the sparse nature of the x-dependent weights, allow for more flexibility in the scale choice by keeping explicitly the scale dependence as an additional dimension in the weighting table and provide a means to evaluate renormalisation and factorisation scale-dependence a posteriori. We also provide a broader range of processes, since in addition to di-jet production, we include W-and Z-boson production. In order to make easy use of the large number of weight files for practically all inclusive jet p T spectra and di-jet mass spectra made available by the fastNLO project, we provide a software interface to make use of these weight tables within the APPLGRID framework.

PDF-independent representation of cross-sections 2.1 Representing the PDF on a grid
We make the assumption that PDFs can be accurately represented by storing their values on a two-dimensional grid of points and using n th -order interpolations between those points. Instead of using the parton momentum fraction x and the factorisation scale Q 2 , we use a variable transformation that provides good coverage of the full x and Q 2 range with uniformly spaced grid points: y(x) = ln 1 x + a(1 − x) and τ (Q 2 ) = ln ln The parameter Λ should be chosen of the order of Λ QCD , but need not necessarily be identical. The parameter a serves to increase the density of points in the large x region 1 and can be chosen according to the needs of the concrete application. 2 The PDF f (x, Q 2 ) is then represented by its values q iy,iτ at the 2-dimensional grid point (i y δy, i τ δτ ), where δy and δτ denote the grid spacings, and is obtained elsewhere by interpolation: where n, n ′ are the interpolation orders. The interpolation function I (n) i (u) is 1 for u = i, and otherwise is given by: Defining int(u) to be the largest integer such that int(u) ≤ u, k and κ are defined as: Given finite grids whose vertex indices range from, 0 . . . N y −1, for the y grid and, 0 . . . N τ −1, for the τ grid, one should additionally require that eq. (2) only uses available grid points. This can be achieved by remapping, k → max(0, min(N y −1−n, k)), and, κ → max(0, min(N τ −1−n ′ , κ)).

Representing the final state cross-section weights on a grid (DIS case)
To illustrate the method we take the case of a single flavour in deep-inelastic scattering (DIS).
Suppose that we have an NLO Monte Carlo program that produces events, m = 1 . . . N . Each event m has an x value, x m , a Q 2 value, Q 2 m , as well as a weight, w m . We define p m as the number of powers in the strong coupling α s in event m. Normally one would obtain the final result W of the Monte Carlo integration for one sub-process from: 3 where f (x, Q 2 ) is the PDF of the flavour under consideration.
Instead one introduces a weight grid W (p) iy,iτ and then for each event one updates a portion of the grid with: The final result for W , for an arbitrary PDF and an arbitray α s , can then be obtained subsequent to the Monte Carlo run: where the sums with indices i y and i τ run over the number of grid points and we have explicitly introduced x (iy) and Q 2 (iτ ) such that: y(x (iy) ) = i y δy and τ Q 2 (iτ ) = i τ δτ.

Including renormalisation and factorisation scale dependence
If one has the weight matrix W (p) iy,iτ determined separately order by order in α s , it is straightforward to vary the renormalisation µ R and factorisation µ F scales a posteriori (we assume that they were set equal in the original calculation).
It is helpful to introduce some notation related to the DGLAP evolution equation: 3 Here, and in the following, renormalisation and factorisation scales have been set equal for simplicity.
where the P 0 and P 1 are the LO and NLO matrices of DGLAP splitting functions that operate on vectors (in flavour space) f of the PDFs. Let us now restrict our attention to the NLO case where we have just two values of p in eq. 7. For example, in jet production in DIS, p LO = 1 and p NLO = 2. Introducing ξ R and ξ F corresponding to the factors by which one varies µ R and µ F respectively, for arbitrary ξ R and ξ F we may then write: where β 0 = (11N c − 2n f )/(12π) and N c (n f ) is the number of colours (flavours). Though this formula is given for an x-space based approach, a similar formula applies for moment-space approaches. Furthermore it is straightforward to extend it to higher perturbative orders.
To obtain the full DIS cross-section a summation of the weights and the parton densities over the contributing sub-processes is required.

The case of two incoming hadrons
In hadron-hadron scattering one can use analogous procedures but with one more dimension. Besides Q 2 , the weight grid depends on the momentum fractions of the first (x 1 ) and second (x 2 ) hadrons.
The analogue of eq. 7 is given by: where n sub is the number of sub-processes and the initial state parton combinations F are specified in eqs. 12, 20 and 18.
The combinations of the incoming parton densities (defining the number of sub-processes) often can be simplified by making use of the symmetries in the weights. In the case of jet production only seven sub-processes are needed (see section 2.4.1). The case of W -boson and Z-boson production is treated in the Appendix A. The case of b-quark production is discussed in ref. [11].
An automated way to find the sub-processes is discussed in Appendix B.

Sub-processes for jet production in hadron-hadron collisions
In the case of jet production in proton-proton collisions the weights generated by the Monte Carlo program can be organised in seven possible initial-state combinations of partons: where g denotes gluons, q, quarks and r, quarks of different flavour, q = r and we have used the generalised PDFs defined as: where f i/H is the PDF of flavour i = −6 . . . 6 for hadron H and H 1 (H 2 ) denotes the first or second hadron. 4

Including scale dependence in the case of two incoming hadrons
It is again possible to choose arbitrary renormalisation and factorisation scales. Specifically for NLO accuracy: where F (l) q 1 →P 0 ⊗q 1 is calculated as F (l) , but with q 1 replaced with P 0 ⊗ q 1 , and analogously for F (l) q 2 →P 0 ⊗q 2 .

Reweighting to a different center-of-mass energy
From a weight grid W calculated at a particular centre-of-mass energy √ s it is also possible to calculate a cross-section at a different centre-of-mass energy √ s ′ by using transformed parton momentum fractions x ′ 1/2 and adding a flux factor in the cross-section convolution as given by eq. 11 or eq. 14: and the momentum fractions x 1/2 in the generalised parton densities F (x 1 , x 2 , Q 2 ) are replaced by: When √ s ′ < √ s it can occur that x ′ 1 > 1 or x ′ 2 > 1, in which case the parton densities should be set to zero. One should be aware that a jet transverse momentum that corresponds to moderate x values with centre-of-mass energy √ s (and correspondingly low density of grid points in x) may correspond to large x when using a smaller √ s ′ . In such cases, it can happen that the low density of grid points in x is no longer sufficient, given that PDFs vary more rapidly at large x than at moderate x.
Special care is also needed when taking √ s ′ > √ s insofar as there will be kinematic regions accessible with the larger √ s ′ values that were not probed at all in the original NLO calculation at centre-of-mass energy √ s. As a concrete example, with √ s ′ = 14 TeV, there can be events with three jets having respectively p T = 6, 4, 2 TeV. Such events contribute to the inclusive jet spectrum at p T = 4 TeV . However, taking a grid calculated with √ s = 10 TeV (where such events are kinematically disallowed) and using it to determine the inclusive jet spectrum with √ s ′ = 14 TeV, this kind of contribution will be left out.

Technical implementation
To test the scheme discussed above, the NLO QCD Monte Carlo programs NLOJET++ [13] for jet production and MCFM [14,15] for the production of W -and Z-boson are used. To illustrate the performance of the method jet and W -and Z-boson production are used as examples. However, it is worth noting that these these two programs give access to many of the NLO QCD calculations presently available.
The weight grid W (p)(l) iy 1 ,iy 2 ,iτ of eq. 11 is filled (for each cross-section bin) in the user module of the NLO program, where one has access to the event weights and the partons' momenta. This object is called "grid" in the following. At this point the cross-section definition is specified and the physical observables that are being studied are defined (e.g. using a jet algorithm).
The weight grid for each value of the observable in question is represented as a multidimensional object with one dimension each for x 1 , x 2 and Q 2 , one for the sub-process in question, and one for the order in α s . The task is to store the weight grid in such a way that as little memory as possible is used and the information can be extracted in a fast way. In the following several options to reduce the necessary memory are discussed: The simplest structure for a software implementation of the weight grid is a multidimensional array (for x 1 , x 2 and Q 2 ), like the TH3D-class available in the ROOT analysis framework.
The overhead of storing empty bins can be largely reduced by calculating the x 1 , x 2 and Q 2 boundaries of the weight grid using the NLO QCD program in a special run before the actual filling step. At the beginning of the filling step the adjusted boundaries of the weight grids are then read-in and an optimised weight grid is constructed.
Since the rectilinear region bounded by limits in x 1 , x 2 and Q 2 may contain many phasespace points that are unoccupied, additional memory can be saved by using methods to avoid storing elements in the weight grid that are not filled. Since the occupied regions are continuous, but irregular, grid formats for truly sparse matrices (such as the Harwell-Boeing format) are not used. Instead a custom format is favoured where the grid, lower-and upper-limits in each dimension are stored along with all the elements in between. This is illustrated in Fig. 1 for a simple two-dimensional grid. For the three-dimensional structure, each of the row-column elements would itself be a column with its own lower and upper range delimiters. The resulting saving of memory is usually around a factor of four, even after taking into account the additional storage for the range delimiters. 5 Figure 1: An example of the custom twodimensional sparse structure. Rows and columns are numbered from 0 to 20 from the top left. The elements with data members are shown filled, only rows 1 to 17 have data members, for each row the columns that have data members are shown on the right. A total of 117 elements, from the maximum of 400 elements are stored, along with the single pair of row-range delimiters, and the 17 pairs of column range delimiters for each of the individual rows.
Since the grid itself knows the index of the first and last filled element in each row, column etc., it is possible to only iterate over those elements of the grid that contain data. Similarly when interrogating the grid for the value of an element, it is possible to ascertain whether the element is in the occupied, or unoccupied region of the grid and return the value of the filled element if filled, or 0 otherwise. This makes accessing the unfilled members of the grid much faster than otherwise.
The actual implementation for the grid 6 involves a number of related classes written in 5 If additional savings are required in the future, packing the range delimiters for each sparse one dimensional structure into a single integer will halve this additional overhead, but will slightly increase the access time due to the unpacking. 6 The complete code including the interfaces to NLOJET++ and MCFM is available from C++ 7 . The grid for a given cross-section is represented by a concrete instance of a master class, appl::grid. This class has a number of constructors that allow the cross-section it will calculate to be defined in terms of a fixed number of regular or variable width bins in the cross-section observable.
For each bin in the observable, the master class has a number of instances of an internal class -one for each order of α s -so that for a cross-section with 10 bins, with contributions at leading order and next-to-leading order, the master class would contain a total of 20 instances of the internal class.
This internal class, appl:igrid, encodes all the information required to create the crosssection, at one particular order, for that bin. The class contains the x-to-y, y-to-x and Q 2to-τ , τ -to-Q 2 transform pairs, and a subclass that encodes information on how to generate the N generalised internal sub-processes for the particular interaction from the basic parton distribution functions. It also contains instances of the sparse grid class in x 1 , x 2 and Q 2 described above, for each of the N sub-process.
When requested to perform the convolution, the master class calls the convolute method of the subclass for each order of the cross-section in each bin. The convolute method of the subclass performs the convolution over x 1 , x 2 and Q 2 for each of the sub-processes.
For each bin in the observable, the master class takes the cross-sections from the subclasses for each order from each bin and adds them to arrive at the final cross section for that bin.
The subclass for the generalised internal sub-processes are very basic classes which encode the number of sub-processes, i.e. seven in the case of jet production, and twelve in the case of Z-boson production, and simply take the 13 parton distribution values for each incoming hadron at a given scale, and generate the N internal processes from these.
When the grid is saved to a ROOT file 8 the master class encodes the complete status of the internal grids, which transform pair, and which sub-process is required etc., so that once reading from the file, everything required to calculate the cross section (e.g. sub-process definition, CKM matrix elements etc.) is available. In this way all information to perform the cross-section calculation is available from the output file from a single function call by the user and the only additional information required is an input function for generating the PDFs and another one for calculating α s . We use the HOPPET program [8] to calculate the DGLAP splitting functions needed for the cross-section convolution when the renormalisation and factorisation is varied (see eq. 10).
All the various choices in the weight grid architecture and other information needed to calculate the cross-section are encoded in the output file. They are described in the following: • The centre-of-mass energy at which the weight grid has been produced.
• The choice of the coordinate transform function. By default the form of eq. 1 is used.
However, any other function can be provided by the user.
• The interpolation order as given by eq. 2.
http://svn.hepforge.org/applgrid. 7 A FORTRAN interface is also available so that the basic functionality can be accessed from within user FORTRAN code. 8 Technically, the grid is transformed to TH3D-histograms that are stored in the output ROOT file.
• The number of grid points to be used for each dimension x 1 , x 2 and Q 2 .
• The definition of the sub-processes via a 13 x 13 matrix.
• The CKM matrix elements or other constants needed to calculate the cross-sections.
• The required number of the points on the grid can be optionally reduced with the aid of reweighting factor in the filling step. This flattens out the PDF in the region where it is steeply falling.
By default the following functional form is used for the reweighting 9 : The parameter a 1 can be adjusted to flatten out the change of the PDF at low-x while the parameter a 2 can be optimised for the high-x region. The factor 0.99 prevents the weight from being zero for x = 1.
Reasonable values for the parameters a 1 and a 2 have been determined by fitting the sum of the up, down and gluon PDFs. For the CTEQ6 PDFs [16], values of a 1 = −1.5 to −1.6 and a 2 = 3.0 to 3.4 have been found for the range 5 < Q < 5000 GeV. The variation comes from a slight dependence of the a 1 and a 2 parameters on Q 2 . For other PDFs, the results of the fit can be slightly different.
The user can change the parameters or provide another functional form.

Accuracy of the weight grids
The choice of the weight grid architecture depends on the required accuracy, on the exact crosssection definition and on the available computer resources. For each possible application the weight grid architecture has to be carefully chosen in order to achieve the required accuracy with the available computer memory and computing time. For instance, for observables where the PDFs are steeply falling, e.g. the inclusive jet cross-section at high transverse momentum in the forward region, a fine grid in x is needed. The memory usage of weight grids for one cross-section should be kept small since, e.g. in global PDF fits, it might be necessary to read in a large number of weight grids. In addition, the convolution time depends on the number of grid nodes, and so keeping memory requirements as small as possible is in any case desirable. The number of points needed in the weight grid is kept modest by using the higher-order interpolation functions of eq. 2, and optionally also by introducing a PDF weight, as in eq. 17 during the filling step, or by using a sparse structure.
In the following, the influence of the grid architecture on the achievable accuracy in the cross-section calculation is discussed. The computer memory use and execution speed are also investigated. The production of jets and of W -and Z-bosons at LHC are used as examples.
In our test runs, to be independent from statistical fluctuations (which can be large, in particular in the NLO case), in addition to the weight grid, reference histograms are filled using the NLO QCD calculation without weights in the standard way. The result obtained from the weight grid is then compared to these reference histogram.   Figure 7: Ratios of grid and standard calculations of the single inclusive jet p T spectrum for 0 < y < 1 (a) and for 2 < y < 3 (b), with scale variation. The default weight grid is used, with 30 bins in x, 10 bins in Q 2 , a coordinate transform parameter a = 5 and fifth order interpolation.
The grid results are based on a posteriori variation of the renormalisation and factorisation scales, using eq. 10, while the standard results have been obtained separately for each choice of renormalisation and factorisation scale. The PDF set is CTEQ6mE.

Jet production at hadron colliders
The single inclusive jet cross-section as a function of the jet transverse momentum (p T ) is calculated for jets in the central rapidity (y) region of 0 < y < 1 and in the forward rapidity region of 2 < y < 3. Jets are defined via the seedless cone jet algorithm as implemented in NLOJET++, which corresponds to the seedless algorithm of ref. [17] (or SISCone [18]), except for small differences in the split-merge procedure which are irrelevant at this order. The cone radius has been set to R = 0.7, the overlap fraction to f = 0.5. 10 The renormalisation and factorisation scales are set to Q 2 = p 2 T,max , where p T,max is the p T of the highest p T jet in the required rapidity region 11 .
To discuss the dependence of the weight grid performance on the grid architecture, a default weight grid is defined from which variations in a single parameter are studied systematically. The default weight grid consists of 30 bins in x and 10 bins in Q 2 . The points are distributed according to eq. 1 with a = 5 and 5th order interpolation is used. No PDF reweighting (see eq. 17) is used.
The ratio of the cross-section calculated with the default weight grid to the reference crosssection calculation is shown in Fig. 2 for the jet cross section in the central rapidity region (0 < y < 1) (a) and the forward rapidity region (2 < y < 3) (b). The weight grid is produced in a run where the CTEQ6mE PDF [16] has been used to calculate the jet cross-section. This PDF is used as standard in the following. To show the independence of the weight grid performance on the used PDF, Fig. 2 also includes more recent PDFs based on the analyses of a large variety of data (global analysis) like CTEQ6.6 [20] and MSTW2008 [21] or only using inclusive DIS data based on combined H1 and ZEUS data (HERAPDF01) [22]. In addition, we include a PDF that does not use a parameterised input distribution NNPDF [23]. Further comparisons of the jet cross-sections calculated with these PDFs can be found in section 5.
In the central region the cross-section calculated with the weight grid reaches an accuracy of about 0.1% for all tranverse jet momenta and all PDFs. In the forward region a similar performance is achieved for transverse jet momenta up to 1000 GeV. For transverse jet momenta above that value the performance degrades to 0.6% and a variation with the PDF is observed.
The dependence of the accuracy on the number of x-bins is illustrated in Fig. 3. If only 25 x-bins are used, the accuracy is 0.3% in the central and 0.6% in the forward rapidity region. The accuracy decreases towards low jet transverse momenta. More accuracy is achieved by a larger number of x-bins. For 30 bins the accuracy is 0.1%. For 40 x-bins the improvement is small, but visible. A very sensitive kinematic region is the forward region with very high transverse momenta. In this region at least 30 x-bins are needed to get an accuracy of 0.1%. Fig. 4 shows the dependence of the accuracy on the number of Q 2 -bins. This dependence is rather small. When a large enough number of x-bins is chosen, no change is observed for 8 to 15 bins in Q 2 .
10 These choices are related to the fact that some of the NLOJET++ runs were performed some time in the past. A modern cone-algorithm (in the class of those with a split-merge procedure) would be SISCone [18], and a value of f = 0.75 would be recommended [19]. 11 Note that beyond LO the pT,max will in general differ from the pT of the other jets, so when binning an inclusive jet cross-section, the pT of a given jet may not correspond to the renormalisation scale chosen for the event as a whole. For this reason separate grid dimensions for the jet pT and for the renormalisation scale are used. This requirement has been efficiently circumvented in some moment-space approaches [2].
The dependence on the interpolation order (as defined in eq. 3) is shown in Fig. 5. While varying the default interpolation order n = 5 to n = 4 and n = 6 gives similar results within 0.1%, the interpolation order n = 3 leads to an accuracy loss of 0.5% at low transverse jet momenta in the central regions, and by 0.4 − 1% at low and high transverse jet momenta in the forward region.
In conclusion, the results in Figs. 2-5 demonstrate that an accuracy of 0.1 % can be reached with a reasonable weight grid size. The most critical parameter is the number of x-bins, which must be large enough to accommodate strong PDF variations in certain phase space regions. In comparison, the dependence on the number of Q 2 bins is rather weak. The interpolation between the grid points is sufficiently accurate to allow the grid technique to be used and fifth order interpolation produces reasonable results. The achieved accuracy is probably sufficient for all practical applications.
In applications where a very small weight grid is needed, one can also introduce a PDFweight to flatten out the x-dependence of the PDFs (see eq. 17). The PDF weight is calculated using a 1 = −1.5 and a 2 = 3. This is illustrated in Fig. 6, where grids with very low number of x-bins (8,9,10) and eight Q 2 bins are used, the interpolation is lowered to n = 4. Even with the smallest weight grid an accuracy of 1% is achieved using the PDF-weight. For a somewhat larger weight grid with 10 x-bins the accuracy is 0.5% in all phase space regions.
One of the important theoretical uncertainties in NLO QCD calculations is the variation of the results with the choice of the factorisation and renormalisation scale. Eq. 14 allows the calculation of the cross-section for any scale choice a posteriori from one weight grid produced at a fixed scale choice. The results from scale variations by a factor of 2 up and down is shown in Fig. 7. The renormalisation and factorisation scales are either varied together or varied individually. The weight grid result has been calculated with a single weight grid and the reference cross-sections have been calculated by repeating the standard NLO QCD calculation for each of the scale variations. The cross-section calculated with the weight grid reproduces the standard results to within about 0.1% in the central region and 0.1 − 0.2% in the forward region.

Reweighting jet cross-sections to a different centre-of-mass energy
As outlined in section 2.6 a weight grid produced at a given centre-of-mass energy can also be used to calculate the cross-section at a lower or higher centre-of-mass energy. This procedure works if the coverage in x in the weight grid is large enough. For instance, when lowering the centre-of-mass energy to calculate the jet cross-section at a fixed transverse jet momentum, it might happen that the required large x values are not present in the weight grid produced at a higher centre-of-mass energy. The variation of the centre-of-mass energy has therefore to be done with care by the user.
As an example, the accuracy of the jet cross-section calculation using the default weight grid at a fixed centre-of-mass energy of √ s = 14000 GeV is investigated. Reference cross-sections are calculated at various centre-of-mass energies, i.e. 1800, 5000, 7000, 10000, 14000, 16000 and 18000 GeV. Since the calculations at the various centre-of-mass energies are statistically independent, each reference cross-section as well as the default weight grid at √ s = 14000 GeV needs to be calculated with large event samples. Each of the calculations is done with 50 000 000 For large changes in centre-of-mass energy and large x T values the approximation of the standard grid becomes inaccurate. For instance, for √ s = 1800 GeV and x T = 0.6 the weight grid calculation gives a result that is 10% higher than the standard calculation. This discrepancy increases further for large x T values. The ratio of the last two x T values becomes very large. 12 Fig. 8b) shows the result for a larger grid using 50 bins in x and 20 bins in Q 2 . With such a grid the deviations are mostly reduced to statistical fluctuations. Only the largest x T value for the lowest centre-of-mass energy exhibts a deviation by 30 % from the standard calculation.
A small grid with a PDF weighting leads to large discrepancies to the standard calculation and cannot be used.
In conclusion, the grid technique gives a good accuracy to compute the jet cross-section at various centre-of-mass energies. For very high transverse momenta and extreme centre-of-mass variations a large grid might be required.

W-boson production at hadron colliders
To further demonstrate the performance of the weight grid method, the production of W -bosons at LHC energies is taken as example. The observable that will be examined is the transversemomentum distribution of the positron from W + -boson decays, when the positron is either central |η| ≤ 0.5, or very forward, |η| ≥ 3.0.
As in the previous section, a default weight grid is defined and variations in a few parameters are studied. The default weight grid consists of 25 bins in x. The points are distributed according to eq. 1 with a = 5 and a fifth order interpolation is used. No PDF weight (see eq. 17) is used. The cross-sections are calculated with the factorisation and renormalisation scale fixed to the mass of the W -boson. Therefore, the weight grid need only be two dimensional.
The influence of the number of bins in x is shown in Fig. 9. If the number of bins in x is too For the default weight grid, lowering the interpolation order from n = 5 to n = 4 results in an accuracy loss of about 0.2% over much of the p T range, as shown in Fig. 10. The accuracy for positrons with low transverse momenta degrades to 0.8%. The good precision for n = 5 can only be improved using n = 7. Fig. 11 shows the dependence on the grid spacing parameter corresponding to the parameter a in eq. 1. The x-values in the cross-section calculation are not large and consequently a fine spacing at large x (corresponding to a large a parameter) is not needed and the result improves for low a values. An accuracy of better than 0.1% is achieved for all variations.
Finally, Fig. 12 shows that, if a PDF weighting is used, it is possible to use very small grid sizes. For a weight grid with only eight x-bins an accuracy of 0.1% can be achieved. In this case the gain in accuracy is small when increasing the number of x-bins. Only in the forward region and for high transverse energies the increase in the number of x-bins is beneficial.
In summary, a sufficient accuracy is achieved with about 25 x-bins and a fifth order interpolation. An equidistant grid spacing (a = 0) is sufficient.

CPU and computer memory performance
The execution time for each call to the filling routine for the grid has been studied on a 1.5 GHz PowerPC and a 3 GHz Intel Xeon running Linux, using a dummy structure with N points in each dimension. Fig. 13 shows the performance for various grid architectures. The grids are based on either the ROOT TH3D class, the custom sparse class (SparseMatrix3d) described in the section 3, or the TMatrixDSparse class which implements the 2-dimensional Harwell-Boeing matrix representation. In the latter case, a sparse 1-dimensional structure of TMatrixDSparse matrices using the classes of the SparseMatrix3d has been used to create a sparse 3-dimensional structure. As expected the Harwell-Boeing based class is very quick for filling when the grid is small, but as the grid size becomes larger, since the occupation is reasonably large, the number of entries that must be examined becomes large and the filling time increases rapidly. For the TH3D and custom sparse structures, the filling time is largely independent of the grid size.
The reduction in memory occupied by the custom sparse grid structure after trimming away unoccupied elements is illustrated in Fig. 14. The bottom-left plot shows the absolute size of the stored elements in MBytes, both before, and after trimming away unfilled elements. The top-left plot shows the fraction of the total, untrimmed grid size, occupied by the filled elements. As the grid spacing decreases, the overall grid size naturally increases.
The execution time using the grid to perform the final cross-section calculation including the PDF convolution has also been studied using a 1.5 GHz PowerPC. The results are based on calculations of differential cross-sections with respect to the positron pseudo-rapidity and transverse momentum in W -boson production using MCFM [14,15], as presented in section 4.3.
The cross sections involve 20 and 24 bins for the lepton pseudo-rapidity and transverse energy distributions respectively. Fig. 14b shows the convolution time for grids with N bins in dimensions x 1 and x 2 for the sparse structure. Results are given for the trimmed and untrimmed structures In the case of the untrimmed grid, all data elements are retained in the convolution, even those with no entries.
Excluding the unfilled data elements in the convolution improves the convolution time by a factor approaching two. In addition, we see that the convolution time varies approximately linearly with the grid linear dimension. This is because the most costly part of the convolution is the calculation of the PDF at the grid nodes. With independent grid nodes for x 1 and x 2 , there are 2N evaluations of the PDF for each observable bin, and so the convolution scales linearly with N .
In conclusion, the custom sparse structure using trimmed blocks gives the best performance.

Application example: Calculation of NLO QCD uncertainty for inclusive jet cross-sections for proton proton collisions at various centre-of-mass energies
As an example in this section the uncertainties of the inclusive jet cross-section in the central region (0 < y < 1) are evaluated from the default grid obtained at a centre-of-mass energy of √ s = 14000 GeV. The jet cross-sections are calculated at various centre-of-mass energies.
The most recent PDF parameterisations along with their associated uncertainties are used, i.e. CTEQ6.6 [20], MSTW2008 [21], HERAPDF01 [22] and NNPDF [23]. Fig. 15 shows the effect of the PDF uncertainty from CTEQ6.6 (a), MSTW2008 (b), HER-APDF01 (c) and NNPDF (d) on the inclusive jet cross-section with respect to the central value of the somewhat older PDF, CTEQ6mE [16]. The uncertainty from the CTEQ6mE PDF is also overlayed. The band illustrates the result of adding the jet cross-sections obtained for each of the PDF variations 13 . The marker indicates the central value. Fig. 16 shows the PDF uncertainty together with the renormalisation and factorisation scale uncertainty added in quadrature with respect to the central value for each of the PDFs.
For p T < 1000 GeV the jet cross-section obtained with CTEQ6.6 is about 2% smaller than from CTEQ6mE. Above this value the CTEQ6.6 cross-section increases with respect to the one from CTEQ6mE as the jet p T increases. At p T = 2000 GeV it is about equal and at p T = 4000 GeV it is about 10% larger. The uncertainty is reduced for the CTEQ6.6 PDF. The uncertainty is about 3% for p T < 500 GeV, about 8% at p T = 1000 GeV and about 20% at p T = 3000 GeV.
The MSTW2008 PDF gives a jet cross-section that is 5% larger than the one obtained with Shown is the jet cross-section uncertainty induced by the CTEQ6mE PDF and the CTEQ6.6 (a) the MSTW2008 (b), the HERAPDF (c) and the NNPDF PDF (d). The reference cross-section σ 0 is the one obtained by the central value of the CTEQ6mE PDF. The default PDF is indicated by a marker. CTEQ6mE at p T < 500 GeV and is about the same at p T = 1000 GeV and then further decreases. The uncertainty is only about 2 % for p T < 500 GeV and then increases to about 6% at p T = 1000 GeV. The MSTW2008 PDF gives a smaller uncertainty than the CTEQ6.6 PDF. It seems that the differences between the jet cross-section calculated with CTEQ6.6 and MSTW2008 are a bit larger than the individual uncertainties.
The result obtained with the HERAPDF01 is more similar to the one obtained from MSTW2008 than the one from CTEQ6.6. At low p T the central value is about 2% higher than the one from CTEQ6mE. In the region 500 < p T < 1500 GeV the HERAPDF01 predicts a lower jet crosssection than the other PDFs. The uncertainty is about 5% for p T < 1000 GeV and then increases to about 20 − 40% at p T = 3000 GeV. The small uncertainty of the jet cross-section calculated with the HERAPDF01 is remarkable, since only DIS data are used. However, model and parametrization uncertainties are not included in this cross-section calculation. The MSTW and CTEQ sets do not yet include the most recent HERA data. The NNPDF predicts jet crosssections that are 5 − 10% higher than the one from the other PDFs; in particular in the region 300 < p T < 1000 GeV. The uncertainty is about 5% at low p T , 10% at 1000 GeV and 20 − 30% at 3000 GeV.
The overall uncertainty, i.e. including the PDF and the scale variation added in quadrature, is shown in Fig. 16. It is about 8% up to a p T of about 1000 GeV and then increases towards higher p T . It is about 20 − 30% at p T = 3000 GeV. For very high p T the PDF uncertainty dominates.   17a) shows the total inclusive cross-section for central jets (0 < y < 1) integrated for p T > 100 GeV, p T > 300 GeV and p T > 500 GeV as a function of the centre-of-mass energy. The markers denote the reference cross-section calculated in the standard way. The lines are obtained from a weight grid produced at √ s = 14000 GeV. The cross-section calculation from the default weight grid reproduces the reference cross-sections within 1 − 2% (see also section 4.2). For each jet transverse momentum threshold the total jet cross-section rises with increasing centreof-mass energy. Fig. 17b) shows the centre-of-mass energy dependence of the jet cross-section  Figure 18: Uncertainty of the inclusive jet cross-section for jets with transverse momenta p T > 100 GeV and within 0 < y < 1 as a function of the centre-of-mass energy √ s. Shown is the ratio of the cross-section with varied PDFs and renormalisation and factorisation scale (σ) to the cross-section calculated with the central value of each PDF set and no scale variation (σ 0 ). The inner uncertainty band shows only the PDF uncertainty. The outer band shows the PDF and the scale uncertainty added in quadrature. The uncertainty from CTEQ6.6 is shown in a), from MSTW2008 in b), from HERAPDF in c) and from NNPDF in d).
normalised to the jet cross-section at 5000 GeV for each jet transverse momentum threshold. As expected the centre-of-mass energy dependence is strongest for high transverse jet momenta. Fig. 18 shows for each of the considered PDF sets the PDF uncertainty along with the renormalisation and factorisation scale uncertainty added in quadrature for jets with p T > 100 GeV as a function of the centre-of-mass energy √ s. Both the PDF and the scale uncertainties only depend weakly on the centre-of-mass energy. For high centre-of-mass energies the uncertainties are a bit smaller.
6 Application example: PDF fit including DIS data and jet production data at hadron colliders An important application of the method outlined above, is the consistent inclusion of final state measurements from hadronic colliders into the final extraction of PDFs by NLO QCD fits. Measurements of final states -such as jet production or the production of lepton pairs via the Drell-Yan process -can provide important additional constraints on the proton PDFs, complementary to those from inclusive DIS data.
As a simple "proof-of-principle" example, the grid technique outlined in this paper has been used to include simulated LHC jet data into a NLO QCD fit. The fit framework used here is based on the recent ZEUS-JETS PDF, derived from a fit to inclusive DIS and jet data from HERA. Jet cross-sections from the TEVATRON or any other data than that from HERA are not used. Full details of the data-sets, PDF parameterisation and other assumptions are given elsewhere [5].
To represent the LHC data for inclusion in the fit, jet production from proton-proton collisions at a centre-of-mass energy of 14000 GeV was simulated using the JETRAD [25] program, using the CTEQ6.1 PDF [26]. Single inclusive jet cross sections, differential in p T , were obtained in three regions of rapidity: 0 < |y| < 1, 1 < |y| < 2 and 2 < |y| < 3. A grid with default parameters, as described in Sec. 4.1, was produced and interfaced to the ZEUS NLO QCD fit program. Several fits were performed, using different assumptions on the statistical and systematic uncertainties on the simulated data. The PDF uncertainties were calculated using the Hessian method [27,28], with ∆χ 2 = 1 14 .
A representative result is shown in Fig. 19. In this example, the statistical uncertainty on the simulated LHC jet data corresponds to an integrated luminosity of 10 fb −1 and uncorrelated systematic uncertainties have been assumed to be at a level of 5%. A precise jet energy scale uncertainty of 1% (corresponding to ∼ 5 − 15% on the generated cross-sections) has also been assumed, and is included as a correlated systematic in the fit. Fig. 19 a) shows the up-valence, down-valence, total sea and gluon PDF distributions as a function of x, at Q 2 = 10000 GeV 2 . The shaded band shows the results of the fit including the simulated LHC jet data. In Fig. 19 b), the fractional uncertainties on the gluon PDF, at a number of Q 2 values, are shown.
Comparison with the results from a fit which does not include the simulated LHC jet data indicates that some constraint on the high-x gluon could be provided by the LHC single inclusive jet data 15 . However, this is reliant on a very precise knowledge of the jet energy scale.
In fact, according to this study, a precise knowledge of the jet energy scale is the key factor. Other fits, which assumed a smaller integrated luminosity (1 fb −1 ) or larger uncorrelated systematics (10%), still indicated an improvement on the gluon uncertainties, provided the jet energy scale uncertainty was kept at a level of ∼ 1%. However, fits in which the latter uncertainty was assumed to be larger, indicated little or no improvement in the gluon uncertainty compared to the reference. More details can be found in ref. [31]. Such precision on the jet energy scale is achievable, but will require a lot of experimental work on the understanding of the LHC detectors. The inclusion of TEVATRON jet cross-sections in the NLO QCD fit might provide further constrains. However, it may be the case that ratios of jet cross sections -for example, in different rapidity regions -may have substantially smaller systematic uncertainties, while retaining sensitivity to the gluon density in the proton. Further constraints on the proton PDFs are also expected from Drell-Yan data measured at LHC or any other data than those from HERA. Such data sets can now be consistently included in NLO QCD fits. 15 Note that the fit without the simulated LHC jet data is not identical to the ZEUS-JETS fit since the standard ZEUS fit [5] uses the Offset method to determine the PDF uncertainties. The ZEUS fit shown here is a modified version of the published analysis, with uncertainties determined using the Hessian method, with ∆χ 2 = 1. Different treatments of experimental uncertainties in PDF analyses are discussed extensively elsewhere [27][28][29][30].

Conclusions
A technique has been developed to store the perturbative coefficients calculated by a NLO QCD Monte Carlo program in a look-up table (grid) allowing for a posteriori inclusion of an arbitrary parton density function (PDF) set and of alternative values of the strong coupling constant as well as for a posteriori variation of the renormalisation and factorisation scale. This extends a technique that has already been successfully used to analyse HERA data to the more demanding case of proton-proton collisions at LHC energies.
The technique can be used to constrain PDF uncertainties by allowing the consistent inclusion of final state observables in global QCD PDF fit analyses. This will help to increase the sensitivity of the LHC to find new physics as deviations from the Standard Model predictions.
An accuracy of better than 0.1% can be reached with reasonably small look-up tables for the single inclusive jet cross-section in the central rapidity region |y| < 1, for jet transverse momenta (p T ) from 100 to 4500 GeV and about 0.2% for jets in the forward rapidity region 2 < y < 3. Similar accuracy can be achieved for the differential cross-sections in rapidity and transverse momentum of electrons produced in Z and W -boson decays. This was examined in the central y < 0.5 and very forward y > 3 regions for transverse momentum up to p T < 500 GeV.
The look-up tables provide a powerful tool to quickly evaluate the PDF and scale uncertainties of the cross-section at various centre-of-mass energies. The most recent PDFs predict jet cross-sections in the central rapidity region within a few percent accuracy over a large range of jet transverse momenta. This technique has been successfully applied to a PDF fit using inclusive deep-inelastic scattering and jet data measured at the electron-proton collider HERA and using simulated LHC jet cross-sections. An improvement on the uncertainty of the gluon density can only be achieved if the jet energy scale is very precisely known. A more comprehensive analysis will be possible in the future, since the grid technique can be applied to most of the available NLO QCD calculations.

Appendix A: sub-processes for W-and Z-boson production
The production of W -and Z-bosons in proton-proton collisions involves flavour-dependent electro-weak couplings. Therefore, the number of sub-processes that need to be defined is larger than in the case of jet production. To reduce the number of sub-processes as much as possible, quarks are assumed to be massless and the CKM matrix elements [32,33] to describe the contributions of the various quark flavours are used.
In the case of W + -boson production 16 6 initial state combinations are needed: DU : gD : gU : where the generalised PDFs are used. They are defined as: where V ij are the CKM matrix elements. 17 For simplicity in the former equations we omitted the top contribution, since the parton densities are zero for most practical applications.

Appendix B: Automated identification of sub-processes
In general there are 169 (13×13 flavour) possible PDF combinations for proton proton collisions. In order to store only the minimal amount of information, one needs to establish which of those combinations always come with correlated weights, or equivalently one should identify the underlying physical sub-processes. So far, for each process under study, the sub-processes have been identified manually, on a case-by-case basis. However, the sub-processes can also be found in an automated way.
To simplify the discussion (without loss of generality) it will be convenient to assume that the PDFs are always evaluated at fixed values of x. For each event i and for each of the 169 PDF combination j (with PDF weight p j ), the NLO QCD program calculates matrix-element weights W ij . The total weight for the event i is j W ij p j . The PDF combinations are called channels in the following.
To identify the sub-processes, one determines the W ij weights for 169 events, giving a 169 × 169 matrix, whose i (event) index labels the rows and whose j (PDF channel) index labels the columns. One then carries out an eigenvalue decomposition of the W ij matrix. If v n denotes the n th eigenvalue and L n and R n the left and right eigenvectors (with components L ni , etc.), then as long as the there are no degenerate eigenvalues, an orthonormality relation can be written: where the normalisation is our specific choice. Then one can rewrite the W ij matrix as it being straightforward, for example, to verify that both sides satisfy i L ni W ij = v n L nj and j W ij R mj = v m R mi . Let us now assume that only N of the eigenvalues are non-zero. 18 Then eq. 23 can be interpreted as follows: there are N relevant sub-processes; each n ≤ N corresponds to a sub-process that multiplies a linear combination of PDF channels in which the contribution of channel j is L nj . In event i, sub-process n comes with a weight R ni v n . It will be convenient to denote this by w in . By virtue of the orthornormality condition eq. 22, we have that w in = j W ij R nj , i.e. to determine the weight of sub-process n (whose PDF channel combination is given by the left eigenvector L nj ) we take the right-eigenvector that corresponds to this channel and use it to right-multiply the full weight matrix, so to as to eliminate all but the contribution to the n th sub-process. The next step is to observe that the sub-processes determined for the first 169 events should hold for all remaining events. 19 So now for any event i, we can determine the weight for subprocess n, w in = j W ij R nj , using the R nj determined from the initial events. Having stored the w in one can then subsequently reconstruct the full W ij as W ij = n w in L nj .
We have verified, in the context of a number of MCFM processes, that this approach is viable in practice 20 . However, it is has yet to be fully integrated with the rest of our grid code and the results shown above are based on the manual sub-process decompositions explicitly spelled out in sections 2.4.1 and in the Appendix A.
One should be aware that while the automated suprocess decomposition yields a number of subprocesses that is identical to what can be found with manual decomposition, the specific linear combinations of PDF channels are usually different. To help understand why, one can take the example of jet production with the 7 subprocesses of eq. 12. There, rather than using qq and qq channels, one might have chosen instead to store weights for the combinations of qq + qq and qq − qq channels. More generally, one would have been free to base the grid on any 7 linearly independent combinations of the channels of eq. 12. For the automated channel decomposition process, the particular independent linear combinations that emerge depend on the random weights of the events used to identify the channels.