A meta-analysis of parton distribution functions

A"meta-analysis"is a method for comparison and combination of nonperturbative parton distribution functions (PDFs) in a nucleon obtained with heterogeneous procedures and assumptions. Each input parton distribution set is converted into a"meta-parametrization"based on a common functional form. By analyzing parameters of the meta-parametrizations from all input PDF ensembles, a combined PDF ensemble can be produced that has a smaller total number of PDF member sets than the original ensembles. The meta-parametrizations simplify the computation of the PDF uncertainty in theoretical predictions and provide an alternative to the 2010 PDF4LHC convention for combination of PDF uncertainties. As a practical example, we construct a META ensemble for computation of QCD observables at the Large Hadron Collider using the next-to-next-to-leading order PDF sets from CTEQ, MSTW, and NNPDF groups as the input. The META ensemble includes a central set that reproduces the average of LHC predictions based on the three input PDF ensembles and Hessian eigenvector sets for computing the combined PDF+$\alpha_s$ uncertainty at a common QCD coupling strength of 0.118.


I. INTRODUCTION
Parton distribution functions (PDFs) in the nucleon describe long-distance nonperturbative effects in high-energy QCD processes. Their detailed understanding is vital for the physics program at the Large Hadron Collider (LHC). First years of the LHC operation have culminated in the discovery of a Higgs scalar particle [1,2] and tested various Standard Model processes at the highest precision ever attained. No direct indications for physics beyond the Standard Model have been found so far, yet future plans [3] envision reaching much higher accuracy in new physics searches at the raised LHC energy in the next decade. Toward this goal, many theoretical predictions for the LHC must include next-to-next-to-leading order (NNLO) QCD and next-to-leading order (NLO) electroweak contributions, as well as parametrizations for nonperturbative PDFs of comparable accuracy. Modern PDFs are determined by several groups from a statistical analysis of multiple hadronic experiments utilizing diverse methods. In this paper, we discuss estimation of the full uncertainty in QCD predictions based on the combination of inputs from PDFs by several groups. The combination of PDF uncertainties is non-trivial, given a variety of considerations that go into their estimation, and has important implications for those LHC measurements where the PDF uncertainty results in a leading systematic error.
When predicting a new QCD observable X, a user may estimate the full uncertainty due to the PDFs by repeatedly computing X for every provided error PDF set and combining all such predictions according to a prescribed formula. Such estimation may cause a practical bottleneck if X is a complex observable that must be recomputed with hundreds of PDF parametrizations.
A related difficulty concerns combination of estimated PDF uncertainties on X using PDFs from several ensembles. Various groups follow incongruous procedures when defining their PDF uncertainties, which must be somehow reconciled in order for the combination to proceed. The PDF4LHC study recommended in 2010 [27,28] to find the combined 68% c.l. uncertainty for X using NLO PDFs from three groups (CT, MSTW, and NNPDF) by first calculating the 68% c.l. interval independently for each group and in accordance with the definition that the group had adopted; and then taking an envelope of the three 68% c.l. intervals as the total uncertainty. Later, this procedure was applied to the NNLO PDFs [16]. The resulting estimate of the PDF uncertainty spans all individual confidence intervals obtained under non-identical definitions. In practice, it requires the user to estimate X for multiply redundant PDF sets, many of which predict X values that are close to the central prediction and contribute little to the total uncertainty.
We will now outline an alternative approach called a "meta-analysis" in which the input PDF sets are combined before the observable X is computed. Each member set of the initial PDF ensembles is approximated at an energy scale Q 0 by an intermediary functional form dependent on PDF parameters {a}. The distribution of the member sets over the parameters {a} is analyzed, and a hypersurface spanning parameters of all PDF error sets is determined. Finally, using the shared parametrization form, we can construct a relatively small number of PDF eigenvector sets spanning all 68% c.l. error sets using the Hessian or Monte-Carlo sampling method. When the input PDF ensembles are statistically consistent with one another, the new PDF error sets reproduce the average and total PDF uncertainty of the initial PDF sets, but the number of the independent sets can be considerably reduced in the new ensemble.
The final ensemble of META PDFs constructed this way serves the same purpose as the PDF4LHC recommendation, but combines the PDFs directly in the PDF parameter space while at the same time minimizing numerical computation efforts. In general it is difficult to bring all diverse PDFs into a common framework, for example because distinct heavy-quark schemes are used at momentum scales of a few GeV. But for purposes of the LHC studies, and focusing  Table I: Input NNLO PDF ensembles considered in the meta-analysis. In the ABM analysis, 28 eigenvector sets are provided with αs(MZ ) varied around a best-fit value of 0.1134. The meta-parametrization is obtained for the ABM set for αs(MZ) = 0.118 and compared to other PDF ensembles at close αs(MZ) values.
only the x and Q ranges corresponding to the LHC kinematics, most of the difficulties can be circumvented by requiring the lowest scale Q 0 for the PDF evolution to be above the bottom quark mass, where all PDF sets evolve assuming the same number of active flavors. The effects of choosing different heavy-quark schemes are then taken into account through the boundary conditions at the low scale Q 0 . A variety of checks is performed to guarantee that the physics inputs and statistical features of the input PDF sets are preserved by the META PDF ensemble.
In the next sections, we build a combined ensemble of NNLO META PDFs for LHC studies from CT10, MSTW 2008 and NNPDF2.3 ensembles with 5 active quark flavors. These ensembles use compatible central values of the QCD coupling (α s ), similar data sets and procedures, and are in quantitative agreement [16]. The combination of such compatible sets is the simplest one to carry through, but the framework of the meta-analysis does not limit the number of the PDF ensembles that can be combined. We also derive and examine the meta-parametrizations for the ABM11 and HERAPDF1.5 sets that were not included in the PDF4LHC combination. The ABM and HERAPDF ensembles are quite distinct from the three global ensembles, hence we cannot include them into the META ensemble yet using the combination procedure that was chosen. We comment on the comparison of the ABM and HERAPDF ensembles with the META PDFs in later sections.
The paper is organized as follows. Section II introduces the meta-parametrizations to approximate the input PDFs at energies relevant for the LHC. Section III constructs a META PDF ensemble from CT10, MSTW'08, and NNPDF2.3 NNLO ensembles. In Section IV we briefly discuss phenomenological applications of the META PDFs. Section V contains a short summary.

A. Selection of input PDF ensembles
In a general-purpose PDF ensemble, parton distributions are parametrized by certain functions Φ f (x, Q) of the partonic momentum fraction x at an initial scale Q 0 of order 1 GeV and found numerically at higher scales Q by solving QCD evolution equations. The PDF depends on the flavor f of the probed parton, but we omit the index f in Φ(x, Q) to simplify the notation, assuming that all considerations apply to every parton flavor.
At low scales comparable to the charm and bottom masses, the PDFs from various groups cannot be directly combined because of different implementations of heavy-quark mass contributions. In typical LHC applications, on the other hand, the hard factorization scale Q tends to be well above the bottom quark mass m b ≈ 4.8 GeV. At such scales the PDFs from all groups evolve in the same way according to the DGLAP evolution equations with 5 active quark flavors. At even higher scales, Q 2 ≫ m 2 b , masses of charm and bottom quarks can be neglected in hardscattering contributions. We therefore can circumvent explicit treatment of heavy-quark effects by approximating the PDFs Φ(x, Q 0 ) by flexible meta-parametrizations f (x, Q 0 ; {a}) at a scale Q 0 above m b . For definiteness we choose Q 0 = 8 GeV and assume 9 independent PDF parametrizations at this scale, for g, u, u, d, d, s, s, c = c, and b = b. For simplicity we neglect small differences between c andc, and b andb PDFs.
The input PDF ensembles that will be considered in this analysis are summarized in Table I. They are obtained at the next-to-next-to-leading order (NNLO) in the QCD coupling strength α s . Every ensemble except ABM11 includes 28-100 PDF error sets probing the uncertainty due to the PDF parametrization at a fixed α s (M Z ) close to 0.118, as well as additional best-fit PDF parametrizations for alternative α s (M Z ) to examine the magnitude of the α s uncertainty. The 0.118 value is compatible with the world-average QCD coupling 0.1184 ± 0.0007 [29] and will be used as a common α s (M Z ) value in most comparisons. In the ABM11 ensemble, the error PDFs are given for a lower central value of α s (M Z ) = 0.1134 and include the α s (M Z ) variation in the covariance matrix. An update of the ABM11 analysis tuned to the LHC data, called ABM12, has been released very recently [8]. Only the ABM11 ensemble provides a member set for α s (M Z ) = 0.118 from their α s series (but not the PDF error sets at this α s ). Because the ABM error sets correspond to a lower α s (M Z ) value, they should not be compared on the same footing with the error PDFs for a fixed α s (M Z ) ≈ 0.118 from the other groups. For this comparison, we will examine the meta-parametrization for the ABM11 member set at α s (M Z ) = 0.118, as well as LHC predictions for ABM12 using the central α s (M Z ) = 0.1132. Note that, for making theoretical predictions, the ABM group recommends to use their PDF sets corresponding to their best-fit α s (M Z ) ≈ 0.113.
The dependence of the input PDFs on the momentum fraction x at the scale Q = 8 GeV is illustrated in Fig. 1. The envelopes represent the 90% c.l. PDF uncertainties of CT10. All the other PDFs are normalized to the CT10 central PDF. All PDFs are reasonably close at Q = 8 GeV, especially CT10, MSTW, and NNPDF2.3.   Lines with red, blue, green, gray, and magenta colors correspond to CT10, MSTW'08, NNPDF2.3, HERAPDF1.5, and ABM11 ensembles, respectively.

Comparison of NNLO
There exist two common methods for the statistical likelihood analysis of the PDFs, Hessian method [12,13] and Monte Carlo (MC) sampling method [14,15]. One major distinction between them is that, generally speaking, the Hessian approach provides 2N eig PDF error sets (called "eigenvector sets") that only estimate a given confidence interval for the PDFs without specifying the actual probability distribution. The MC method returns N rep PDF error sets called "replicas" to estimate the probability density for the desired QCD observable.
The following simplest statistics can be computed with either method.
1. The central prediction for a QCD observable X, such as a cross section or the PDF itself, is the value X H C given by the most probable PDF in the Hessian approach (corresponding to the lowest log-likelihood χ 2 in the global fit); or the mean value X M C = X replicas of predictions from all the replicas in the MC method. 2. The PDF uncertainty in the Hessian method estimates the boundaries of the confidence interval on X, usually at the 68% or 90% c.l. Both the symmetric [12,13] uncertainty, δ H , and asymmetric uncertainties δ H ± [30] can be estimated: where X ± i are the values of X corresponding to the upper and lower boundary of the confidence interval for the parameter a i . These formulas are used by the CTEQ, HERA, and MSTW groups, while the ABM group uses a variant of the Hessian master formula that estimates the symmetric uncertainty from the differences between the predictions based on the central PDF and one error set for each eigenvector direction [7].
In the MC approach, the symmetric 68% c.l. PDF uncertainty is given by a standard deviation δ M : 3. The tolerance (error) ellipse is introduced to study correlations between two observables, e.g., X and Y . In the Hessian approach, we first define the correlation angle ∆ϕ as [12,30,31] cos(∆ϕ) The tolerance ellipse is the projection onto the XY plane of the Hessian hypersphere spanned by the eigenvector sets in the linear approximation for PDF dependence of X and Y . The ellipse can be found from the parametric equations for 0 ≤ θ < 2π.
In the MC approach we can also define the error ellipse by an equation where ρ ≡ cov(X, Y )/ δ M (X) δ M (Y ) is the correlation of X and Y , and Projections of the ellipse on the X and Y axes coincide with the confidence intervals on X and Y determined by p 0 , for instance p 0 = 1 (1.64) for the 68 (90)% c.l. Eqs. (5) and (7) delineate the regions of the prescribed confidence when the distributions of X and Y are close to Gaussian ones. The equivalence of the Hessian and MC error ellipses in the Gaussian case can be demonstrated by identifying ρ = cos(∆ϕ).

B. Parametrization and sum rules
The functional forms for Φ(x, Q 0 ) are often assumed to behave as ∼ x a2 (1−x) a3 in the x → 0 and x → 1 limits on the basis of Regge and quark counting arguments. They must satisfy the valence quark number and momentum sum rules and predict positive cross sections. The free parameters of Φ(x, Q 0 ) are found from a fit, their total number is chosen so as to provide very flexible functional forms agreeing with the data, but without overfitting the data. NNPDF2.3 uses a neural network with 259 parameters to minimize the parametrization bias, while the CT10 and MSTW'2008 NNLO ensembles keep 25 and 20 free parameters, respectively. Another convenient form utilizes Chebyshev orthogonal polynomials T i (y(x)) [32][33][34]. This form is particularly suited for constructing the meta-parametrizations and will be employed in our analysis.
In those x ranges where the experimental data impose tight constraints, the number of effective degrees of freedom is smaller than in the full PDF ensemble, so a smaller number of parameters is needed to approximate the acceptable PDF shapes. We will therefore fit the effective parametrizations to the input PDFs only in the x ranges where sufficient experimental constraints are available. The boundaries of these ranges are taken to satisfy x > 3 · 10 −5 for all quark flavors; x < 0.8 for the gluon and u, d, c, b quarks; x < 0.4 forū,d quarks; and x < 0.3 for s,s quarks. By construction, the uncertainty bands of the input PDFs and their meta-parametrizations agree well in the fitted x regions. Outside these ranges, the meta-parametrizations are determined by extrapolation and span a wide uncertainty band, which is close, although not identical, to the original PDF uncertainty (which has a large uncertainty of its own at such x). The PDFs in the outside (unfitted) regions are hardly constrained at the moment and have negligible contributions to most LHC observables even at 14 TeV.
The specific effective form that we choose at the scale Q 0 for each flavor is It satisfies the above asymptotic behaviors, ∼ x a2 at x → 0 and ∼ (1 − x) a3 at x → 1, while the detailed shape is regulated by the Chebyshev polynomials, T j (y(x)) with j ≥ 1, bound to lie between -1 to 1 for −1 ≤ y ≤ 1. The positivity condition is automatically satisfied for 0 ≤ x ≤ 1, in accord with the input PDFs, which are positive for all groups at the x and Q 0 values we chose. The function y(x) maps the 0 ≤ x ≤ 1 interval onto the −1 ≤ y ≤ 1 interval. The form of y(x) is selected so as to avoid large cancellations between the coefficients of Chebyshev polynomials in the y → ±1 limits, hence to reduce the number of T i (y(x)) needed for approximating Φ(x, Q 0 ). We select y = cos(πx β ) with β = 1/4 as a mapping function that generally requires fewer Chebyshev polynomials than the other tried form y = 1 − 2x α with α = 1/2 suggested by [32]. The choice of y(x) is illustrated in Fig. 2, where the logarithmic derivatives d(ln f )/da i are compared for y = 1 − 2x 1/2 in the left subfigure and y = cos(πx 1/4 ) in the right subfigure. From Eq. (9) we have thus ln f is linear in the a i parameters. The coefficients d ln f /da i = T 1 (y(x)) − 1, T 2 (y(x)) − 1, ... for i ≥ 4 are shown by blue solid lines. With the choice y = cos(πx 1/4 ) in the right inset, the oscillations of the polynomials are stretched across a wider span of x, resulting in a better approximation of the PDF shapes.   Figure 3: Partial integrals over the fitted x ranges for the momentum sum rule, u-valence sum rule, and d-valence sum rule at Q0 = 8 GeV. In each inset, the points from left to right are found from CT10 (up triangles), MSTW2008 (down triangles), NNPDF2.3 (squares), HERAPDF1.5 (diamonds), and ABM11 (circle) PDF sets.
We evaluate Φ(x k , Q 0 ) and f (x k , Q 0 ; {a}) on a lattice of momentum fractions {x k } for each flavor and fit f to Φ by minimizing a metric function where For each of the 9 flavors, f (x, Q 0 ; a) depends on at least three important parameters, a 1,2,3 . The number of additional Chebyshev polynomials varies depending on the complexity of the PDF shape and is especially large for the NNPDF parametrizations that oscillate. By trial and error, we found that all current NNLO PDFs can be approximated by including up to order-5 polynomials T j for the gluon, u and d quarks; and up to order-4 polynomials for other flavors. This will be our default choice, rendering a total of 66 PDF parameters. The differences between the input and fitted PDF uncertainty bands are much smaller than the uncertainty bands in this case. The order of Chebyshev polynomials is still low enough to avoid large cancelations between their coefficients. We checked that the sum rules are automatically preserved by the approximate parametrizations. The integrals of the meta-parametrizations must obey In Fig. 3, we evaluate partial contributions to these integrals over the fitted x regions at the scale Q 0 . For all input PDFs, the partial integrals render an average of 0.996 for the momentum sum, 1.99 for the u-valence sum, and 0.997 for the d-valence sum. Thus the sum rules are satisfied by the approximate parametrizations even if they are not enforced by an explicit condition, and if integration is only over the selected partial x regions. In Figs. 4 and 5, we compare the input NNLO PDFs (solid lines) for 5 PDF ensembles to their respective metaparametrizations, referred to as "fitted PDFs" (dashed lines). The comparisons are made at scales Q = 8 and 85 GeV after evolution of the fitted PDFs that will be explained a bit later. Due to the limited space, we only show results for the gluon PDF g(x, Q) as a representative example. Analogous plots for all flavors and PDF ensembles are available at [35].
In each subfigure, we see the ratio of the central fitted PDF to the central input PDF for each ensemble, indicated by the dashed line at the center. The bands of the 90% c.l. PDF uncertainties for the input and fitted PDFs are also shown, normalized to the central input PDF. When the input ensemble provides only 68% c.l. error sets, the 90% c.l. errors are obtained by multiplying the 68% c.l. ones by 1.64. For ABM11, only the best-fit PDF with α s (M Z ) = 0.118 is presented, since their PDF error sets correspond to a lower central α s (M Z ) and include the α s (M Z ) uncertainty in the covariance matrix.
Apart from not too consequential differences at very small or large x values, the agreement between the input and fitted PDFs is good, especially for CT10. For MSTW at Q = 8 GeV and, similarly, for HERAPDF, only for x below 10 −4 the fitted PDFs show a systematically upward shift due to the double pole structure of the MSTW gluon PDF [5] that is absent in our functional form. It is even more interesting to examine the fit to NNPDF2.3, which originally has a much more flexible parametrization with a total of 259 PDF parameters. From Fig. 4, we find that the fitted PDFs show almost the same statistical features as the original NNPDF except for the PDF uncertainties in the region with x < 10 −4 . The differences between the input and fitted PDF errors are further reduced at higher scales, such as Q = 85 GeV in Fig. 5. Since the PDF uncertainties are not well defined themselves in the regions with poor experimental constraints, these small differences outside of the typical x − Q region of the LHC will hardly matter in practice.            In principle, above the initial scale Q 0 = 8 GeV, the selected NNLO PDFs and associated QCD coupling strengths α s follow identical evolution equations based on 3-loop QCD splitting functions with 5 active flavors [36,37], since Q 0 is above the PDF transition threshold from 4 to 5 flavors used in the variable flavor number schemes. On the other hand, some numerical differences can't be excluded initially in numerical implementation of these equations by various groups. To benchmark the α s running and PDF evolution above Q 0 , we compared the tabulated Q dependence of the PDFs and α s (Q) from the five ensembles available in the LHAPDF library [38] to the explicit evolution of their respective input parametrizations done with the computer code HOPPET [39]. The benchmarking comparison is summarized in the appendix. Acceptable agreement between the tabulated and HOPPET evolution at better than 1-2% is observed for all PDF ensembles and at practically all x and Q.
We can also examine the quality of fits to individual error sets from each group, as is illustrated in Figs. 6-9. For instance, in Fig. 6 the green long-dashed lines indicate the ratios of 50 PDF eigenvector sets in CT10 to the central set. The blue short-dashed lines indicate the ratio of the meta-parametrization to the input parametrization for each eigenvector set. The differences between the input and meta-parametrizations are below 2% in most cases, much smaller than the spread of the error PDFs. Similar level of agreement is reached for the MSTW and and HERA PDFs in Figs. 7 and 8, where slightly larger deviations (up to about 5%) are observed at x < 10 −4 . Finally, for NNPDF in Fig. 9, the meta-parametrizations tend to have fewer oscillations than the input error PDF sets because of the functional form we chose. The differences between the individual error PDFs of the input and fitted NNPDF ensembles are larger in this case, but cancel well in the full PDF uncertainty, which comes to be about the same in the input and fitted ensembles as discussed above.

C. Comparison of meta-PDF parameters from various ensembles
Once all input ensembles are converted into a shared functional form, it becomes possible to directly examine and compare their meta-parameters as an alternative to the usual comparisons of the PDF shapes in the x space. The distributions of the meta-parameters follow a variety of trends [35], some of each are illustrated in Figs. 10 and 11. Here we show the probability distributions for select pairs of the meta-parameters from the five fitted ensembles, with f (i) in the axis labels indicating the parameter a i in the meta-parametrization (9) for flavor f =s,ū, etc. The discrete markers and lines indicate the central values and 90% c.l. error ellipses computed according to Eqs. (5)-(7) for each fitted ensemble. Blue, red, green, gray, and magenta colors correspond to the CT, MSTW, NNPDF, HERAPDF, and ABM ensembles, respectively.
In general, the figures indicate good agreement between the recent NNLO PDFs. While the CT, MSTW, and NNPDF parameters are the most compatible among themselves, the ABM and HERAPDF parameters are more likely to deviate from the averages of the three global sets, as is illustrated by some insets in Figs. 10 and 11, as well as later in Figs. 16 and 17. Fig. 10 shows distributions of the parameter a 3 that controls the PDFs in the x → 1 limit. The values of a 3 are well constrained for u and d quarks, but the spread of a 3 for sea flavors is much wider. [The axis ranges are chosen to coincide in most subfigures to visually compare the spread of the a 3 parameters for various flavors.] For the HERA ensemble, the central a 3 value for u is outside of the 90% error ellipses of the global ensemble, and similarly, the central a 3 value of ABM for c quark is barely on the boundary of the 90% ellipses of the global ensembles. It is also remarkable that the a 3 parameters for many flavors are not correlated: the axes of the corresponding error ellipses are oriented along the axes of the plot, rather than along the diagonals. On the other hand, in Fig. 11 we observe strong correlations between the PDF parameters of different flavors that are due to the basic physical properties of the PDFs, e.g., close connections between the heavy-quark PDFs and gluon, strangeness and anti-strangeness,ū and d.

D. Predictions based on the input and fitted PDFs for the LHC
As another test of consistency between the input and fitted PDF ensembles, we compared their predictions for several common observables at the LHC. We examined a group of observables that probe various combinations of PDFs in the typical LHC measurements, as discussed in [4]. They include the total cross sections for W and Z boson production (with a decay into a lepton pair) at NNLO [40,Vrap]; SM Higgs boson production via gg and bb fusion, computed at NNLO with iHixs1.3 [41,iHixs1.3]; tt production with m t = 173.5 GeV at partial NNLO [42][43][44][45] and including resummed contributions [46][47][48], implemented in TOP++1.5; as well as differential distributions of single-inclusive jet production at NLO [49,FastNLO2.0] for ATLAS kinematical bins [50].
The corresponding central predictions and their PDF uncertainties for CT10, MSTW'08, and NNPDF2. The agreement between the input and fitted PDFs (including for the α s series) is nearly perfect for CT10 and NNPDF2.3. Minor differences can be noticed upon closer examination, but they are always smaller than the PDF uncertainties. For some MSTW2008 predictions, there is an overall shift (< 1%) due to the small difference between their tabulated and the HOPPET numerical DGLAP evolution mentioned in the appendix.
The 90% c.l. error ellipses, computed according to Eqs. (5)-(7) for the above LHC cross sections, are plotted in Fig. 15. As before, the solid (dotted) ellipses represent predictions based on the input (fitted) PDFs for CT10 (in blue), MSTW (in red), and NNPDF (in green). The locations and shapes of the original error ellipses are well preserved by the meta-parametrizations. From these comparisons we conclude that the chosen meta-parametrization form with 66 parameters in Eq. (9) reproduces well the central LHC cross sections, their PDF uncertainties, and correlated flavor dependence of all input PDF sets. We can now proceed to the combination of the error PDFs into a META ensemble. † To compute the αs uncertainty, we also fit the αs PDF series of all the groups.   Figure 11: Correlations between representative pairs of meta-PDF parameters, for the same PDF ensembles as in Fig. 10.

A. Overview
In the previous section, we have shown that, by using a shared PDF parametrization form, we can closely approximate NNLO error PDFs from all groups at the initial scale of Q 0 = 8 GeV. Even more, if numerical evolution by HOPPET is adopted, the fitted PDFs also agree with the input PDFs at all scales above Q 0 .
Using these meta-parametrizations, we can create a META ensemble of PDFs from all groups that would serve similar purposes as the PDF4LHC recommendation [27]. For the combination to proceed, the PDFs must use compatible values of α s , heavy-quark masses, number of active flavors, and other physics inputs. Furthermore, it is desirable to have about the same number of experimental points fitted in each input ensemble to be allowed to use unweighted averages when combining them, as we do below.
Thus we choose to combine the meta-parametrizations from CT10, MSTW2008, and NNPDF2.3 with α s (M Z ) = 0.118, 0.11707, and 0.118, which satisfy these requirements. In the META PDF ensemble we will assume a common α s (M Z ) = 0.118. The available error sets from MSTW correspond to a lower α s (M Z ) = 0.11707, but MSTW also provides an additional central set corresponding to α s (M Z ) = 0.118. We estimate the meta-parameters of all MSTW error PDF sets shifted to 0.118 by adding the differences between the parameters of MSTW central sets for 0.118 and 0.11707 to the parameters of the MSTW error PDF sets for 0.11707. In the future, we hope that the different groups will make their PDF ensembles available for the same α s (M Z ) to facilitate their combination into META PDFs.
Before proceeding, we should warn that the definitions of the PDF uncertainties, as well as the procedures for implementing them, differ significantly from group to group. It is beyond the scope of our study to reconcile all pertinent differences among the individual input PDF sets. For now, we will assume that the PDF error sets that nominally correspond to the same confidence level can be combined.
In the Hessian approach [12,13] adopted by CT10 and MSTW, a small number of independent PDF eigenvector sets spans only the boundary of the 68% or 90% c.l. hypervolume in their respective PDF parameter space. In the Monte-Carlo (MC) approach of NNPDF [15], the same hypervolume is populated by discrete unweighted PDF replicas that can be used to reconstruct the probability density in space of PDF parameters. To combine the Hessian and MC error sets, we first generate MC replicas from the CT10 and MSTW2008 meta-sets using a prescription that is close to the one in [51]. The distribution of replicas from all ensembles will approximate the a priori unknown probability distribution in space of 66 meta-PDF parameters. With a sufficient number of replicas generated from the original sets, this probability distribution can be reconstructed with increasing accuracy.
We will follow a simplistic but not unreasonable thinking that the parameters of the META ensemble roughly obey a 66-dimensional Gaussian distribution. We will also symmetrize the PDF errors and include all PDF ensembles with the same weights to simplify many considerations. The combined META PDF uncertainties can be estimated using either the Hessian or MC approach. The Hessian approximation is particularly attractive, as we found that only about 100 eigenvector sets are required to estimate the combined confidence levels regardless of the number of the input error sets that we tried. Furthermore, if we are interested in the uncertainty of a specified observable, the number of needed Hessian META sets can be reduced to a few by the procedure dubbed 'data set diagonalization' [52], the possibility that we plan to investigate in the future.

B. Generation of MC replicas from Hessian ensembles
The first step in the derivation of the META ensemble consists in generating the MC replicas for the CT and MSTW ensembles. Initially each of them contains 2N eig Hessian eigenvector sets distributed on the boundary of their respective hypervolume, corresponding to 90% c.l. for both CT and MSTW. The MC replicas for each ensemble are produced by generating groups of N eig random numbers R j , each sampled independently from a standard normal distribution in the interval −∞ < R j < +∞. For every group of R j , we construct a pair of MC replicas f MC (x, Q 0 ) for each flavor by where f 0 (x, Q 0 ) indicates the central PDF, and f j,± (x, Q 0 ) are two Hessian error sets associated with the positive and negative displacements along the j-th eigenvector. The factor r is 1.64 to convert from the 90% to 68% c.l., assuming the Gaussian distribution of the PDF parameters. ‡ This formula symmetrizes the displacements of the MC replicas from the start, as indicated by "±". It is equivalent to the generation of MC replicas in [51], but operates with symmetrically distributed replicas. By doing this we can convert any Hessian eigenvector set into an arbitrary number N rep of MC replicas representing the probability density in {a} space. The underlying assumption (which holds well in the {x, Q} regions with sufficient kinematical constraints) is that the input probability distributions are close to the Gaussian ones, not too asymmetric, and compatible with one another. By trial and error, we found that N rep = 100 of replicas per ensemble is sufficient for the combination of errors. This is the number of MC replicas that was included from each PDF ensemble.

C. Constructing META PDFs
Given N rep values a i (k) of a parameter a i on a group g of PDF sets, we can find the expectation value a i g , sample covariance cov(a i , a j ) g , and standard deviation (δa i ) g of a i on g: When N g groups of PDF replicas (N g = 3: CT10, MSTW'08, and NNPDF2.3) are combined into a META ensemble, the expectation value a MET A of a on the whole ensemble of N rep · N g replicas can be found by averaging the expectation values on each group: The summation over "g" runs over all PDF groups. The sample covariance and standard deviation on the META ensemble are similarly related to their values on the individual groups by (assuming N rep ≫ 1) and Given these equations, the expectation value and standard deviation for any parameter or observable a on the META ensemble can be derived either by averaging on the full META ensemble according to Eqs. (16)-(18) (and identifying the group g with the whole META ensemble); or by averaging the expectation values and standard deviations on the individual input groups with the help of Eqs. (19) and (21). However, since much of the information contained in 300 replicas is redundant, we can reduce the number of needed error PDFs by constructing a smaller Hessian eigenvector ensemble that will still be sufficient for estimating the confidence intervals.
Toward this goal, we select a new basis Each new parameter a ′ i follows an independent one-dimensional Gaussian distribution with an expectation value 0 and standard deviation √ λ i . Here O ij is an orthogonal matrix, and λ i are the eigenvalues of the covariance matrix. The eigenvalues λ i in the PDF analysis span many orders of magnitude [12]. Those directions in {a ′ } space that are associated with very small λ i have negligible contributions to the PDF uncertainties, so that the corresponding parameters a ′ i can be fixed at their central values, thus reducing the number of independent eigenvectors. For each remaining eigenvector direction a ′ i , we construct two meta-PDF sets corresponding to the lower and upper boundaries of the respective 68% c.l. interval.
In the end, we obtain a set of Hessian eigenvector PDFs that spans the 68% c.l. hypersurface on the ensemble of META PDF replicas. We can repeat the same procedure to construct the 90% c.l. META PDF replicas.
In the current analysis, we found that a Hessian set with 50 eigenvectors (or 100 eigenvector sets) is sufficient for estimation of the combined PDF uncertainty. The PDF parameter associated with each eigenvector direction follows an independent Gaussian distribution, thus we can compute the PDF uncertainties and correlation angles for arbitrary observables according to the master formulas of the Hessian formalism: and The 68% or 90% c.l. hypervolumes of the META ensemble enclose the central predictions of all PDF groups, as is illustrated on the example of some gluon or u quark PDF parameters in Figs. 16

D. Combination with the αs uncertainty
To estimate the PDF+α s uncertainty using the META PDF ensemble, we provide a series of additional meta-sets for α s (M Z ) = 0.118, obtained by taking the average of the corresponding PDFs from the three groups. Depending on the PDF analysis, α s (M Z ) is either treated as an external parameter determined by its world average (as in CT10 [53]) or fitted together with the PDFs (as in MSTW [54] or NNPDF [55]). In the latter case, the best-fit value of α s (M Z ) from the PDF fit at NNLO (0.1171 from MSTW, 0.1173 from NNPDF, and 0.1132 from ABM12) tends to be lower than the latest world average of 0.1184 ± 0.0007 at 68% c.l. [29]. The PDF4LHC recommendation [27,28] suggests a slightly larger error of 0.0012 at 68% c.l. than in the world average. Note that the PDF4LHC recommendation also takes different central values of α s (M Z ) for different PDF ensembles, which enlarges the α s uncertainty in the end.
Given these options, one can come up with several ways for calculating the combined PDF+α s uncertainty, depending on the interpretation and statistical confidence assigned to the α s input. The combined uncertainty may vary significantly depending on the prescription, especially for the QCD observables that are directly sensitive to α s or the gluon PDF. Here we suggest several practical possibilities for estimating the PDF+α s uncertainty using the META PDFs.
1. We may take the world-average α s value as an external input for the PDF analysis and assume that the α s uncertainty is decoupled from the systematic errors in the fit due to missing higher-order effects. In this approach, the 68% c.l. error on α s (M Z ) is given by the world average of ±0.0007. We vary α s in this range when estimating the α s uncertainty on a QCD observable and add it to the PDF uncertainty in quadrature, which produces the PDF+α s uncertainty with full correlations, as has been discussed by CTEQ group [53].
2. An alternative prescription is to replace the input α s (M Z ) error by 0.0012 at 68% c.l. as in the PDF4LHC recommendation [27,28], and add the α s and PDF uncertainties in quadrature. The resulting PDF+α s uncer-   tainty will be smaller than the one from the PDF4LHC recommendation, as a single value of 0.118, rather than the PDF4LHC envelope of central α s values from all ensembles, is used in the META case.
3. Lastly, we can enlarge the input α s (M Z ) error to 0.002 at 68% c.l. and add the α s and PDF uncertainties linearly. This prescription fully covers the preferred α s values by different groups and is numerically close to the PDF+α s uncertainty based on the PDF4LHC envelope.
The first two conventions predict smaller α s and PDF+α s uncertainties than the envelope prescription used by the 2010 PDF4LHC study [27,28] and 2012 PDF benchmarking study [16]. In these cases the PDF uncertainty always dominates in the combined uncertainty. The third convention turns out to be numerically close to the envelope prescription. In our remaining comparisons the PDF+α s uncertainties are estimated based on the third convention.

IV. PHENOMENOLOGICAL APPLICATIONS
The META PDF ensemble incorporates measurements from HERA, fixed-target experiments, and Tevatron. Although the NNPDF2.3 ensemble used here includes some of the early LHC data in the fit, like jet production cross sections [50], W and Z rapidity distributions [56,57], and W electron charge asymmetry [58], we do not expect these relatively weak LHC constraints to impact strongly the META PDFs' behavior. Thus the META ensemble can be used for any LHC predictions based on non-LHC measurements.
Before discussing the LHC applications, we would like to review two important features of the META PDFs. First, as mentioned earlier, the sum rule constraints are not enforced directly in the meta-fit. After obtaining our final  META PDFs, we plot the partial integrals for the momentum and valence sums in Fig. 20, in the same fashion as in Fig. 3. The sum rules are obeyed by the META PDFs at the accuracy that is similar, or better than, the individual PDF ensembles.
The Hessian method of error propagation relies on the assumption of the approximately linear dependence of QCD observables on the PDF parameters in the vicinity of the central fit. Uncertainties of the META PDF parameters are symmetric by definition, but the uncertainties of the PDF functional forms are generally not. We verified the validity of the linear approximation by first checking the asymmetry |δ H + −δ H − |/(δ H + +δ H − ) of the uncertainties on the PDFs and PDF luminosities, where δ H ± indicate the asymmetric errors according to Eq. (22). For example, in Fig. 21 we plot the asymmetry of PDF errors of different flavors, which gives an estimation of the non-linear behavior of the PDFs due to both the parametrization and the PDF evolution. The asymmetries observed in the figure are very small, at the level of a few percent in the region with x < 0.1. They can exceed 15% in the large x region, where the PDFs are not well constrained. As Q increases from 8 to 10000 GeV, the asymmetry slowly spreads from large toward smaller x as a consequence of the PDF evolution. In Fig. 22 we further show the asymmetry of the PDF uncertainty of the parton luminosities in different kinematic regions at the LHC 8 TeV [59], as functions of the invariant mass and rapidity of the produced final state. These asymmetries are also small except for very large invariant masses or rapidities that correspond to the x values close to 1. By investigating the asymmetry of the PDFs as well as of sample LHC cross sections, we conclude that the linear dependence of most LHC observables on small variations of the PDF parameters is well satisfied.
Using the eigenvector sets and α s series of the META ensemble, we can calculate central predictions and uncertainties of the cross sections at the LHC and compare them with the results from individual groups. As an example, Figs. 23-25 show NNLO predictions for W, Z, SM Higgs, tt, and NLO predictions for inclusive jet production, using the same settings as in Figs. 12-14  The PDF or PDF+α s uncertainties of the META ensemble are slightly smaller than the envelope prescription in the benchmarking study [16]. For instance, the NNLO Higgs cross section through gluon fusion is 18.75 ± 1.24 pb for LHC 8 TeV according to the envelope prescription, while it is 18.78 ± 1.15 pb for the META PDF. Fig. 26 shows the 90% c.l. tolerance ellipses for pairs of cross sections computed using either the META PDFs or original PDF ensembles with α s (M Z ) = 0.118. The ellipses are computed according to Eq. (5) and correspond to the probability density contours of 90%. The ellipses of the META PDFs preserve the PDF-induced (anti-)correlations observed in the three input PDF ensembles.

V. SUMMARY
Faithful estimation of the PDF dependence in theoretical predictions is relevant practically and challenging conceptually. Diversity of PDF ensembles provided by several groups reflects a variety of factors impacting the PDFs in modern QCD calculations. Comparisons of PDFs deal with their miscellaneous assumptions and disparate formats, as well as with large numbers of member sets included in the PDF ensembles. To estimate the net uncertainty in QCD cross sections due to the PDF inputs, theoretical predictions are currently calculated on a process-by-process basis for multiple PDF sets and combined at the final stage based on a certain prescription, e.g., the PDF4LHC recommendation. Such approach, while providing a reasonable uncertainty estimate, handles the PDFs inefficiently and promotes lengthy computations. Much of the information contained in the input PDF sets is discarded in the final combination. Development of fast interpolation interfaces for (N)NLO cross sections, such as ApplGrid [60] or FastNLO [49,61], speeds up the computation, but does not eliminate the key hurdle of inefficient information processing, especially when simulations involve many scattering processes.
The meta-analysis described in this study follows an alternative approach in which a variety of PDFs are combined into a single PDF ensemble before the individual theoretical predictions are computed. Such ensemble is suitable for most LHC applications and consists of a relatively small number of error PDFs that reproduce the total PDF+α s uncertainty provided by an arbitrary number of the input PDF ensembles. In the specific example that we constructed, a META ensemble is comprised of 100 Hessian eigenvector sets for evaluating the 68% c.l. combined uncertainty of CT10, MSTW'2008, and NNPDF2.3 at NNLO. The three ensembles are selected because they can be combined with minimal obstacles. They correspond to close values of the QCD coupling strength α s (M Z ) (compatible with the world average of 0.118) and, although they follow different heavy-quark schemes, these differences can be absorbed into their parametrizations at the ininitial evolution scale of META PDFs chosen to be above the b quark mass (at 8 GeV).
Each error set of the input ensembles is approximated by an auxiliary functional form (a meta-parametrization)  in the kinematical range typical for LHC studies, taken to be x > 10 −5 and 8 < Q < 10000 GeV. As the metaparametrizations of all error sets share the same functional form, distributions of their free parameters (66 in total) can be easily compared. We determine these parameters at a common coupling α s (M Z ) of 0.118. Small differences in the input α s values are compensated for by minor shifts of the respective meta-parameters found using the α s series of the input PDF sets.
Then we combine the meta ensembles from CT10, MSTW2008 and NNPDF2.3 by generating Monte-Carlo replicas for each Hessian input ensemble and diagonalizing the covariance matrix constructed from the Monte-Carlo replicas. The final result consists of a central META PDF set, equivalent to the unweighted average of three input central sets, and 100 eigenvector META sets obtained in the Hessian method [12,13]. The PDF uncertainty for the META ensemble is computed according to the master formulas in Eq. (22). We also provide a PDF α s series of the META PDFs for calculating the PDF+α s uncertainty, including the correlations, by adding the PDF and α s uncertainties in quadrature [53]. The META PDFs are stored in a tabulated format together with an interface for their numerical interpolation at [35].
The META ensemble predicts about the same PDF+α s uncertainty in the key LHC observables as the PDF4LHC recommendation, and in addition provides a natural way to estimate correlations of different observables. In comparison to massive computation efforts characteristic of LHC simulations, the construction of the META ensemble is relatively simple and was carried out entirely by using standard functions for statistics and data analysis in Mathematica 8.
Our results present an initial attempt to combine LHC predictions from different PDF groups independently of the processes studied. We note several avenues for further developments of this approach. The simple combination procedure that was tried (equivalent to using an unweighted average for finding expectation values) would be inappropriate for including too disparate PDF ensembles or for non-Gaussian parameter distributions. In our case, the three  For HERAPDF1.5, a part of the difference with the global sets can be attributed to a larger PDF uncertainty on some combinations of the HERA PDF parameters, given their smaller experimental data set. Such an ensemble could be added by averaging the META parameters with statistical weights that account for different sizes of the experimental data samples.
The error PDFs of the future META ensembles can be obtained using either the Monte Carlo (MC) sampling or Hessian methods. In the MC approach, probability distributions of arbitrary complexity can be in principle described. Constraints from new experiments can be imposed using the PDF reweighting technique [14,62,63] in either MC or Hessian approaches. The method of data set diagonalization [52] can be applied in the case of the Hessian representation to find a small number of eigenvector sets that dominate the PDF uncertainty of given observables. While the 100 META eigenvector sets are designated for a wide range of LHC theoretical predictions, a much smaller number of eigenvector sets is expected to dominate the uncertainty in specific processes, such as massive electroweak boson production. The framework of the meta-analysis is well-suited for exploring these possibilities.  In this appendix, we verify that the dependence of α s (Q) and PDFs Φ(x, Q) on the scale Q in the input ensembles is compatible and allows their combination. We compared the tabulated evolution of α s (Q) and Φ(x, Q) for five ensembles from the LHAPDF library [38] with the numerical evolution by the HOPPET program [39] from the initial scale Q 0 = 8 GeV up to 10000 GeV. In all cases, we assume 3 QCD loops and 5 flavors in the QCD beta-function and splitting kernels. Fig. 27 shows the ratio of α s (Q) that we get from LHAPDF to the corresponding values from HOPPET using the original α s (M Z ) values of the input ensembles listed in Table I. The agreement of the tabulated and evolved α s (Q) is perfect for all ensembles except for HERAPDF, where a kink at the top quark mass occurs because HERAPDF uses the 6-flavor α s (Q) above the top quark threshold.
The PDF evolution is examined by computing ratios of Φ(x, Q) returned by LHAPDF and HOPPET at several Q values above 8 GeV. The x-dependent ratios at Q = 500 GeV shown in Fig. 28 are representative of the general trend at other Q values. We observe excellent agreement at the level of 0.1% between LHAPDF and HOPPET for both CT10 and NNPDF almost across the entire shown region of 10 −5 ≤ x < 1, except for x 0.5, where differences in the Q dependence can be a few percent for all sea PDFs. In the case of ABM, fluctuations for all sea PDFs are observed at somewhat smaller values of x of about 0.3, but again they happen in the region where the sea PDFs are small, and the accuracy of LHAPDF interpolation is sufficient compared to the PDF uncertainties. The tabulated and numerical evolution of quark PDFs from MSTW shows a systematic discrepancy reaching 1% at x < 0.05, while the tabulated and numerical evolution of the MSTW gluon is in a perfect agreement. The tabulated and evolved HERA PDFs differ by ≈ 0.5% across most x, which is expected at Q = 500 GeV given the 5-flavor evolution in HOPPET and 6-flavor evolution of the HERAPDF's tabulated α s (Q).  x Ratio of b x Figure 28: Comparison of the PDF evolution from different best-fit PDF sets with HOPPET for Q = 500 GeV.