PDF reweighting in the Hessian matrix approach

We introduce the Hessian reweighting of parton distribution functions (PDFs). Similarly to the better-known Bayesian methods, its purpose is to address the compatibility of new data and the quantitative modifications they induce within an existing set of PDFs. By construction, the method discussed here applies to the PDF fits that carried out a Hessian error analysis using a non-zero tolerance $\Delta\chi^2$. The principle is validated by considering a simple, transparent example. We are also able to establish an agreement with the Bayesian technique provided that the tolerance criterion is appropriately accounted for and that a purely exponential Bayesian likelihood is assumed. As a practical example, we discuss the inclusive jet production at the LHC.


Introduction
A large part of the present high-energy collider physics depends, in one way or another, on the knowledge of the parton distribution functions (PDFs). The use of PDFs leans on a cornerstone theorem of Quantum Chromo Dynamics (QCD), the collinear factorization [1,2]. Although it can be formally proven only for the most simple cases, it is often assumed to work in general. Ultimately, it is the agreement with the experimental data that decides whether such assumption is valid.
The PDFs are traditionally determined in global analyses [3][4][5][6] finding the parameters that can optimally reproduce a variety of experimental data. This is a complex procedure requiring the ability to efficiently solve the parton evolution equations and to calculate higher-order QCD cross-sections. Nontrivial issues are also the actual way of finding the best fit and quantifying its uncertainties. Although various PDF parametrizations are publicly available for a general user, for a long period of time it was difficult for e.g. an experimental collaboration to understand what would be the implications of their measurements in the context of a global PDF fit. For example, although a given measurement would be known to be most sensitive to, say, the up quark distribution, some other data in a global fit may already provide the up quarks with more stringent constraints, and the real advantage of the measurement would be due to a sub-leading contribution, say, from the strange quarks.
The Bayesian reweighting technique, first introduced in [7] and later on elaborated by the Neural Network PDF (NNPDF) collaboration [8,9], provides a way of addressing the consistency and quantitative effects of a new experimental evidence in terms of PDF fits.
In essence, the underlying probability distribution, represented in the NNPDF philosophy [6] by an ensemble (∼ 1000) of PDF replicas, is updated by assigning each replica a certain weight based on how well it reproduces the new data. This method has become an increasingly popular way to estimate the effects of e.g. new LHC measurements [10][11][12][13][14][15][16]. However, the majority of the existing PDF fits use a rather different way of quantifying the PDFs and their uncertainties. Along with the best fit found by χ 2 minimization, it is customary to provide a collection (∼ 50) of Hessian error sets [17] that quantify the neighborhood of the central fit within a certain confidence criterion ∆χ 2 . An extension of the Bayesian reweighting technique to this particular case was suggested in [18], and has thereafter been used in some occasions [11,19,20]. However, a recent study [21] revealed some imperfections when comparing the results from reweighting to the ones obtained by a direct fit done in the usual way.
Here, we take a different strategy. Based on the ideas presented in Ref. [22], our principal goal is to show how a general user can directly study the consistency and consequences of a new data set within an existing set of PDFs that comes with the Hessian error sets without having to generate multitude of PDF replicas. The method naturally incorporates the confidence criterion ∆χ 2 defined in the original fit. By considering a simple numerical example, we argue that the method is perfectly compatible with a new fit. Our second objective is to understand how the procedure suggested here relates to the Bayesian reweighting and to the discrepancies found in [21]. We prove that the original Bayesian method, proposed in [7] and advocated recently in [23], is equivalent with the one introduced here once the ∆χ 2 criterion is properly incorporated.

The Hessian method
The usual definition of an optimum correspondence between data and a set of PDFs f ≡ f (x, Q 2 ) that depends on certain fit parameters {a}, is the minimum of a χ 2 -function. In its most simple form, we can write it as . Modifications to this definition are necessary if the experimental errors are correlated or some data sets are emphasized in the fit by assigning them an additional weight. In the Hessian approach to quantify the PDF errors [17], the behaviour of χ 2 around the best fit S 0 is approximated by a second order polynomial in the space of fit parameters {a} where δa j ≡ a j − a 0 j are the excursions from the best-fit values and χ 2 0 is the minimum value of χ 2 . Being symmetric, the Hessian matrix H ij has N eig orthonormal eigenvectors v (k) and eigenvalues ǫ k satisfying Defining a new set of variables as one easily finds that That is, the transformation in Eq. (2.5) diagonalizes the Hessian matrix. At this point, a ∆χ 2 criterion is needed to specify how much the term i z 2 i can grow while the corresponding PDFs still remain "acceptable". Those PDF fits that employ the ideal choice ∆χ 2 = 1 are usually limited to a subset of world data [24,25], while the global fits prefer to take ∆χ 2 > 1 [4,5] to allow inconsistencies among different data sets. It follows that the corresponding uncertainty for a PDF-dependent quantity O = O[f ] can be computed as (2.7) An essential feature of the Hessian approach is the introduction of the PDF error sets S ± k , defined customarily (along with the best fit S 0 ) in the z-space as z(S 0 ) = (0, 0, ..., 0) , Using these sets, one can evaluate the derivatives in Eq. (2.7) by a linear approximation such that This formula, or a generalization for asymmetric errors, provides an extremely simple and useful recipe for propagating the PDF-uncertainties to observables.

The Hessian reweighting
Let us now consider a new set of data y = y 1 , y 2 , . . . , y N data with covariance matrix C. Our goal here is twofold: to find out whether these new data are consistent within the original set of PDFs and, if so, what would be the effect of incorporating them into the original analysis.
In order to answer these questions, we consider a function χ 2 new defined as where we have simply added the contribution of the new data on top of the "old" χ 2 in Eq. (2.6). Using a similar linear approximation as earlier, we can estimate the theoretical values y i [f ] in arbitrary z-space coordinates by where we have defined Thus, χ 2 new is a continuous, quadratic function of the parameters w k , and its minimum is given by where the matrix B and vector a are An important feature of the solution is the "penalty term" which we can use to decide whether the new data set is consistent within the original PDFs. First, if P ≪ ∆χ 2 the new data could be incorporated into the original fit without causing a conflict with the other data. On the other hand, if P ∆χ 2 the new data appears to display tension with the considered set of PDFs. This, however, does not necessarily mean that the new data would be incompatible with the other data. A situation like this may arise if the new data probe unconstrained components of PDFs whose behaviour was fixed by hand. For example, some recent PDF fits [26] still assume s(x) ∝ (u(x) + d(x)) for the strange quark distribution and confronted with data sensitive to the strange quarks could lead to this kind of situation.
The components of the weight vector w min also specify the set of PDFs f new that corresponds to the new global minimum. They can be easily calculated by taking y i = f (x, Q 2 ) in Eq. (3.2). That is, The resulting new PDFs are linear combinations of the original ones -they have been "reweighted". We note that the new PDFs constructed in this way still satisfy the necessary sum rules. For instance, as the original best fit S 0 and the error sets S ± k satisfy the momentum sum rule Due to the linearity of the parton evolution equations, also f new satisfies them. Thus, the reweighted distributions comprise a proper set of PDFs which can be consistently utilized in pQCD calculations. Furthermore, assuming that the original confidence criterion is not much altered by the inclusion of the new data, one can also construct the new PDF error sets. Indeed, Eq. (3.1) can be rewritten as where δw = w − w min , and the matrix B takes the role of the Hessian matrix (compare to Eq. (2.2)). This can be brought into a diagonal form by an analogue of the transformation in Eq. (2.5) and the new error sets defined exactly as was done in Eq. (2.8).

Non-linear extension of the Hessian reweighting
The linear approximation of Eq. (3.2) can be improved by including also quadratic terms of w k as correcting for the possible non-linear behaviour new becomes a quartic function of w k and its minimum can be found using numerical methods. The corresponding PDFs can be computed by taking y i = f (x, Q 2 ) in Eq. (3.12). The matrix B in Eq. (3.11) gets replaced by where the partial derivatives read 15) and are understood to be evaluated at the found minimum.

Bayesian methods
Given a large ensemble of PDFs f k , k = 1 . . . N rep , such as those of the NNPDF collaboration [6], that represents the underlying probability distribution P old (f ) of the PDFs, one can compute the expectation value O and variance δ O for an observable O as Using the laws of statistics, the initial probability distribution P old (f ) can be updated to include also additional information contained in a new set of data y, since, by the Bayes theorem, where P( y|f ) stands for the conditional probability (the likelihood function) for the new data, given a set of PDFs. It follows that the average value for any observable depending on the PDFs becomes a weighted average where the weights ω k turn out to be proportional to the likelihood function P( y|f ). The question of how to choose the likelihood appropriately has been recently addressed in [23] but the conclusive answer, if it exists, remains to be given. Two options, corresponding to different choices of the likelihood, have been discussed in the literature. The one suggested originally by Giele and Keller (GK) [7] follows from taking P( y|f )d n y as the probability to find the new data to be confined in a differential element d n y around y resulting in where The option advocated by the NNPDF collaboration derives from taking P( y|f )dχ as the probability for the corresponding χ 2 to be confined in a differential volume dχ around χ, giving instead 1 which was shown to be consistent with a direct fit in the NNPDF framework [8,9]. It was pointed out in Ref. [23] that the former weights contain more information on the new data than the latter ones, as a given data set uniquely determines the value of χ 2 , while a fixed χ 2 may correspond to various different data sets. The generic behaviour of these weights with respect to χ 2 per number of points for N data = 10 is shown in Figure 1. While the Figure 1. Comparison of the likelihoods for the original derivation of the Bayesian re-weighting [7] (red) and the one proposed in [8,9] (blue). In this plot the number of points is n = 10.
GK weights are always higher for those replicas that give lower χ 2 , the NNPDF option obviously favors ones with χ 2 /N data ≈ 1. We note that the latter likelihood may lead to a rather pathological situation: If the value of χ 2 /N data (computed with the expectation values of the observables or as an average of the individual χ 2 k s) is less than unity before the reweighting, it can happen that the reweighting actually causes χ 2 for the new data to grow since the replicas with χ 2 /N data ≈ 1 are favored. However, if new data are directly included into a PDF fit as in Eq. (3.1), the value of χ 2 for the new data should only decrease.
The large ensemble of PDFs required by the Bayesian approach can be constructed, in analogue to Eq. (3.9), by where the coefficients R ik are random numbers drawn from a Gaussian distribution centered at zero and with variance one. An asymmetric version of Eq. (4.9) to account for nonlinearities was advocated in Ref. [18]. Specifically, it was proposed that the replicas should be generated by is chosen according to the sign of R ik . However, in this case already before the reweighting the expectation values for the observables will not, in general, match those computed directly with the central set of the original fit. To accurately compare with the linear Hessian reweighting, we stick here to the symmetric prescription of Eq. (4.9). As pointed out earlier, the replicas built in this way satisfy the PDF sum rules and the parton evolution equations. After computing the weights ω k for each replica, the reweighted PDFs can be written as 11) and, similarly to the Hessian case, one can calculate the "penalty" induced in the original fit by (4.12) We note that before reweighting w k = 1, and the sums in the parenthesis above vanish since the mean of the random numbers R ik is zero. Another useful indicator for the Bayesian methods is the effective number of replicas N eff , defined as If a given replica f k ends up having a small weight w k ≪ 1, it has a negligible effect in the new predictions computed by Eqs. (4.4) and (4.5). The value of N eff defined above serves as an estimate for such a "loss" of replicas. If N eff ≪ N rep , the method becomes inefficient and is a sign that the new data contains too much new information or that it is incompatible with the previous data. Should this happen, also the penalty in Eq. (4.12) is probably large.

Simple example
In this section, we will compare the different reweighting methods by invoking a rather simple, but illustrative example. We consider a function which resembles a typical fit function used in PDF fits. 2 We proceed as follows: • Construct a set of pseudodata (data set 1) for g(x). The value of each data point y k is computed using parameters a k i = a 0 i (1 + α r ik ), where α = 0.02, and r is a Gaussian random number. The figures in this section correspond to a 0 0 = 30, a 0 1 = 0.5, a 0 2 = 2.4, a 0 3 = 4.3. We assign each data point with an uncertainty δy k = y k (1 + 4 i=1 r 2 ik )β, with β = 0.04. The parameters a 4 = 2.4 and a 5 = −3 are taken as fixed.
• Perform a χ 2 fit with four free parameters a 0 , a 1 , a 2 , a 3 to these data, and construct the corresponding Hessian error sets using a certain ∆χ 2 criteria.
• Slightly varying the parameters a 0 i , construct a second set of pseudodata (data set 2), and apply the above-introduced reweighting techniques on these data.
• Perform a direct fit using both data sets and compare these "exact" results to the predictions given by the reweighting methods. We begin by considering the ideal case ∆χ 2 = 1, depicted in Fig. 2. We have chosen here an example in which the data set 2 (40 points) contains evidence from a region of x that the data set 1 (100 points) did not reach. In the case of Bayesian methods, we have used a sufficiently large number (10 5 ) of replicas to get rid of all numerical inaccuracies. To compare the methods as accurately as possible, the linear version of the Hessian reweighting is used throughout this section. The results shown in Fig. 2 reveal that the Hessian reweighting and the Bayesian one with GK weights agree not only with each other, but also with the "exact" result. The outcome with the chi-squared weights, however, is in clear disagreement with the rest. The reason can be understood from the distribution of weights shown in Fig. 3 for this particular case. Evidently, the chi-squared prescription gives preference to a rather different subset of replicas than the GK weights unavoidably leading to a different result. From Fig. 2 we also notice that the value of χ 2 /N data increased when using ω chi−squared k . The global fits of PDFs typically use ∆χ 2 clearly larger than unity. In our simple example here, a motivation for using ∆χ 2 > 1 could be to compensate for the restricted functional form at small values of x where the data set 1 did not have constraints [28]. Thus, we repeat the exercise taking this time ∆χ 2 = 5. The results are shown in Fig. 4. While the Hessian reweighting can still accurately reproduce the "exact" fit, now both Bayesian methods fail. The reason for the failure is that the likelihood function P( y|f ) as such does not contain any information on ∆χ 2 , although the distribution of replicas clearly depends on the value of ∆χ 2 through the PDF error sets. As the spread among the replicas encoded by ∆χ 2 = 5 is wider than that covered by ∆χ 2 = 1, the new data thereby appear more constraining than they actually are. In the case of Hessian reweighting ∆χ 2 is simply an input parameter that encodes the behaviour of the original data near the best fit and controls the size of the uncertainty bands -it does not affect the resulting new central values. However, the agreement encountered with ∆χ 2 = 1 (Fig. 2) hints that   it should be possible to generalize the Bayesian method with GK weights also to the case ∆χ 2 > 1. The key point is to note that we could divide, for example, Eq. (3.1) by ∆χ 2 , and effectively use ∆χ 2 = 1 thereafter. This observation instructs us to rescale the values of χ 2 k in Eqs. (4.6) and (4.8) as when computing the weight for each replica. The corresponding results are shown in Fig. 5 which differs from Fig. 4 only in using the above mentioned rescaling when computing the Bayesian weights. Evidently, the mutual agreement between the Bayesian method with GK weights, the Hessian reweighting, and the "exact" result, is restored. In our simple example here, the factor that mainly limits the accuracy of the reweighting is the precision of the original quadratic expansion of Eq. (2.2). Indeed, the small mismatch between the results of reweighting and the real fit e.g. in Fig. 5 can be largely attributed to this approximation not being perfect.

Equivalence of the Bayesian and Hessian reweighting
The close similarity of the results obtained using the (linear) Hessian reweighting and Bayesian one with the rescaled GK weights indicates that the two are actually one and the same. In this short section we will give a formal proof of this equivalence. From Eq. (4.11) we see that the coordinates specifying the GK-reweighted PDFs in the eigenvector space are given by where the denominator N is and we have applied the rescaling χ 2 k → χ 2 k /∆χ 2 as in Eq. (5.2). Using the expression for χ 2 in Eq. (4.7) and the linear approximation of Eq. (3.2) with w k = R kℓ , we find where χ 2 [f S 0 ] is the value of χ 2 computed with the central set S 0 , and the coefficients D ik and a k were defined in Eqs. (3.3) and (3.7). In the limit of infinitely large N rep , the sum over the replicas above can be replaced by an integral where the additional exponential stems from the probability distribution for the random numbers R mℓ being Gaussian. Using this in Eq. (6.3) above, we have where the elements of the matrix B were given in Eq. (3.6). The corresponding expression for the denominator N is Upon taking the ratio, the various prefactors cancel, and the coefficients w GK k reduce to which coincides with the Eq. (3.5) specifying the coefficients for the linear Hessian reweighting. As shown in Ref. [23], the weights w chi−squared k in Eq. (4.8) emerge from w GK k by integrating over all possible data sets that give an equal χ 2 . Being "contaminated" by such unwanted information from non-existing data readily explains their failure in the present context.

Inclusive jet production at the LHC
In this section, we apply the reweighting methods to the production of inclusive jets in pro-ton+proton collisions at the LHC. In comparison to our simple example discussed earlier, a new source of non-linearity arises which could potentially decrease the accuracy of the linear reweighting. Namely, the quadratic PDF dependence of the proton+proton cross sections (σ denotes the coefficient functions and jet defintions in general) may or may not be well approximated by which is the approximation one effectively makes when using Eq. (3.2). In some other cases, such as deep-inelastic scattering (linear in PDFs), this would not be an issue whereas for the W -asymmetry, the non-linearities could be even more intricate.
Here, we consider the recent CMS √ s = 7 TeV jet measurements [29] (N data = 133), and use the CT10NLO PDFs [4] for which ∆χ 2 CT10 = 100 and N eig = 26. In practice, we have utilized the FASTNLO interface [30][31][32] for the computations. The renormalization scale µ r and factorization scale µ f were fixed to the jet transverse momentum as µ r = µ f = p T /2, as was done also in the CT10NLO analysis. We account for the correlated systematic errors by constructing a covariance matrix to be used in the calculation of χ 2 s. To be specific, we compute the elements of the covariance matrix C by where σ uncorr i is the uncorrelated error of data point i, and β k i denotes the absolute shift of this data point corresponding to 1-sigma deviation of the systematic parameter k. In addition to the luminosity, unfolding and jet energy scale uncertainties, we also treat the quoted uncertainty in the multiplicative non-perturbative corrections (underlying event, hadronization) as a correlated systematic error. The uncorrelated errors σ uncorr As the left-hand panels, but after applying the systematic shifts.
statistical and 1% uncorrelated systematic uncertainty added in quadrature. Calculating the χ 2 using the covariance matrix C is equivalent to (see e.g. [33][34][35]) minimizing with respect to the systematic parameters s k . This occurs with the parameter values and − k s min k β k i is the net systematic shift for the data point y i . Figure 6 presents a comparison between the CMS data and the NLO predictions, including the PDF error bands. The χ 2 values for the central CT10NLO set are χ 2 uncorr /N data ≈ 1.4 adding all errors in quadrature and χ 2 corr /N data ≈ 1.9 accounting for the data correlations. Let us now discuss the adequacy of the approximation in Eq. (7.2). To this end, we have prepared some random PDF replicas by Eq. (4.9) separating the cases ℓ = We compute the jet cross sections "exactly" (by constructing parametrizations corresponding to these PDF replicas and using them in FASTNLO computations) and, on the other hand, by the linear approximation of Eq. (7.2). Typical results from such an exercise are shown in Fig. 7. First, if ℓ < 1, the linear approximation proves rather accurate, the deviations being normally much less than couple of percents. In the latter case, ℓ > 1, the linear approximation evidently breaks down. We conclude that if the reweighted PDFs end up sufficiently close to the original ones the linear approximation of Eq. (7.2), and thereby the linear Hessian reweighting, should be rather accurate. In the case of Bayesian reweighting, the replicas are very often way outside the ∆χ 2 hypersphere, and, from the lower panel of Fig. 7, we can anticipate some differences whether the cross sections are evaluated using the replicas directly inside FASTNLO, or "on the fly" by the linear approximation of Eq. (7.2).
It is a straightforward task to apply the reweighting methods to these jet data. For the needs of the Bayesian techniques, we have generated 10 4 replicas using Eq. (4.9) and computed the cross sections for each one separately. When computing the GK weights we divide the resulting values of χ 2 by ∆χ 2 CT10 = 100, which was shown in the previous section to appropriately modify the underlying likelihood. We present the results of reweighting in Fig. 8, separately accounting for the correlated errors, and taking all errors as uncorrelated. As the jet production is predominantly sensitive to the gluons, we find it reasonable to present the results here only in terms of gluon PDFs. From Fig. 8 we find that, as expected, the Hessian and Bayesian methods are almost in a perfect agreement also here. The small mismatch at large x is mainly due to the residual non-linear effects and also partly due to that even the 10 4 replicas were not enough to find a fully convergent Bayesian result. The effects in gluons are qualitatively similar as found in [21] by means of slightly different Bayesian weights, namely, that the LHC jet data prefer somewhat softer gluon distributions than they currently are. In any case, the CT10NLO PDFs are in good agreement with these data, the reweighting penalty being much less than ∆χ 2 and only small amount of replicas being "lost" (N eff is large). Interestingly, the effects with uncorrelated errors end up being slightly larger than when using the data correlations. The reason is the systematic 5% deviation between the CT10NLO predictions and the central values of the data points (see Fig. 6). When using the correlated errors such deviations partly disappear as the systematic parameters adjust themselves properly, and the shift in PDFs is therefore less than when the errors are added in quadrature but keeping the "slightly-off" central values fixed. Loosely, we could say that it is "cheaper" to increase the latter term in Eq. (7.4) than the reweighting penalty in Eq. (3.1).
As noticed above, the present case appears to involve a small, but still visible amount of non-linearity. To have a better understanding on this, we invoke the improved non-linear prescription of the Hessian reweighting introduced in section 3. The gluon PDFs obtained using the non-linear method end up being somewhat higher at large x than those derived from the linear reweighting, as is demonstrated in Fig. 9. The origin of the difference is not that much in the process of determining the actual weights w k (the approximation in Eq. (7.2) is good), but whether one uses Eq. (3.9) or Eq. (3.12) when constructing the resulting reweighted PDFs. This is because at large-x the original CT10NLO error sets are not particularly symmetrically distributed around the central set. The slight differences between the Hessian and Bayesian results seen in Fig. 8 are also largely of this origin. However, as noted in section 4, it is not clear how to accurately improve the Bayesian methods to accomodate such non-linearities.
Finally, it is instructive to contrast the above results to what would have been obtained using the "naive" prescription for the Bayesian reweighting. That is, not dividing the values of χ 2 by ∆χ 2 when computing the weights. In principle, the outcome of using the naive GK weights should coincide with the Hessian reweighting taking ∆χ 2 = 1. The resulting modifications in gluons are shown in Fig. 10. The conclusions one would draw from these results would be completely the opposite that what was just found. Especially when accounting for the data correlations the induced penalties are staggeringly large, clearly exceeding ∆χ 2 CT10 = 100, and the effective number of replicas become very low. These would (erroneously) indicate that the CMS jet data discussed here are not compatible with the CT10NLO PDFs! Indeed, the outcome using the naive GK weights ends up being so far outside the original error band that almost all the replicas get discarded. To obtain a stable result in this case would require a tremendous amount of replicas. All these difficulties, as we have demonstrated, are pure artefacts of not accounting properly for the original ∆χ 2 CT10 tolerance.

Summary
We have discussed how to incorporate information from a new set of experimental measurements to an existing set of PDFs in which the uncertainties are quantified via Hessian error sets with fixed ∆χ 2 . To this end, a semi-algebraic method, the Hessian reweighting, was introduced as an alternative technique to the prevailing Bayesian methods. While the Hessian reweighting is straightforwardly derived considering a new set of data in a χ 2 fit, the Bayesian methods are outwardly distinct, based on statistical inference. We compared the different approaches to a direct re-fit by a simple example verifying the adequacy of the Hessian method. In the case of the Bayesian procedure an agreement with a new fit was also established, but only after including the ∆χ 2 criterion properly into the Bayesian likelihood function which -as we mathematically justified -must be a pure exponential as originally proposed by Giele and Keller. This likely explains the difficulties that have been encountered elsewhere when applying the Bayesian reweighting in conjunction with a Hessian PDF fit. Our findings appear to be in contrast to the works of NNPDF collaboration in which a different functional form for the likelihood is derived. However, as the NNPDF methodology is far more involved than a direct χ 2 minimization considered here, it is perfectly possible, though not obvious, that a different functional form for the weights applies.
Both the methods discussed here have their pros and cons: While the Hessian procedure requires evaluating the observables only with the central and error sets (typically around 50 sets in total), in the Bayesian method one needs to deal with a much larger ensemble of PDFs (around 10 3 to find converging results). On the other hand, the Hessian reweighting is procedurally a bit more involved requiring e.g. numeric linear algebra, while the Bayesian technique is simpler. The reliability of the both methods depends basically on the accuracy of the quadratic approximation around the minimum χ 2 made in the original PDF fit, and, to some degree, on the adequacy of the linear approximations that one makes. The former is something the end user cannot control, but in the case of Hessian reweighting one can easily improve on the latter by invoking a non-linear procedure as we explained.