# Information and treatment of unknown correlations in the combination of measurements using the BLUE method

- 691 Downloads
- 8 Citations

## Abstract

We discuss the effect of large positive correlations in the combinations of several measurements of a single physical quantity using the Best Linear Unbiased Estimate (BLUE) method. We suggest a new approach for comparing the relative weights of the different measurements in their contributions to the combined knowledge about the unknown parameter, using the well-established concept of Fisher information. We argue, in particular, that one contribution to information comes from the collective interplay of the measurements through their correlations and that this contribution cannot be attributed to any of the individual measurements alone. We show that negative coefficients in the BLUE weighted average invariably indicate the presence of a regime of high correlations, where the effect of further increasing some of these correlations is that of reducing the error on the combined estimate. In these regimes, we stress that assuming fully correlated systematic uncertainties is not a truly conservative choice, and that the correlations provided as input to BLUE combinations need to be assessed with extreme care instead. In situations where the precise evaluation of these correlations is impractical, or even impossible, we provide tools to help experimental physicists perform more conservative combinations.

## 1 Introduction

Our knowledge about some of the most fundamental parameters of physics is derived from a vast number of measurements produced by different experiments using several complementary techniques. Many statistical methods are routinely used [1] to combine the available data and extract the most appropriate estimates of the values and uncertainties for these parameters, properly taking into account all correlations between the measurements. One the most popular methods for performing these combinations is the Best Linear Unbiased Estimate (BLUE) technique, an approach first introduced in the 1930’s [2] and whose reformulation in the context of high-energy physics [3, 4] has been routinely used for the combination of the precision measurements performed by experiments at the LEP [5], Tevatron [6] and LHC [7] colliders, as well as in other domains.

To quantify the “relative importance” of each measurement in its contribution to the combined knowledge about the measured physical quantity, its coefficient in the BLUE weighted average is traditionally used. In many examples in the literature where the BLUE technique has been used, the combinations are dominated by systematic uncertainties, often assumed as fully correlated among different measurements. This often leads to situations where one or more measurements contribute with a negative BLUE coefficient, pushing experimentalists to redefine the “relative importance” of a measurement as the absolute value of its BLUE coefficient, normalised to the sum of the absolute values of all coefficients [6, 7]. In our opinion, this approach is incorrect.

In this paper, we propose a different approach for comparing the relative contributions of the measurements to the combined knowledge about the unknown parameter, using the well-established concept of Fisher information [8]. We also show that negative coefficients in the BLUE weighted average invariably indicate the presence of very high correlations, whose marginal effect is that of reducing the error on the combined estimate, rather than increasing it. In these regimes, we stress that taking systematic uncertainties to be fully (i.e. 100 %) correlated is not a conservative assumption, and we therefore argue that the correlations provided as inputs to BLUE combinations need to be assessed with extreme care. In those situations where their precise evaluation is impossible, we offer a few guidelines and tools for critically re-evaluating these correlations, in order to help experimental physicists perform more “conservative” combinations. In our discussion, we will generally limit ourselves to BLUE combinations of a single measured parameter and where the correlations used as inputs to the combination are positive. Many of the concepts and tools we present could be applied also to the more general cases of BLUE combinations of several measured parameters, and/or involving also negative correlations between measurements, but this discussion is beyond the scope of this paper.

The outline of this article is the following. In Sect. 2 we review the definition of “relative importance” of a measurement in a BLUE combination as presented by some papers in the literature and we present our objections to it by using a simple numerical example. We then present our alternative definitions of information weights in Sect. 3, after a brief recall of the definition of Fisher information and of its relevant features. By studying marginal information and information derivatives, in Sect. 4 we show that negative BLUE coefficients in the combination of several measurements of one parameter are always a sign of a “high-correlation” regime, thus generalising the results presented for two measurements by the authors of Ref. [3]. In Sect. 5 we go on to discuss practical guidelines and tools, illustrated by numerical examples, to identify correlations that may have been overestimated and to review them in a more “conservative” way. In Sect. 6 we summarize our discussion and present some concluding remarks.

## 2 “Relative importance” and negative BLUE coefficients

In the BLUE technique, the best linear unbiased estimate of each unknown parameter is built as a weighed average of all available measurements. The coefficients multiplying the measurements in each linear combination are determined as those that minimize its variance, under the constraint of a normalisation condition which ensures that this represents an unbiased estimate of the corresponding parameter. As discussed extensively in Refs. [3, 4, 9], this technique is equivalent to minimizing the weighted sum of squared distances of the measurements from the combined estimates, using as weighting matrix the input covariance matrix of the measurements, which is assumed to be known a priori.

More generally, the problem with defining the “relative importances” of measurements according to Eq. 2 is that the coefficient with which a measurement enters the linear combination of all measurements in the BLUE, i.e. its “weight” in the BLUE weighted average, is being confused with the impact or “weight” of its relative contribution to the knowledge about the measured observable. In the following we will therefore clearly distinguish between these two categories of “weights”: we will sometimes refer to the BLUE coefficient \(\lambda _i\) of a measurement as its “central value weight” (CVW), while we will use the term “information weight” (IW) to refer to, using the same words as in Refs. [6, 7], its “relative importance” or the “weight it carries in the combination”. We will propose and discuss our definitions of intrinsic and marginal information weights in the next section, using the well-established concept of Fisher information.

## 3 Fisher information and “information weights”

In this section, we present our definitions of intrinsic and marginal information weights, after briefly recalling the definition of Fisher information and summarizing its main relevant features. A more general discussion of Fisher information and its role in parameter estimation in experimental science is well beyond the scope of this paper and can be found in many textbooks on statistics such as the two excellent reviews in Refs. [9, 10], which will largely be the basis of the overview presented in this section.

### 3.1 Definition of Fisher information

As pointed out in Ref. [10], Fisher information is a valuable tool for assessing quantitatively the contribution of an individual measurement to our knowledge about an unknown parameter inferred from it, because it possesses three remarkable properties.

First, information increases with the number of observations \(y_i\) and in particular it is additive, i.e. the total information yielded by two independent experiments is the sum of the information from each experiment taken separately.

Second, the definition of the “information obtained from a set of measurements” depends on which parameters we want to infer from them. This is clear from Eq. 6, which defines Fisher information \({\fancyscript{I}}^{(\mathbf {X})}\) about \(\mathbf {X}\) in terms of a set of derivatives with respect to the parameters \(\mathbf {X}\).

### 3.2 BLUE estimators and Fisher information

An unbiased estimator whose variance is equal to its Cramer-Rao lower bound, i.e. one for which the equality in Eq. 7 holds, is called an efficient unbiased estimator. While in the general case it is not always possible to build one, an efficient unbiased estimator does exist under the assumption that the \(n\) measurements \(\mathbf {y}\) are multivariate Gaussian distributed with a positive definite covariance matrix that is known a priori and does not depend on the unknown parameters \(\mathbf {X}\). This is the same assumption that had been used for the description of the BLUE method in Ref. [4] and we will take it as valid throughout the rest of this paper.

As discussed at length in Refs. [9, 10], such distributions possess in fact a number of special properties that significantly simplify all statistical calculations involving them. In particular, it is easy to show, in the general case with several unknown parameters, that the best linear unbiased estimator is under these assumptions an unbiased efficient estimator, i.e. that its covariance matrix is equal to the inverse of the Fisher information matrix. Moreover, the Fisher information matrix and the combined covariance do not depend on the unknown parameters \(\mathbf {X}\) under these assumptions, while this is not true in the general case. For Gaussian distributions, the best linear unbiased estimator also coincides with the maximum likelihood estimator [9], while this is not true in most other cases, including the case of Poisson distributions.

### 3.3 Intrinsic and marginal information weights

Having recalled the relevance of the Fisher information concept to quantitatively assess the contribution of a set of measurements to the knowledge about an unknown parameter, we may now introduce our proposal about how to best represent the “weight that a measurement carries in the combination” or its “relative importance”. We define this in terms of intrinsic and marginal information weights. Our approach is radically different from that of Refs. [6, 7], because we do not attempt to make sure that the \(n\) weights for the different measurements sum up to 1.

The intrinsic information weights for the different measurements are, by construction, always positive. The weight for the correlations can, instead, be negative, null or positive. In other words, according to our definition, while every measurement always adds intrinsic information to a combination, the net effect of correlations may be to increase the combined error, to keep the combined error unchanged or, less frequently, to decrease it.

Marginal information weights are guaranteed to be non-negative (as discussed more in detail in Sect. 4.2), but they are generally different from the corresponding intrinsic information weights if the measurement is correlated to any of the others. In particular, \(\mathrm {MIW}_i<\mathrm {IIW}_i\) represents the common situation where one part of the intrinsic information contributed by one measurement is reduced by correlations, while \(\mathrm {MIW}_i>\mathrm {IIW}_i\) represents the cases where its correlations amplify its net contribution to information. We will discuss these issues in more detail in Sect. 4.1, in the specific case of two measurements of one observable.

Results for the combination of A and B (top, \(\chi ^2\)/ndof \(=\) 1.00/1), for that of A, B1 and B2 (center, \(\chi ^2\)/ndof = 1.17/2) and for that of A, B11, B12 and B2 (bottom, \(\chi ^2\)/ndof = 2.42/3). For each input measurement \(i\) the following are listed: the central value weight CVW\(_i\) or \(\lambda _i\), the intrinsic information weight \(\mathrm {IIW}_i\) (also shown for the correlations), the marginal information weight \(\mathrm {MIW}_i\), the relative importance \(\mathrm {RI}_i\). In the last row in each table, the BLUE central value and error and the sum of all weights in each column are displayed

Measurements | CVW/% | IIW/% | MIW/% | RI/% | |
---|---|---|---|---|---|

A | 103.00 \(\pm \) 3.87 | 40.00 | 40.00 | 40.00 | 40.00 |

B | 98.00 \(\pm \) 3.16 | 60.00 | 60.00 | 60.00 | 60.00 |

Correlations | – | – | 0.00 | – | – |

BLUE / Total | 100.00 \(\pm \) 2.45 | 100.00 | 100.00 | 100.00 | 100.00 |

A | 103.00 \(\pm \) 3.87 | 40.00 | 40.00 | 40.00 | 25.00 |

B1 | 99.00 \(\pm \) 4.00 | 90.00 | 37.50 | 50.63 | 56.25 |

B2 | 101.00 \(\pm \) 8.00 | \(-\)30.00 | 9.38 | 22.50 | 18.75 |

Correlations | – | – | 13.13 | – | – |

BLUE / Total | 100.00 \(\pm \) 2.45 | 100.00 | 100.00 | 113.13 | 100.00 |

A | 103.00 \(\pm \) 3.87 | 40.00 | 40.00 | 40.00 | 25.00 |

B11 | 99.01 \(\pm \) 4.00 | 45.00 | 37.50 | \(\sim \) 0 | 28.13 |

B12 | 98.99 \(\pm \) 4.00 | 45.00 | 37.50 | \(\sim \) 0 | 28.13 |

B2 | 101.00 \(\pm \) 8.00 | \(-\)30.00 | 9.37 | 22.50 | 18.75 |

Correlations | – | – | \(-\)24.37 | – | – |

BLUE / Total | 100.00 \(\pm \) 2.45 | 100.00 | 100.00 | 62.50 | 100.00 |

We should stress at this point that information weights also have their own limitations and should be used with care. In particular, the main interest of information weights should not be that of ranking measurements, but rather that of providing a quantitative tool for a better understanding of how the different measurements, individually and together, contribute to our combined knowledge about the parameters that we want to infer. We believe that attempting to determine which individual experiment provides the “best” or “most important” contribution to a combination is a goal of relatively limited scientific use and, more importantly, is a question that involves some degree of arbitrariness. As we mentioned above, when combining \(n\) correlated measurements, it is very difficult to unambiguously split \(\mathrm {IIW}_{\mathrm {corr}}\) into sub-contributions from the several correlations that simultaneously exist between those measurements. In particular, it would be quite complex to disentangle the two competing effects that each correlation may have on the information contributed by any given measurement, that of amplifying this contribution through the collaboration with other measurements, and that of reducing this contribution by making the measurements partially redundant with each other. As a consequence, “ranking” individual measurements by their intrinsic or marginal information weights is a practise that we do not advocate or recommend.

To better illustrate what we mean, in the bottom section of Table 1 we have added a slightly different example, where it is now assumed that \(y_{\mathrm{B1}}\) is itself the result of the combination of two very similar measurements \(y_{\mathrm{B11}}=99.01\pm 4.00\) and \(y_{\mathrm{B12}}=98.99\pm 4.00\) that are 99.999 % correlated to each other (and are each individually 87.5 % correlated to \(y_{\mathrm{B2}}\)). It is not surprising in this case that B11 and B12 have a central value weight equal to half that of B1, an intrinsic information weight that is the same as that of B1, but a marginal information weight that is essentially zero (because including B12 is largely redundant if the almost identical measurement B11 has already been included, and viceversa). While in the combination of A, B1 and B2 the net effect of correlations was to amplify the information contribution of both B1 and B2 by \(\mathrm {MIW}_{\mathrm{B1}}\!-\!\mathrm {IIW}_{\mathrm{B1}}\!=\!\mathrm {MIW}_{\mathrm{B2}}\!-\!\mathrm {IIW}_{\mathrm{B2}}\!=\!13.1\,\%\), in this third example the information contributions of B11 and B12 are also affected by the competing effect of their mutual correlation, which brings their \(\mathrm {MIW}\) down essentially to zero. This example is also interesting because it clearly shows that very different “rankings” may be obtained for the individual measurements if they are ordered by decreasing values of \(\mathrm {IIW}_i\), \(\mathrm {MIW}_i\), CVW\(_i\) or \(\mathrm {RI}_i\): for instance, measurement B11 has the highest CVW and \(\mathrm {RI}\), the second highest \(\mathrm {IIW}\), but the lowest \(\mathrm {MIW}\). Excluding \(\mathrm {RI}\), which we already argued to be an ill-defined quantity, we see in this case that CVW, \(\mathrm {IIW}\) and \(\mathrm {MIW}\) all have their limitations if they are used for “ranking”. Indeed, CVW can be negative, which may give the false impression that a measurement makes a combination worse instead of improving it; \(\mathrm {IIW}\) completely ignores the effect of correlations; \(\mathrm {MIW}\) only describes the marginal contribution of a single measurement and of its correlations. For these reasons, we propose to quote all of CVW, \(\mathrm {IIW}\) and \(\mathrm {MIW}\) whenever a combination of several measurements is presented, while explicitly refraining from using any of them for ranking individual measurements.

## 4 Negative BLUE coefficients and “high-correlation regimes”

In this section, we use the concept of Fisher information to explore the relation between negative BLUE coefficients and the size of correlations between measurements. We start by revisiting the discussion of these issues presented in Ref. [3] for two measurements of one parameter, whose conclusion was that negative weights appear when the positive correlation between the two measurements exceeds a well-defined threshold. We then generalize this conclusion to \(n\) measurements of one parameter, first by computing marginal information and then by analysing the derivatives of Fisher information with respect to the correlations between measurements: we show, in particular, that negative central value weights in BLUE combinations are always a sign of a “high-correlation” regime, where the marginal effect of further increasing one or more of these correlations is that of reducing the errors on the combined estimates rather than increasing them. In Sect. 5 we will discuss important practical consequences of what is presented in this section.

### 4.1 The simple case of two measurements of one parameter

In other words, the threshold value \(\rho \!=\!\sigma _{\mathrm{A}}/\sigma _{\mathrm{B}}\) effectively represents a boundary between two regimes, a “low-correlation regime”, where \(\lambda _{\mathrm{B}}\) is positive and \(\sigma _{\hat{Y}}^2\) increases as \(\rho \) grows, and a “high-correlation regime”, where \(\lambda _{\mathrm{B}}\) is negative and \(\sigma _{\hat{Y}}^2\) decreases as \(\rho \) grows. Note that the BLUE variance from the combination of A and B at the boundary between the two regimes \(\rho \!=\!\sigma _{\mathrm{A}}/\sigma _{\mathrm{B}}\) is equal to that from A alone (\(\sigma _{\hat{Y}}^2\!=\!\sigma _{\mathrm{A}}^2\)), while it is lower on either side of the boundary. In the same way, the Fisher information from the combination at the boundary between the two regimes is equal to that from A alone, while it is higher on either side of the boundary: in other words, the marginal contribution to information from the addition of B in the combination is zero at the boundary, but it is positive on either side of it. Note in passing that the fact that the BLUE coefficient for B is zero does not mean however that the measurement is simply not used in the combination, because the central value measured by B does in any case contribute to the calculation of the overall \(\chi ^2\) for the combination: this statement remains valid for the combination of \(n\) measurements, although we will not repeat it in the following.

### 4.2 Marginal information from the \(i{\mathrm {th}}\) measurement of one parameter

### 4.3 Information derivatives

In the first part of this section, we have described the boundary between low-correlation and high-correlation regimes in the simplest case of the combination of two measurements, as well as in the more complex but still specific case of the combination of \(n\) measurements, where only the \(n\!-\!1\) correlations of the \(i{\mathrm {th}}\) measurement to all of the others are allowed to vary. We now analyze the most general case of the combination of \(n\) measurements of one parameter, as a function of the \(n(n\!-\!1)/2\) correlations of all the measurements to one another. We do this by studying the first derivatives of information with respect to these correlations \(\rho _{ij}\).

Equation 35 clearly shows that, if all BLUE coefficients are positive, the first derivatives of information are always negative with respect to the correlations between any two measurements, i.e. information can only decrease if correlations are further increased: this is the equivalent in \(n(n\!-\!1)/2\) dimensional space of what we have previously called a “low-correlation” regime, as this sub-space is guaranteed to contain the point where all correlations are zero. Conversely, if at least one BLUE coefficient is negative (and keeping in mind that they can not be all negative), then at least one information derivative must be positive, i.e. there is at least one correlation which leads to higher information if it is increased: this is the equivalent of what we have previously called a “high-correlation” regime. The boundary between the two regimes is a hypersurface in \(n(n\!-\!1)/2\) dimensional space, defined by the condition that at least one BLUE coefficient is zero, while all the others are non-negative: when this condition is satisfied, the information derivatives with respect to one or more correlations are also zero, meaning that information has reached a minimum in its partial functional dependency on those correlations. This completes the generalization to several measurements of one observable of the discussion presented in Ref. [3] for only two measurements. Note finally that in that case, i.e. for \(n\!=\!2\), all these considerations are trivially illustrated by Figs. 1 and 2, showing that the boundary between low and high correlation regimes in the 1-dimensional space of the correlation \(\rho \) is a 0-dimensional hypersurface (a point) at the value \(\rho =\sigma _{\mathrm{A}}/\sigma _{\mathrm{B}}\).

## 5 “Conservative” estimates of correlations in BLUE combinations

A precise assessment of the correlations that need to be used as input to BLUE combinations is often very hard. Ideally, one should aim to measure these correlations in the data or by using Monte Carlo methods. This, however, turns out to be often impractical, if not impossible, for instance when combining results produced by different experiments that use different conventions for assessing the systematic errors on their measurements, or when trying to combine results from recent experiments to older results for which not enough details were published and the expertise and the infrastructure to analyse the data are no longer available. In these situations, it may be unavoidable to combine results using input covariance matrices where the correlations between the different measurements have only been approximately estimated, rather than accurately measured. In the following, we will refer to these estimates of correlations as the “nominal” correlations (and we will extensively study the effect on BLUE combination results of reducing correlations below these initial “nominal” values).

In particular, it is not uncommon to read in the literature that correlations have been “conservatively” assumed to be 100 %. In this section, we question the validity of this kind of statement. A “conservative” estimate of a measurement error should mean that, in the absence of more precise assessments, an overestimate of the true error (at the price of losing some of the available information from a measurement) is more acceptable than taking the risk of claiming that a measurement is more accurate than it really is. Likewise, by “conservative” estimate of a correlation, one should mean an estimate which is more likely to result in an overall larger combined error than in a wrong claim of smaller combined errors.

When BLUE coefficients are all positive, i.e. in a low-correlation regime, information derivatives are negative and the net effect of increasing any correlation can only be that of reducing information and increasing the combined error: in this case, choosing the largest possible positive correlations (100 %) is clearly the most conservative choice. Our discussion in the previous section, however, shows that negative BLUE coefficients are a sign of a high-correlation regime, where the net effect of increasing some of these correlations is that of increasing information and reducing the combined error: in other words, if correlations are estimated as 100 % and negative BLUE coefficients are observed, it is wrong to claim that correlations have been estimated “conservatively”.

In this section, we will first analyse under which conditions it is indeed conservative to assume that correlations are 100 %, using a simple two-measurement combination as an example. For those situations where a precise evaluation of correlations is impossible, and where setting them to their “nominal” estimates would result in negative BLUE coefficients, we will then offer a few guidelines and tools to help physicists make more conservative estimates of correlations.

### 5.1 Conservative estimates of correlations in a two-measurement combination

This figure shows that there are two different regimes. When \((\sigma _{B (\mathrm {cor})}/\sigma _{A (\mathrm {cor})})<(\sigma _{\mathrm{A}}/\sigma _{A (\mathrm {cor})})^2\), the most conservative value of \(\rho _{\mathrm {cor}}\) is 1: in this case, both measurements A and B contribute to the combination with positive BLUE coefficients because, no matter how large \(\rho _{\mathrm {cor}}\) is, the combination always remains in a low-correlation regime. When \((\sigma _{B (\mathrm {cor})}/\sigma _{A (\mathrm {cor})})\ge (\sigma _{\mathrm{A}}/\sigma _{A (\mathrm {cor})})^2\), instead, the most conservative value of \(\rho _{\mathrm {cor}}\) is smaller than 1: in this case, the combination is at the boundary of low and high correlation regimes, where the combined error is maximised and equal to \(\sigma _{\hat{Y}}\!=\!\sigma _{\mathrm{A}}\) while \(\lambda _A\!=\!1\) and \(\lambda _B\!=\!0\), which is more or less equivalent (modulo the effect on \(\chi ^2\) previously discussed) to excluding the less precise measurement B from the combination.

### 5.2 Identifying the least conservative correlations between \(n\) measurements

Note that the sums of all \((\rho _{ij}^{[s]}/I)(\partial I/\partial \rho _{ij}^{[s]})\) over all error sources \(s\) effectively represent the effect on information of rescaling the correlations between measurements \(i\) and \(j\) by the same factor for all error sources, while their sums over measurements \(i\) and \(j\) represent the effect of rescaling the correlations between all measurements by the same factor in a given error source \(s\). Likewise, their global sum over measurements \(i\) and \(j\) and error sources \(s\) represents the effect on information of rescaling all correlations by a global factor. While they lack the granularity to give more useful insight about which correlations are most relevant when trying to make the combination more conservative, these sums also represent interesting quantities to analyse in some situations. In particular, we will point out in Sect. 5.3.1 that each of these different sums of derivatives becomes zero in one of the information minimization procedures that will be described in that section.

### 5.3 Reducing correlations to make them “more conservative”

Having proposed a way to identify which “nominal” correlations have not been estimated “conservatively” and may need to be reassessed, we now propose some practical procedures to reduce their values and try to make the combination more conservative, when a full and precise reevaluation of these correlations is impossible. What follows must be understood as simple guidelines to drive the work of experimental physicists when combining measurements: we propose different methods, but the applicability of one rather than the other, which also implies some level of arbitrariness, would have to be judged on a case-by-case basis.

We propose three main solutions to the problem of reducing the (large and positive) “nominal” values of correlations to make the combination more conservative: the first is a numerical minimization of information with respect to these correlations, the second consists in ignoring some of the input measurements, and the third one is a prescription that we indicate with the name of “onionization” and that consists in decreasing the off-diagonal elements in the covariance matrices so that they are below a specific threshold. At the end of the section we will present a practical example that illustrates the different features of these methods.

#### 5.3.1 Minimizing information by numerical methods

A software package, called BlueFin,^{1} was specifically prepared to study all these issues. Within this package, numerical minimizations are performed using the Minuit [12] libraries through their integration in Root [13], imposing the constraints that scale factors remain between 0 and 1. All scale factors are varied in the minimization, except those which are known to have no effect on information because the information derivatives with respect to them (which are essentially those presented in Sect. 4.3) are zero both at “nominal” and at zero correlations (i.e. when all scale factors are 1 and 0, respectively).

The “ByOffDiagElem” minimization is the most tricky, as it may trespass into regions where the total covariance matrix is not positive definite, sometimes in an unrecoverable way, in which case we declare the minimization to have failed. Even when this minimization does converge to a minimum, one should also keep in mind that at this point the partial covariance matrices for the different error sources may be non positive-definite with negative eigenvalues: this is clearly a non-physical situation, which should be used for illustrative purposes only and is clearly not suitable for a physics publication.

Not surprisingly, in the very simple example presented in Sect. 2, where only one off-diagonal correlation is non-zero and errors are assumed to come from a single source of uncertainty, these three minimizations all converge to the same result, where the off-diagonal covariance is reduced to \(\rho \sigma _{\mathrm{B1}}\sigma _{\mathrm{B2}}\!=\!\sigma _{\mathrm{B1}}^2\!=\!16.00\), which leads to a combination where \(\lambda _{\mathrm{B2}}\!=\!0\) and again the less precise measurement B2 is essentially excluded. In a more general case with several non-zero correlations and many different sources of uncertainty, the three minimizations may instead converge to rather different outcomes. The BlueFin software will also be used for the numerical examples shown at the end of the section.

#### 5.3.2 Iteratively removing measurements with negative BLUE coefficients

Having observed many times that choosing “more conservative” correlations may ultimately lead to combinations where the BLUE coefficients of one or more measurements are increased from a negative value to zero, it is perfectly legitimate to think of excluding these measurements from the combination from the very beginning. If one should choose to adopt this approach, we suggest to do this iteratively, by removing first the measurement with the most negative BLUE coefficient, then performing a new combination and finally iterating until only positive BLUE coefficients remain. This procedure is guaranteed to converge as the combination of a single measurement has a single BLUE coefficient equal to 1. We will present an example later on.

Excluding measurements from a combination may be a very controversial decision to take. At the same time, if there are negative BLUE coefficients and it is impossible to determine precisely the correlations, this may be the truly conservative and soundest scientific choice, to avoid the risk of claiming combined results more accurate than they really are. Note that excluding a measurement differs from including it with a rescaled correlation which gives it a zero BLUE coefficient, as in the latter case the measurement does contribute to the \(\chi ^2\) for the fit while in the former case it does not. If correlations for that measurement cannot be precisely assessed, in any case, even the accuracy of its contribution to the \(\chi ^2\) with an ad-hoc rescaled correlation is somewhat questionable and it may be better to simply exclude the measurement from the combination altogether.

Note finally that high correlations between different measurements in a combination are not only caused by correlated systematic uncertainties in the analyses of independent data samples, but are also expected for statistical uncertainties when performing different analyses of the same data samples. In these cases, where negative BLUE coefficients would be likely if two such measurements were combined, it is already common practice to only publish the more precise analysis and simply use the less precise one as a cross-check.

#### 5.3.3 The “onionization” prescription

In more detail, if the full covariance matrix is built as the sum of \(S\) error sources as in Eq. 39, then \(S\!\times \!n(n\!-\!1)/2\) correlations \(({\fancyscript{M}}')_{ij}^{[s]}\!/\!\sqrt{({\fancyscript{M}}')_{ii}^{[s]}({\fancyscript{M}}')_{jj}^{[s]}}\) need to be separately estimated in the \(S\) partial covariances \(({\fancyscript{M}}')^{[s]}\). We considered two possible rules of thumb to provide conservative estimates of the partial covariances satifying Eq. 43.

Note that, in the procedure described above as well as in its implementation in the BlueFin software that we used to produce the results presented in the next section, we systematically apply the “onionization” of the partial covariance matrix for each source of uncertainty. In a real combination, it may be more appropriate to only apply this procedure to the partial covariance matrices of those sources of uncertainty for which at least some of the information derivatives in Eq. 40 are positive. More generally, we stress again that we only propose this prescription as a rule of thumb, but no automatic procedure can replace an estimate of correlations based on a detailed understanding of the physics processes responsible for each source of systematic uncertainty.

### 5.4 A more complex example

The assumption that the background is fully correlated between all experiments may be the result of a detailed analysis, or a supposedly “conservative” assumption in the absence of more precise correlation estimates. It is rather unlikely that a more detailed analysis would not be performed in a case like this one—in particular, in this type of situation, with such a large difference in the sizes of the fully correlated BKGD errors in the different measurements, we would recommend to try to split the BKGD systematics into its sub-components in the combination—but this is clearly an example for illustrative purposes only.

Results of the combination of \(y_\mathrm {A}\), \(y_\mathrm {B}\), \(y_\mathrm {C}\) and \(y_\mathrm {D}\) (\(\chi ^2\)/ndof = 4.23/3). The central value, total error and individual error components for each input measurement \(i\) are listed, followed by the central value weight CVW\(_i\) or \(\lambda _i\), the intrinsic information weight \(\mathrm {IIW}_i\) (also shown for the correlations), the marginal information weight \(\mathrm {MIW}_i\), the relative importance \(\mathrm {RI}_i\). In the last row, the BLUE central value and errors and the sum of all weights in each column are displayed

Measurements | \(\sigma _{Unc}\) | \(\sigma _{Bkgd}\) | \(\sigma _{Lumi}\) | CVW/% | IIW/% | MIW/% | RI/% | |
---|---|---|---|---|---|---|---|---|

\(y_\mathrm {A}\) | 95.00 \(\pm \) 17.92 | 10.00 | 10.00 | 11.00 | 60.39 | 50.91 | 34.69 | 48.78 |

\(y_\mathrm {B}\) | 144.00 \(\pm \) 44.63 | 14.00 | 40.00 | 14.00 | \(-\)11.90 | 8.20 | 8.97 | 9.61 |

\(y_\mathrm {C}\) | 115.00 \(\pm \) 20.81 | 18.00 | 3.00 | 10.00 | 25.36 | 37.74 | 14.63 | 20.49 |

\(y_\mathrm {D}\) | 122.00 \(\pm \) 25.00 | 25.00 | 0 | 0 | 26.15 | 26.15 | 26.15 | 21.12 |

Correlations | – | – | – | – | – | -23.01 | – | – |

BLUE / Total | 101.30 \(\pm \) 12.78 | 10.14 | 2.04 | 7.51 | 100.00 | 100.00 | 84.44 | 100.00 |

Normalised information derivatives \(\rho \)/I*dI/d\(\rho \) for the combination of of \(y_\mathrm {A}\), \(y_\mathrm {B}\), \(y_\mathrm {C}\) and \(y_\mathrm {D}\) in the cross-section example, computed at “nominal” correlation values. The last column and last row list information derivatives when the same rescaling factor is used for a given off-diagonal element or error source, which are equal to the sums of individual derivatives in each row and column, respectively

BLUE central values and variances for the cross section example, with “nominal” correlations, with correlations reduced using the procedures presented in this paper, as well as with no correlations

Combination | \((\hat{Y}\pm \sigma _{\hat{Y}})/\mathrm {pb}\) | UNC | BKGD | LUMI | \(\chi ^2\)/ndof | \(\lambda _\mathrm {A} (\%)\) | \(\lambda _\mathrm {B} (\%)\) | \(\lambda _\mathrm {C} (\%)\) | \(\lambda _\mathrm {D} (\%)\) |
---|---|---|---|---|---|---|---|---|---|

“Nominal” corr. | 101.3 \(\pm \)12.8 | \(\pm \)10.1 | \(\pm \)2.0 | \(\pm \)7.5 | 4.2/3 | 60.4 | \(-\)11.9 | 25.4 | 26.1 |

ByGlobFac | 105.2 \(\pm \)13.0 | \(\pm \)9.9 | \(\pm \)4.1 | \(\pm \)7.3 | 3.1/3 | 50.2 | \(-\)5.7 | 28.6 | 26.9 |

ByErrSrc | 107.3 \(\pm \)13.2 | \(\pm \)9.8 | \(\pm \)4.7 | \(\pm \)7.6 | 2.6/3 | 45.6 | \(-\)1.8 | 28.2 | 28.0 |

ByOffDiagElem | 108.2 \(\pm \)13.4 | \(\pm \)9.8 | \(\pm \)5.2 | \(\pm \)7.6 | 2.4/3 | 44.1 | 0.0 | 27.2 | 28.7 |

No CVWs \(<\) 0 | 108.2 \(\pm \)13.4 | \(\pm \)9.8 | \(\pm \)5.2 | \(\pm \)7.6 | 1.3/2 | 44.1 | – | 27.2 | 28.7 |

Onionization | 109.2 \(\pm \)13.1 | \(\pm \)9.5 | \(\pm \)4.9 | \(\pm \)7.6 | 2.2/3 | 42.0 | 2.4 | 28.3 | 27.3 |

No corr. | 110.1 \(\pm \)11.5 | \(\pm \)8.8 | \(\pm \)5.0 | \(\pm \)5.6 | 1.6/3 | 41.4 | 6.7 | 30.7 | 21.2 |

The most striking effect, perhaps, is the fact that all modifications of the “nominal” correlations to make them “more conservative” lead to significant central value shifts (i.e. possibly to biased combined estimates) and to much larger combined BKGD systematics, in spite of relatively small increases in the total combined errors. In particular, it is somewhat counter-intuitive that the combined uncorrelated error decreases when reducing correlations, while the combined systematic errors increase: this is likely to be another feature of the high-correlation regime characterizing the “nominal” correlations of this example. We stress that, in real situations, it is important to analyse this type of effects, and not only the effect on the total combined error, when testing different estimates of correlations. This is especially important if one keeps in mind that the results of BLUE combinations are generally meant to be further combined with other results (e.g. the combined top masses from LHC and the combined top mass from Tevatron will eventually be combined).

It is not too surprising, conversely, that the effects on combined BKGD systematics are much larger than those on the combined LUMI systematics. This could be guessed by remembering that normalised information derivatives are much larger for the former than for the latter.

It is also not surprising that the “ByOffDiagElem” minimization gives essentially the same results (except for the \(\chi ^2\) value) that are found when excluding the measurements with negative BLUE coefficients. By construction, in fact, this is the only one of the three minimizations which almost always guarantees that BLUE coefficients which were initially negative end up equal to zero after the minimization: if the minimum is a local minimum, some of the derivatives in Eq. 35, which are directly proportional to the BLUE coefficients, must eventually be zero.

Onionization of the covariance matrices for the BKGD and LUMI error sources in the cross-section example. The values are given in pb\(^2\)

BKGD | \(\left( \begin{array}{rrrr} 100. &{} 400. &{} 30. &{} 0. \\ 400. &{} 1600. &{} 120. &{} 0. \\ 30. &{} 120. &{} 9. &{} 0. \\ 0. &{} 0. &{} 0. &{} 0. \\ \end{array}\right) \) | \(\rightarrow \) | \(\left( \begin{array}{rrrr} 100. &{} 100. &{} 9. &{} 0. \\ 100. &{} 1600. &{} 9. &{} 0. \\ 9. &{} 9. &{} 9. &{} 0. \\ 0. &{} 0. &{} 0. &{} 0. \\ \end{array}\right) \) |

LUMI | \(\left( \begin{array}{rrrr} 121. &{} 154. &{} 110. &{} 0. \\ 154. &{} 196. &{} 140. &{} 0. \\ 110. &{} 140. &{} 100. &{} 0. \\ 0. &{} 0. &{} 0. &{} 0. \\ \end{array}\right) \) | \(\rightarrow \) | \(\left( \begin{array}{rrrr} 121. &{} 121. &{} 100. &{} 0. \\ 121. &{} 196. &{} 100. &{} 0. \\ 100. &{} 100. &{} 100. &{} 0. \\ 0. &{} 0. &{} 0. &{} 0. \\ \end{array}\right) \) |

Modified input covariances for the four measurements in the cross-section example, when reducing correlations according to the procedures described in this paper. The values are given in pb\(^2\)

“Nominal” corr. | \(\left( \begin{array}{rrrr} 321. &{} 554. &{} 140. &{} 0. \\ 554. &{} 1992. &{} 260. &{} 0. \\ 140. &{} 260. &{} 433. &{} 0. \\ 0. &{} 0. &{} 0. &{} 625. \\ \end{array}\right) \) |

ByGlobFac | \(\left( \begin{array}{rrrr} 321. &{} 442. &{} 112. &{} 0. \\ 442. &{} 1992. &{} 208. &{} 0. \\ 112. &{} 208. &{} 433. &{} 0. \\ 0. &{} 0. &{} 0. &{} 625. \\ \end{array}\right) \) |

ByErrSrc | \(\left( \begin{array}{rrrr} 321. &{} 341. &{} 124. &{} 0. \\ 341. &{} 1992. &{} 196. &{} 0. \\ 124. &{} 196. &{} 433. &{} 0. \\ 0. &{} 0. &{} 0. &{} 625. \\ \end{array}\right) \) |

ByOffDiagElem | \(\left( \begin{array}{rrrr} 321. &{} 272. &{} 140. &{} 0. \\ 272. &{} 1992. &{} 219. &{} 0. \\ 140. &{} 219. &{} 433. &{} 0. \\ 0. &{} 0. &{} 0. &{} 625. \\ \end{array}\right) \) |

Onionization | \(\left( \begin{array}{rrrr} 321. &{} 221. &{} 109. &{} 0. \\ 221. &{} 1992. &{} 109. &{} 0. \\ 109. &{} 109. &{} 433. &{} 0. \\ 0. &{} 0. &{} 0. &{} 625. \\ \end{array}\right) \) |

In particular, note in Table 5 that the onionization procedure (but the same is true for minimizations) affects correlations for the BKGD and LUMI error sources in exactly the same way without distinctions. If this was a real combination, instead, one would most likely keep the LUMI correlation unchanged (because a common luminosity measurement would indeed result in a 100 % correlation between \(y_\mathrm {A}\), \(y_\mathrm {B}\) and \(y_\mathrm {C}\), and these three measurements together could even help to constrain the error on it), concentrating instead on the re-assessment of the BKGD correlation alone (because the initial “nominal” estimate of 100 % correlation is neither conservative nor realistic in the presence of different sensitivities to differential distributions).

It should finally be added that the total covariance matrix derived from the onionization prescription is used as the starting point of the “ByOffDiagElem” minimization in the BlueFin software, as we have found this to improve the efficiency of the minimization procedure. As an additional cross-check of the onionization prescription, we also tested a fourth type of minimization, where information is independently minimized for each source of uncertainty as if this was the only one, varying each time only the correlations in that error source (after removing those measurements not affected by it and slightly reducing the allowed correlation ranges to keep the partial covariance positive definite). The preliminary results of this test (not included in Table 4) indicate that these minimizations do not seem to significantly move partial covariances or the final result away from those obtained through the onionization prescription, which are used as a starting point also in this case.

We conclude this section by reminding that the prescriptions presented here are only empirical recipes that assume no prior knowledge of the physics involved and, for this reason, can never represent valid substitutes for a careful quantitative analysis of correlations using real or simulated data. A precise estimate of correlations is important in general, but absolutely necessary in high correlation regimes, where it may be as important as a precise assessment of measurement errors themselves.

## 6 Conclusions

Combining many correlated measurements is a fundamental and unavoidable step in the scientific process to improve our knowledge about a physical quantity. In this paper, we recalled the relevance of the concept of Fisher information to quantify and better understand this knowledge. We stressed that it is extremely important to understand how the information available from several measurements is effectively used in their combination, not only because this allows a fairer recognition of their relative merit in their contribution to the knowledge about the unknown parameter, but especially because this makes it possible to produce a more robust scientific result by critically reassessing the assumptions made in the combination.

In this context, we described how the correlations between the different measurements play a critical role in their combination. We demonstrated, in particular, that the presence of negative coefficients in the BLUE weighted average of any number of measurements is a sign of a “high-correlation regime”, where the effect of increasing correlations is that of reducing the error on the combined result. We showed that, in this regime, a large contribution to the combined knowledge about the parameter comes from the joint impact of several measurements through their correlation and we argued, as a consequence, that the merit for this particular contribution to information cannot be claimed by any single measurement individually. In particular, we presented our objections to the standard practice of presenting the “relative importances” of different measurements based on the absolute values of their BLUE coefficients, and we proposed the use of (“intrinsic” and “marginal”) “information weights” instead.

In the second part of the paper, we questioned under which circumstances assuming systematic errors as fully correlated can be considered a “conservative” procedure. We proposed the use of information derivatives with respect to inter-measurement correlations as a tool to identify those “nominal” correlations for which this assumption is wrong and a more careful evaluation is necessary. We also suggested a few procedures for trying to make a combination more “conservative” when a precise estimate of correlations is simply impossible.

We should finally note that BLUE combinations are not the only way to combine different measurements, but they are actually the simplest to understand when combinations are performed under the most favorable assumptions that measurements are multivariate Gaussian distributed with covariances known a priori, as in this case all relevant quantities become easily calculable by matrix algebra. We therefore stress that, while the results in this paper were obtained under these assumptions and using the BLUE technique, large positive correlations are guaranteed to have a big impact, and should be watched out for, also in combinations performed with other methods or under other assumptions.

## Footnotes

- 1.
Best Linear Unbiased Estimate Fisher Information aNalysis—https://svnweb.cern.ch/trac/bluefin.

## Notes

### Acknowledgments

This work has been inspired by many discussions, during private and public meetings, on the need for critically reviewing the assumptions about correlations and the meaning of “weights”, when combining several measurements in the presence of high correlations between them. It would be difficult to mention and acknowledge all those colleagues who have hinted us towards the right direction and with whom we have had very fruitful discussions. We are particularly grateful to the members of the TopLHCWG and to the ATLAS and CMS members who have helped in the reviews of the recent top mass combinations at the LHC. We would also like to thank our colleagues who have sent us comments about the first two public versions of this paper. In particular, it is a pleasure to thank Louis Lyons for his extensive feedback and his very useful suggestions. We are also grateful to the EPJC referees for their detailed and insightful comments, as well as for making us aware of the research presented in Ref. [11]. Finally, A.V. would like to thank the management of the CERN IT-ES and IT-SDC groups for allowing him the flexibility to work on this research alongside his other committments in computing support for the LHC experiments.

## References

- 1.K. Nakamura et al., (Particle Data Group), review of particle physics. J. Phys. G
**37**, 075021 (2010)Google Scholar - 2.A.C. Aitken, On least squares and linear combinations of observations. Proc. R. Soc. Edinb.
**55**, 42 (1935)Google Scholar - 3.L. Lyons, D. Gibaut, P. Clifford, How to combine correlated estimates of a single physical quantity. Nucl. Instr. Meth. A
**270**, 110 (1988) Google Scholar - 4.A. Valassi, Combining correlated measurements of several different physical quantities. Nucl. Instr. Meth. A
**500**, 391 (2003)CrossRefADSGoogle Scholar - 5.The ALEPH, DELPHI, L3 and OPAL Collaborations and the LEP Electroweak Working Group, Electroweak measurements in electron-positron collisions at W-boson-pair energies at LEP, Phys. Rep.
**532**, 119 (2013)Google Scholar - 6.The Tevatron Electroweak Working Group for the CDF and D0 Collaborations, Combination of CDF and D0 results on the mass of the top quark using up to 5.8 fb\(^{-1}\) of data. [arXiv:1107.5255v3] (2011)
- 7.The ATLAS and CMS Collaborations, Combination of ATLAS and CMS results on the mass of the top quark using up to 4.9 \(fb^{-1}\) of data, ATLAS-CONF-2012-095, CMS PAS TOP-12-001 (2012)Google Scholar
- 8.R.A. Fisher, Theory of statistical estimation. Proc. Camb. Phil. Soc.
**12**, 700 (1925)CrossRefADSGoogle Scholar - 9.A. van den Bos,
*Parameter Estimation for Scientists and Engineers*. (Wiley-Interscience, 2007)Google Scholar - 10.F. James,
*Statistical Methods in Experimental Physics*, 2 nd edn. (World Scientific, Singapore, 2006)Google Scholar - 11.M.G. Cox et al., The generalized weighted mean of correlated quantities. Metrologia
**43**, S268 (2006)CrossRefADSGoogle Scholar - 12.F. James, M. Roos, MINUIT - a system for function minimization and analysis of the parameter errors and correlations. Comput. Phys. Commun.
**10**, 343 (1975)CrossRefADSGoogle Scholar - 13.L. Moneta et al., Recent improvements of the ROOT fitting and minimization classes. in
*Proceedings of ACAT08*, Erice (2008)Google Scholar - 14.P. Achard et al., (L3 Collaboration), Determination of \(\alpha _s\) from hadronic event shapes in \(e^+e^-\) annihilation at 192\(\le \sqrt{s}\le \)208 GeV. Phys. Lett. B
**536**, 217 (2002)Google Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Funded by SCOAP^{3} / License Version CC BY 4.0.