1 Introduction

The combination of a number of correlated estimates of a single observable is discussed in Ref. [1]. Here, the term estimate denotes a particular outcome (measurement) based on an estimator of the observable, which follows a probability density distribution (pdf). The particular estimate obtained may be a likely or unlikely outcome given that distribution. Repeating the measurement numerous times with identical conditions, the estimates will follow the underlying multi-dimensional pdf of the estimators.Footnote 1 The analysis [1] makes use of a \(\chi ^2\) minimisation to obtain the combined value expressed in the mathematically equivalent BLUE language.

Provided the estimators are unbiased, when applying this formalism the Best Linear Unbiased Estimate of the observable is obtained with the following meaning: Best: the combined result for the observable obtained this way has the smallest variance; Linear: the result is a linear combination of the individual estimates; Unbiased Estimate: when the procedure is repeated for a large number of cases consistent with the underlying multi-dimensional pdf, the mean of all combined results equals the true value of the observable. For a real situation, for which the estimates are obtained by experiments that cannot be repeated numerous times, when performing the combination one has to rely on this fact, although the combined value obtained from the particular estimates may be far away from the true value. This fact however should not be mistaken for a bias inherent to the method.

The equations to solve the problem for the general case of \(m\) estimates and \(n\) observables with \(m\ge n\) are given in Ref. [2]. They have been implemented in a software package [3] that is embedded into the ROOT analysis framework [4], but are not repeated here. However, the special case of two correlated estimates of the same observable is discussed in some detail. This is because already from this case the main features of the combination can easily be understood.

This paper is organised as follows: the case of two estimators and the consequences of the conditional probability is explained in Sect. 2. The equations for the combination of a pair of estimates are given in Sect. 3. This is followed by a discussion of the properties of the estimates to be combined in Sect. 4. The impact of assigning relative uncertainties is reviewed in Sect. 5. The concept of reduced correlations is outlined in Sect. 6, and other methods, constructed to maximise the variance of the combined result, are discussed in Sect. 7. Based on an example, the consequences of using these methods are discussed in Sect. 8. A detailed proposal on how to decide on a combination and how to perform investigations of its stability is given in Sect. 9. Finally, conclusions are drawn in Sect. 10.

2 Correlated estimators and conditional probabilities

Let \(X_\mathrm {1}\) and \(X_\mathrm {2}\) with variances \(\sigma _{1}^{2}\) and \(\sigma _{2}^{2}\) be two unbiased, but correlated Gaussian estimators of a true value \(x_{T}\). They obey the two-dimensional pdf \(\mathcal{{P}}(X_\mathrm {1}, X_\mathrm {2})\), with identical mean values \(\langle X_\mathrm {1} \rangle =\langle X_\mathrm {2} \rangle =x_{T} \) for the two estimators if calculated based on the entire pdf. With a correlation of the two estimators of \(\rho \) the pdf reads:

$$\begin{aligned} \mathcal{{P}}(X_\mathrm {1}, X_\mathrm {2})&= \frac{1}{\sqrt{2\pi }\sigma _{1}}\frac{1}{\sqrt{2\pi }\sigma _{2}} \frac{1}{\sqrt{1-\rho ^2}} \nonumber \\&\cdot \exp {\left\{ - \frac{1}{2(1-\rho ^2)} \left( \frac{(X_\mathrm {1}-x_{T})^2}{\sigma _{1}^{2}}\right. \right. } \nonumber \\&\left. \left. +\frac{(X_\mathrm {2} \!-\!x_{T})^2}{\sigma _{2}^{2}} \!-\! \frac{2\rho (X_\mathrm {1}-x_{T})(X_\mathrm {2}-x_{T})}{\sigma _{1} \sigma _{2}}\right) \right\} \nonumber \\ \end{aligned}$$
(1)

The outcome of a pair of data analyses using these estimators will be two estimates denoted by \(x_\mathrm {1}\) and \(x_\mathrm {2}\) that will occur according to this pdf.Footnote 2 The estimates will have variances of \(\sigma _{1}^{2}\) and \(\sigma _{2}^{2}\) assigned, and their correlation is \(\rho \). Without loss of generality it is assumed that \(X_\mathrm {1}\) is as least as precise an estimator of \(x_{T}\) than \(X_\mathrm {2}\) is, such that \(z\equiv \sigma _{2}/\sigma _{1} \ge 1\).

In combinations of estimates of physics observables the typical situation is that one estimate, here \(x_\mathrm {1}\), is available, and the question arises what the improvement will be if also the information from another estimate, here \(x_\mathrm {2}\), is used, rather than determining \(x_{T}\) and its uncertainty solely based on \(x_\mathrm {1}\). Therefore, it is important to understand what is the likely outcome of \(x_\mathrm {2}\) given the existence of \(x_\mathrm {1}\). This is most directly seen by analysing the conditional pdf for \(X_\mathrm {2}\) given \(X_\mathrm {1} =x_\mathrm {1} \) which reads:

$$\begin{aligned} \mathcal{{P}}(x_\mathrm {1}, X_\mathrm {2})&= \frac{1}{\sqrt{2\pi }\sigma _{1}}\exp {\left\{ -\frac{1}{2} \left( \frac{x_\mathrm {1}-x_{T}}{\sigma _{1}}\right) ^2\right\} } \nonumber \\&\cdot \frac{1}{\sqrt{2\pi }\sigma _{2} \sqrt{1-\rho ^2}}\nonumber \\&\cdot \exp {\left\{ -\frac{1}{2}\left( \frac{X_\mathrm {2}-\left[ x_{T} +\rho z(x_\mathrm {1}-x_{T})\right] }{\sigma _{2} \sqrt{1-\rho ^2}}\right) ^2\right\} }.\nonumber \\ \end{aligned}$$
(2)

A few facts are worth noticing, see also Refs. [5, 6], and a related discussion in Ref. [7]. Firstly, this conditional pdf for \(X_\mathrm {2}\) at a given fixed value of \(x_\mathrm {1}\) is no longer centred at \(\langle X_\mathrm {2} \rangle =x_{T} \) but at \(\langle X_\mathrm {2} \rangle =x_{T} +\rho z(x_\mathrm {1}-x_{T})\). Although \(X_\mathrm {2}\) in itself is an unbiased estimator, given the existence of the estimate \(x_\mathrm {1}\) and the correlation of the estimators, it is no longer distributed around the true value, except for the situation in which the value of the more precise estimate coincides with the true value, i.e. \(x_\mathrm {1} =x_{T} \). This is a mere consequence of the correlation. As intuitively expected, in the case of positively correlated estimates, if one estimate is larger (smaller) than \(x_{T}\) the other also more likely will be larger (smaller). For negatively correlated estimates the situation is reversed.

For \(\rho >0\), and depending on whether \(\rho z\) is larger (smaller) than unity, the mean \(\langle X_\mathrm {2} \rangle \) is even further away from (closer to) the true value \(x_{T}\) than \(x_\mathrm {1}\) is. Given that the distribution in \(X_\mathrm {2}\) is still symmetric around its mean, for \(\rho >1/z\) in more than half of the cases in which \(x_{T} < x_\mathrm {1} \) also \(x_{T} < x_\mathrm {1} < X_\mathrm {2} \) is fulfilled. Secondly, the variance of \(X_\mathrm {2}\) no longer amounts to the initial value of \(\sigma _{2}^{2}\) but it is reduced to \((1-\rho ^2) \sigma _{2}^{2} \) which vanishes for \(\rho = \pm 1\), again a consequence of the correlation. Finally, for \(\rho =0\) the original values of the mean and width of the pdf for \(X_\mathrm {2}\) are recovered.

Simulating the two-dimensional pdf \(\mathcal{{P}}(X_\mathrm {1}, X_\mathrm {2})\) using five million pairs of estimates, the consequences of the conditional probability for the example of individually unbiased estimators obeying \(\langle X_\mathrm {1} \rangle = \langle X_\mathrm {2} \rangle = x_{T} = 0\) are discussed. For uncertainties of \(\sigma _{1} =0.85\) and \(\sigma _{2} =1.15\), i.e. for \(z=1.35\), the results are shown in Fig. 1 for three different values of the correlation, \(\rho =0, 0.9, -0.9\). For the uncorrelated case, Fig. 1a, the half axes of the ellipses coincide with the coordinate axes. For any value of \(x_\mathrm {1}\), e.g. along the vertical red line shown, the conditional pdf is centred around \(X_\mathrm {2} =x_{T} \). A hypothetical outcome, namely the pair of estimates \(x_\mathrm {1}\) and \(x_\mathrm {2}\), is indicated by the red dot. Depending on the value of \(\rho \) this is a more or less likely outcome, as can be seen from the different colours of the pdf at the location of the point in Fig. 1. Numerically, for the three scenarios \(\rho =0, 0.9, -0.9\), the value of the pdf at the chosen point with respect to the maximum of the pdf, i.e. \(\mathcal{{P}}(x_\mathrm {1}, x_\mathrm {2}) / \mathcal{{P}}(0, 0) \), amounts to \(0.67, 0.48, 0.03\). Since for the chosen value of \(x_{T}\) this point lies in the upper right (i.e. first) quadrant, both estimates are larger than \(x_{T}\). Since the point is above the diagonal line, \(x_\mathrm {2}\) has been chosen to be larger than \(x_\mathrm {1}\), such that the order is \(x_{T} <x_\mathrm {1} <x_\mathrm {2} \). This means the true value is outside the interval given by the two estimates. Analysing the entire two-dimensional pdf one finds that, even for the uncorrelated case, for which the pdf is equally shared by the four quadrants in the \(X_\mathrm {1}\)\(X_\mathrm {2}\) plane, in half of all possible outcomes (namely in quadrants one and three), the true value does not fall within the interval spanned by the estimates, despite the fact that both estimators are unbiased and not correlated.Footnote 3

Fig. 1
figure 1

The two-dimensional pdf \(\mathcal{{P}}(X_\mathrm {1}, X_\mathrm {2})\) for three values of the correlation \(\rho \) obtained using five million pairs of estimates. The black line corresponds to \(X_\mathrm {1} =X_\mathrm {2} \), the red line to \(X_\mathrm {1} =x_\mathrm {1} \), and finally the dot to a particular pair of estimates chosen to be \(x_\mathrm {1} =0.30\) and \(x_\mathrm {2} =0.95\). The variable \(f_\mathrm {out}\) denotes the fraction of events for which \(x_{T}\) does not lie within the interval spanned by the pair of estimates. Shown are a \(\rho =0\), b \(\rho =0.9\), and c \(\rho =-0.9\). In bc the half axes shown in blue are changed and rotated (counter) clockwise from the positive \(X_\mathrm {2}\) axis

The situation of largely positively correlated uncertainties with \(\rho =0.9\), a situation frequently referred to as Peelle’s Pertinent Puzzle [8, 9], is shown in Fig. 1b. This time, due to the positive correlation, the ellipses is deformed and rotated clockwise from the positive \(X_\mathrm {2}\) axis with increasing rotation angle \(\theta \) for increasing \(\rho \) according to the following formula [6]:

$$\begin{aligned} \tan {2\theta }&= \frac{2\rho z}{1-z^2}. \end{aligned}$$

The shifted mean of the conditional pdf of \(X_\mathrm {2}\) given \(x_\mathrm {1}\) is apparent from the intersection of the ellipses with the vertical red line. In this case, since the ellipses is mostly contained in the first and third quadrant, only in about \(14\,\%\) of all cases the true value falls within the interval spanned by the two estimates. Only for negatively correlated estimates, Fig. 1c, for which the pdf mostly populates the second and fourth quadrant, the likely situation is that \(x_{T}\) lies within the interval spanned by the estimate, which in this case occurs for about \(86\,\%\) of all cases.

In practise, the typical situation occurring for the combination of two estimates of the same observable is that the estimates are positively correlated. This is especially likely for the situation of systematically dominated total uncertainties, and where both estimates suffer from the imperfect knowledge on the same sources of uncertainty. In this case the most likely place for the true value to lie is outside the interval spanned by the two estimates, a fact that should be kept in mind. The information on \(x_{T}\) that can be gained by adding the information from \(x_\mathrm {2}\) to the one from \(x_\mathrm {1}\) is discussed next.

3 The special case of two correlated estimates

Again, \(x_\mathrm {1}\) and \(x_\mathrm {2}\) with variances \(\sigma _{1}^{2}\) and \(\sigma _{2}^{2}\) obeying \(z=\sigma _{2}/\sigma _{1} \ge 1\) are two Gaussian estimates from two unbiased estimators of the true value \(x_{T}\) of the observable, and \(\rho \) denotes their total correlation with \(-1\le \rho \le 1\). In this situation the BLUE of \(x_{T}\) is:

$$\begin{aligned} x&= (1-\beta )\,x_\mathrm {1} + \beta \,x_\mathrm {2} \,, \end{aligned}$$
(3)

where \(\beta \) is the weight of the less precise estimate, and, by construction, the sum of weights is unity. The variable \(x\) is the combined result and \(\sigma _{x}^{2}\) denotes its variance, i.e. the uncertainty assigned to the combined value is \(\sigma _{x}\). In the following the derivation of the formulas for \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) within the BLUE formalism is repeated, see Ref. [1]. The covariance matrix for the general solution of the linear combinations in the BLUE formalism is given by Eq. 5 of Ref. [2]. For the studied case of two estimates of one observable it reduces to;

$$\begin{aligned} \sigma _{x}^{2}&= \left( \begin{array}{c} 1-\beta \\ \beta \\ \end{array} \right) ^\mathrm{T} \cdot \left( \begin{array}{cc} \sigma _{1}^{2} &{} \rho \,\sigma _{1} \,\sigma _{2} \\ \rho \,\sigma _{1} \,\sigma _{2} &{} \sigma _{2}^{2} \\ \end{array} \right) \cdot \left( \begin{array}{c} 1-\beta \\ \beta \\ \end{array} \right) \,, \end{aligned}$$
(4)

dividing by \(\sigma _{1}^{2}\) and inserting \(z\) yields:

$$\begin{aligned} \frac{\sigma _{x}^{2}}{\sigma _{1}^{2}}&= \left( \begin{array}{c} 1-\beta \\ \beta \\ \end{array} \right) ^\mathrm{T} \cdot \left( \begin{array}{cc} 1 &{} \rho z \\ \rho z &{} z^2 \\ \end{array} \right) \cdot \left( \begin{array}{c} 1-\beta \\ \beta \\ \end{array} \right) \,, \end{aligned}$$
(5)

multiplication results in:

$$\begin{aligned} \frac{\sigma _{x}^{2}}{\sigma _{1}^{2}}&= (1-\beta )^2 + 2\rho z \beta (1-\beta ) + \beta ^2z^2\nonumber \\&= 1 - 2\beta (1-\rho z) + \beta ^2 (1- 2\rho z +z^2), \end{aligned}$$
(6)

taking the derivative with respect to \(\beta \) equal to zero (i.e. the \(\chi ^2\) minimisation) gives:

$$\begin{aligned} \frac{\partial }{\partial \,\beta }\left( \frac{\sigma _{x}^{2}}{\sigma _{1}^{2}}\right)&= -2(1-\rho z) + 2\beta (1- 2\rho z + z^2) = 0. \end{aligned}$$
(7)

Finally, after solving for \(\beta \) one obtains:

$$\begin{aligned} \beta&= \frac{1 - \rho z}{1 - 2\rho z + z^{2}} = \frac{1 - \rho z}{(1 - \rho z)^2 + z^{2}(1-\rho ^2)} \, \end{aligned}$$
(8)

which is valid for \(-1 \le \rho \le 1\) and \(z \ge 1\), but for \(\rho = z = 1\).

The last term in Eq. 8 shows that the denominator of \(\beta \) is always positive such that the sign of \(\beta \) is determined by the sign of the numerator. The resulting \(\beta \) as a function of \(\rho \), and for various \(z\) values is shown in Fig. 2a. Identifying Eqs. 3 and 8 yields:

$$\begin{aligned} \frac{1}{2} \ge \beta&= \frac{x - x_\mathrm {1}}{x_\mathrm {2}- x_\mathrm {1}} = \frac{1 - \rho z}{1 - 2\rho z + z^{2}} \ge \frac{1}{1 - z}\,, \end{aligned}$$
(9)

where the left limit has been derived at \(\rho \ne 1\) \(z = 1\), and the right limit at \(\rho = 1\).

Fig. 2
figure 2

The results for Eqs. 81113 as functions of \(\rho \) for a number of \(z\) values. Shown are a, b \(\sigma _{x}\)/\(\sigma _{1}\) and their derivatives with respect to \(\rho \), c \(\partial \beta /\partial \rho \) and d \(1/\sigma _{1}\ \partial \sigma _{x}/\partial \rho \)

A few features are important to understand the results of the combination. As expected, the value of \(\beta \) has to be smaller or equal than 0.5, because otherwise \(x_\mathrm {2}\) would be the more precise estimate. Since the denominator in Eq. 8 is positive for all allowed values of \(\rho \) and \(z\), the function for \(\beta \) turns negative for \(\rho >1/z\) as shown in Fig. 2a. This is exactly the point at which for a given \(x_\mathrm {1}\) the conditional probability for \(X_\mathrm {2}\) to be even further away from \(x_{T}\) than \(x_\mathrm {1}\) is, exceeds 50\(\,\%\), see Sect. 2.

The first equal sign in Eq. 9 means that the value of \(\beta \) can be interpreted as the difference of the combined value from the more precise estimate in units of the difference of the two estimates. If \(\beta \) is positive, the signs of the numerator and denominator are identical and \(x\) lies within the interval spanned by \(x_\mathrm {1}\) and \(x_\mathrm {2}\). Given \(\beta \le 0.5\) it never lies further away from the more precise estimate than half the difference of the two. Again, this is expected since the more precise estimate should dominate the combination. In contrast, if \(\beta \) is negative, the signs of the numerator and denominator are different. This means the value of \(x\) lies on the opposite side of \(x_\mathrm {1}\) than \(x_\mathrm {2}\) does, or in other words, the combined value lies outside the interval spanned by the two estimates. Given the discussion about the conditional pdf in Sect. 2, a very desirable feature.

Inserting the result for \(\beta \) into Eq. 6 yields:

$$\begin{aligned} \frac{\sigma _{x}^{2}}{\sigma _{1}^{2}}&= 1 - 2\frac{(1-\rho z)^2}{1- 2\rho z + z^2} + \frac{(1-\rho z)^2}{1- 2\rho z + z^2}\nonumber \\&= \frac{(1- 2\rho z + z^2)-(1-\rho z)^2}{1-2\rho z+z^2}\,, \end{aligned}$$
(10)

which after evaluating the numerator and taking the square root gives:

$$\begin{aligned} \frac{\sigma _{x}}{\sigma _{1}}&= \sqrt{\frac{z^{2}(1 - \rho ^{2})}{1 - 2\rho z + z^{2}}}\,. \end{aligned}$$
(11)

The resulting \(\sigma _{x}\)/\(\sigma _{1}\), as a function of \(\rho \), and for various \(z\) values is shown in Fig. 2b. This variable quantifies the uncertainty of the combined value in units of the uncertainty of the more precise estimate, i.e. \(1-\sigma _{x}/\sigma _{1} \) is the relative improvement achieved by also using \(x_\mathrm {2}\), i.e. including the information contained in the less precise estimator. Consequently, \(\sigma _{x}\)/\(\sigma _{1}\) can be used to decide whether it is worth combining.

Since in the numerator of Eq. 10 the first term is identical to the denominator (which is always positive, see Eq. 8), and the second term is positive for all values of \(\rho \) and \(z\), the value of \(\sigma _{x}\)/\(\sigma _{1}\) is always smaller or equal to unity, as shown in Fig. 2b. Again this is expected, since including the information from the estimate \(x_\mathrm {2}\) should improve the knowledge on \(x\), which means its precision \(\sigma _{x}\). Not surprisingly, the value of \(\sigma _{x}\)/\(\sigma _{1}\) is exactly one for \(\rho = 1/z\), i.e. for \(\beta =0\). In this situation, the value of \(x_\mathrm {2}\) is irrelevant in the linear combination of Eq. 3, and consequently \(x=x_\mathrm {1} \) and \(\sigma _{x} =\sigma _{1} \). Finally, \(\sigma _{x}\)/\(\sigma _{1}\) is exactly zero if \(\rho =\pm 1\) in accordance with the variance of \(X_\mathrm {2} \) for the conditional PDF given \(x_\mathrm {1}\) and \(\rho \), shown in Sect. 2. This means that for the fully correlated or fully anti-correlated case of two estimators, given \(x_\mathrm {1}\), the result is known for sure, and the outcome of the second estimate has to be \(x_\mathrm {2} =x_{T} +\rho z(x_\mathrm {1}-x_{T})\). For combinations of experimental results, for which for all pairs of estimates there are also uncorrelated components of the uncertainty, this situation never happens.

The typical situation is that both \(\rho \) and \(z\) are only known with some precision. In this situation it is essential to analyse the sensitivity of the central value of the combination to this imperfect knowledge that is encoded in the respective derivatives. The derivatives of \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) with respect to the parameters \(\rho \) and \(z\) have been derived in this paper and are given in Eqs. 1215.

$$\begin{aligned} \frac{\partial \,\beta }{\partial \,\rho }&= \frac{z(1 - z^{2})}{(1 - 2\rho z + z^{2})^{2}}\end{aligned}$$
(12)
$$\begin{aligned} \frac{\partial \,\frac{\sigma _{x}}{\sigma _{1}}}{\partial \,\rho }&= z (z-\rho ) (1-\rho z) \sqrt{\frac{1}{(1-\rho ^{2})(1-2\rho z+z^{2})^{3}}} \end{aligned}$$
(13)
$$\begin{aligned} \frac{\partial \,\beta }{\partial \,z}&= \frac{\rho (1 + z^{2}) - 2z}{(1 - 2\rho z + z^{2})^{2}} \end{aligned}$$
(14)
$$\begin{aligned} \frac{\partial \,\frac{\sigma _{x}}{\sigma _{1}}}{\partial \,z}&= (1-\rho z) \sqrt{\frac{1-\rho ^{2}}{(1-2\rho z + z^{2})^{3}}} \end{aligned}$$
(15)

The resulting variations of the combined value, Eq. 3, are given in Eqs. 1617.

$$\begin{aligned} \frac{\partial \,x}{\partial \,\rho }&= (x_\mathrm {2}-x_\mathrm {1})\, \frac{\partial \,\beta }{\partial \,\rho }\end{aligned}$$
(16)
$$\begin{aligned} \frac{\partial \,x}{\partial \,z}&= (x_\mathrm {2}-x_\mathrm {1})\, \frac{\partial \,\beta }{\partial \,z} \end{aligned}$$
(17)

The derivatives of \(\beta \) and \(\sigma _{x}/\sigma _{1} \) with respect to \(\rho \) as functions of \(\rho \), and for various \(z\) values, Eqs. 12 and 13, are shown in Fig. 2c, d. The equations for \(\beta \) and \(\sigma _{x}/\sigma _{1} \), this time as a function of \(z\) and for various \(\rho \) values, are shown in Fig. 3a, b. Finally, the derivatives of \(\beta \) and \(\sigma _{x}/\sigma _{1} \) with respect to \(z\) as functions of \(z\), and for various \(\rho \) values, Eqs. 14 and 15, are shown in Fig. 3c, d. These derivatives can be used to visualise the sensitivity of the combined result to the imperfect knowledge on both the correlation \(\rho \) and the uncertainty ratio \(z\) of the individual estimators, and help to decide on whether to refrain from combining. This decision should only be based on the parameters of the combination but not on the outcome for a particular pair of estimates \(x_\mathrm {1}\) and \(x_\mathrm {2}\). This is because these parameters are features of the underlying two-dimensional pdf of the estimators, whereas the two specific values are just a pair of estimates, i.e. a single possible likely or unlikely outcome of results. A suggestion for how to proceed is given in Sect. 9.

Fig. 3
figure 3

The results for Eqs. 8111415 as functions of \(z\) for a number of \(\rho \) values. Shown are a \(\beta \) and b \(\sigma _{x}\)/\(\sigma _{1}\) and their derivatives with respect to \(z\), c \(\partial \beta /\partial z\) and d \(1/\sigma _{1}\ \partial \sigma _{x}/\partial z\)

4 Estimator properties

In general, in experimental analyses an estimator is constructed by studying Monte Carlo simulated events that are taken as data substitutes. Using those events it is verified that the estimator is unbiased. By applying the method to data, the measured value of the estimator, i.e. the estimate, e.g. \(x_\mathrm {1}\), is obtained together with its statistical uncertainty. Subsequently, individual systematic uncertainties are obtained for the estimator and assigned to the estimate. For example, in top quark mass measurements like Ref. [10], this is achieved, e.g. by changing the reconstructed objects like leptons and jets within their uncertainties, by altering the underlying Monte Carlo model for the signal, and by varying the background evaluations from data or simulations. In these procedures, the systematic variations per source \(k\) of uncertainty are chosen to be performed in an uncorrelated way from any other source \(k^\prime \), and the actual values of the uncertainties are considered one standard deviation Gaussian uncertainties. Consequently, the total systematic uncertainty is calculated as the square root of the quadratic sum of the contributions from the individual sources. Finally, the result is quoted as:

$$\begin{aligned} x_\mathrm {1}&= \mathrm {value}\,\pm \mathrm {stat}\,\pm \mathrm {\sqrt{\sum _{ k}{syst^2}}}. \end{aligned}$$
(18)

To enable their combination, the breakdown of systematic uncertainties is provided. Consequently, the features of the estimates are:

  1. 1.

    they are unbiased,

  2. 2.

    their uncertainties are assumed to be Gaussian,

  3. 3.

    the uncertainty sources are constructed to be uncorrelated.

The property (3) relates to the correlation of two sources (\(k, k^\prime \)) of uncertainties and should not be confused with the correlation \(\rho _{ijk}\) of two estimates (\(i, j\)) for the same source \(k\) of uncertainty. If there are physics reasons to believe that two sources (\(k, k^\prime \)) are indeed correlated, it is advisable to reconsider the separation of the uncertainty sources, because otherwise, using the quadratic sum of Eq. 18 is questionable.

When performing the combination of a pair \(ij\) of estimates, for each source \(k\) of uncertainty a correlation \(\rho _{ijk}\) has to be assigned for that pair. The statistical uncertainties are either uncorrelated, or, for the case of two estimates obtained from overlapping or even the same data events, their correlation can be obtained within the analysis by means of pseudo-experiments, as described e.g. in Ref. [10]. For the systematic uncertainties, the value of the assigned correlation always is a physics motivated choice that can only be made with some uncertainty. The easiest case occurs if the uncertainties of the estimators have been determined in exactly the same way, e.g. within one experiment while using the identical procedure for all estimates. In this case, the assumption of \(\rho _{ijk} =1\) is justified, and any observed difference in the size of uncertainty \(\sigma _{ik} \ne \sigma _{jk} \) is likely caused by the different sensitivities of the estimators to that particular source of uncertainty. The uncertainty of this correlation assumption can be assessed by varying the value of \(\rho _{ijk}\) within bounds to be chosen. Given the estimator property (3), for each source \(k\) this should be performed independently from all other sources.

A more complicate situation arises however, when combining estimates obtained by different experiments, which even may have partly been derived without knowledge on the procedure applied for the respective other result. Given the difference in strategy, there may be a smaller correlation. In addition, even for \(\rho _{ijk} =1\) differences in the size of the uncertainties can originate from a different size of variation performed for the two estimators. As an example, one experiment may perform larger variations of Monte Carlo parameters than another, an example of which can be found in Ref. [11]. In this situation given the different dependences of \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) on \(\rho \) and \(z\), the difference can not be accounted for by changes in \(\rho _{ijk}\), but the most appropriate choice is to vary \(\sigma _{ik}\) and/or \(\sigma _{jk}\).

Given the above, an individual assessment of the correlation assumptions per source \(k\), as is performed e.g. in Ref. [12], is strongly preferred. In contrast, any automated procedure of simultaneous variations very likely can not properly account for the specific situations of all sources \(k\). This is discussed in more detail in Sect. 7. In any case, all systematic variations on the assumptions should be performed obeying the features of the estimators listed above.

Frequently, the question arises whether a pair (\(i, j = 1, 2\)) of estimates is compatible. This can be decided upon using a \(\chi ^2\) that is defined as the squared ratio of the difference of the estimates, \(\Delta \), and its uncertainty, \(\sigma _\Delta \):

$$\begin{aligned} \chi ^2 (x_\mathrm {1}, x_\mathrm {2})&= \left( \frac{\Delta }{\sigma _\Delta }\right) ^2 = \frac{\left( x_\mathrm {1}-x_\mathrm {2} \right) ^2}{\sigma _{1}^{2} +\sigma _{2}^{2}-2\rho \sigma _{1} \sigma _{2}}\,, \end{aligned}$$
(19)

which is the significance of the difference of the estimates of being incompatible with zero. Alternatively, one may exploit the related \(\chi ^2\) probability for one degree of freedom, \(P(\chi ^2, 1)\), defined as the integral:

$$\begin{aligned} P(\chi ^2, 1)&= \int _{\chi ^2}^{\infty } \mathcal{{P}}({\chi ^{\prime }}^{2}, 1) \, \mathrm{d}{\chi ^{\prime }}^{2}, \end{aligned}$$
(20)

which is the probability for an even larger \(\chi ^2\) to occur for any other pair [6].

Ideally, only compatible estimates should be combined, otherwise the combined result is not trustworthy. Unfortunately, given the statistical nature of the problem, the question of compatibility of a given pair of estimates can not be answered unambiguously, i.e. for a single pair of estimates it is impossible to decide whether this is an unlikely case given the underlying pdf, or an incompatible case. In turn this also means that no conclusions on properties of the estimator distribution or even the combination method can be drawn solely based on specific pairs of estimates and the result of their combination.

For the situation of a larger number of estimates to be combined, it is advisable to inspect the distribution of the \(\chi ^2\) values of the pairwise compatibility tests calculated from Eq. 19 that should resemble a \(\chi ^2\) distribution for one degree of freedom. For pairs resulting in large \(\chi ^2\) values, the analysis procedures applied in obtaining the uncertainties should be investigated in detail for detecting possible incompatibilities. The outcome for a specific example is discussed in the next section.

Finally, the global \(\chi ^2\) of the combination, i.e. the quantity minimised, for \(i, j=1,\ldots , m\) estimates of a single observable \(x_{T}\), yielding a combined value \(x\) and with an inverse covariance matrix \(V^{-1}\) is defined as:

$$\begin{aligned} \chi ^2&= \sum _{i=1}^{m}\sum _{j=1}^{m} (x_\mathrm {i}-x) V_{ij}^{-1} \left( x_\mathrm {j}-x\right) \!. \end{aligned}$$
(21)

This \(\chi ^2\) is a single number per combination that should be small for a compatible set of estimates for the observable under investigation.

5 Relative uncertainties

The formulas described above are only valid for Gaussian estimators with absolute uncertainties \(\sigma _{ik}\) for all sources. Here, the term absolute uncertainty means that the value of the uncertainty is identical for all possible values of the estimator pdf, i.e. it is independent of the actual value of the estimate. This means it is the same for the actual estimate, any combined value, and the true value, such that \(\sigma _{i} =\sigma _{i} (x_\mathrm {i})=\sigma _{i} (x)=\sigma _{i} (x_{T})\). Therefore, irrespectively of whether it was calculated for the estimate, it also applies to the combined value. In contrast, a relative uncertaintyFootnote 4 (e.g. of some percent) depends on the actual value of \(x_{T}\). Consequently, for relative uncertainties, the uncertainty assigned to the estimate, \(\sigma _{i} =\sigma _{i} (x_\mathrm {i})\), is formally incorrect, since it should correspond to the uncertainty of the estimator pdf, i.e. \(\sigma _{i} =\sigma _{i} (x_{T})\), which has a different value.

Within the BLUE method this can be accounted for approximately by performing the combination in an iterative way, see Ref. [9, 13]. In this procedure, starting from the initially assigned value, after each iteration the uncertainty is replaced by the expected uncertainty of the true value \(x_{T}\), approximated by the one of the combined value \(x\). For most applications, for a given source \(k\) of systematic uncertainty, a linear dependence of the uncertainty \(\sigma _{ik}\) on \(x\) is assumedFootnote 5, however, there also exist more complicate cases like the one discussed in Ref. [13].

It is worth noticing that during the iterations the originally assigned uncertainties of the estimates are altered, albeit at unchanged correlation assumptions. For example, when using the same linear dependence for all estimates \(i\) and a given source of uncertainty \(k\), this means that after the first iteration the uncertainty from this source is identical for all estimates, and finally, at convergence its value amounts to a given fraction of the combined result. Assuming this behaviour for all uncertainties of a pair of estimates leads to \(z=1\). This results in \(\beta =0.5\), see Eq. 8, for all possible values of \(\rho \), and the combination reduces to averaging the estimates, i.e. \(x=(x_\mathrm {1} +x_\mathrm {2})/2\), irrespectively of their correlation. Solely the uncertainty \(\sigma _{x}\) depends on the value of the correlation, i.e. Eq. 11 reduces to \(\sigma _{x}/\sigma _{1}\ = \sqrt{(1+\rho )/2}\). An example of this situation is Peelle’s Pertinent Puzzle.

Numerically, the difference of using absolute or relative uncertainties rarely is of importance, especially so when combining consistent precision measurements. This is because a difference of \(n\,\%\) between the estimates and the combined value only results in a relative change of \(n\%\) in \(\sigma _{ik}\). Given that \(\sigma _{ik}\) in itself is small compared to \(x_\mathrm {i}\), this likely ends up in very small differences in \(x\) and \(\sigma _{x}\), in any case well below the size of the respective uncertainty.

At first sight a counter example is the original formulation of Peelle’s Pertinent Puzzle [8, 9]Footnote 6, for which the estimates are given in Table 1 scenario \(\mathcal {A}\). This puzzle, however restricted to situations like scenario \(\mathcal {A}\) and investigating different models for the uncertainties, has been discussed in the literature, see e.g. [9, 14, 15].

Table 1 Comparison of the combinations for Peelle’s Pertinent Puzzle for the BLUE method with absolute and relative uncertainties using various scenarios for the estimates and their correlation. The five scenarios analysed are: \(\mathcal {A}\) the original values for the estimates \(i=1, 2\), uncertainties, \(k=0, 1\) and correlations \(\rho _{12k}\) with \(\rho _{120} =0\) and \(\rho _{121} =1\), \(\mathcal {B} =\mathcal {A} \) but with all uncertainties scaled by a factor two, \(\mathcal {C} ~(\mathcal {D})=\mathcal {A} \) but with a changed value for the second estimate and with the original (rescaled) uncertainties, and \(\mathcal {E} =\mathcal {A} \) but with a decreased value of the assumed correlation for the systematic uncertainty, i.e. for \(k=1\). The estimates are listed together with their uncertainties. In addition given are the parameters and results of the combination

For scenario \(\mathcal {A}\), the statistical uncertainties are uncorrelated and the systematic uncertainties are fully correlated, which results in \(\rho = 0.8\). Given that a percentage uncertainty of \(10\,\%\, (20\,\%)\) is quoted for the statistical (systematic) uncertainty, the ratio of the total uncertainty equals the ratio of the estimates, i.e. \(z = 1.5\). The \(\chi ^2\) of the two estimates, calculated from Eq. 19, is large, i.e. \(\chi ^2 (x_\mathrm {1}, x_\mathrm {2})=5.9\) and \(P(\chi ^2, 1) =1.5\,\%\), which means whatever method is used, a combination of this pair of estimates is questionable.

Given the procedures applied to obtain the systematic uncertainty it should be possible to decide whether this source is an absolute or relative uncertainty. Here, the combination is performed for both assumptions, i.e. using either absolute and relative uncertainties for all sources of uncertainty, see also Ref. [9]. The results are listed in Table 1, scenario \(\mathcal {A}\). In the case of relative uncertainties, given the combined value, the final statistical (systematic) uncertainties assigned to the estimates are \(0.13\) (\(0.25\)), i.e. they are equal for both estimates and different from the values quoted in the upper part of the table. The resulting corresponding uncertainties of the combined result are \(0.09\) and \(0.25\), respectively. Due to the changes in uncertainties, for the BLUE method with relative uncertainties the \(\chi ^2\) of the two estimates, calculated from the finally assigned uncertainties, is even larger, i.e. \(\chi ^2 (x_\mathrm {1}, x_\mathrm {2})=8.0\) and \(P(\chi ^2, 1) =0.5\,\%\). As explained above, by construction, the combined result is the mean of the two estimates.

To assess the significance of the difference of the two combined results obtained with the two combination methods, utilising the \(\chi ^2\) of Eq. 19, the correlation of the two results has to be calculated. In general, given the iterative procedure of the BLUE method with relative uncertainties this can not be calculated analytically from the inputs to the combination, but has to be obtained numerically by performing numerous combinations. To do so an underlying estimator distribution \(\mathcal{{P}}(X_\mathrm {1}, X_\mathrm {2})\) has to be constructed, see Eq.1. All models of \(\mathcal{{P}}(X_\mathrm {1}, X_\mathrm {2})\) investigated here are based on the uncertainties and the correlations \(\rho _{12k}\) of the estimates for the two sources \(k\) of uncertainty as given in Table 1. In addition, a true value has to be assumed, together with an uncertainty model, based on either absolute or relative uncertainties. To ensure that the conclusions are neither biased towards the uncertainty model chosen, nor to a specific value of \(x_{T}\), six estimator distributions \(\mathcal{{P}}(X_\mathrm {1}, X_\mathrm {2})\) are investigated. They assume either absolute or relative uncertainties for three assumptions on the true value, namely \(x_{T} = 0.75, 1, 1.25\), thereby spanning the entire range of results obtained for all scenarios listed in Table 1 and both uncertainty models.

Technically, the pdfs are based on Eq.1 at a given value of \(x_{T}\). The values for the uncertainties are taken from the upper part of Table 1. When simulating the absolute uncertainty model those uncertainties are taken at face value, whereas for the relative uncertainty model the fractions are retained, i.e. the uncertainties from Table 1 are scaled to the corresponding value of \(x_{T}\). Finally, the correlation of the estimators is obtained from the covariance and the total uncertainties assigned. For a given pair of estimates generated, before performing the combination, uncertainties have to be assigned to the estimates. When simulating the absolute uncertainty model, the uncertainties from the pdf are kept. When instead simulating the relative uncertainty model, the uncertainties are rescaled to the estimates to be combined.

As an example, for scenario \(\mathcal {A}\), for \(x_{T} = 1\) and assuming the model of absolute uncertainties the results are visualised in Fig. 4. Figure 4a shows the predicted two-dimensional distribution for one hundred thousand pairs of estimates given the model. The red point in Fig. 4a indicates the pair of estimates from the original Puzzle, which, if it is assumed to stem from this pdf, is an unlikely outcome. In addition listed in Fig. 4a are the mean values and uncertainties of the estimator distributions together with their correlation. By construction they coincide with the values in Table 1, proving the consistency of the simulation. The corresponding \(\chi ^2\) distribution of the pairs of estimates shown in Fig. 4b exhibits the steep fall off expected for pairs of estimates consistent with stemming from this two-dimensional pdf. Here, this is achieved by construction. For a set of compatible experimental estimates to be combined, ideally a similar distribution for the pairwise \(\chi ^2\) values obtained from Eq. 19 should be observed. In comparison, the corresponding \(\chi ^2\) value for the original pair of estimates is rather large, which makes it an unlikely case given this pdf, i.e. only in about 1.6 \(\%\) of the cases a larger \(\chi ^2\) will be observed. This observation holds for both uncertainty models, and also does not depend on the chosen value of \(x_{T}\) since this only moves the ellipses along the diagonal.

Fig. 4
figure 4

Results of Peelle’s Pertinent Puzzle for scenario \(\mathcal {A}\) for one hundred thousand pairs of estimates. The simulation is based on a hypothetical two-dimensional pdf assuming \(x_{T} =1\), using the uncertainties and correlation of the estimates from this scenario, and simulating absolute uncertainties. Shown are a the two-dimensional distribution of the pairs of estimates, b the \(\chi ^2 (X_\mathrm {1}, X_\mathrm {2})\) of the pairs of estimates, c the two-dimensional distribution of the pairs of combined results when using either absolute uncertainties (X), or relative uncertainties (Y), and d the \(\chi ^2 (X, Y)\) of the pairs of results. Both \(\chi ^2\) distributions are truncated at \(\chi ^2 =8\). The red points correspond to the estimates (a) and combined results (c) for this scenario, see Table 1. In addition listed for the estimates are in a their mean values and uncertainties together with their correlation, and in b the fraction of pairs for which the \(\chi ^2\) value exceeds the one observed for this scenario. The analogous quantities for the combined results are given in c and d, respectively

For a given combination, the combined results are denoted by \(x\) (\(y\)) when assuming absolute (relative) uncertainties in the combination procedure (which are assigned irrespectively of the assumed uncertainty model of the pdf). Their two-dimensional distribution \(\mathcal{{P}}(X, Y)\) is shown in Fig. 4c. It is found that for each estimator distribution chosen, the respective combination method is unbiased, whereas the other method shows a bias. For the example shown, assuming the uncertainty model with absolute uncertainties results in \(\langle X \rangle = x_{T} \) for the BLUE combination, whereas, in this case the BLUE method with relative uncertainties has a bias, i.e. \(\langle Y \rangle = x_{T} + 0.03\). This is caused by the fact that, given the underlying absolute uncertainty model of the pdf, the wrong uncertainty model is assumed when performing the combination. However, this bias is insignificant, given the size of the statistical uncertainty. This conclusion applies to all scenarios and all six models for the estimator distribution. In all cases the combined results from the two methods are highly correlated, and the mean values differ by less than the statistical uncertainty of the combination method that shows the bias.

The red point in Fig. 4c denotes the pair of combined results from the original Puzzle, which lies far away from the ellipses. The correlation of the combined results from the two methods is deduced from all pairs of estimates, combining them with both prescription, and calculating the correlation of the two-dimensional distribution \(\mathcal{{P}}(X, Y)\). For this pdf the correlation amounts to 0.96. Figure 4d shows the \(\chi ^2\) distribution for all pairs of results, again a steeply falling distribution. Using the correlation obtained from the simulation, the resulting value for the original pair is \(\chi ^2 (x, y) = 18.8\) which sits in the tail of this distribution, i.e. only in about 0.6\(\%\) of the cases a larger \(\chi ^2\) will be observed.

Applying all six models to scenario \(\mathcal {A}\), the correlation of \(\mathcal{{P}}(X, Y)\) varies from 0.92 to 0.98, and the corresponding \(\chi ^2 (x, y)\) values for the original puzzle range from \(11-27\). Given this, for all models the pair of results is not very likely or incompatible. However, as demonstrated for example by the results in Fig. 4, this is a mere consequence of the very unlikely or incompatible input and not of the differences of the method. This can be more clearly seen by analysing the additional scenarios \(\mathcal {B}-\mathcal {E} \) given in Table 1. They are designed to artificially improve the compatibility of the input, while using different aspects of the estimates. The parameters of the combinations depend on \(\rho \) and \(z\) such that they only change, if one of those changes. The value of \(\rho \) is defined by the scenario, and, due to the simultaneous scaling of both uncertainty sources, in this case is not altered by any of the methods, see Table 1. In contrast, for given initial values of the uncertainties and correlations per source, the value of \(z\) of the BLUE method with absolute uncertainties is altered by the BLUE method with relative uncertainties. This is caused by the dependence of the estimator uncertainties on the combined value, as can be seen e.g. by comparing the \(z\) values for both methods for scenario \(\mathcal {C}\). Given this, the uncertainty of the combined result of the BLUE method with relative uncertainties depends on the values of the estimates, i.e. on the likeliness of this particular experimental outcome, given an underlying pdf.

In these additional scenarios, the estimates are altered by either changing: \(\mathcal {B}\) the size of the uncertainties, \(\mathcal {C}\), \(\mathcal {D}\) the value of the less precise estimate, and \(\mathcal {E}\) the correlation of the systematic uncertainties. The target value of the estimate compatibility for the BLUE method with absolute uncertainties was a \(\chi ^2 (x_\mathrm {1}, x_\mathrm {2})\) of about 1.5.

For scenario \(\mathcal {B}\) the uncertainties are doubled. For none of the methods does this change the relative importance of the estimates, however it improves their compatibility. For scenarios \(\mathcal {C}\), \(\mathcal {D}\) the value of the less precise estimate \(x_\mathrm {2}\) is reduced to make it more compatible with \(x_\mathrm {1}\). The difference of the two scenarios is that, motivated by the absolute uncertainty model, in \(\mathcal {C}\) the changed value for \(x_\mathrm {2}\) is considered another possible outcome, namely a value consistent with the conditional pdf for \(X_\mathrm {2}\). Consequently, the originally assigned uncertainties are kept. In contrast, for scenario \(\mathcal {D}\), this time motivated by the relative uncertainty model, the uncertainties are scaled to amount to the same fractional uncertainties as were originally assumed in \(\mathcal {A}\). Again the compatibility of the estimates and of the combined results is improved. For scenario \(\mathcal {D}\), by construction, all parameters of the combined result obtained using relative uncertainties are identical to the ones in scenario \(\mathcal {A}\). The combined value and its uncertainty are different, because the mean is changed due to the changed estimate \(x_\mathrm {2}\). For scenario \(\mathcal {E}\) the correlation is reduced, yielding a similar level of agreement of the estimates. Here, again by construction, the combined result obtained using relative uncertainties is identical to the one in scenario \(\mathcal {A}\), but for its uncertainty which is reduced due to the smaller correlation of the estimates.

The parameters of the combination in Table 1 show that for the BLUE method the sensitivity of scenarios \(\mathcal {A}-\mathcal {C} \) on \(\rho \) and \(z\) are identical, such that the related conclusions drawn will not depend on the scenario. In addition, the derivatives reveal the fact that for the BLUE combination with relative uncertainties, for all scenarios that retain the initial relative uncertainties, the weights of the estimates are independent of \(\rho \) but have a large sensitivity to \(z\). For the BLUE combination the situation is rather different. Here, the weights have a much larger dependence on \(\rho \) than on \(z\).

For the additional scenarios, the resulting compatibilities of the combined results are estimated as described above for scenario \(\mathcal {A}\). As an example of the six estimator distributions investigated, the \(\chi ^2 (x, y)\) values for \(x_{T} = 1\) and using the absolute uncertainty model are: \(18.8, 1.5, 1.4, 3.4, 1.3\) for scenarios \(\mathcal {A}\), \(\mathcal {B}\), \(\mathcal {C}\), \(\mathcal {D}\), \(\mathcal {E}\), which means that the differences of the methods strongly diminish when using a more compatible input. This observation does only weakly depend on the underlying estimator distribution, i.e. although the \(\chi ^2\) values differ, the pattern of the \(\chi ^2\) values for the different scenarios is very similar for all six cases. For all estimator distributions, firstly, the by far largest \(\chi ^2\) value is observed for scenario \(\mathcal {A}\). Secondly, for the remaining scenarios with the same correlation of the estimators, i.e. for scenarios \(\mathcal {B}-\mathcal {D} \), scenario \(\mathcal {C}\) in all but one case has the smallest \(\chi ^2\) value. Thirdly, within a given uncertainty model for the estimator distribution, scenario \(\mathcal {E}\), has either the smallest or the second smallest \(\chi ^2\) value of all scenarios.

Finally, applying all models to scenarios \(\mathcal {B}-\mathcal {E} \), the correlation of \(\mathcal{{P}}(X, Y)\) varies from \(0.78-0.98\) (\(0.94-0.97\)), for the estimator distributions with absolute (relative) uncertainties. The corresponding \(\chi ^2 (x, y)\) values for the remaining scenarios in Table 1 range from \(0.7-4.8\) and (\(1.2-4.3\)), respectively, i.e. they are much smaller than what was observed for scenario \(\mathcal {A}\). Consequently, the apparently large difference observed for the two combined results for scenario \(\mathcal {A}\) is not caused by the differences in the methods, but by the unlikeliness of the specific pair of estimates for all scenarios, i.e. the incompatibility of the input to the BLUE combination.

As an alternative solution, for each scenario in Table 1, the most likely \(x_{T}\) given the estimates \(x_\mathrm {i}\), their uncertainties \(\sigma _{i}\) and correlation \(\rho \) is obtained from a maximum likelihood fit using Eq.1 for \(X_\mathrm {i} =x_\mathrm {i} \) as the likelihood function. Two likelihood functions are constructed. The result for \(x_{T}\) of those should be compared to the combined values \(x\) from the BLUE method with absolute and relative uncertainties, respectively. The first likelihood uses constant values for the \(\sigma _{i}\). In contrast, for the second likelihood, in view of the relative uncertainty model, the uncertainties are chosen to depend on \(x_{T}\) according to the given fractional uncertainties for the scenarios in Table 1, such that \(\sigma _{i} =\sigma _{i} (x_{T})=x_{T} \,\sigma _{i}/x_\mathrm {i} \) varies with \(x_{T}\). By construction, the results for \(x_{T}\) from the first likelihood are identical to the combined values \(x\) of the BLUE method with absolute uncertainties, since the likelihood is a Gaussian, i.e. it corresponds to the situation for which the BLUE formulas were derived. The second likelihood has non Gaussian tails, and consequently, the results for \(x_{T}\) differ from the combined values \(y\) of the BLUE method with relative uncertainties, which is only an approximation. The results of the second likelihood are \(x_{T} =1.53, 1.25, 1.03, 1.15, 1.24\) for scenarios \(\mathcal {A}\), \(\mathcal {B}\), \(\mathcal {C}\), \(\mathcal {D}\), \(\mathcal {E}\). The corresponding symmetrised uncertainties are \(0.34, 0.44, 0.22, 0.23, 0.19\). Apart from the unlikely scenario \(\mathcal {A}\), the values for \(x_{T}\) nicely agree with the combined values from the BLUE combination with relative uncertainties, see Table 1. This demonstrates the quality of the approximation for consistent pairs of estimates. The uncertainties obtained from the likelihood and the BLUE combination with relative uncertainties differ more strongly for scenarios \(\mathcal {A}\), \(\mathcal {B}\) that have the largest non Gaussian contributions, whereas for the remaining scenarios they are almost identical. This ends the discussion of Peelle’s Pertinent Puzzle.

The definition of whether a given source of uncertainty is an absolute and relative uncertainty has to be made in view of the actual procedure followed to determine this uncertainty. Nevertheless, as purely numerical examples, and without any physics motivation, for a number of examples of publicly available combinations, to evaluate the numerical importance for real applications, the results for both assumptions are given below. All values quoted follow the convention of Eq. 18. The two examples for which originally relative uncertainties are assigned are the combination of lifetimes of \(B\) mesons [13], and of the cross-section for single top quark production at the LHC [12]. In these cases for comparison absolute uncertainties are assumed for all sources. The two examples for which originally absolute uncertainties are assigned are the latest combinations of the measurements of the top quark mass \(m_{\mathrm {top}}\), performed at the Tevatron [16] and the LHC [11]. In these cases for comparison relative uncertainties are assumed for all sources of systematic uncertainties.

The corresponding results are for the \(B\)-lifetime:

$$\begin{aligned} \tau ~[\mathrm{ps}]&= 1.13\,\pm 0.09\,\pm 0.11 \quad \text{(relative) } \nonumber \\ \tau ~[\mathrm{ps}]&= 1.13\,\pm 0.09\,\pm 0.09 \quad \text{(absolute) }, \end{aligned}$$
(22)

for the single top quark production cross-section:

$$\begin{aligned} \sigma ~[\mathrm{pb}]&= 85.3\,\pm 4.1\,\pm 11.5 \quad \text{(relative) } \nonumber \\ \sigma ~[\mathrm{pb}]&= 83.7\,\pm 4.6\,\pm 11.2 \quad \text{(absolute) }, \end{aligned}$$
(23)

for \(m_{\mathrm {top}}\) measured at the Tevatron:

$$\begin{aligned} m_{\mathrm {top}} ~[\mathrm {GeV} ]&= 173.21\,\pm 0.51\,\pm 0.71 \quad \text{(absolute) } \nonumber \\ m_{\mathrm {top}} ~[\mathrm {GeV} ]&= 173.26\,\pm 0.51\,\pm 0.71 \quad \text{(relative) }, \end{aligned}$$
(24)

and finally, for \(m_{\mathrm {top}}\) measured at the LHC:

$$\begin{aligned} m_{\mathrm {top}} ~[\mathrm {GeV} ]&= 173.29\,\pm 0.23\,\pm 0.92 \quad \text{(absolute) } \nonumber \\ m_{\mathrm {top}} ~[\mathrm {GeV} ]&= 173.30\,\pm 0.23\,\pm 0.92 \quad \text{(relative) }. \end{aligned}$$
(25)

In all cases the difference of the pair of results is small compared to their statistical uncertainties. The results on \(m_{\mathrm {top}}\) are almost indistinguishable, and at the quoted precision the uncertainties are identical.

This ends the discussion about relative uncertainties. In the remainder of the paper only absolute uncertainties are considered.

6 The concept of reduced correlations

Reduced correlations postulate that for each pair of estimates, e.g. the pair (1, 2), that are positively correlated for a given source of uncertainty \(k\), i.e. \(\rho _{12k} >0\), the smaller of the individual uncertainties, e.g. \(\sigma _{1k} <\sigma _{2k} \), is fully correlated, and the remainder is uncorrelated. This replaces the covariance \(\rho _{12k} \sigma _{1k} \sigma _{2k} \) by the square of the smaller of the individual uncertainties, e.g. \(\sigma _{1k}^{2}\) for this source, see e.g. Ref. [17]. This is equivalent to assuming the correlation to amount to the ratio of the smaller to the larger uncertainty, \(\rho _{12k} =\sigma _{1k}/\sigma _{2k} =1/z_{12k} \).

The impact of this concept can be seen by analysing the contribution of the source \(k\) to the covariance matrix separated into the postulated uncorrelated (u) and correlated (c) parts that reads:

$$\begin{aligned} V_{k}&= \left( \begin{array}{cc} \sigma _{1k}^{2} &{} \rho _{12k} \sigma _{1k} \sigma _{2k} \\ \rho _{12k} \sigma _{1k} \sigma _{2k} &{} \sigma _{2k}^{2} \\ \end{array} \right) \nonumber \\&= \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} \sigma _{2k}^{2}-\sigma _{1k}^{2} \\ \end{array} \right) _\mathrm {u} + \left( \begin{array}{cc} \sigma _{1k}^{2} &{} \sigma _{1k}^{2} \\ \sigma _{1k}^{2} &{} \sigma _{1k}^{2} \\ \end{array}\right) _\mathrm {c} \end{aligned}$$
(26)

By construction, this effectively replaces one source of uncertainty by two and assigns zero (full) correlation to the first (second), i.e. \(\sigma _{1k}^{2} =1\cdot \sigma _{1k} \sigma _{1k} \). Typically, it is suggested to apply this concept to sources for which the initially assigned correlation of the estimates was \(\rho _{12k} =1\), or at least \(\rho _{12k} >1/z_{12k} \). This is because in this situation the correlation is always reduced with respect to the initial value, hence the name.

If this source is the only uncertainty, this will lead to \(\beta =0\). For the case in which \(\rho _{12k} \ge 0\) for all \(k\), with an arbitrary number of sources, and applying the concept to all sources with \(\rho _{12k} > 0\) (i.e. an unfavourite situation in which the correlation is partly even increased), the covariance with reduced correlations reads:

$$\begin{aligned} \rho _\mathrm {red} \sigma _{1} \sigma _{2}&= \! \sum _{\sigma _{1k}^{2} <\sigma _{2k}^{2}} \!\! \sigma _{1k}^{2} + \! \! \sum _{\sigma _{2k}^{2} <\sigma _{1k}^{2}} \!\! \sigma _{2k}^{2} \le \! \sum _{\rho _{12k} > 0} \!\! \sigma _{1k}^{2} \le \sigma _{1}^{2} \end{aligned}$$
(27)

where \(\rho _\mathrm {red}\) is the total reduced correlation of the pair of estimates. The first (second) term sums the variances of the sources for which initially the estimates were positively correlated and for which \(x_\mathrm {1}\) (\(x_\mathrm {2}\)) has the smaller uncertainty. If the second estimate does not have a smaller uncertainty for any of these sources, for the first inequality the equal sign is realised, otherwise replacing some \(\sigma _{2k}^{2}\) by \(\sigma _{1k}^{2}\) in the second sum will increase the covariance. Finally, if there are also no sources of uncertainty for which initially the estimates were taken as uncorrelated, for the second inequality the equal sign is valid. In any case, comparing the first and last terms the result is:

$$\begin{aligned} \rho _\mathrm {red}&\le \frac{\sigma _{1}}{\sigma _{2}} = \frac{1}{z}, \end{aligned}$$
(28)

which means \(\beta \ge 0\) is ensured by the method. This is also true if initially the total correlation is smaller, i.e. there are in addition sources for which the estimates are negatively correlated, or if the method is only applied to sources with \(\rho _{12k} > 1/z_{12k} \). As a consequence, by construction \(x\) is always within \(x_\mathrm {1}\) and \(x_\mathrm {2}\). However, as has been shown above, due to the conditional probability, the true value \(x_{T}\) is outside this interval in the majority of all cases.

Apart from this deficiency, also from physics arguments this procedure is questionable as can be seen from an example. Lets assume there are two estimates of the same experiment, which suffer from the same source of uncertainty (lets say an energy scale uncertainty), but apply different phase space requirements, e.g. on the jet transverse momentum \(p_\mathrm {t}\). Typically, the uncertainty on these scales decrease with increasing \(p_\mathrm {t}\), such that the estimate with the stronger requirement will have the smaller uncertainty. The method now effectively assigns a correlation to the uncertainty from this source, which is zero (one) for \(p_\mathrm {t} <p_\mathrm {t,min} \) (\(p_\mathrm {t} >p_\mathrm {t,min} \)), where \(p_\mathrm {t,min}\) is the larger of the two minimum transverse momenta required for the two estimates, see Eq. 26. As a result, firstly, the limit of the correlation for \(p_\mathrm {t} =p_\mathrm {t,min} \) from above and below is different. Secondly, the uncertainty slightly below \(p_\mathrm {t,min}\) is by construction independent of the one slightly above. Given that the value of \(p_\mathrm {t,min}\) is arbitrary, and that the facts that lead to the uncertainty of the energy scale do not disappear across the threshold, an unphysical situation. This is an example, where for \(\rho _{12k} =1\) a difference in \(z_{12k}\) is attempted to be cured by an ad hoc change in \(\rho _{12k}\). However, the dependence of \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) on \(\rho \) and \(z\) are different.

It is worth noticing that this method is not a conservative approach, in the sense that applying it would always lead to an increased variance of the combined result \(\sigma _{x}^{2}\). If, for a given \(z\), e.g. the value of \(\rho \) is only slightly larger than \(1/z\), the resulting \(\rho _\mathrm {red}\) may be much smaller than \(1/z\), such that \(\sigma _{x}\)/\(\sigma _{1}\) is actually reduced from its initial value, see Eq. 11 and Fig. 2b. Consequently, the uncertainty assigned to the combined value by using reduced correlations may be either larger or smaller, depending on the initial value of \(\rho \) and the size of the reduction. For a specific example the impact is evaluated below.

7 Methods to maximise the variance

On top of the reduced correlations discussed in the previous section, an even more rigorous way to avoid estimates with negative BLUE weights is the choice to simply exclude those estimates from the combination. A recipe of how to proceed if this is desired is given in Ref. [18]. However, this ad hoc choice does not respect the consequence of the conditional probability and consequently is disfavoured.

In addition to the above, a number of methods have been suggested to arrive at a conservative combined estimate, i.e. to maximise the variance of the combined result \(\sigma _{x}^{2}\). All attempts work by reducing the correlation in an artificial, but controlled way. Given Fig. 2 they will only be active for \(\rho >1/z\) which means \(\beta <0\). The three methods suggested in Ref. [18] multiply the initially assigned correlations per source \(k\) for any pair \(ij\) of estimates by factors \(f_{ijk}\). These factors are either chosen:

  1. (i)

    globally, \(f_{ijk} = f\) for all \(i,j,k\),

  2. (ii)

    per uncertainty source, \(f_{ijk} = f_{k} \) for all \(i,j\),

  3. (iii)

    per pair of estimates, \(f_{ijk} = f_{ij} \) for all \(k\).

All methods are not flexible enough to incorporate the different knowledge on the correlations that will be available for different pairs of estimates and different sources of uncertainty. In addition, they do not obey some of the properties of the estimates outlined in Sect. 4. More specifically, by varying the correlation for all sources simultaneously the method (i) does not obey property (3) of the estimates, namely that all sources of uncertainties are assumed to be uncorrelated. It also does not take into account that the knowledge on the correlation may differ from source to source. As an example, for the combination of \(m_{\mathrm {top}}\) in Ref. [11], the uncertainties related to the colour reconnection and to the background determined from Monte Carlo are both assumed to be fully correlated between all estimates. However, there is no physics reason to believe that the two sources of uncertainty are correlated. Consequently, if the correlation assumption is changed e.g. for the colour reconnection by using \(f=0.9\), there is no reason to simultaneously apply the same factor to the uncertainty from the background determinations, which would however be enforced when using method (i). In contrast, if there are physics arguments to vary two sources of uncertainty simultaneously, i.e. there are reasons to believe that two sources (\(k, k^\prime \)) are correlated, it is preferred to reconsider the separation of the uncertainty sources, see Sect. 4.

Method (ii) does not take into account that the uncertainties on \(\rho _{ijk}\) likely are better known for pairs of estimates from the same experiment, than for pairs of estimates from different experiments, or even obtained at different colliders. Given this, although, a correlated variation for some pairs (i, j) can be well justified, applying this to all pairs is not flexible enough.

Method (iii), although calculated per pair, in reality corresponds to specific \(\rho _{ijk}\) values. Since the variation is done per pair, e.g. (\(i, j\)) or (\(i^\prime ,j\)), where \(i, i^\prime \) are assumed to be estimates from the same experiment and \(j\) from another experiment, this very likely leads to very different assumptions on the correlation for the source \(k\) across experiments. Again, the available knowledge on this can not be respected by this automated procedure.

8 A hypothetical example

The impact of the reduced correlations and the three ways to maximise the variance of the combined result are discussed on the basis of a hypothetical example, motivated by typical estimates occurring in top quark mass measurements. For simplicity, only two estimates and three uncertainty sources are used. The extension to more estimates and uncertainty sources is straight forward.

The two estimates are given in Table 2. They are analysed for four different scenarios in which the assumption on either the correlation, or the size of the uncertainty for one of the sources, is changed one at a time. Using Eq. 19 and calculating \(P(\chi ^2, 1)\), the compatibility of the estimates is assessed for the BLUE method and for all scenarios.Footnote 7 The values obtained with this procedure are \(P(\chi ^2, 1) =0.33, 0.45, 0.18, 0.18\), for scenarios \(\mathcal {A}\), \(\mathcal {B}\), \(\mathcal {C}\), \(\mathcal {D}\).

Table 2 Combinations of two correlated estimates using the BLUE method for different scenarios \(\mathcal {A}-\mathcal {D} \), and using the different methods described in the text

Given the assigned correlations per source and \(z=1.54\), scenario \(\mathcal {A}\) corresponds to a situation where \(\rho =0.78>1/z=0.65\) and consequently \(\beta =-0.22<0\). This situation is visualised in Fig. 5, where the eight sub-figures correspond to Figs. 2, 3, and the black points to the pair of estimates investigated. Consequently, in Fig. 5e, the point is to the right of the peak which sits at \(\beta =0\).

Fig. 5
figure 5

Results for the Blue combination using the hypothetical example from Table 2, scenario \(\mathcal {A}\). The sub-figures (a)–(h) correspond to Figs. 2, 3 for the pair of estimates investigated. The black points represent the actual values of the parameter shown at the given values of \(\rho \) and \(z\). In a also the estimates \(x_\mathrm {1}\) and \(x_\mathrm {2}\), as well as the combined value \(x\), together with their uncertainties, are listed. In each sub-figure three curves are shown in which, for parameters shown as a function of \(\rho \) (or \(z\)), the value of \(z\) (or \(\rho \)) is varied. The curves corresponding to the minimum/central/maximum value of this variation are shown in blue/black/red, and the three values used for \(z\) and \(\rho \) are given in b and d, respectively. For the derivatives of \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) with respect to \(\rho \) and \(z\), for each sub-figure the range of observed parameter values is given. This range is obtained for the three curves shown, while keeping the respective value of the other parameter. As an example in b the range in \(\partial \,\beta /\partial \,\rho \) at \(\rho =0.78\) is quoted observed when changing \(z\) from 1.39 to 1.69. Finally, for \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) their full range is quoted in a and e. This range is obtained using all nine possible pairs of the \(\rho \) and \(z\) values

In Fig. 5, the sensitivity of the combination to variations of \(\rho \) and \(z\) is visualised by the three curves per sub-figure. For a given functional dependence of one of the functions, e.g. \(\beta (\rho )\), they show the sensitivity to the respective other parameter, here \(z\), using the actual value (black dashed line) and two changed values (coloured full lines). For the pair of changed values, either \(z\) is multiplied by \(0.9\) (blue line) or \(1.1\) (red line), Fig. 5a, b, e, f, or \(\rho \) is changed by \(\pm 0.1\), Fig. 5c, d, g, h. This indicates the impact of \(10\,\%\) uncertainties on their respective initial values. The figure shows that the dependence of \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) and their derivatives on one of the parameters strongly depends on the value the respective other parameter has. As an example, the sensitivity to \(\rho \) of the derivative of \(\beta \) with respect to \(z\), visualised by the spread of the three lines in Fig. 5d, varies strongly with \(z\). For the chosen example it is smallest close to the black point, i.e. to the actual pair of values of \(\rho \) and \(z\). For two estimates with \(z=1.1\) the sensitivity to \(\rho \) would be much larger. In contrast, for the derivative of \(\sigma _{x}\)/\(\sigma _{1}\) with respect to \(z\), Fig. 5h, the chosen point in phase space lies close to the region with the largest spread of the curves, signalling a large \(\rho \) dependence. The quoted derivatives of \(\sigma _{x}\)/\(\sigma _{1}\) in Fig. 5f, h, show that a \(10\,\%\) change in \(\rho \) has a much larger impact on the uncertainty of the combined value than the corresponding change in \(z\), which means that for this particular case, it is more important to correctly determine \(\rho \) rather than \(z\). The values of all parameters and for all scenarios investigated are listed in Table 2.

Given the initial correlation assumptions, the reduced correlations act on both systematic uncertainties and yield \(\rho = 0.44\). As a result of this strong reduction of the correlation, the resulting value of \(\sigma _{x}\)/\(\sigma _{1}\) is lower than for the initially assigned correlations, see Table 2. Because there is a non zero uncorrelated component to the uncertainty for both estimates, the reduced correlations can not switch off \(x_\mathrm {2}\) completely, as it would otherwise do, see Eq. 27.

For this example, the three methods for maximising the variance, at the quoted precision, all give the same combined result, which is achieved for \(f = 0.83\), \(f_{k} = 0.34, 1\) (or \(f_{k} = 1, 0.77\)) for \(k=1, 2\), and finally, \(f_{ij} = f = 0.83\), respectively. Consequently, with these algorithms, the second estimate is switched off in different ways, i.e. they all give \(\beta =0\) and \(x=x_\mathrm {1} \), as it would be the case if estimates with negative weights would be ignored.

For scenario \(\mathcal {B}\) the systematic uncertainty \(k=1\) is assumed to be uncorrelated rather than fully correlated. By this assumption the correlation is reduced such that the point moves to the left of the peak in Fig. 5e and the BLUE combination results in a positive value for \(\beta \). Given that \(\beta \) is very close to zero the estimate \(x_\mathrm {2}\) would improve \(x_\mathrm {1}\) by less than \(1\,\%\). For the reduced correlations, which now only act on the source \(k=2\), the correlation is further decreased, such that the predicted improvement in precision of \(6\,\%\) is even larger than for scenario \(\mathcal {A}\). In contrast, since the maximisation of the variance is only attempted to the right of the peak in Fig. 5e none of the algorithms (i)–(iii) is proposing any change.

The scenarios \(\mathcal {C}\) and \(\mathcal {D}\) implement the situation in which for \(\rho _{12k} = 1\) for \(k=2\) the difference has been caused by the use of different procedures. Either estimate \(x_\mathrm {1}\) has a ’too crude’ procedure assigned such that not all features of this source are accounted for, and the quoted uncertainty is underestimated, scenario \(\mathcal {C}\), or for estimate \(x_\mathrm {2}\) a ’too generous’ variation was performed such that the quoted uncertainty is overestimated, scenario \(\mathcal {D}\). In these scenarios the BLUE combinations give significant and different improvements. This means it is worth investigating whether the difference in uncertainty is caused by different sensitivities of the estimators used, or by different procedures followed and in the latter case if possible, to harmonise those.

For the reduced correlations, given the assigned identical uncertainties, the source \(k=2\) is not altered, see Sect. 6. Because in addition, the uncertainties for \(k=1\) are much smaller than those for \(k=2\), the method is almost switched off, i.e. \(\rho _\mathrm {red} \approx \rho \). It is worth noticing that the results of the BLUE method for scenarios \(\mathcal {C}\) and \(\mathcal {D}\) are much different from the result of the reduced correlations for scenario \(\mathcal {A}\), exemplifying the different sensitivities to \(\rho \) and \(z\). For the BLUE method, and at the quoted precision, the values of \(\beta \) in \(\mathcal {C}\) and \(\mathcal {D}\) are identical, and much different from the one for scenario A. They also differ strongly from the value obtained by applying the reduced correlations for scenario \(\mathcal {A}\).

Again, since the maximisation of the variance is only attempted for \(\beta <0\), also for scenarios \(\mathcal {C}\) and \(\mathcal {D}\) all algorithms (i)–(iii) are inactive.

9 How to decide on and perform a combination

The proposed procedure is described for the situation of \(m\) estimates of the same observable and fully respects the properties of the estimates given in Sect. 4. The extension to more than one observable is straight forward. As an example, the procedure is applied to the input of the latest combination of \(m_{\mathrm {top}}\) measurements performed at the Tevatron [16]. Based on the initial input and the default assumptions on the correlations, the following questions are addressed:

  1. (I)

    Are the estimates compatible?

  2. (II)

    Which estimates are worth combining?

  3. III)

    What are the consequences of varying \(\rho _{ijk}\)?

  4. (IV)

    What are the consequences of varying \(z_{ijk}\)?

Clearly, the outcome of the combination depends on the initial correlation assignments in Ref. [16] that are kept to obtain the central combined result.

For answering (I), the compatibility is addressed by the \(\chi ^2\) defined in Eq. 19, and calculating \(P(\chi ^2, 1)\). Incompatible sets of estimates should not be combined, instead the reason for this should be searched for. From the 66 \(\chi ^2\) values of the pairwise compatibility tests for the twelve estimates from Ref. [16], 18 are above one, of which one (one) is above two (three), the smallest value being about \(P(\chi ^2, 1) =8\,\%\), resulting in a reasonable distribution of \(\chi ^2\) values. In addition, the global \(\chi ^2\) of the combination, see Eq. 21, amounts to \(\chi ^2 =8.5\) for eleven degrees of freedom yielding a \(\chi ^2\) probability of 0.67.

For answering (II), starting from the most precise estimate \(i\) it is proposed to rank the estimates \(j\ne i\) by their importance. Here, the importance of estimate \(j\) is defined as the potential improvement in the most precise estimate \(i\) by including the estimate \(j\), irrespectively of the existence of any other estimate, calculated using Eq. 11 and identifying \(12 = ij\). The most precise estimate is chosen since it is special in the sense that, if no combination is performed, it represents the best knowledge of the observable, and the aim of any combination is to improve this information. The proposed procedure takes into account the correlation and the relative uncertainty of the two estimates, but is deliberately independent from the existence of all other estimates. This suggestion is motivated by the aim to only include the estimate \(j\) if it on its own significantly improves the most precise estimate of \(x_{T}\), irrespectively of the information contained in other estimates. By construction, this definition is a subjective and not unique choice, and other measures of importance could be taken.

After producing this list, a combination is performed by using the most precise estimate and adding one additional estimate at a time following that list. Finally, setting a threshold for the minimum relative improvement required, it can be decided which estimates to use, and for which it is not worth to perform the difficult task of finding the appropriate variations in \(\rho _{ijk}\) and \(z_{ijk}\) for assessing the stability of the combined result. If a selection of estimates is not attempted and all measurements are retained, the definition of importance is irrelevant.

The details of the hypothetical pairwise combinations are listed in Table 3. Looking at the parameters of the combination it is apparent that the importance of the exact knowledge of \(\rho \) and \(z\) strongly depends on the pair of estimate under consideration. As an example, the derivatives of \(\sigma _{x}\)/\(\sigma _{1}\) with respect to \(\rho \) vary by about a factor of 10–20 in absolute size. In addition, they have different signs, such that for some estimates the uncertainty on the combined result is reduced when reducing the correlation, for others it is instead increased. Using this information it becomes apparent for which estimate the proper assignment of the correlation is most important. The derivatives nicely show the sensitivity around the chosen default assumption. For example for the estimates CDF(I) \(l+j\) and CDF(II) Met, the sensitivity of the combination with the most precise estimate to \(\rho \) is almost twice as large for the former than for the latter, and it is even larger than the one of the most important additional estimate D0(II) \(l+j\). In addition, increasing \(\rho \) for the estimate CDF(I) \(l+j\) would decrease the uncertainty of the combined result, whereas for the estimate CDF(II) Met it would instead be increased.

Table 3 The list of estimates of \(m_{\mathrm {top}}\) from Ref. [16]. The most precise estimate is CDF(II) \(l+j\). The other estimates are listed according to their importance, defined as the achieved improvement of the combined uncertainty with respect to the most precise estimate, obtained by performing pairwise combinations of each estimate with the most precise one. The correlation \(\rho \) and relative uncertainties \(z\) are given together with the two main parameters of the combination, \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) and their derivatives with respect to \(\rho \) and \(z\). Entries quoted as \(0.00\) mean that the absolute value of the actual number was below \(0.005\)

The result of applying the proposed procedure to the input of the latest combination of \(m_{\mathrm {top}}\) measured at the Tevatron [16] is shown in Fig. 6. The first line in Fig. 6 shows the result of the most precise estimate. All following lines report the results of successive combinations after adding the estimate listed to the previously accumulated list. If, as an example, an improvement in the total uncertainty of at least \(1\,\%\) for each individual remaining estimate to be included is desired, only the first five estimates should be combined.

Fig. 6
figure 6

Results of successive combinations according to importance of the estimates of \(m_{\mathrm {top}}\) from [16]. The first line shows the result of the most precise estimate. All following lines report the combined result after adding the estimate listed to the previously accumulated list. Combinations, below the line with its \(m_{\mathrm {top}}\) value given in red never improve the total uncertainty by more than \(1\,\%\)

If the estimates were sorted according to their absolute BLUE weights for the combination based on all estimates, which takes into account the correlations of all estimates (and the fact that the uncertainty is reduced on both sides of \(\sigma _{x}/\sigma _{1} = 1, \beta =0\)), the same five estimates would have been chosen, i.e. the combined result is the same. If instead the estimates were sorted by their inverse variance 1/\(\sigma _{i}^{2}\), which deliberately ignores all correlations and weights the estimates as if \(\rho _{ij} = 0\), a slightly different list would be used. In the latter case, as can be seen from the values of \(z\) reported in Table 3, the estimate CDF(I) \(l+j\) would not be used, but D0(II) dil would be used instead, despite the fact that looking at \(\sigma _{x}\)/\(\sigma _{1}\) its impact is much smaller, demonstrating the large importance of the correlation.

When using the proposed method, the corresponding result of \(m_{\mathrm {top}}\) is shown in red. The BLUE weights of the five estimates in the order they appear in Fig. 6 are: \(0.61, 0.23, -0.06, 0.12, 0.10\). No combination below this line improves the total uncertainty by more than \(1\,\%\). The situation for the pair containing the most precise estimate and the one with the negative BLUE weight is shown in Fig. 7. For all sub-figures and all coordinate axes Figs. 5 and 7 are drawn using identical ranges. Compared to Fig. 5, there is a very flat behaviour around the point representing this pair of estimates, but for \(\sigma _{x}\)/\(\sigma _{1}\) and its derivative of \(\sigma _{x}\)/\(\sigma _{1}\) with respect to \(\rho \), Fig. 7e, f.

Fig. 7
figure 7

Same as Fig. 5 but for the pair containing the most precise estimate, and the one with the negative BLUE weight, for the \(m_{\mathrm {top}}\) combination using input from [16], see Table 3

After performing the selection, the combination of all selected estimates is performed to determine the central value and the breakdown of uncertainties. The compatibility of the pairs of selected estimates is improved, only two \(\chi ^2\) values exceed one, and the smallest \(P(\chi ^2, 1)\) value is about 19 \(\%\). For the selected estimates the total \(\chi ^2\) amounts to \(\chi ^2 =2.5\) for four degrees of freedom yielding a very similar \(\chi ^2\) probability as for the full set of estimates of 0.65. By construction, the result of the combination is very close to the one based on all estimates. Only little information is lost, but it is much more clear which estimates contain the information, and the investigation of the stability of the result is more simple.

As said above, the values of \(\rho _{12k}\) and \(z_{12k}\) are only known with some uncertainties. The task is to evaluate the consequences of this for the combined value. Looking at the figures of pairwise combinations like Fig. 7, or the values listed in Table 3, the most critical pairs and parameters can easily be identified. To assess the stability of the combined result, individual uncertainty sources have to be investigated for possible variations of \(\rho _{ijk}\) and \(z_{ijk}\). This should be done in view of the details of the procedures applied, and it should be decided whether a variation in \(\rho _{ijk}\) or \(z_{ijk}\) is the appropriate choice.

To investigate (III) independent variations per source \(k\) are performed in which \(\rho _{ijk}\) is varied within a range determined by analysing the procedures used for the estimates. This is performed by multiplying the initially assigned correlation by a factor \(r\), using the range \(r = 1 \rightarrow r_\mathrm {min} \), and investigating the difference in the uncertainty of the combined result. If found appropriate, the observed differences in the combined values could be added quadratically to the uncertainty of the combined result to account for the uncertainties in the assigned correlations.

Since the detailed information on reasonable variations of the initially assigned correlations is only available to the experiments that actually determined the estimates, for the example presented, the full range of \(r = 1 \rightarrow 0\) has been used for all sources that remain correlated after the selection of estimates, which likely is an overestimation of the effect. For this example all variations lead to an increase of the combined value \(x\). The square root of the quadratic sum of the differences between the combined value of the default assignments and the ones obtained with the changed assumption on the correlation for all sources \(k\) amounts to \(0.26\) \(\mathrm {GeV}\). This number is dominated by a single source that contributes with \(0.23\) \(\mathrm {GeV}\). Given this, a simultaneous variation of the correlation assumption of all sources would result in an only slightly larger value of \(0.29\) \(\mathrm {GeV}\). However, this evaluation is disfavoured, since it violates property (3) of the estimates. In addition, the individual variations also reveal which sources are the important ones for the stability.

Given these variations, in principle the list of importance for the estimates may differ from the initial one. If, as it is usually done, the above variations are only used as stability checks, and no additional uncertainty is assigned, this is of very little concern. This is the case for many combinations including the example presented, see e.g. Refs. [11, 12, 16]. If an additional uncertainty is assigned, one may want to perform the selection iteratively or refrain from selecting estimates. Again, for the example presented, this is of minor numerical importance. When using the recommended individual variations, the first six estimates of the list of importance are always the same for the full range of \(r = 1 \rightarrow 0\). Only for two sources and for correlations below \(0.6\) and \(0.4\) respectively, one of the first five estimates is exchanged with the sixth one.

To investigate (IV) an indicative procedure is to assume identical values \(\sigma _{ik} =\sigma _{jk} \) for pairs of estimates and to repeat the combination. If this test results in large variations, it is advisable to understand whether the difference of \(\sigma _{ik}\) and \(\sigma _{jk}\) is due to different sensitivities of the estimators, or caused by different procedures followed in determining the uncertainties. In the latter case one should try to harmonise the procedures. For a numerical example of such a situation see Table 2. Investigating the procedures in detail, likely smaller variations of \(\sigma _{ik} \) turn out to be appropriate. Since this information is only available to the experiments that actually determined the estimates, for the example presented, this has not been investigated here. Depending on the details of the situation this can easily be more important than variations of \(\rho \), as can been seen from the example of the hadronisation uncertainty for the LHC \(m_{\mathrm {top}}\) combination [11].

10 Summary and conclusions

In this paper the combination of correlated estimates has been reviewed using the Best Linear Unbiased Estimate (BLUE) method, mainly concentrating on the special case of two estimates of the same observable.

It has been shown that the underlying conditional probability inevitably leads to the fact that for positively correlated estimators, for a given pair of estimates to be combined, in most of the cases the true value is not within the interval spanned by the estimates. This fact should be respected by any combination method. All combination methods deliberately constructed to force the combined value to always lie within the interval spanned by the estimates, violate this consequence of the conditional probability, and are wrong by construction. These methods will lead to worse results than the BLUE method that achieves this predicted behaviour by means of negative weights, which occur if they reduce the variance of its unbiased result. This situation is realised if the mean of the conditional probability of the less precise estimator is further away from the true value than the more precise estimate. This is the case whenever the correlation of the estimates \(\rho \) is larger than \(1/z\), the ratio of the smaller and the larger uncertainty.

For any pair of estimates, their combination is fully determined by the values of \(\rho \) and \(z\), which determine the main parameters of the combination, namely the weight of the less precise estimate \(\beta \), and the ratio of the uncertainty of the combined results and the more precise estimate \(\sigma _{x}\)/\(\sigma _{1}\). However, \(\rho \) and \(z\) themselves are typically only known with some uncertainty. Therefore, for visualising the sensitivity of the central result to these uncertainties, derivatives of \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) were derived with respect to \(\rho \) and \(z\). The derivatives can be used to identify the sources of estimates and uncertainties for which the knowledge on \(\rho \) and \(z\) is most critical.

The differences observed when using either relative or absolute uncertainties in the BLUE combination have been investigated, including a simulation of Peelle’s Pertinent Puzzle. It has been found that the apparent difference observed for the original formulation of the puzzle, i.e. for a single pair of estimates, is mainly a reflection of the unlikeliness of this pair of estimates. When instead combining numerous pairs of estimates based on a number of hypothetical underlying probability distributions that cover the full range of combined results observed for the original version of the puzzle, and both uncertainty models, the differences of the two methods are insignificant. The same holds true for a number of specific examples of publicly available combinations.

A critical assessment of methods proposed to deal with the uncertainty on the correlations has been given. Especially, it has been argued that reduced correlations mix \(\rho \) and \(z\) and act in an unphysical way. Other methods constructed to maximise the variance of the combined result are too general, do not respect all properties of the estimates, and do not reflect the different knowledge on the correlations that likely is available for estimates of the same experiment, or those obtained at the same collider compared to those from different experiments and/or colliders. For all other methods discussed, the uncertainty in the knowledge on the relative size of the uncertainties per source \(k\), is ignored throughout, however, this can be numerically much more important.

A detailed proposal for a procedure to combine a number of estimates and to evaluate the stability of the result has been made. It has been argued that the decision on including a given estimate into the combination should be based on its potential improvement with respect to the most precise estimate, i.e. on the relative gain of uncertainty of the combined value with respect to the most precise one for hypothetical pairwise combinations, irrespectively of the existence of other estimates. The most precise estimate is chosen since it is special in the sense that, if no combination is performed, it represents the best knowledge of the observable, and the aim of any combination is to improve this information. It is proposed to only include other estimates if they significantly improve on the most precise one. By construction, this definition is a subjective and not unique choice, and other measures of importance could be taken, or no selection could be performed.

In any case, the stability of the result should be assessed source by source in view of the uncertainty on the knowledge on \(\rho _{ijk}\) and \(z_{ijk}\), while respecting the properties of the estimates. Given the different dependence of the two parameters \(\beta \) and \(\sigma _{x}\)/\(\sigma _{1}\) of the pairwise combination on \(\rho \) and \(z\), it is advisable to assess the impact on a case by case basis performing appropriate changes in \(\rho _{ijk}\) or \(z_{ijk}\). A freely available software package to perform these investigations has been written.

Finally, all ways to assess the uncertainty on the combined result by variations of the \(\rho _{ijk}\) and \(z_{ijk}\) are only indicative of possible sensitivities. If large sensitivities occur, a better understanding and possibly harmonisation of the input, and ways to calculate, rather than postulate the correlations as is frequently done, are much preferred.