Modeling theoretical uncertainties in phenomenological analyses for particle physics
 431 Downloads
 2 Citations
Abstract
The determination of the fundamental parameters of the Standard Model (and its extensions) is often limited by the presence of statistical and theoretical uncertainties. We present several models for the latter uncertainties (random, nuisance, external) in the frequentist framework, and we derive the corresponding p values. In the case of the nuisance approach where theoretical uncertainties are modeled as biases, we highlight the important, but arbitrary, issue of the range of variation chosen for the bias parameters. We introduce the concept of adaptive p value, which is obtained by adjusting the range of variation for the bias according to the significance considered, and which allows us to tackle metrology and exclusion tests with a single and welldefined unified tool, which exhibits interesting frequentist properties. We discuss how the determination of fundamental parameters is impacted by the model chosen for theoretical uncertainties, illustrating several issues with examples from quark flavor physics.
1 Introduction
In particle physics, an important part of the data analysis is devoted to the interpretation of the data with respect to the Standard Model (SM) or some of its extensions, with the aim of comparing different alternative models or determining the fundamental parameters of a given underlying theory [1, 2, 3]. In this activity, the role played by uncertainties is essential, since they constitute the limit for the accurate determination of these parameters, and they can prevent from reaching a definite conclusion when comparing several alternative models. In some cases, these uncertainties are from a statistical origin: they are related to the intrinsic variability of the phenomena observed, they decrease as the sample size increases and they can be modeled using random variables. A large part of the experimental uncertainties belong to this first category. However, another kind of uncertainties occurs when one wants to describe inherent limitations of the analysis process, for instance, uncertainties in the calibration or limits of the models used in the analysis. These uncertainties are very often encountered in theoretical computations, for instance when assessing the size of higher orders in perturbation theory or the validity of extrapolation formulas. Such uncertainties are often called “systematics”, but they should be distinguished from less dangerous sources of systematic uncertainties, usually of experimental origin, that roughly scale with the size of the statistical sample and may be reasonably modeled by random variables [4]. In the following we will thus call them “theoretical” uncertainties: by construction, they lack both an unambiguous definition (leading to various recipes to determine these uncertainties) and a clear interpretation (beyond the fact that they are not from a statistical origin). It is thus a complicated issue to incorporate their effect properly, even in simple situations often encountered in particle physics [5, 6, 7].^{1}
The relative importance of statistical and theoretical uncertainties might be different depending on the problem considered, and the progress made both by experimentalists and theorists. For instance, statistical uncertainties are the main issue in the analysis of electroweak precision observables [11, 12]. On the other hand, in the field of quark flavor physics, theoretical uncertainties play a very important role. Thanks to the Bfactories and LHCb, many hadronic processes have been very accurately measured [13, 14], which can provide stringent constraints on the Cabibbo–Kobayashi–Maskawa matrix (in the Standard Model) [15, 16, 17], and on the scale and structure of New Physics (in SM extensions) [18, 19, 20, 21]. However, the translation between hadronic processes and quarklevel transitions requires information on hadronization from strong interaction, encoded in decay constants, form factors, bag parameters... The latter are determined through lattice QCD simulations. The remarkable progress in computing power and in algorithms over the last 20 years has led to a decrease of statistical uncertainties and a dominance of purely theoretical uncertainties (chiral and heavyquark extrapolations, scale chosen to set the lattice spacing, finitevolume effects, continuum limit...). As an illustration, the determination of the Wolfenstein parameters of the CKM matrix involves many constraints which are now limited by theoretical uncertainties (neutralmeson mixing, leptonic and semileptonic decays, ...) [22].
The purpose of this note is to discuss theoretical uncertainties in more detail in the context of particle physics phenomenology, comparing different models not only from a statistical point of view, but also in relation with the problems encountered in phenomenological analyses where they play a significant role. In Sect. 2, we summarize fundamental notions of statistics used in particle physics, in particular p values and test statistics. In Sect. 3, we list properties that we seek in a good approach for theoretical uncertainties. In Sect. 4, we propose several approaches and in Sect. 5, we compare their properties in the most simple onedimensional case. In Sect. 6, we consider multidimensional cases (propagation of theoretical uncertainties, average of several measurements, fits and pulls), which we illustrate using flavor physics examples related to the determination of the CKM matrix in Sect. 7, before concluding. An appendix is devoted to several issues connected with the treatment of correlations.
2 Statistics concepts for particle physics
We start by briefly recalling frequentist concepts used in particle physics, highlighting the role played by p values in hypothesis testing and how they can be used to define confidence intervals.
2.1 p values
2.1.1 Data fitting and data reduction
The latter choice of expressing the experimental likelihood in terms of modeldependent parameters such as \(\beta \) has, however, one technical drawback: the full statistical analysis has to be performed for each model one wants to investigate, e.g., the Standard Model, the Minimal Supersymmetric Standard Model, GUT models, ... In addition, building a statistical analysis directly on the initial likelihood requires one to deal with a very large parameter space, depending on the parameters in \(\theta \) that are needed to describe the detector response. One common solution to these technical difficulties is a twostep approach. In the first step, the data are reduced to a set of model and detectorindependent^{2} random variables that contains the same information as the original likelihood (to a good approximation): in our example the likelihoodbased estimators \(\hat{C}\) and \(\hat{S}\) of the parameters C and S can play the role of such variables (estimators are functions of the data and thus are random variables). In a second step, one can work in a particular model, e.g., in the Standard Model, to use \(\hat{C}\) and \(\hat{S}\) as inputs to a statistical analysis of the parameter \(\beta \). This twostep procedure gives the same result as if the analysis were done in a single step through the expression of the original likelihood in terms of \(\beta \). This technique is usually chosen if the PDF g of the estimators \(\hat{C}\) and \(\hat{S}\) can be parameterized in a simple way: for example, if the sample size is sufficiently large, then the PDF can often be modeled by a multivariate normal distribution, where the covariance matrix is approximately independent of the mean vector.
Let us now extend the above discussion to a more general case. A sample of random events is \(\{E_i,i=1\ldots n\}\), where each event corresponds to a set of directly measurable quantities (particle energies and momenta, interaction vertices, decay times...). The distribution of these events is described by a PDF, the functional form f of which is supposed to be known. In addition to the event value E, the PDF value depends on some fixed parameters \({\theta }\), hence the notation \(f(E;\theta )\). The likelihood for the sample \(\{E_i\}\) is defined by \( \mathcal L_{\{E_i\}}(\theta ) = \prod _{i=1}^n f(E_i;\theta ). \) We want to interpret the event observation in a given phenomenological scenario that predicts at least some of the parameters \(\theta \) describing the PDF in terms of a set of more fundamental parameters \(\chi \).
To this aim we first reduce the event observation to a set of model and detectorindependent random variables X together with a PDF \(g(X;\chi )\), in such a way that the information that one can get on \(\chi \) from g is equivalent to the information one can get from f, once \(\theta \) is expressed in terms of \(\chi \) consistently with the phenomenological model of interest. Technically, it amounts to identifying a minimal set of variables x depending on \(\theta \) that are independent of both the experimental context and the phenomenological model. One performs an analysis on the sample of events \({E_i}\) to derive estimators \(\hat{x}\) for x. The distribution of these estimators can be described in terms of a PDF that is written in the \(\chi \) parametrization as \(g(X;\chi )\), where we have replaced \(\hat{x}\) by the notation X, to stress that in the following X will be considered as a new random variable, setting aside how it has been constructed from the original data \(\{E_i\}\). Obviously, in our previous example for \(B^0(t)\) \(\rightarrow J/\psi K_S\), \(\{t_i\}\) correspond to \(\{E_i\}\), C ans S to x, and \(\beta \) to \(\chi \).
2.1.2 Model fitting
Through the simple change of variable \(\frac{\mathrm{d}p}{\mathrm{d}T}\frac{\mathrm{d}\mathcal P}{\mathrm{d}p}=\frac{\mathrm{d}\mathcal P}{\mathrm{d}T}\), one obtains that the null distribution (that is, the distribution when the null hypothesis is true) of a p value is uniform, i.e., the distribution of values of the p value is flat between 0 and 1. This uniformity is a fundamental property of p values that is at the core of their various interpretations (hypothesis comparison, determination of confidence intervals...) [1, 2].
In the frequentist approach, one wants to design a procedure to decide whether to accept or reject the null hypothesis \({\mathcal H}_\chi \), by avoiding as much as possible either incorrectly rejecting the null hypothesis (TypeI error) or incorrectly accepting it (TypeII error). The standard frequentist procedure consists in selecting a TypeI error \(\alpha \) and determining a region of sample space that has the probability \(\alpha \) of containing the data under the null hypothesis. If the data fall in this critical region, the hypothesis is rejected. This must be performed before data are known (in contrast to other interpretations, e.g, Fischer’s approach of significance testing [1]). In the simplest case, the critical region is defined by a condition of the form \(T\ge t_\alpha \), where \(t_\alpha \) is a function of \(\alpha \) only, which can be rephrased in terms of p value as \(p\le \alpha \). The interest of the frequentist approach depends therefore on the ability to design p values assessing the rate of TypeI error correctly (its understatement is clearly not desirable, but its overstatement yields often a reduction in the ability to determine the truth of an alternative hypothesis), as well as avoiding too large a TypeII error rate.
A major difficulty arises when the hypothesis to be tested is composite. In the case of numerical hypotheses like (3), one gets compositeness when one is only interested in a subset \(\mu \) of the parameters \(\chi \). The remaining parameters are called nuisance parameters ^{3} and will be denoted by \(\nu \), thus \(\chi =(\mu ,\nu )\). In this case the hypothesis \({\mathcal H}_\mu : \mu _t=\mu \) is composite, because determining the distribution of the observables requires the knowledge of the true value \(\nu _t\) in addition to \(\mu \). In this situation, one has to devise a procedure to infer a “p value” for \({\mathcal H}_\mu \) out of p values built for the simple hypotheses where both \(\mu \) and \(\nu \) are fixed. Therefore, in contrast to a simple hypothesis, a composite hypothesis does not allow one to compute the distribution of the data.^{4}
Once p values are defined, one can build confidence intervals out of them by using the correspondence between acceptance regions of tests and confidence sets. Indeed, if we have an exact p value, and the critical region \(C_\alpha (X)\) is defined as the region where \(p(X;\mu )<\alpha \), the complement of this region turns out to be a confidence set of level \(1\alpha \), i.e., \(P[\mu \notin C_\alpha (X)]= 1\alpha \). This justifies the general use of plotting the p value as a function of \(\mu \), and reading the 68 or 95% CL intervals by looking at the ranges where the p value curve is above 0.32 or 0.05. This is illustrated for a simple example in Figs. 2 and 3. Once again, this discussion is affected by issues of compositeness and nuisance parameters, as well as the requirement of checking the coverage of the p value used to define these confidence intervals: an overcovering p value will yield too large confidence intervals, which will prove indeed conservative.
A few words about the notation and the vocabulary are in order at this stage. A p value necessarily refers to a null hypothesis, and when the null hypothesis is purely numerical such as (3) we can consider the p value as a mathematical function of the fundamental parameter \(\mu \). This of course does not imply that \(\mu \) is a random variable (in frequentist statistics, it is always a fixed, but unknown, number). When the p value as a function of \(\mu \) can be described in a simple way by a few parameters, we will often use the notation \(\mu =\mu _0\pm \sigma _\mu \). In this case, one can easily build the p value and derive any desired confidence interval. Even though this notation is similar to the measurement of an observable, we stress that this does not mean that the fundamental parameter \(\mu \) is a random variable, and it should not be seen as the definition of a PDF. In line with this discussion, we will call uncertainties the parameters like \(\sigma \) that can be given a frequentist meaning, e.g., they can be used to define the PDF of a random variable. On the other hand, we will call errors the intermediate quantities such as \(\sigma _\mu \) that can be used to describe the p value of a fundamental parameter, but cannot be given a statistical meaning for this parameter.
2.2 Likelihoodratio test statistic
For the problems considered here, the MLR choice features alluring properties, and in the following we will use test statistics that are derived from this choice. First, if \(g(X;\chi _t)\) is a multidimensional Gaussian function, then the quantity \(2\ln {\mathcal L}_X(\chi _t)\) is the sum of the squares of standard normal random variables, i.e., is distributed as a \(\chi ^2\) with a number of degrees of freedom (\(N_\mathrm{dof}\)) that is given by \(\mathrm{dim}(X)\). Secondly, for linear models, in which the observables X depend linearly on the parameters \(\chi _t\), the MLR Eq. (11) is again a sum of standard normal random variables, and is distributed as a \(\chi ^2\) with \(N_\mathrm{dof}=\mathrm{dimension}(\mu )\). Wilks’ theorem [28] states that this property can be extended to nonGaussian cases in the asymptotic limit: under regularity conditions and when the sample size tends to infinity, the distribution of Eq. (11) will converge to the same \(\chi ^2\) distribution depending only on the number of parameters tested.
3 Comparing approaches to theoretical uncertainties
Summary table of various approaches to theoretical uncertainties considered in the text
Approach  Random\(\delta \)  Nuisance\(\delta \)  External\(\delta \) 

Hypothesis  Random var.  Composite hyp.  Family of simple hyp. 
\(\mathrm{PDF}_\varDelta (\delta )\)  \({\mathcal H}_\mu :\mu _t=\mu \)  \({\mathcal H}^{(\delta )}_\mu :\mu _t=\mu +\delta \)  
Test  Likelihood ratio  Quadratic  Quadratic 
Constraint on \(\delta \)  –  \(\varOmega \)  \(\varOmega \) 
Associativity  Yes if normal \(\mathrm{PDF}\)  Yes if \(\varOmega \) hyperball  Yes if \(\varOmega \) hyperball 
Splitting of errors  Yes if normal \(\mathrm{PDF}\)  Yes for all \(\varOmega \)  Yes for all \(\varOmega \) 
Stationarity  Yes  Yes  Yes 
Simple asympt. lim.  Yes if normal \(\mathrm{PDF}\)  Yes  Yes 
Simple \(\sigma \rightarrow 0\) limit  Depends on \(\mathrm{PDF}\)  \(\varOmega \)  \(\varOmega \) 
Particular cases  Naive Gaussian  Fixed/adaptive nuis.  Scan 
If we take  Normal PDF  Fixed/adaptive \(\varOmega \)  Sup over fixed \(\varOmega \) 

one can (contrarily to what has just been said) treat the theoretical uncertainty on the same footing as a statistical uncertainty; in this case, in order to follow a meaningful frequentist procedure, one has to assume that one lives in a world where the repeated calculation of a given quantity leads to a distribution of values around the exact one, with some variability that can be modeled as a PDF (“random\(\delta \) approach”),

one can consider that theoretical uncertainties can be modeled as external parameters, and perform a purely statistical analysis for each point in the theoretical uncertainty parameter space; this leads to an infinite collection of p values that will have to be combined in some arbitrary way, following a model averaging procedure (“external\(\delta \) approach”),

one can take the theoretical uncertainties as fixed asymptotic biases,^{7} treating them as nuisance parameters that have to be varied in a reasonable region (“nuisance\(\delta \) approach”).

as general as possible, i.e., apply to as many “kinds” of theoretical uncertainties as possible (lattice uncertainties, scale uncertainties) and as many types of physical models as possible,

leading to meaningful confidence intervals, in reasonable limit cases: obviously, in the absence of theoretical uncertainties, one must recover the standard result; one may also consider the type of constraint obtained in the absence of statistical uncertainties,

exhibiting good coverage properties, as it benchmarks the quality of the statistical approach: the comparison of different models provides interesting information but does not shed light on their respective coverage,

associated with a statistically meaningful goodnessoffit,

featuring reasonable asymptotic properties (large samples),

yielding the errors as a function of the estimates easily (error propagation), in particular by disentangling the impact of theoretical and statistical contributions,

leading to a reasonable procedure to average independent estimates – if possible, it should be equivalent for any analysis to include the independent estimates separately or the average alone (associativity). In addition, one may wonder whether the averaging procedure should be conservative or aggressive (i.e., the average of similar theoretical uncertainties should have a smaller uncertainty or not), and if the procedure should be stationary (the uncertainty of an average should be independent of the central values or not),

leading to reasonable results in the case of averages of inconsistent measurements.
We summarize some of the points mentioned above in Table 1. As it will be seen, it will, however, prove challenging to fulfill all these criteria at the same time, and we will have to make compromises along the way.
4 Illustration of the approaches in the onedimensional case
4.1 Situation of the problem

Take a model corresponding to the interpretation of \(\delta \): random variable, external parameter, fixed bias as a nuisance parameter...

Choose a test statistic \(T(X;\mu )\) that is consistent with the model and that discriminates the null hypothesis: Rfit, quadratic, other...

Compute, consistently with the model, the p value, which is in general a function of \(\mu \) and \(\delta \).

Eliminate the dependence with respect to \(\delta \) by some welldefined procedure.

Exploit the resulting p value (coverage, confidence intervals, goodnessoffit).
4.2 The random\(\delta \) approach
In the random\(\delta \) approach, \(\delta \) would be related to the variability of theoretical computations, which one can model with some PDF for \(\delta \), such as \({\mathcal N}_{(0,\varDelta )}\) (normal) or \({\mathcal U}_{(\varDelta ,+\varDelta )}\) (uniform). The natural candidate for the test statistic \(T(X;\mu )\) is the MLR built from the PDF. One considers a model where \(X=s+\delta \) is the sum of two random variables, s being distributed as a Gaussian of mean \(\mu \) and width \(\sigma \), and \(\delta \) as an additional random variable with a distribution depending on \(\varDelta \).
4.3 The nuisance\(\delta \) approach
In the nuisance approach, \(\delta \) is not interpreted as a random variable but as a fixed parameter so that in the limit of an infinite sample size, the estimator does not converge to the true value \(\mu _t\), but to \(\mu _t+\delta \). The distinction between statistical and theoretical uncertainties is thus related to their effect as the sample size increases, statistical uncertainties decreasing while theoretical uncertainties remaining of the same size (see Refs. [29, 30, 31] for other illustrations in the context of particle physics). One works with the null hypothesis \(\mathcal {H}_{\mu }: \mu _t=\mu \), and one has then to determine which test statistic is to be built.
 If one wants to keep it fixed, \(\varOmega _r=r[\varDelta ,\varDelta ]\):One may wonder what the best choice is for r, as the p value gets very large if one works with the reasonable \(r=3\), while the choice \(r=1\) may appear as nonconservative. We will call this treatment the fixed rnuisance approach.$$\begin{aligned} p_\mathrm{fixed\ \varOmega _r}=\mathrm{Max}_{\delta \in \varOmega _r}[1 \mathrm {CDF}_\delta (\mu )]. \end{aligned}$$(26)
 One can then wonder whether one would like to let \(\varOmega \) depend on the value considered for p. In other words, if we are looking at a \(k\,\sigma \) range, we could consider the equivalent range for \(\delta \). This would correspond towhere \(k_\sigma (p )\) is the “number of sigma” corresponding to p$$\begin{aligned} p_\mathrm{adapt\ \varOmega }=\mathrm{Max}_{\delta \in \varOmega _{k_\sigma ( p)}}[1 \mathrm {CDF}_\delta (\mu )] \end{aligned}$$(27)where the function Prob has been defined in Eq. (12). We will call this treatment the adaptive nuisance approach. The correct interpretation of this p value is: p is a valid p value if the true (unknown) value of \(\delta /\varDelta \) belongs to the “would be” \(1p\) confidence interval around 0. This is not a standard coverage criterion: one can use adaptive coverage, and adaptively valid p value, to name this new concept. Note that Eqs. (27), (28) constitute a nonalgebraic implicit equation that has to be solved by numerical means.$$\begin{aligned} k_\sigma (p )^2=\mathrm{Prob}^{1}(p,N_\mathrm{dof}=1), \end{aligned}$$(28)
4.4 The external\(\delta \) approach
5 Comparison of the methods in the onedimensional case

the random\(\delta \) approach with a Gaussian random variable, or naive Gaussian (nG), see Sect. 4.2,

the nuisance\(\delta \) approach with quadratic statistic and fixed range, or fixed nuisance, see Sect. 4.3,

the nuisance\(\delta \) approach with quadratic statistic and adaptive range, or adaptive nuisance, see Sect. 4.3,

the external\(\delta \) approach with quadratic statistic and fixed range, equivalent to the Rfit approach in one dimension; see Sect. 4.4.
5.1 p values and confidence intervals

By construction, nG always provides the same errors whatever the relative proportion of theoretical and statistical uncertainties, and all the approaches provide the same answer in the limit of no theoretical uncertainty \(\varDelta =0\).

By construction, for a given \(n\sigma \) confidence level, the interval provided by the adaptive nuisance approach is identical to the one obtained using the fixed nuisance approach with a \([n,n]\) interval. This explains why the adaptive nuisance approach yields identical results to the fixed 1nuisance approach at 1\(\sigma \) (and similarly for the fixed 3nuisance approach at 3\(\sigma \)). The corresponding curves cannot be distinguished on the upper and central panels of Fig. 5.

The adaptive nuisance approach is numerically quite close to the nG method; the maximum difference occurs for \(\varDelta /\sigma =1\) (up to 40% larger error size for 5\(\sigma \) intervals).

The p value from the fixednuisance approach has a very wide plateau if one works with the ‘reasonable’ range \([3\varDelta ,+3\varDelta ]\), while the choice of \([\varDelta ,+\varDelta ]\) might be considered as nonconservative.

The 1external and fixed 1nuisance approaches are close to each other and less conservative than the adaptive approach, which is expected, but also than nG, for confidence intervals at 3 or 5\(\sigma \) when theory uncertainties dominate.

When dominated by theoretical uncertainties (\(\varDelta /\sigma \) large), all approaches provide 3 and 5\(\sigma \) errors smaller than the nG approach, apart from the adaptive nuisance approach.
Comparison of the size of onedimensional confidence intervals at \(1,3,5\sigma \) for various methods and various values of \(\varDelta /\sigma \)
nG  1nuisance  Adaptive nuisance  1external  

\(\varDelta /\sigma =0.3\)  
\(1\sigma \)  1.0  1.0  1.0  1.2 
\(3\sigma \)  3.0  3.0  3.5  3.2 
\(5\sigma \)  5.0  5.0  6.1  5.1 
\(\varDelta /\sigma =1\)  
\(1\sigma \)  1.0  1.1  1.1  1.4 
\(3\sigma \)  3.0  2.7  4.1  2.8 
\(5\sigma \)  5.0  4.1  7.0  4.2 
\(\varDelta /\sigma =3\)  
\(1\sigma \)  1.0  1.1  1.1  1.3 
\(3\sigma \)  3.0  1.8  3.7  1.9 
\(5\sigma \)  5.0  2.5  6.3  2.5 
\(\varDelta /\sigma =10\)  
\(1\sigma \)  1.0  1.0  1.0  1.1 
\(3\sigma \)  3.0  1.3  3.3  1.3 
\(5\sigma \)  5.0  1.5  5.5  1.5 
5.2 Significance thresholds
Comparison of 1D \(1,3,5\sigma \) significance thresholds for \(\varDelta /\sigma =1\). For instance, the first line should read: if with nG a p value = 1\(\sigma \) is found, then the corresponding values for the three other methods are 0.9/1.0/0.4\(\sigma \). \(\infty \) means that the corresponding p value was numerically zero (corresponding to more than 8\(\sigma \))
nG  1nuisance  Adaptive nuisance  1external  

1\(\sigma \) signif. threshold  
nG  1  0.9  1.0  0.4 
1nuisance  1.1  1  1.0  0.5 
Adaptive nuisance  1.1  1.0  1  0.5 
1external  1.4  1.4  1.2  1 
3\(\sigma \) signif. threshold  
nG  3  3.4  2.3  3.2 
1nuisance  2.7  3  2.0  2.8 
Adaptive nuisance  4.1  4.9  3  4.8 
1external  2.8  3.2  2.1  3 
5\(\sigma \) signif. threshold  
nG  5  6.2  3.6  6.1 
1nuisance  4.1  5  3.0  4.9 
Adaptive nuisance  7.0  \(\infty \)  5  \(\infty \) 
1external  4.2  5.1  3.1  5 
Comparison of 1D \(1,3,5\sigma \) significance thresholds for \(\varDelta /\sigma =3\). Same comments as in the previous table
nG  1nuisance  Adaptive nuisance  1external  

1\(\sigma \) signif. threshold  
nG  1  0.8  0.9  0.2 
1nuisance  1.1  1  1.0  0.5 
Adaptive nuisance  1.1  1.0  1  0.5 
1external  1.3  1.4  1.1  1 
3\(\sigma \) signif. threshold  
nG  3  6.6  2.4  6.5 
1nuisance  1.8  3  1.5  2.8 
Adaptive nuisance  3.7  \(\infty \)  3  \(\infty \) 
1external  1.9  3.2  1.6  3 
5\(\sigma \) signif. threshold  
nG  5  \(\infty \)  4.0  \(\infty \) 
1nuisance  2.5  5  2.0  4.9 
Adaptive nuisance  6.3  \(\infty \)  5  \(\infty \) 
1external  2.5  5.1  2.1  5 
In agreement with the previous discussion, we see that fixed 1nuisance and 1external yield similar results for 3 and 5\(\sigma \), independently of the relative size of statistical and theoretical effects. Moreover, they are prompter to claim a tension than nG, the most conservative method in this respect being the adaptive nuisance approach.
5.3 Coverage properties
As indicated in Sect. 2.1.2, p values are interesting objects if they cover exactly or slightly overcover in the domain where they should be used corresponding to a given significance; see Eqs. (7)–(9). If coverage can be ensured for a simple hypothesis [1, 2], this property is far from trivial and should be checked explicitly in the case of composite hypotheses, where compositeness comes from nuisance parameters that can be related to theoretical uncertainties, or other parameters of the problem.
Coverage properties of the various methods at 68.27, 95.45 and 99.73% CL, for different true values of \(\delta /\varDelta \) contained in, at the border of, or outside the fixed volume \(\varOmega \), and for various relative sizes of statistical and theoretical uncertainties \(\varDelta /\sigma \)
68.27% CL  95.45% CL  99.73% CL  68.27% CL  95.45% CL  99.73% CL  

\(\varDelta /\sigma =1\), \(\delta /\varDelta =1\)  \(\varDelta /\sigma =1\), \(\delta /\varDelta =0\)  
nG  65.2%  96.6%  99.9%  84.1%  99.5%  100.0% 
1nuisance  68.2%  95.4%  99.7%  86.5%  99.3%  100.0% 
Adaptive nuisance  68.3%  99.6%  100.0%  86.4%  100.0%  100.0% 
1external  83.9%  97.8%  99.9%  95.4%  99.7%  100.0% 
1ext. (excl. \(p\equiv 1\))  69.2%  95.7%  99.8 %  85.5%  99.1%  100.0 % 
\(\varDelta /\sigma =1\), \(\delta /\varDelta =3\)  \(\varDelta /\sigma =3\), \(\delta /\varDelta =0\)  

nG  5.76%  43.2%  89.1%  99.8%  100.0%  100.0% 
1nuisance  6.60%  38.0%  78.4%  100.0%  100.0%  100.0% 
Adaptive nuisance  6.53%  75.4%  99.8%  99.9%  100.0%  100.0% 
1external  16.0%  50.3%  84.2%  100.0%  100.0%  100.0% 
1ext. (excl. \(p\equiv 1\))  14.0%  49.1%  83.8 %  98.5%  100.0%  100.0 % 
\(\varDelta /\sigma =3\), \(\delta /\varDelta =3\)  \(\varDelta /\sigma =3\), \(\delta /\varDelta =1\)  

nG  0.00%  0.35%  68.7%  56.3%  100.0%  100.0% 
1nuisance  0.00%  0.00%  0.07%  68.1%  95.5%  99.7% 
Adaptive nuisance  0.00%  9.60%  99.8%  68.2%  100.0%  100.0% 
1external  0.00%  0.00%  0.13%  84.1%  97.7%  99.9 % 
1ext. (excl. \(p\equiv 1\))  0.00%  0.00%  0.13%  68.2%  95.4%  99.7% 
In order to compare the different situations, we take \(\sigma ^2+\varDelta ^2=1\) for all methods, and compute for each method the coverage fraction (the number of times the confidence level interval includes the true value of the parameter being extracted) for various confidence levels and for various values of \(\varDelta /\sigma \). Note that the coverage depends also on the true value of \(\delta /\varDelta \) (the normalized bias). The results are gathered in Table 5 and Fig. 6. We also indicate the distribution of p values obtained for the different methods.
One notices in particular that the 1external approach has a cluster of values for \(p=1\), which is expected due to the presence of a plateau in the p value. This behavior makes the interpretation of the coverage more difficult, and as a comparison, we also include the results when we consider the same distribution with the \(p=1\) values removed. Indeed one could imagine a situation where reasonable coverage values could only be due to the \(p=1\) clustering, while other values of p would systematically undercover: such a behavior would either yield no constraints or too liberal constraints on the parameters depending on the data.

If \(\varOmega \) is fixed and does not contain the true value of \(\delta /\varDelta \) (“unfortunate” case), both external\(\delta \) and nuisance\(\delta \) approaches lead to undercoverage; the size of the effect depends on the distance of \(\delta /\varDelta \) with respect to \(\varOmega \). This is also the case for nG.

If \(\varOmega \) is fixed and contains the true value of \(\delta /\varDelta \) (“fortunate” case), both the external\(\delta \) and the nuisance\(\delta \) approaches overcover. This is also the case for nG.

If \(\varOmega \) is adaptive, for a fixed true value of \(\delta \), a p value becomes valid if it is sufficiently small so that the corresponding interval contains \(\delta \). Therefore, for the adaptive nuisance\(\delta \) approach, there is always a maximum value of CL above which all p values are conservative; this maximum value is given by \(1\mathrm {Erf}[\delta /(\sqrt{2}\varDelta )]\).
5.4 Conclusions of the unidimensional case
It should be stressed that, by construction, all methods are conservative if the true value of the \(\delta \) parameter satisfy the assumption that has been made for the computation of the p value. Therefore coverage properties are not the only criterion to investigate in this situation in order to assess the methods: in particular one has to study the robustness of the p value when the assumption set on the true value of \(\delta \) is not true. The adaptive approach provides a means to deal with a priori unexpected true values of \(\delta \), provided one is interested in a small enough p value, that is, a large enough significance effect. Other considerations (size of confidence intervals, significance thresholds) suggest that the adaptive approach provides an interesting and fairly conservative framework to deal with theoretical uncertainties. We are going to consider the different approaches in the more general multidimensional case, putting emphasis on the adaptive nuisance\(\delta \) approach and the quadratic test statistic.
6 Generalization to multidimensional cases
Up to here we only have discussed the simplest example of a single measurement X linearly related to a single model parameter \(\mu \). Obviously the general case is multidimensional, where we deal with several observables, depending on several underlying parameters, possibly in a nonlinear way, with several measurements involving different sources of theoretical uncertainty. Typical situations correspond to averaging different measurements of the same quantity, and performing fits to extract confidence regions for fundamental parameters from the measurement of observables. In this section we will discuss the case of an arbitrary number of observables in a linear model with an arbitrary number of parameters, where we are particularly interested in a onedimensional or twodimensional subset of these parameters.
6.1 General formulas
Following the onedimensional examples in the previous sections, we always assume that the measurements \(X_i\) have Gaussian distributions for the statistical part. We will consider two main cases of interest in our field: averaging measurements and determining confidence intervals for several parameters.
6.2 Averaging measurements
We start by considering the averages of several measurements of a single quantity, each with both statistical and theoretical uncertainties, with possible correlations. We will focus mainly on the nuisance\(\delta \) approach, starting with two measurements before moving to other possibilities.
6.2.1 Averaging two measurements and the choice of a hypervolume
A first usual issue consists in the case of two uncorrelated measurements \(X_1\pm \sigma _1\pm \varDelta _1\) and \(X_2\pm \sigma _2\pm \varDelta _2\) that we want to combine. The procedure is well defined in the case of purely statistical uncertainties, but it depends obviously on the way theoretical uncertainties are treated. As discussed in Sect. 3, associativity is a particularly appealing property for such a problem as it allows one to replace a series of measurements by its average without loss of information.
Each choice of volume provides an average with different properties. As discussed earlier, associativity is a very desirable property: one can average different observations of the same quantity prior to the full fit, since it gives the same result as keeping all individual inputs. The hyperball choice indeed fulfills associativity. On the other hand, the hypercube case does not: the combination of the inputs 1 and 2 yields the following test statistic: \((w_1+w_2)(\mu \hat{\mu })^2\), whereas the resulting combination \(\hat{\mu }\pm \sigma _\mu \pm \varDelta _\mu \) has the statistic \((\mu \hat{\mu })^2/(\sigma _\mu ^2+\varDelta _\mu ^2)\). The two statistics are proportional and hence lead to the same p value, but they are not equivalent when added to other terms in a larger combination.
A comment is also in order concerning the size of the uncertainties for the average. In the case of the hypercube, the resulting linear addition scheme is the only one where the average of different determinations of the same quantity cannot lead to a weighted theoretical uncertainty that is smaller than the smallest uncertainty among all determinations.^{13} In the case of the hyperball, it may occur that the average of different determinations of the same quantity yields a weighted theoretical uncertainty smaller than the smallest uncertainty among all determinations.
Whatever the choice of the volume, a very important and alluring property of our approach is the clean separation between the statistical and theoretical contribution to the uncertainty on the parameter of interest. This is actually a general property that directly follows from the choice of a quadratic statistic, and in the linear case it allows one to perform global fits while keeping a clear distinction between various sources of uncertainty.
6.2.2 Averaging n measurements with biases in a hyperball
We will now consider here the problem of averaging n, possibly correlated, determinations of the same quantity, each individual determination coming with both a Gaussian statistical uncertainty, and a number of different sources of theoretical uncertainty. We focus first on the nuisance\(\delta \) approach, as it is possible to provide closed analytic expressions in this case. We will first discuss the variation of the biases over a hyperball, before discussing other approaches, which will be illustrated and compared with examples from flavor physics in Sect. 7.
Statistical uncertainties are assumed here to be strictly Gaussian and hence symmetric (see Appendix C for more details of the asymmetric case). In contrast, in the nuisance approach, a theoretical uncertainty that is modeled by a bias parameter \(\delta \) may be asymmetric: that is, the region in which \(\delta \) is varied may depend on the sign of \(\delta \), e.g., \(\delta \in [\varDelta _,+\varDelta _+]\) in one dimension with the fixed hypercube approach (\(\varDelta _\pm \ge 0\)). In order to keep the stationarity property that follows from the quadratic statistic, we take the conservative choice \(\varDelta =\mathrm{Max}(\varDelta _+,\varDelta _)\) in the definition Eq. (34). Let us emphasize that this symmetrization of the test statistic is independent of the range in which \(\delta \) is varied: if theoretical uncertainties are asymmetric, one computes Eqs. (46)–(48) to express the asymmetric combined uncertainties \(\varDelta _{\mu ,\pm }\) in terms of the \(\varDelta _{i\alpha ,\pm }\).
6.2.3 Averages with other approaches
As indicated before, this discussion occurs for any linear transformation P and is not limited to the Cholesky decomposition. We have not been able to find other procedures that would avoid these difficulties while paralleling the hypercube case. In the following, we will thus use Eq. (49) even in the presence of theoretical correlations: therefore, the latter will be taken into account in the definition of T through \(\bar{W}\), but not in the definition of the range of variations to compute the error \(\varDelta \). We also notice that the problems that we encounter are somehow due to contradicting expectations concerning the hypercube approach. In Sect. 6.2.1, the hypercube corresponds to values of \(\delta _1\) and \(\delta _2\) left free to vary without relation among them (contrary to the hyperball case). It seems therefore difficult to introduce correlations in this case which was designed to avoid them initially. Our failure to introduce correlations in this case might be related to the fact that the hypercube is somehow designed to avoid such correlations from the start and cannot accommodate them easily.
In the case of the external\(\delta \) approach, the scan method leads to the same discussion as for the nuisance case, provided that one uses the following statistic: \(T = (X\mu \delta )^2/(\sigma ^2+\varDelta ^2)\). This choice is different from Ref. [32] by the normalization (\(\sigma ^2+\varDelta ^2\) rather than \(\sigma ^2\)) in order to take into account of the importance of both uncertainties when combining measurements (damping measurements which are imprecise in one way or the other). As indicated in Sect. 4.4, the difference of normalization of the test statistic does not affect the determination of the p value in the unidimensional case, but it has an impact once several determinations are combined. The choice above corresponds to the usual one when \(\varDelta \) is of statistical nature. It gives a reasonable balance when two or more inputs are combined that all come with both statistical and theoretical uncertainties.
A similar discussion holds for the random\(\delta \) approach. However, if the combined errors \(\sigma _\mu \) and \(\varDelta _\mu \) are the same between the nuisance\(\delta \) (with hyperball), the random\(\delta \) and the external\(\delta \) (with hyperball) approaches, we emphasize that the p value for \(\mu \) built from these errors is different and yields different uncertainties for a given confidence level for each approach, as discussed in Sect. 4.
6.2.4 Other approaches in the literature
There are other approaches available in the literature, often starting from the random\(\delta \) approach (i.e., modeling all uncertainties as random variables).
The Heavy Flavor Averaging Group [36] choose to perform the average including correlations. In the absence of knowledge on the correlation coefficient between uncertainties of two measurements (typically coming from the same method), they tune the correlation coefficient so that the resulting uncertainty is maximal (which is not \(\rho =1\) in the case where the correlated uncertainties have a different size and are combined assuming a statistical origin; see Appendix A.2). This choice is certainly the most conservative one when there is no knowledge concerning correlations.
The Flavor Lattice Averaging Group [37] follows the proposal in Ref. [38]: they build a covariance matrix where correlated sources of uncertainties are included with 100% correlation, and they perform the average by choosing weights \(w_i\) that are not optimal but are well defined even in the presence of \(\rho =\pm 1\) correlation coefficients. As discussed in Appendix A.2, our approach to singular covariance matrices is similar but more general and guarantees that we recover the weights advocated in Ref. [38] for averages of fully correlated measurements.
Finally, the PDG approach [34] combines all uncertainties in a single covariance matrix. In the case of inconsistent measurements, one may then obtain an average with an uncertainty that may be interpreted as ‘too small’ (notice however that the weighted uncertainty does not increase with the incompatibility of the measurements). This problem occurs quite often in particle physics and cannot be solved by purely statistical considerations (even in the absence of theoretical uncertainties). If the model is assumed to be correct, one may invoke an underestimation of the uncertainties. A (commonly used) recipe in the pure statistical case has been adopted by the Particle Data Group, which consists in computing a factor \(S=\sqrt{\chi ^2/(N_\mathrm{dof}1)}\) and rescaling all uncertainties by this factor. A drawback of this approach is the lack of associativity: the inconsistency is either removed or kept as it is, depending on whether the average is performed before any further analysis, or inside a global fit. Furthermore since the ultimate goal of statistical analyses is indeed to exclude the null hypothesis (e.g. the Standard Model), it looks counterintuitive to first wash out possible discrepancies by an ad hoc procedure. Therefore we refrain to define a S factor in the presence of theoretical uncertainties, and we leave the discussion of discrepancies between independent determinations of the same quantity to a casebycase basis, based on physical (and not statistical) grounds.
In the case of the Rfit approach adopted by the CKMfitter group [15, 16], a specific recipe was chosen to avoid underestimating combined uncertainties in the case of marginally compatible values. The idea is first combine the statistical uncertainties by combining the likelihoods restricted to their statistical part, then assign to this combination the smallest of the individual theoretical uncertainties. This is justified by the following two points: the present state of the art is assumed not to allow one to reach a better theoretical accuracy than the best of all estimates, and this best estimate should not be penalized by less precise methods. In contrast with the plain (or naive) Rfit approach for averages (consisting in just combining Rfit likelihoods without further treatment), this method of combining uncertainties was called educated Rfit and is used by the CKMfitter group for averages [17, 19, 22]. Let us note finally that the calculation of pull values, discussed in Sect. 6.3, is a crucial step for assessing the size of discrepancies.
6.3 Global fit
6.3.1 Estimators and errors
Another prominent example of multidimensional problem is the extraction of a constraint on a particular parameter of the model from the measured observables. If the model is linear, Eq. (38), the discussion follows closely that of Sect. 6.2.2. In the case where there is a single parameter of interest \(\mu \), we do not write explicitly the calculations and refer to Sect. 7 for numerical examples.
6.4 Goodnessoffit
6.5 Pull parameters
In addition to the general indication given by goodnessoffit indicators, it is useful to determine the agreement between individual measurements and the model. One way of quantifying this agreement consists in determining the pull of each quantity. Indeed, the agreement between the indirect fit prediction and the direct determination of some observable X is measured by its pull, which can be determined by considering the difference of minimum values of the test statistic including or not the observables [22]. In the absence of nonGaussian effects or correlations, the pulls are random variables of vanishing mean and unit variance.
6.6 Conclusions of the multidimensional case
We have discussed several situations where a multidimensional approach is needed in phenomenology analysis. In addition to the issues already encountered in one dimension, a further arbitrary choice must be performed in the multidimensional case for nuisance and external approaches concerning the shape of the volume in which the biases are varied: two simple cases are given by the hypercube and the hyperball, corresponding, respectively, to the wellknown linear and quadratic combination of uncertainties. We have then discussed how to average two (or several) measurements, emphasizing the case of the nuisance approach. We have finally illustrated how a fit could be performed in order to determine confidence regions. Beyond the metrology of the model, we can also determine the agreement between model and experiments thanks to the pull parameters associated with each observable.
The unidimensional case (stationarity of the quadratic test statistic under minimization, coverage properties) has led us to prefer the adaptive nuisance approach, even though the fixed nuisance approach could also be considered. In the multidimensional case, the hyperball in conjunction with the quadratic test statistic allows us to keep associativity when performing averages, so that it is rigorously equivalent from the statistical point of view to keep several measurements of a given observable or to average them in a single value. We have also been able to discuss theoretical correlations using the hyperball case at two different stages: including the correlations among observables in the domain of variations of the biases when computing the errors \(\varDelta \), and providing a meaningful definition for the theoretical correlation among parameters of the fit. We have not found a way to keep these properties in the case of the hypercube. Moreover, choosing the hypercube may favor bestfit configurations where all the biases are at the border of their allowed regions, whereas the hyperball prevents such ‘finetuned’ solutions from occurring.
For comparison, in the following we will focus on two nuisance approaches: fixed 1hypercube and adaptive hyperball with a preference for the latter. The other combinations would yield far too conservative (adaptive hypercube) or too liberal (fixed 1hyperball) ranges of variations for the biases.
7 CKMrelated examples
We will now consider the differences between the various approaches considered using several examples from quark flavor physics. These examples will be only for illustrative purposes, and we refer the reader to other work [15, 16, 22, 35] for a more thorough discussion of the physics and the inputs involved. From the previous discussion, we could consider a large set of approaches for theoretical uncertainties.
We will restrict to a few cases compared to the previous sections. First, we will consider educated Rfit (Rfit with specific treatment of uncertainties for averages), as used by the CKMfitter analyses and described in Sect. 6.2.4, while the naive Rfit approach will only be shown for the sake of comparison and is not understood as an appropriate model. We will also consider two nuisance approaches, namely the adaptive hyperball and the 1hypercube cases. Our examples will be chosen in the context of CKM fits, and correspond approximately to the situation for Summer 2014 conferences. However, for pedagogical purposes, we have simplified intentionally some of the inputs compared to actual phenomenological analyses performed in flavor physics [35].
7.1 Averaging theorydominated measurements
Top: lattice determinations of the kaon bag parameter \(B_K^{\bar{\mathrm{MS}}}(2\mathrm{GeV})\). Middle: averages according to the various methods, and corresponding confidence intervals for various significances. Bottom: pulls associated to each measurement for each method. For Rfit methods, we quote only the significance of the pull, whereas other methods yield the pull parameter as well as the pull itself under the form \(p\pm \sigma \pm \varDelta \) (significance of the pull)
References  \(N_f\)  Mean  Stat  Theo  

ETMC10 [41]  2  0.532  ±0.019  \(\pm 0.003\pm 0.007\pm 0.003\pm 0.008\pm 0.005\)  
LVdW11 [42]  2 + 1  0.5572  ±0.0028  \(\pm 0.0045\pm 0.0033\pm 0.0039\pm 0.0006\pm 0.0134\)  
BMW11 [43]  2 + 1  0.5644  ±0.0059  \(\pm 0.0022\pm 0.0008\pm 0.0006\pm 0.0006\pm 0.0002\pm 0.0056\)  
RBCUKQCD12 [44]  2 + 1  0.554  ±0.008  \(\pm 0.007 \pm 0.003\pm 0.012\)  
SWME14 [45]  2 + 1  0.5388  ±0.0034  \(\pm 0.0237\pm 0.0048\pm 0.0005\pm 0.0108\pm 0.0022\pm 0.0016\pm 0.0005\) 
Method  Average  1\(\sigma \) CI  2\(\sigma \) CI  3\(\sigma \) CI  5\(\sigma \) CI 

nG  \(0.5577 \pm 0.0063 \pm 0\)  \(0.5577 \pm 0.0063\)  \(0.5577 \pm 0.0126\)  \(0.5577 \pm 0.0189\)  \(0.5577 \pm 0.0315\) 
Naive Rfit  \(0.5562\pm 0.0120 \pm 0.0018 \)  \(0.5562 \pm 0.0138\)  \(0.5562 \pm 0.0258\)  \(0.5562 \pm 0.0379\)  \(0.5562 \pm 0.0619\) 
Educ Rfit  \(0.5562 \pm 0.0020 \pm 0.0100\)  \(0.5562 \pm 0.0120\)  \(0.5562 \pm 0.0139\)  \(0.5562 \pm 0.0159\)  \(0.5562 \pm 0.0198\) 
1hypercube  \(0.5577\pm 0.0038 \pm 0.0176\)  \(0.5577 \pm 0.0193\)  \(0.5577\pm 0.0240\)  \(0.5577 \pm 0.0281\)  \(0.5577 \pm 0.0360\) 
Adapt hyperball  \(0.5577\pm 0.0038 \pm 0.0050\)  \(0.5577 \pm 0.0068\)  \(0.5577 \pm 0.0165\)  \(0.5577 \pm 0.0257\)  \(0.5577 \pm 0.0436\) 
Pull  nG  (e)Rfit  1hypercube  Adaptive hyperball  

ETMC10  \(1.22\pm 1.04\pm 0\ (1.2\sigma )\)  \((0.0 \sigma )\)  \(1.22\pm 0.85\pm 1.88\ (0.3\sigma )\)  \(1.22\pm 0.85\pm 0.60\ (1.1\sigma )\)  
LVdW11  \(0.04\pm 1.10\pm 0\ (0.0\sigma )\)  \((0.0 \sigma )\)  \(0.04\pm 0.35\pm 2.71\ (0.0\sigma )\)  \(0.04\pm 0.35\pm 1.04\ (0.1\sigma )\)  
BMW11  \(\ 1.74\pm 1.49\pm 0\ (1.2\sigma )\)  \((0.0 \sigma )\)  \(\ 1.74\pm 0.86\pm 4.32\ (0.0\sigma )\)  \(\ 1.74\pm 0.86 \pm 1.21 \ (1.0 \sigma )\)  
RBCUKQCD12  \(0.27\pm 1.08\pm 0\ (0.2\sigma )\)  \((0.0 \sigma )\)  \(0.27\pm 0.55\pm 2.38\ (0.0\sigma )\)  \(0.27\pm 0.56\pm 0.93\ (0.4\sigma )\)  
SWME14  \(0.75\pm 1.03\pm 0\ (0.7\sigma )\)  \((0.0 \sigma )\)  \(0.75\pm 0.19\pm 2.24\ (0.0\sigma )\)  \(0.75\pm 0.19\pm 1.01\ (0.7\sigma )\) 
The results for each method are given in Table 6 (middle). The first column corresponds to the outcome of the averaging procedure. In all the approaches considered, we can split statistical and theoretical uncertainties. In the case of naive Rfit, one combines the measurements by adding the well statistic corresponding to each measurement: the resulting test statistic T is a well with a bottom, the width of which can be interpreted as a theoretical uncertainty, whereas the width at \(T_{\min }+1\) determines the statistical uncertainty.^{17} The case of educated Rfit was described in Sect. 6.2.4. The confidence intervals are obtained from the p value determined from the “average” column.
We compute the pulls in the same way in both cases, interpreting the difference of \(T_\mathrm{min}\) with and without the observables as a random variable distributed according to a \(\chi ^2\) law with \(N_\mathrm{dof}=1\). The propagation of uncertainties for the quadratic statistic was detailed in Sects. 6.2.1 and 6.2.2 where the separate extraction of statistical and theoretical uncertainties was described. The tables are obtained by plugging the average into the onedimensional p value associated with the method, and reading from the p value the corresponding confidence interval at the chosen significance. The associated pulls are given in Table 6 (bottom).
Top: lattice determinations of the \(D_s\)meson decay constant \(f_{D_s}\) (in MeV). Middle: averages according to the various methods, and corresponding confidence intervals for various significances. Bottom: pull associated to each measurement for each method. For Rfit methods, we quote only the significance of the pull, whereas other methods yield the pull parameter as well as the pull itself under the form \(p\pm \sigma \pm \varDelta \) (significance of the pull)
References  \(N_f\)  Mean  Stat  Theo  

ETMC09 [46]  2  244  ±3  \(\pm 2 \pm 7\)  
HPQCD10 [47]  2 + 1  248.0  ±1.4  \(\pm 0.4 \pm 1.4 \pm 1.0 \pm 0.8 \pm 0.3 \pm 0.3 \pm 0.3\)  
FNALMILC11 [48]  2 + 1  260.1  ±8.9  \(\pm 2.2 \pm 1.6\pm 1.0\pm 1.4 \pm 2.8 \pm 2.0 \pm 3.4 \pm 1.8\)  
FNALMILC14 [49]  2 + 1 + 1  248.8  ± 0.3  \(\pm 1.2 \pm 0.2\pm 0.1 \pm 0.4\)  
ETMC14 [50]  2 + 1 + 1  247.2  ±3.9  \(\pm 0.7 \pm 1.2 \pm 0.3\) 
Method  Average  1\(\sigma \) CI  2\(\sigma \) CI  3\(\sigma \) CI  5\(\sigma \) CI 

nG  \(248.5\pm 1.1 \pm 0 \)  \(248.5 \pm 1.1\)  \(248.5 \pm 2.2\)  \(248.5 \pm 3.3\)  \(248.5 \pm 5.5\) 
Naive Rfit  \(248.1\pm 0.9 \pm 1.3 \)  \(248.1\pm 2.2\)  \(248.1 \pm 3.1\)  \(248.1 \pm 4.1\)  \(248.1\pm 5.9\) 
Educ Rfit  \(248.1\pm 0.3 \pm 1.9\)  \(248.1 \pm 2.2\)  \(248.1 \pm 2.5\)  \(248.1 \pm 2.8\)  \(248.1 \pm 3.4\) 
1hypercube  \(248.5 \pm 0.5 \pm 2.7 \)  \(248.5 \pm 3.0\)  \(248.5 \pm 3.5\)  \(248.5 \pm 4.0\)  \(248.5 \pm 5.0\) 
Adapt hyperball  \(248.5\pm 0.5 \pm 1.0 \)  \(248.5 \pm 1.2\)  \(248.5 \pm 2.8\)  \(248.5 \pm 4.3\)  \(248.5 \pm 7.2\) 
Pull  nG  (e)Rfit  1hypercube  Adaptive hyperball  

ETMC09  \(0.59\pm 1.01\pm 0\ (0.6\sigma )\)  \((0.0 \sigma )\)  \(0.59\pm 0.39\pm 1.47\ (0.0\sigma )\)  \(0.59\pm 0.39\pm 0.93\ (0.6\sigma )\)  
HPQCD10  \(0.28\pm 1.12\pm 0\ (0.3\sigma )\)  \((0.0 \sigma )\)  \(0.28\pm 0.60\pm 2.77 \ (0.0\sigma )\)  \(0.28\pm 0.60\pm 0.95\ (0.4\sigma )\)  
FNALMILC11  \(1.08\pm 1.00\pm 0\ (1.1\sigma )\)  \((0.0 \sigma )\)  \(1.08\pm 0.82\pm 1.74 \ (0.3\sigma )\)  \(1.08\pm 0.83\pm 0.57\ (1.0\sigma )\)  
FNALMILC14  \(0.63\pm 1.82\pm 0\ (0.3\sigma )\)  \((0.0 \sigma )\)  \(0.63\pm 1.05\pm 4.97 \ (0.0\sigma )\)  \(0.63\pm 1.05\pm 1.48 \ (0.5\sigma )\)  
ETMC14  \(0.35\pm 1.04\pm 0\ (0.3\sigma )\)  \((0.0 \sigma )\)  \(0.35\pm 0.94\pm 1.20\ (0.2\sigma )\)  \(0.35\pm 0.94\pm 0.43 \ (0.4\sigma )\) 
For both quantities \(B_K\) and \(f_{D_s}\) at large confidence level (3\(\sigma \) and above), the most conservative method is the adaptive hyperball nuisance approach, whereas the one leading to the smallest uncertainties is the educated Rfit approach. Below 3\(\sigma \), the 1hypercube approach is more conservative than the adaptive hyperball nuisance approach, and it becomes less conservative above that threshold. The most important differences are observed at large CL/significance. The statistical uncertainty obtained in the nG approach is by construction identical to the combination in quadrature of the statistical and theoretical uncertainties obtained in the adaptive hyperball approach. However, one can notice that the confidence intervals for high significances in the two approaches are different, with nG being less conservative. The overall very good agreement of lattice determinations means vanishing pulls for Rfit methods (since all the wells have a common bottom with a vanishing \(T_\mathrm{min}\)). For the other methods, the pull parameter has statistical and theoretical errors of similar size in the adaptive hyperball case, whereas theoretical errors tend to dominate in the 1hypercube method. This yields smaller pulls in the latter approach.
A last illustration, which does not come solely from lattice simulations, is provided by the determination the strong coupling constant \(\alpha _S(M_Z)\). The subject is covered extensively by recent reviews [34, 51], and we stress that we do not claim to provide an accurate alternative average to these reviews which requires a careful assessment of the various determinations and their correlations. As a purely illustrative example, we will focus on the average of determinations from \(e^+e^\) annihilation under a set of simplistic hypotheses for the separation between statistical and theoretical uncertainties. In order to allow for a closer comparison with Refs. [34, 62], we try to assess correlations this time. We assume that theoretical uncertainties for the same set of observables (\( j \& s\), 3j, T), but from different experiments, are 100% correlated, and the statistical uncertainties for determinations from similar experimental data are 100% correlated (BST, DWT, AFHMST).^{18}
We perform the average in the different cases considered, see Table 8 (middle), which are represented graphically in Fig. 8 (a similar plot at \(3\sigma \) is given in Fig. 13 in Appendix D). We notice that the various approaches yield results with similar central values to the nG case. The pulls for individual quantities are mostly around 1\(\sigma \), and they are smaller in the adaptive hyperball approach compared to the nG one, showing better consistency. Refs. [34, 62] take a different approach, “range averaging”, which amounts to considering the spread of the central values for the various determinations, leading to \(\alpha _S(M_Z)=0.1174 \pm 0.0051\) for the determination from \(e^+e^\) annihilation data considered here [62]. This approach is motivated in Ref. [34] by the complicated pattern of correlations and the limited compatibility between some of the inputs and, more importantly, it does not take into account that the different determinations have different accuracies according to the uncertainties quoted. The approach in Refs. [34, 62] conservatively accounts for the possibility that some uncertainties are underestimated. On the contrary, our averages given in Table 8 and Fig. 8 assume that all the inputs should be taken into account and averaged according to the uncertainties given in the original articles. The difference in the underlying hypotheses for the averages explain the large difference observed between our results and the ones in Refs. [34, 62]. Note, however, that our numerics directly follow from the use of the different averaging methods, and lack the necessary critical assessment of the individual determinations of \(\alpha _S(m_Z)\) performed in Refs. [34, 62].
7.2 Averaging incompatible or barely compatible measurements
Another important issue occurs when one wants to combine barely compatible measurements. This is for instance the case for \(V_{ub}\) and \(V_{cb}\) from semileptonic decays, where inclusive and exclusive determinations are not in very good agreement. The list of determinations used for illustrative purposes and the results for each method are given in Tables 9 and 10, together with the corresponding graphical comparisons in Fig. 9 (a similar plot at \(3\sigma \) is given in Fig. 14 in Appendix D). Our inputs are slightly different from Ref. [36] for several reasons. The inclusive determination of \(V_{ub}\) corresponds to the BLNP approach [64], and we consider the theoretical uncertainties from shape functions (leading and subleading), weak annihilation, and heavyquark expansion uncertainties on matching and \(m_b\). We use only branching fractions measured for \(B\rightarrow \pi \ell \nu \) and average the unquenched lattice calculations quoted in Ref. [36]. For \(V_{cb}\) exclusive we also split the various sources of theoretical uncertainties coming from the determination of the form factors. We assume that there are no correlations among all these uncertainties.
The lack of compatibility between the two types of determination means in particular that the naive Rfit combined likelihood has not flat bottom, and thus no theoretical uncertainty. This behavior was one of the reasons to propose the educated Rfit approach, where the theoretical uncertainty of the combination cannot be smaller than any of the individual measurements.
The same pattern of conservative and aggressive approaches can be observed, with a fairly good agreement at 3\(\sigma \) level (apart from the naive Rfit approach, already discussed). At 5\(\sigma \), the adaptive hyperball proves again rather conservative, even though the theoretical error of the averages are smaller than the 1hypercube nuisance and the educated Rfit approaches. The analysis of the pulls yields similar conclusions, with discrepancies at the 2\(\sigma \) for \(V_{ub}\) and between 2 and 3\(\sigma \) for \(V_{cb}\). Once again, theoretical errors for the pull parameters are larger in the 1hypercube approach than in the adaptive hyperball case. Let us also notice that in both cases, there are only two quantities to combine, so that the two pull parameters are by construction opposite to each other up to an irrelevant scaling factor, leading to the same pull for both quantities.
Top: determinations of \(\alpha _S(M_Z)\) using \( e^+ e^ \) annihilation, taken from Ref. [34]. Middle: averages for \(\alpha _S(M_Z)\) from \(e^+e^\) annihilation according to the various methods, and corresponding confidence intervals for various significances. Bottom: pull associated to each measurement for each method. For Rfit methods, we quote only the significance of the pull, whereas other methods yield the pull parameter as well as the pull itself under the form \(p\pm \sigma \pm \varDelta \)
References  Mean  Stat (\( \times 10^{3} \))  Theo (\( \times 10^{3} \))  

ALEPHj & s [52]  0.1224  \( \pm \)0.9 \( \pm \) 0.9 \( \pm \)1.2  \( \pm \)3.5  
OPALj & s [53]  0.1189  \( \pm \)0.8 \( \pm \) 1.6 \( \pm \)1.0  \( \pm \)3.6  
JADEj & s [54]  0.1172  \( \pm \)0.6 \( \pm \) 2.0 \( \pm \)3.5  \( \pm \)3.0  
Dissertori3j [55]  0.1175  \( \pm \)2.0  \( \pm \)1.5  
JADE3j [56]  0.1199  \( \pm \)1.0 \( \pm \) 2.1 \( \pm \) 5.4  \( \pm \)0.7  
BST [57]  0.1172  \( \pm \)1.0 \( \pm \) 0.8  \( \pm \) 1.2 \( \pm \)1.2  
DWT [58]  0.1165  \( \pm \)2.2  \( \pm \) 1.7  
AFHMST [59]  0.1135  \( \pm \)0.2  \( \pm \) 0.5 \( \pm \)0.9  
GLMT [60]  0.1134  \( \pm \) 2.5  ± 0.6  
HKMSC [61]  0.1123  \( \pm \)0.2  \( \pm \)0.7 \( \pm \)1.4 
Method  Average  1\(\sigma \) CI  2\(\sigma \) CI  3\(\sigma \) CI  5\(\sigma \) CI 

nG  \(0.1143 \pm 0.0010 \pm 0\)  \(0.1143 \pm 0.0010\)  \(0.1143 \pm 0.0020\)  \(0.1143 \pm 0.0030\)  \(0.1143 \pm 0.0050\) 
Naive Rfit  \(0.1145\pm 0.0002 \pm 0\)  \(0.1145 \pm 0.0002\)  \(0.1145 \pm 0.0004\)  \(0.1145 \pm 0.0006\)  \(0.1145 \pm 0.0011\) 
Educ Rfit  \(0.1145\pm 0.0001 \pm 0.0006\)  \(0.1145 \pm 0.0007\)  \(0.1145 \pm 0.0009\)  \(0.1145 \pm 0.0010\)  \(0.1145 \pm 0.0013\) 
1hypercube  \(0.1143\pm 0.0005 \pm 0.0018\)  \(0.1143 \pm 0.0020\)  \(0.1143 \pm 0.0026\)  \(0.1143 \pm 0.0031\)  \(0.1143 \pm 0.0041\) 
Adapt hyperball  \(0.1143\pm 0.0005 \pm 0.0009\)  \(0.1143 \pm 0.0011\)  \(0.1143 \pm 0.0026\)  \(0.1143 \pm 0.0039\)  \(0.1143 \pm 0.0067\) 
Pull  nG  (e)Rfit  1hypercube  Adaptive hyperball  

ALEPHj & s  \(1.30 \pm 0.69 \pm 0\ (1.9\sigma )\)  (2.5\(\sigma \))  \(1.30 \pm 0.26 \pm 0.91\ (1.8\sigma )\)  \(1.30 \pm 0.26 \pm 0.63\ (1.6\sigma )\)  
OPALj & s  \(0.93 \pm 0.69 \pm 0\ (1.3\sigma )\)  (0.4\(\sigma \))  \(0.93 \pm 0.29 \pm 0.89\ (0.7\sigma )\)  \(0.93 \pm 0.29 \pm 0.63\ (1.2\sigma )\)  
JADEj & s  \(0.76 \pm 0.79 \pm 0\ (0.9\sigma )\)  (0.0\(\sigma \))  \(0.76 \pm 0.55 \pm 0.84\ (0.6\sigma )\)  \(0.76 \pm 0.55 \pm 0.57\ (0.9\sigma )\)  
Dissertori3j  \(1.13 \pm 0.77 \pm 0\ (1.4\sigma )\)  (0.9\(\sigma \))  \(1.13 \pm 0.58 \pm 0.95\ (0.9\sigma )\)  \(1.13 \pm 0.58 \pm 0.51\ (1.3\sigma )\)  
JADE3j  \(1.10 \pm 1.00 \pm 0\ (1.1\sigma )\)  (0.8\(\sigma \))  \(1.10 \pm 0.98 \pm 0.46\ (1.0\sigma )\)  \(1.10 \pm 0.98 \pm 0.22\ (1.1\sigma )\)  
BST  \(0.36 \pm 0.92 \pm 0\ (0.4\sigma )\)  (0.2\(\sigma \))  \(0.36 \pm 0.88 \pm 0.41\ (0.4\sigma )\)  \(0.36 \pm 0.88 \pm 0.26\ (0.4\sigma )\)  
DWT  \(0.15 \pm 0.97 \pm 0\ (0.2\sigma )\)  (0.1\(\sigma \))  \(0.15 \pm 0.96 \pm 0.18\ (0.2\sigma )\)  \(0.15 \pm 0.96 \pm 0.10\ (0.2\sigma )\)  
AFHMST  \(0.24 \pm 0.78 \pm 0\ (0.3\sigma )\)  (0.0\(\sigma \))  \(0.24 \pm 0.57 \pm 1.00\ (0.1\sigma )\)  \(0.24 \pm 0.57 \pm 0.53\ (0.4\sigma )\)  
GLMT  \(0.29 \pm 0.95 \pm 0\ (0.3\sigma )\)  (0.2\(\sigma \))  \(0.28 \pm 0.88 \pm 0.73\ (0.2\sigma )\)  \(0.28 \pm 0.88 \pm 0.36\ (0.3\sigma )\)  
HKMSC  \(2.27 \pm 1.35 \pm 0\ (1.7\sigma )\)  (1.4\(\sigma \))  \(2.27 \pm 0.72 \pm 2.27\ (0.7\sigma )\)  \(2.27 \pm 0.72 \pm 1.14\ (1.4\sigma )\) 
7.3 Averaging quantities dominated by different types of uncertainties
In order to illustrate the role played by statistical and theoretical uncertainties, we consider the question of averaging quantities dominated by one or the other. This happens for instance when one wants to compare a theoretically clean determination with other determination potentially affected by large theoretical uncertainties. This situation occurs in flavor physics for instance when one compares the extraction of \(\sin (2\beta )\) from timedependent asymmetries in \(b\rightarrow c\bar{c}s\) and \(b\rightarrow q\bar{q}s\) decays (let us recall that, for the CKM global fit, only charmonium input is used for \(\sin (2\beta )\)). The first have a very small penguin pollution, which we will neglect, whereas the latter is significantly affected by such a pollution. The corresponding estimates of \(\sin (2\beta )\) have large theoretical uncertainties, and for illustration we use the computation done in Ref. [63].
Top: determinations of \(V_{ub}\cdot 10^3\) from semileptonic decays. Middle: averages according to the various methods, and corresponding confidence intervals for various significances. Bottom: pulls associated to each determination for each method. For Rfit methods, we quote only the significance of the pull, whereas other methods yield the pull parameter as well as the pull itself under the form \(p\pm \sigma \pm \varDelta \) (significance of the pull)
Reference  Mean  Stat  Theo  

Exclusive  
CKMfitter Summer 14  3.28  ±0.15  ± 0.26  
Inclusive  
CKMfitter Summer 14  4.359  ±0.180  \(\pm 0.013 \pm 0.027 \pm 0.037\pm 0.161 \pm 0.200\) 
Method  Average  1\(\sigma \) CI  2\(\sigma \) CI  3\(\sigma \) CI  5\(\sigma \) CI 

nG  \(3.79\pm 0.22 \pm 0 \)  \( 3.79 \pm 0.22\)  \( 3.79 \pm 0.44\)  \( 3.79 \pm 0.65\)  \( 3.79 \pm 1.1\) 
Naive Rfit  \(3.70\pm 0.12 \pm 0\)  \(3.70 \pm 0.12\)  \(3.70 \pm 0.23\)  \(3.70 \pm 0.35\)  \(3.70 \pm 0.58\) 
Educ Rfit  \(3.70\pm 0.11 \pm 0.26\)  \(3.70 \pm 0.38\)  \(3.70 \pm 0.49\)  \(3.70 \pm 0.61\)  \(3.70 \pm 0.84\) 
1hypercube  \( 3.79\pm 0.12 \pm 0.34\)  \( 3.79 \pm 0.40\)  \( 3.79 \pm 0.54\)  \( 3.79 \pm 0.67\)  \( 3.79 \pm 0.91\) 
Adapt hyperball  \( 3.79\pm 0.12 \pm 0.18 \)  \( 3.79 \pm 0.24\)  \( 3.79 \pm 0.57\)  \( 3.79 \pm 0.88\)  \( 3.79 \pm 1.49\) 
Pull  nG  (e)Rfit  1hypercube  Adaptive hyperball  

Exclusive  \(3.60\pm 1.46\pm 0 \ (2.5\sigma )\)  \((1.6 \sigma )\)  \(3.60\pm 0.78\pm 2.31 \ (1.9\sigma )\)  \(3.60\pm 0.78\pm 1.23 \ (1.9\sigma )\)  
Inclusive  \(3.40\pm 1.38\pm 0 \ (2.5\sigma )\)  \((1.6 \sigma )\)  \(3.40\pm 0.74\pm 2.20 \ (1.9\sigma )\)  \(3.40\pm 0.74\pm 1.16 \ (1.9\sigma )\) 
7.4 Global fits
In order to illustrate the impact of the treatment of theoretical uncertainties, we consider a global fit including mainly observables that come with a theoretical uncertainty. The list of observables is given in Table 12. Their values are motivated by the CKMfitter inputs used in Summer 2014, but they are used only for purposes of illustration.^{19} We consider two fits: Scenario A involves only constraints dominated by theoretical uncertainties whereas Scenario B includes also constraints from the angles (statistically dominated).
As far as the CKM matrix elements are concerned the Standard Model is linear, but it is not linear in all the other fundamental parameters of the Standard Model. For the illustrative purposes of this note, the first step thus consists in determining the minimum of the full (nonlinear) \(\chi ^2\), and to linearize the Standard Model formulas for the various observables around this minimum (we choose the inputs of Scenario B to determine this point): this define an exactly linear model, which at this stage should not be used for realistic phenomenology but is useful for the comparison of the methods presented here. One can use the results presented in the previous section in order to determine the p value as a function of each of the parameters of interest. In the case of the nuisance\(\delta \) approach, we can describe this p value using the same parameters as before, namely a central value, a statistical error and a theoretical error.
Top: determinations of \(V_{cb}\cdot 10^3\) from semileptonic decays. Middle: averages according to the various methods, and corresponding confidence intervals for various significances. Bottom: pulls associated to each determination for each method. For Rfit methods, we quote only the significance of the pull, whereas other methods yield the pull parameter as well as the pull itself under the form \(p\pm \sigma \pm \varDelta \) (significance of the pull)
Reference  Mean  Stat  Theo  

Exclusive  
CKMfitter Summer 14  38.99  \(\pm 0.49\)  \(\pm 0.04\pm 0.21\pm 0.13\pm 0.39\pm 0.17\pm 0.04\pm 0.19\)  
Inclusive  
CKMfitter Summer 14  42.42  \(\pm 0.44\)  \(\pm 0.74\) 
Method  Average  1\(\sigma \) CI  2\(\sigma \) CI  3\(\sigma \) CI  5\(\sigma \) CI 

nG  \(40.41\pm 0.55 \pm 0\)  \(40.41\pm 0.55\)  \(40.41 \pm 1.11\)  \(40.41 \pm 1.66\)  \(40.41 \pm 2.77\) 
Naive Rfit  \(41.00\pm 0.33 \pm 0 \)  \(41.00\pm 0.32\)  \(41.00 \pm 0.65\)  \(41.00 \pm 0.98\)  \(41.00\pm 1.64\) 
Educ Rfit  \(41.00\pm 0.33 \pm 0.74\)  \(41.00 \pm 1.07\)  \(41.00 \pm 1.39\)  \(41.00 \pm 1.72\)  \(41.00 \pm 2.38\) 
1hypercube  \(40.41\pm 0.34 \pm 0.99\)  \(40.41 \pm 1.15\)  \(40.41 \pm 1.57\)  \(40.41 \pm 1.94\)  \(40.41\pm 2.65\) 
Adapt hyperball  \(40.41\pm 0.34 \pm 0.44\)  \(40.41 \pm 0.60\)  \(40.41 \pm 1.45\)  \(40.41 \pm 2.26\)  \(40.41 \pm 3.84\) 
Pull  nG  (e)Rfit  1hypercube  Adaptive hyperball  

Exclusive  \(4.75\pm 1.56\pm 0 \ (3.1\sigma )\)  \((2.3\sigma )\)  \(4.75\pm 0.91\pm 2.65 \ (2.6\sigma )\)  \(4.75\pm 0.91\pm 1.26 \ (2.3\sigma )\)  
Inclusive  \(3.98\pm 1.30\pm 0 \ (3.1\sigma )\)  \((2.3\sigma )\)  \(3.98\pm 0.77\pm 2.22 \ (2.6\sigma )\)  \(3.98\pm 0.77\pm 0.74 \ (2.3\sigma )\) 
8 Conclusion
A problem often encountered in particle physics consists in analyzing data within the Standard Model (or some of its extensions) in order to extract information on the fundamental parameters of the model. An essential role is played here by uncertainties, which can be classified in two categories, statistical and theoretical. If the former can be treated in a rigorous manner within a given statistical framework, the latter must be described through models. The problem is particularly acute in flavor physics, as theoretical uncertainties often play a central role in the determination of underlying parameters, such as the four parameters describing the CKM matrix in the Standard Model.
Top: symmetrized determinations of \(\sin (2\beta _\mathrm{eff})\) from various penguin \(b\rightarrow q\bar{q}s\) modes and from charmonia modes [36], and estimate within QCD factorization of the correction from penguin pollution in the Standard Model (symmetrized range quoted in Table 1 in Ref. [63]). We neglect any penguin pollution in the case of the charmonium extraction of \(\sin (2\beta )\). Middle: averages according to the various methods, and corresponding confidence intervals for various significances. Bottom: pulls associated to each determination for each method. For Rfit methods, we quote only the significance of the pull, whereas other methods yield the pull parameter as well as the pull itself under the form \(p\pm \sigma \pm \varDelta \) (significance of the pull)
\(\sin (2\beta _\mathrm{eff})\)  \(\varDelta S=\sin (2\beta _\mathrm{eff})\sin (2\beta )\)  \(\sin (2\beta )\)  

\(\pi ^0 K_S\)  \(0.57\pm 0.17\pm 0\)  \( 0.085 \pm 0\pm 0.065\)  \(0.485\pm 0.17\pm 0.065\)  
\(\rho ^0 K_S\)  \(0.525 \pm 0.195\pm 0\)  \( 0.135 \pm 0\pm 0.155\)  \(0.66 \pm 0.195\pm 0.155\)  
\(\eta ' K_S\)  \(0.63\pm 0.06\pm 0\)  \( 0.015 \pm 0\pm 0.015\)  \(0.615\pm 0.06\pm 0.015\)  
\(\phi K_S\)  \(0.73 \pm 0.12\pm 0\)  \( 0.03 \pm 0\pm 0.02\)  \(0.7 \pm 0.12\pm 0.02\)  
\(\omega K_S\)  \(0.71 \pm 0.21\pm 0\)  \( 0.11 \pm 0\pm 0.10\)  \(0.6 \pm 0.21\pm 0.10\)  
\((c\bar{c}) K_S\)  \(0.689\pm 0.018\)  0  \(0.689\pm 0.018\pm 0\) 
Method  Average  1\(\sigma \) CI  2\(\sigma \) CI  3\(\sigma \) CI  5\(\sigma \) CI 

nG  \(0.681\pm 0.017\pm 0\)  \(0.681 \pm 0.017\)  \(0.681 \pm 0.034\)  \(0.681 \pm 0.051\)  \(0.681 \pm 0.085\) 
Naive Rfit  \(0.683 \pm 0.017 \pm 0\)  \(0.683 \pm 0.017\)  \(0.683 \pm 0.034\)  \(0.683 \pm 0.051\)  \(0.683 \pm 0.085\) 
Educ Rfit  \(0.683\pm 0.017\pm 0.\)  \(0.683 \pm 0.017\)  \(0.683 \pm 0.034\)  \(0.683\pm 0.051\)  \(0.683 \pm 0.084\) 
1hypercube  \(0.681\pm 0.017\pm 0.003\)  \(0.681 \pm 0.017\)  \(0.681\pm 0.034\)  \(0.681\pm 0.052\)  \(0.681 \pm 0.086\) 
Adapt hyperball  \(0.681\pm 0.017\pm 0.002\)  \(0.681 \pm 0.017\)  \(0.681 \pm 0.034\)  \(0.681\pm 0.052\)  \(0.681\pm 0.090\) 
Pull  nG  (e)Rfit  1hypercube  Adaptive hyperball  

\(\pi ^0 K_S\)  \(1.09\pm 1.00\pm 0 \ (1.1\sigma )\)  \((0.8 \sigma )\)  \(1.09\pm 0.94\pm 0.37 \ (1.1\sigma )\)  \(1.09\pm 0.94\pm 0.36\ (1.1\sigma )\)  
\(\rho ^0 K_S\)  \(0.09\pm 1.00\pm 0 \ (0.1\sigma )\)  \((0.0 \sigma )\)  \(0.09\pm 0.79\pm 0.63 \ (0.1\sigma )\)  \(0.09\pm 0.79\pm 0.62\ (0.1\sigma )\)  
\(\eta ' K_S\)  \(1.16\pm 1.04\pm 0 \ (1.1\sigma )\)  \((0.9 \sigma )\)  \(1.16\pm 1.01\pm 0.28 \ (1.1\sigma )\)  \(1.16\pm 1.01\pm 0.24 \ (1.1\sigma )\)  
\(\phi K_S\)  \(0.16\pm 1.01\pm 0 \ (0.1\sigma )\)  \((0.0 \sigma )\)  \(0.16\pm 1.00\pm 0.19 \ (0.2\sigma )\)  \(0.16\pm 1.00\pm 0.17\ (0.2\sigma )\)  
\(\omega K_S\)  \(0.35\pm 1.00\pm 0 \ (0.3\sigma )\)  \((0.0 \sigma )\)  \(0.35\pm 0.91\pm 0.44 \ (0.3\sigma )\)  \(0.35\pm 0.91\pm 0.43\ (0.4\sigma )\)  
\((c\bar{c}) K_S\)  \(3.79\pm 2.97\pm 0 \ (1.3\sigma )\)  \((1.1 \sigma )\)  \(3.79\pm 2.87\pm 1.63 \ (1.1\sigma )\)  \(3.79\pm 2.87\pm 0.78\ (1.2\sigma )\) 
Inputs for the theorydominated CKM fits, inspired by the data available in Summer 2014. Scenario A is restricted to the upper part of the table, whereas Scenario B includes all inputs
Observable  Input 

\(V_{ud}\)  \(0.97425\pm 0\pm 0.00022\) 
\(V_{ub}\)  \((3.70\pm 0.12\pm 0.26)\times 10^{3}\) 
\(V_{cb}\)  \((41.00\pm 0.33\pm 0.74)\times 10^{3}\) 
\(\varDelta m_{d}\)  \((0.510\pm 0.003)\) ps\(^{1}\) 
\(\varDelta m_{s}\)  \((17.757\pm 0.021)\) ps\(^{1}\) 
\(B_s/B_d\)  \(1.023\pm 0.013\pm 0.014\) 
\(B_s\)  \(1.320\pm 0.017\pm 0.030\) 
\(f_{B_s}/f_{B_d}\)  \(1.205\pm 0.004\pm 0.007\) 
\(f_{B_s}\)  \(225.6\pm 1.1\pm 5.4\) MeV 
\(\eta _B\)  \(0.5510\pm 0\pm 0.0022\) 
\(\bar{m}_t\)  \(165.95\pm 0.35 \pm 0.64\) GeV 
\(\alpha \)  \((87.8\pm 3.4)^\circ \) 
\(\sin (2\beta )\)  \(0.682\pm 0.019\) 
\(\gamma \)  \((72.8\pm 6.7)^\circ \) 
Numerical results and p values for the CKM parameters in A and \(\lambda \) for Scenarios A and B, depending on the method chosen. For each quantity, we provide the error budget, whenever possible, and the plots of the p values for Scenarios A (left) and B (right)

Numerical results and p values for the CKM parameters in \(\bar{\rho }\) and \(\bar{\eta }\) for Scenarios A and B, depending on the method chosen. For each quantity, we provide the error budget, whenever possible, and the plots of the p values for Scenarios A (left) and B (right)

We have determined the p values associated with each approach for a measurement involving both statistical and theoretical uncertainties. We have also studied the size of error bars, the significance of deviations and the coverage properties. In general, the most conservative approaches correspond to a naive Gaussian treatment (belonging to the random\(\delta \) approach) and the adaptive nuisance approach. The latter is better defined and more conservative than the former in the case where statistical and theoretical approaches are of similar size. Other approaches (fixed nuisance, external) turn out less conservative at large confidence level.
We have then considered extensions to multidimensional cases, focusing on the linear case where the quantity of interest is a linear combination of observables. Due to the presence of several bias parameters, one has to make another choice concerning the shape of the space over which the bias parameters are varied. Two simple examples are the hypercube and the hyperball, leading to a linear or quadratic combination of theoretical uncertainties, respectively. The hypercube is more conservative, as it allows for sets of values of the bias parameters that cannot be reached within the hyperball. On the other hand, the hyperball has the great virtue of associativity, so that one can average different measurements of the same quantity or put all of them in a global fit, without changing its outcome. It also allows us to include theoretical correlations easily, both in the range of variation of biases to determine errors and in the definition of theoretical correlations for the outcome of a fit. We have discussed the average of several measurements using the various approaches, including correlations. We considered in detail the case of 100% correlations leading to a noninvertible covariance matrix. We also discussed global fits and pulls in a linearized context. We have then provided several comparisons between the different approaches using examples from flavor physics: averaging theorydominated measurements, averaging incompatible measurements linear fits to a subset of flavor inputs.
It is now time to determine which choice seems preferable in our case. Random\(\delta \) has no strong statistical basis: its only advantage consists in its simplicity. External\(\delta \) is closer in spirit to the determination of systematics as performed by experimentalists, but it starts with an inappropriate null hypothesis and tries to combine an infinite set of p values in a single p value. On the contrary, the nuisance\(\delta \) approach starts from the beginning with the correct null hypothesis and deals with a single p value.
This choice is independent from another choice, i.e., the range of variation for the parameter \(\delta \). Indeed, when several bias parameters are involved, one may imagine different multidimensional spaces for their variations, in particular the hyperball and the hypercube. As said earlier, the hyperball has the interesting property of associativity when performing averages and avoids finetuned solutions where all parameters are pushed in a corner of phase space. The hypercube is closer in spirit to the Rfit model (even though the latter is not a bias model), but it cannot avoid finetuned situations and it does not seem well suited to deal with theoretical correlations, since it is designed from the start to avoid such correlations.
A third choice consists in determining whether one wants to keep the volume of variation fixed (fixed approach), or to modify it depending on the desired confidence level (adaptive approach). Adaptive hypercube is in principle the most conservative choice but in practice, it gives too large errors, whereas fixed hyperball would give very small errors. Fixed hypercube is more conservative at low confidence levels (large p values), whereas adaptive hyperball is more conservative at large confidence levels (small p values).
This overall discussion leads us to consider the nuisance approach with adaptive hyperball as a promising approach to deal with flavor physics problems, which we will investigate in more phenomenological analyses in forthcoming publications [35].
Footnotes
 1.
The issue of theoretical uncertainties is naturally not the only question that arises in the context of statistical analyses. The statistical framework used to perform these analyses is also a matter of choice, with two main approaches, frequentist and Bayesian, adopted in different settings and for various problems in and beyond highenergy physics [1, 2, 3, 8, 9, 10]. In this paper, we choose to focus on the frequentist approach to discuss how to model theoretical uncertainties.
 2.
It may happen that the detector and/or background effects have a sizable impact on the fitted quantities \(\hat{C}\) and \(\hat{S}\); this can be viewed as uncertainties in the modeling of the event PDF f. These effects are reported as systematic uncertainties and in particle physics, it is customary to treat them on the same footing as the pure statistical uncertainties. Although we will not try to follow this avenue in the examples discussed here, it would be possible to consider these systematic uncertainties as theoretical uncertainties, to be modeled according to the methods that we describe in the following sections.
 3.
“Nuisance” does not mean that these parameters are necessarily unphysical, “pollution” parameters. They can be fundamental constants of Nature, and interesting as such.
 4.
We have defined compositeness for numerical hypotheses, since this is our case of interest in the following. More generally, compositeness also occurs in the case of nonnumerical hypotheses such as “The Standard Model is true”, for which it is not possible to compute the distribution of data either. Indeed assuming that the Standard Model is true does not imply anything on the value of its fundamental parameters, and thus one cannot compute the distribution of a given observable under this hypothesis.
 5.
Strictly speaking, the likelihood is only defined for the actually measured data \(X_0\): \({\mathcal L}_0(\chi )\equiv g(X_0;\chi )\) and thus is only a function of the parameters \(\chi \). Nevertheless it is common practice to use the word “likelihood” for the object \(g(X;\chi )\), considered as a function of both the observables X and the parameters \(\chi \).
 6.
More precisely, the asymptotic limit is reached when the model can be linearized for all values of the data that contribute significantly to the integral (4). It corresponds to the situation where the errors on the parameters derived from computing p values are small with respect to the typical parameter scales of the problem.
 7.
A bias is defined as the difference between the average of the estimator among a large number of experiments with finite sample size and the true value. An estimator is said to be consistent if it converges to the true value when the size of the sample tends to infinity (e.g., maximum likelihood estimators). Consistency implies that the bias vanishes asymptotically, while inconsistency may stem from theoretical uncertainties.
 8.
We discuss how the method can be adapted for asymmetric uncertainties in Appendix C.
 9.
 10.
The choice of the weight in the denominator of the test statistic will be discussed in the multidimensional case in Sect. 6.2.3, but it does not impact the result for the p value in one dimension where it plays only the role of an overall normalization that cancels when computing the p value.
 11.
Such a test statistic tends typically to be less sensitive to discrepancies in a global fit than the likelihood ratio. In the presence of quantities having no or little dependence on the scanned parameters, the impact of discrepancies is diluted in the case of the likelihood statistic.
 12.
In full generality, one should have kept the different sources of theoretical uncertainties separated, as their combination in a single theoretical uncertainty depends on the precise model used for theoretical uncertainties. We consider here the result of Ref. [34] where all theoretical uncertainties are already combined.
 13.
This is true at least for approaches where theoretical errors are modeled by fixed bias parameters: the combined error on the quantity of interest is a weighted sum as in Eq. (42), and the maximal value of this quantity can only be made always larger than each individual contribution if the corners of the hypercube are included in the maximization region.
 14.
This problem does not occur in the hyperball case, where the section of the hyperellipsoid by a hyperplane always yields an ellipse symmetric along the diagonal, with an elongation according to the theoretical correlation between the biases.
 15.
As discussed in Sect. 4.4, the overall normalization of \(T(X;\mu )\) is irrelevant to derive unidimensional p values.
 16.
In this section, we will not deal with asymmetric uncertainties, and for illustrative purpose, we symmetrize all uncertainties, statistical and theoretical, following Eq. (C.39).
 17.
In general, for naive Rfit, the tails of the resulting test statistic T are neither Gaussian nor symmetric. However, our approximation is valid to a good accuracy for our illustrative purposes and the examples discussed in this section.
 18.
In addition, we have made further choices concerning the separation of statistical and theoretical uncertainties based on the following considerations. Ref. [60] discusses the sources of uncertainties (scales, function parameters, bquark mass) within a fit leading to uncertainties assumed to be of statistical nature, with a further systematic uncertainty coming from the difference between the two different schemes. The systematic uncertainties in Ref. [57] are assumed to be of statistical nature in the absence of any opposite statement. For the first two classes (j & s and 3j) hadronization is taken into account by Monte Carlo methods, while for the last two classes (T and C) analytic analyses are made: in the former (latter) case, the hadronic uncertainties are treated as statistical (theoretical).
 19.
In particular, most of the inputs have several sources of theoretical uncertainties, which should be combined together linearly or in quadrature according to the model of theoretical uncertainties chosen. Since we just want to illustrate the difference between the various approaches at the level of the fit, we take as inputs the values obtained in a given framework (Rfit) without recomputing the averages and uncertainties for each approach.
 20.
The definition of \(C_s^+\) can be extended for an arbitrary matrix C in the following way. \(\varSigma \) is defined as the diagonal matrix with entries \(\{\sqrt{C_{11}},\ldots \sqrt{C_{NN}}\}\) (if a diagonal entry is 0, one defines \(\varSigma \) with 1 in the corresponding entry). The matrix \(\varGamma =\varSigma ^{1}.C.\varSigma ^{1} \) can be written according to a singular value decomposition \(\varGamma =R.D.S\) with two rotation matrices R and S. Once the generalized inverse \(D^+\) is defined, the corresponding generalized inverse of C is defined as \(C^+=\varSigma ^{1}.S^T.D^+.R^T.\varSigma ^{1}\).
 21.
One could try to symmetrize the problem, but one would lose the connection with the Cholesky decomposition, with the unpleasant feature that all domains of variation would be identical and thus do not take into account correlations.
Notes
Acknowledgements
We would like to thank S. T’Jampens for collaboration at an early stage of this work, as well as all our collaborators from the CKMfitter group for many useful discussions on the statistical issues covered in this article. We would also like to express a special thanks to the Mainz Institute for Theoretical Physics (MITP) for its hospitality and support during the workshop “Fundamental parameters from lattice QCD” where part of this work was presented and discussed. LVS acknowledges financial support from the Labex P2IO (Physique des 2 Infinis et Origines). SDG acknowledges partial support from Contract FPA201461478EXP. This project has received funding from the European Unions Horizon 2020 research and innovation programme under Grant agreements Nos. 690575, 674896 and 692194.
References
 1.F. James, Statistical Methods in Experimental Physics (World Scientific, Hackensack, 2006)CrossRefMATHGoogle Scholar
 2.G. Cowan, Statistics for Searches at the LHC. doi: 10.1007/9783319053622_9. arXiv:1307.2487 [hepex]
 3.M.G. Kendall, A. Stuart, The Advanced Theory of Statistics (Griffin, London, 1969)MATHGoogle Scholar
 4.P. Sinervo, eConf C 030908, TUAT004 (2003)Google Scholar
 5.M. Schmelling, arXiv:hepex/0006004
 6.W.A. Rolke, A.M. Lopez, J. Conrad, Nucl. Instrum. Methods A 551, 493 (2005). arXiv:physics/0403059 ADSCrossRefGoogle Scholar
 7.R.D. Cousins, J.T. Linnemann, J. Tucker, Nucl. Instrum. Meth. A 595, 480 (2008)ADSCrossRefGoogle Scholar
 8.W.M. Bolstad, J.M. Curran, Introduction to Bayesian Statistics (Wiley, New York, 2016)MATHGoogle Scholar
 9.G. D’Agostini, Rep. Prog. Phys. 66, 1383 (2003). doi: 10.1088/00344885/66/9/201. arXiv:physics/0304102 ADSCrossRefGoogle Scholar
 10.A.J. Bevan, Statistical Data Analysis for the Physical Sciences (Cambridge Press, Cambridge, 2013)CrossRefGoogle Scholar
 11.S. Schael et al., ALEPH and DELPHI and L3 and OPAL and LEP Electroweak Collaborations, Phys. Rep. 532, 119 (2013). arXiv:1302.3415 [hepex]
 12.M. Baak et al., Gfitter Group Collaboration, Eur. Phys. J. C 74, 3046 (2014). arXiv:1407.3792 [hepph]
 13.R. Aaij et al., LHCb Collaboration, Eur. Phys. J. C 73(4), 2373 (2013). arXiv:1208.3355 [hepex]
 14.A.J. Bevan et al., BaBar and Belle Collaborations, Eur. Phys. J. C 74, 3026 (2014). arXiv:1406.6311 [hepex]
 15.A. Hocker, H. Lacker, S. Laplace, F. Le Diberder, Eur. Phys. J. C 21, 225 (2001). arXiv:hepph/0104062 ADSCrossRefGoogle Scholar
 16.J. Charles et al., CKMfitter Group Collaboration, Eur. Phys. J. C 41, 1 (2005). arXiv:hepph/0406184
 17.J. Charles et al., Phys. Rev. D 84, 033005 (2011). arXiv:1106.4041 [hepph]ADSCrossRefGoogle Scholar
 18.O. Deschamps, S. DescotesGenon, S. Monteil, V. Niess, S. T’Jampens, V. Tisserand, Phys. Rev. D 82, 073012 (2010). arXiv:0907.5135 [hepph]ADSCrossRefGoogle Scholar
 19.A. Lenz et al., Phys. Rev. D 83, 036004 (2011). arXiv:1008.1593 [hepph]ADSCrossRefGoogle Scholar
 20.A. Lenz, U. Nierste, J. Charles, S. DescotesGenon, H. Lacker, S. Monteil, V. Niess, S. T’Jampens, Phys. Rev. D 86, 033008 (2012). arXiv:1203.0238 [hepph]ADSCrossRefGoogle Scholar
 21.J. Charles, S. DescotesGenon, Z. Ligeti, S. Monteil, M. Papucci, K. Trabelsi, Phys. Rev. D 89(3), 033016 (2014). arXiv:1309.2293 [hepph]ADSCrossRefGoogle Scholar
 22.J. Charles, O. Deschamps, S. DescotesGenon, H. Lacker, A. Menzel, S. Monteil, V. Niess, J. Ocariz et al., Phys. Rev. D 91(7), 073007 (2015). arXiv:1501.05013 [hepph]ADSCrossRefGoogle Scholar
 23.B. Aubert et al., BaBar Collaboration, Phys. Rev. D 79, 072009 (2009). doi: 10.1103/PhysRevD.79.072009. arXiv:0902.1708 [hepex]
 24.I. Adachi et al., Phys. Rev. Lett. 108, 171802 (2012). doi: 10.1103/PhysRevLett.108.171802. arXiv:1201.4643 [hepex]ADSCrossRefGoogle Scholar
 25.R. Aaij et al., LHCb Collaboration, Phys. Rev. Lett. 115(3), 031601 (2015). doi: 10.1103/PhysRevLett.115.031601. arXiv:1503.07089 [hepex]
 26.I.I.Y. Bigi, V.A. Khoze, N.G. Uraltsev, A.I. Sanda, Adv. Ser. Direct. High Energy Phys. 3, 175 (1989). doi: 10.1142/9789814503280_0004 ADSCrossRefGoogle Scholar
 27.J. Neyman, E.S. Pearson, Philos. Trans. R. Soc. Lond. A 231, 289–337 (1933)ADSCrossRefGoogle Scholar
 28.S.S. Wilks, Ann. Math. Stat. 9(1), 60–62 (1938)CrossRefGoogle Scholar
 29.F.C. Porter, arXiv:0806.0530 [physics.dataan]
 30.S. Fichet, G. Moreau, Nucl. Phys. B 905, 391 (2016). arXiv:1509.00472 [hepph]ADSCrossRefGoogle Scholar
 31.S. Fichet, Nucl. Phys. B 911, 623 (2016). arXiv:1603.03061 [hepph]ADSCrossRefGoogle Scholar
 32.G. . DuboisFelsmann, D.G. Hitlin, F.C. Porter, G. Eigen, arXiv:hepph/0308262
 33.G. Eigen, G. DuboisFelsmann, D.G. Hitlin, F.C. Porter, Phys. Rev. D 89(3), 033004 (2014). arXiv:1301.5867 [hepex]ADSCrossRefGoogle Scholar
 34.K.A. Olive et al., Particle Data Group Collaboration, Chin. Phys. C 38, 090001 (2014)Google Scholar
 35.J. Charles et al., Work in progressGoogle Scholar
 36.Y. Amhis et al., Heavy Flavor Averaging Group (HFAG) Collaboration, arXiv:1412.7515 [hepex]
 37.S. Aoki et al., arXiv:1607.00299 [heplat]
 38.M. Schmelling, Phys. Scr. 51, 676 (1995)ADSCrossRefGoogle Scholar
 39.H. Ruben, Ann. Math. Stat. 33(2), 542–570 (1962)ADSMathSciNetCrossRefGoogle Scholar
 40.A. CastañoMartnez, F. LópezBlázquez, TEST 14, 397 (2005)MathSciNetCrossRefGoogle Scholar
 41.M. Constantinou et al., ETM Collaboration, Phys. Rev. D 83, 014505 (2011). arXiv:1009.5606 [heplat]
 42.J. Laiho, R.S. Van de Water, PoS LATTICE 2011, 293 (2011). arXiv:1112.4861 [heplat]Google Scholar
 43.S. Durr, Z. Fodor, C. Hoelbling, S.D. Katz, S. Krieg, T. Kurth, L. Lellouch, T. Lippert et al., Phys. Lett. B 705, 477 (2011). arXiv:1106.3230 [heplat]ADSCrossRefGoogle Scholar
 44.R. Arthur et al., RBC and UKQCD Collaborations, Phys. Rev. D 87, 094514 (2013). arXiv:1208.4412 [heplat]
 45.T. Bae et al., SWME Collaboration, arXiv:1402.0048 [heplat]
 46.B. Blossier et al., JHEP 0907, 043 (2009). arXiv:0904.0954 [heplat]Google Scholar
 47.C.T.H. Davies, C. McNeile, E. Follana, G.P. Lepage, H. Na, J. Shigemitsu, Phys. Rev. D 82, 114504 (2010). arXiv:1008.4018 [heplat]ADSCrossRefGoogle Scholar
 48.A. Bazavov et al., Fermilab Lattice and MILC Collaboration, Phys. Rev. D 85, 114506 (2012). arXiv:1112.3051 [heplat]
 49.A. Bazavov et al., Fermilab Lattice and MILC Collaborations, Phys. Rev. D 90(7), 074509 (2014). arXiv:1407.3772 [heplat]
 50.N. Carrasco, P. Dimopoulos, R. Frezzotti, P. Lami, V. Lubicz, F. Nazzaro, E. Picca, L. Riggio et al., Phys. Rev. D 91(5), 054507 (2015). arXiv:1411.7908 [heplat]
 51.D. d’Enterria,P.Z. Skands, arXiv:1512.05194 [hepph]
 52.G. Dissertori, A. GehrmannDe Ridder, T. Gehrmann, E.W.N. Glover, G. Heinrich, G. Luisoni, H. Stenzel, JHEP 0908, 036 (2009). arXiv:0906.3436 [hepph]ADSCrossRefGoogle Scholar
 53.G. Abbiendi et al., OPAL Collaboration, Eur. Phys. J. C 71, 1733 (2011). arXiv:1101.1470 [hepex]
 54.S. Bethke et al., JADE Collaboration, Eur. Phys. J. C 64, 351 (2009). arXiv:0810.1389 [hepex]
 55.G. Dissertori, A. GehrmannDe Ridder, T. Gehrmann, E.W.N. Glover, G. Heinrich, H. Stenzel, Phys. Rev. Lett. 104, 072002 (2010). arXiv:0910.4283 [hepph]ADSCrossRefGoogle Scholar
 56.J. Schieck et al., JADE Collaboration, Eur. Phys. J. C 73(3), 2332 (2013). arXiv:1205.3714 [hepex]
 57.T. Becher, M.D. Schwartz, JHEP 0807, 034 (2008). arXiv:0803.0342 [hepph]ADSCrossRefGoogle Scholar
 58.R.A. Davison, B.R. Webber, Eur. Phys. J. C 59, 13 (2009). arXiv:0809.3326 [hepph]
 59.R. Abbate, M. Fickinger, A.H. Hoang, V. Mateu, I.W. Stewart, Phys. Rev. D 83, 074021 (2011). arXiv:1006.3080 [hepph]ADSCrossRefGoogle Scholar
 60.T. Gehrmann, G. Luisoni, P.F. Monni, Eur. Phys. J. C 73(1), 2265 (2013). arXiv:1210.6945 [hepph]
 61.A.H. Hoang, D.W. Kolodrubetz, V. Mateu, I.W. Stewart, Phys. Rev. D 91(9), 094018 (2015). arXiv:1501.04111 [hepph]ADSCrossRefGoogle Scholar
 62.S. Bethke, G. Dissertori, G.P. Salam, EPJ Web Conf. 120, 07005 (2016). doi: 10.1051/epjconf/201612007005 CrossRefGoogle Scholar
 63.M. Beneke, Phys. Lett. B 620, 143 (2005). arXiv:hepph/0505075 ADSCrossRefGoogle Scholar
 64.B.O. Lange, M. Neubert, G. Paz, Phys. Rev. D 72, 073006 (2005). arXiv:hepph/0504071 ADSCrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Funded by SCOAP^{3}